Results
A total of 27 longitudinal vaginal septum patients (mean age ± SD, 29 ± 6 years), with 164 ultrasonographic images and 43 oblique vaginal septum patients (mean age ± SD, 21 ± 8 years) with 262 ultrasonographic images were finally included for constructing deep learning models (Table S3).
We constructed three DL models based on 70 cases with 426 ultrasonographic images. In the validation set, ConvNeXt-B had the best AUCof 0.897 (variance, 0.005, [95%CI, 0.843–0.949]) with highest accuracy of 0.834 (variance, 0.008, 95%CI, 0.762–0.905) and highest specificity of 0.845 (variance, 0.010, [95%CI, 0.778–0.911]). ResNet50 had an AUC of 0.876 (variance, 0.004, [95%CI, 0.813–0.936]) secondary only to ConvNeXt-B, with the best sensitivity of 0.829 (variance, 0.019, [95%CI, 0.769–0.890]). Relatively, the AUC of ViT-B/16 was only 0.664 (variance, 0.011, [95%CI, 0.567–0.761]), junior to both ResNet50 and ConvNeXt-B, with the worst performance, of which the accuracy, sensitivity and specificity were all lower than 0.800 (Table 1 ).
Table 1 Discriminative performance of image-based DL models according to 5-fold cross-validation Model Dataset Accuracy [Variance, (95%CI)] Sensitivity [Variance, (95%CI)] Specificity [Variance, (95%CI)] AUC [Variance, (95%CI)] ResNet50 Training 0.999 (0.000, [0.997–1.000]) 0.998 (0.000, [0.996–1.000]) 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) ResNet50 Validation 0.813 (0.004, [0.734–0.892]) 0.829 (0.019 , [0.769–0.890]) 0.790 (0.007, [0.707–0.872]) 0.876 (0.004, [0.813–0.936]) ConvNeXt-B Training 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) ConvNeXt-B Validation 0.834 (0.008 , [0.762–0.905]) 0.823 (0.010, [0.758–0.888]) 0.845 (0.010 , [0.778–0.911]) 0.897 (0.005 , [0.843–0.949]) ViT_B/16 Training 0.999 (0.000, [0.998–1.000]) 0.999 (0.000, [0.998–1.000]) 1.000 (0.000, [1.000–1.000]) 0.999 (0.000, [0.998–1.000]) ViT_B/16 Validation 0.650 (0.006, [0.551–0.749]) 0.706 (0.013, [0.614–0.797]) 0.578 (0.006, [0.476–0.680]) 0.664(0.011, [0.567–0.761]) Abbreviation : DL deep learning, 95%CI 95% confidence interval, AUC area under the curve
Discriminative performance of image-based DL models according to 5-fold cross-validation
Abbreviation : DL deep learning, 95%CI 95% confidence interval, AUC area under the curve
Further, multiple comparisons of diagnostic performance on validation set between any two of the three models were conducted. The accuracy and AUC of ConvNextB in image-level significantly outperformed ViT-B/16 with P value of 0.046 and 0.028, respectively. But there was no significant difference of the four metrics between ResNet50 and ViT-B/16 or ResNet50 and ConvNextB (Table 2 ).
We also considered combining the three DL models. The combined DL model with AUC of 0.891 (variance, 0.005, [95%CI, 0.837–0.944]) did not outperformed ConvNeXt-B at image-level (Table. S4).
Table 2 Comparison of the discriminative performance of image-based DL models on validation set Model 1 Model 2 P _value P _value P _value P _value (Accuracy) (Sensitivity) (Specificity) (AUC) ResNet50 ViT_B/16 0.168 0.517 0.395 0.094 ConvNeXt-B ViT_B/16 0.046 0.517 0.09 0.028 ResNet50 ConvNeXt-B 0.237 0.916 0.527 0.295 Abbreviation : DL deep learning, AUC area under the curve
Comparison of the discriminative performance of image-based DL models on validation set
Abbreviation : DL deep learning, AUC area under the curve
Besides, we constructed the DL models in case-based manner using 5-fold cross-validation. The ROC on validation set in each fold of the 5-fold cross-validation is shown in Fig. 4 a. It’s obvious that the AUCs of the models had variations among different data divisions. Therefore, for each performance metric of the model, the average of the metric values on validation set of each fold is considered as the final metric. Remarkably, ConvNeXt-B demonstrated the best AUC of 0.897 (variance, 0.004, [95%CI, 0.734–1.000]), with highest accuracy of 0.817 (variance, 0.004, [95%CI, 0.598–0.994]), sensitivity of 0.817 (variance, 0.016, [95%CI, 0.632–0.984]), and specificity of 0.811 (variance, 0.017, [95%CI, 0.622–0.979]). ResNet50 had a median AUC of 0.842 (variance, 0.004, 95%CI, 0.639–0.997) and relatively worse accuracy of 0.772 (variance, 0.005, 95%CI, 0.534–0.988), sensitivity of 0.808 (variance, 0.030, 95%CI, 0.631–0.960), and specificity of 0.709 (variance, 0.041, 95%CI, 0.505–0.905). Still, ViT-B/16 showed the worst performance with its all metrics less than 0.800 (Table 3 ). Fig. 4 ROC of DL models on validation set according to 5-fold cross-validation. a . ROC of cv1 on validation set; b . ROC of cv2 on validation set; c . ROC of cv3 on validation set; d . ROC of cv4 on validation set; e . ROC of cv5 on validation set. ROC, the receiver operating curve. DL, deep learning
ROC of DL models on validation set according to 5-fold cross-validation. a . ROC of cv1 on validation set; b . ROC of cv2 on validation set; c . ROC of cv3 on validation set; d . ROC of cv4 on validation set; e . ROC of cv5 on validation set. ROC, the receiver operating curve. DL, deep learning
Table 3 Discriminative performance of case-based DL models according to 5-fold cross-validation Model Dataset Accuracy [Variance, (95%CI)] Sensitivity [Variance, (95%CI)] Specificity [Variance, (95%CI)] AUC [Variance, (95%CI)] ResNet50 Training 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) ResNet50 Validation 0.772 (0.005, [0.534–0.988]) 0.808 (0.030, [0.631–0.960]) 0.709 (0.041, [0.505–0.905]) 0.842 (0.004, [0.639–0.997]) ConvNeXt-B Training 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) ConvNeXt-B Validation 0.817 (0.004 , [0.598–0.994]) 0.817 (0.016 , [0.632–0.984]) 0.811 (0.017 , [0.622–0.979]) 0.897 (0.004 , [0.734–1.000]) ViT_B/16 Training 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) 1.000 (0.000, [1.000–1.000]) ViT_B/16 Validation 0.668 (0.014, [0.407–0.920]) 0.725 (0.040, [0.522–0.925]) 0.572 (0.024, [0.304–0.831]) 0.681 (0.030, [0.434–0.908]) Abbreviation : DL deep learning, 95%CI 95% confidence interval, AUC area under the curve
Discriminative performance of case-based DL models according to 5-fold cross-validation
Abbreviation : DL deep learning, 95%CI 95% confidence interval, AUC area under the curve
We then compared four metrics of case-based DL models in validation set (Table 4 ). Except the sensitivity, the accuracy, specificity and AUC of ResNet50 significantly outperformed ViT-B/16 ( P = 0.016, P = 0.032 and P < 0.01, respectively). Likewise, compared to ViT-B/16, ConvNeXt-B had significantly greater accuracy ( P = 0.036), specificity ( P < 0.01) and AUC ( P < 0.01), but the sensitivity ( P = 0.249). The four metrics of ConvNeXt-B and ResNet5 showed no significant differences ( P ˃0.05) (Table 4 ).
To figure out whether the combined DL model will behave better with case classification. We analyzed its case-level performance. Still, the combined DL model with specificity of 0.743 (variance, 0.022, [0.533–0.945]) and AUC of 0.891 (variance, 0.003, [95%CI, 0.736–1.000]), failed to identify the cases as well as ConvNeXt-B (Table. S4).
Table 4 Comparison of the discriminative performance of case-based DL models in validation set Model 1 Model 2 P _value P _value P _value P _value (Accuracy) (Sensitivity) (Specificity) (AUC) ResNet50 ViT_B/16 0.016 0.151 0.032 < 0.01 ConvNeXt-B ViT_B/16 0.036 0.249 < 0.01 < 0.01 ResNet50 ConvNeXt-B 0.834 0.753 0.548 0.841 Abbreviation : DL deep learning, AUC area under the curve
Comparison of the discriminative performance of case-based DL models in validation set
Abbreviation : DL deep learning, AUC area under the curve
Since the models had relatively better performance on the validation set in the third fold of the 5-fold cross-validation, we visualized the deep features based on the models trained in the third fold (Fig. 5 ). Fig. 5 Feature map of representative images on validation set according to 5-fold cross-validation. An image of longitudinal vaginal septum and an image of oblique vaginal septum were randomly selected to show the focused region of ConvNeXt-B, ResNet50 and ViT-B/16. DL, deep learning
Feature map of representative images on validation set according to 5-fold cross-validation. An image of longitudinal vaginal septum and an image of oblique vaginal septum were randomly selected to show the focused region of ConvNeXt-B, ResNet50 and ViT-B/16. DL, deep learning
Materials
This is a retrospectively analytic study. We collected accessible cases with a diagnosis of longitudinal vaginal septum or oblique vaginal septum in the Department of Obstetrics and Gynecology, Tongji Hospital, Wuhan between January 2013 and January 2023. The ultrasonographic images were digitally stored in the computer center of our hospital. Patients with ultrasonographic images for analysis must simultaneously meet the following criteria for inclusion: (1) patients surgically diagnosed with a longitudinal or oblique vaginal septum; (2) patients with gynecological ultrasound imaging record indicating longitudinal/oblique vaginal septum in our hospital; (3) only preoperative ultrasound imaging of longitudinal/oblique vaginal septum will be adopted; (4) only B-mode images will be used. The exclusion criteria are as follows: (1) patients without longitudinal/oblique vaginal septum; (2) patients without ultrasound imaging records; (3) postoperative ultrasound imaging; (4) ultrasound imaging during pregnancy; (5) ultrasound images that are not in B-mode or with low quality. The flowchart of data collection and inclusion is shown in Fig. 1 . Fig. 1 Flowchart of data collection for multiple DL models construction. Case data and ultrasonographic images were collected after receiving the ethical approval in Tongji hospital, Wuhan. Strict exclusion criteria were established to enroll the data. 5- fold cross-validation was used to construct deep learning models. DL, deep learning
Flowchart of data collection for multiple DL models construction. Case data and ultrasonographic images were collected after receiving the ethical approval in Tongji hospital, Wuhan. Strict exclusion criteria were established to enroll the data. 5- fold cross-validation was used to construct deep learning models. DL, deep learning
The original ultrasonographic images contain noisy information at the edge that may result in unsatisfactory performance of the DL models. Besides, previous studies have revealed that some key features like hematocolpos were beneficial to define oblique vaginal septum [ 8 ]. The longitudinal and oblique vaginal septa were frequently accompanied by uterus anomalies, such as septate uterus and uterus didelphys [ 25 , 26 ]. Therefore, the images were segmented by using ITK-SNAP 4.0 with a red minor rectangular or trapezoid mask to annotate the region of interest (ROI) containing uterus and vagina and simultaneously to exclude extra interfering information as much as possible (Fig. 2 ). Fig. 2 Illustration of ultrasound images segmentation of longitudinal and oblique vaginal septa. Original ultrasound images of longitudinal and oblique vaginal septa were collected. Then red mask was used to mark the region of uterus and vagina, which was defined as the region of interest and taken as input information of the deep learning models. ROI, region of interest
Illustration of ultrasound images segmentation of longitudinal and oblique vaginal septa. Original ultrasound images of longitudinal and oblique vaginal septa were collected. Then red mask was used to mark the region of uterus and vagina, which was defined as the region of interest and taken as input information of the deep learning models. ROI, region of interest
Three DL models were selected for image classification, among which ResNet50 is the most commonly used CNN-based DL model [ 23 ]. ViT-B/16 is a transformer-based DL model, and by co-opting partial idea of transformer, ConvNeXt-B can achieve a similar effect through depthwise convolution [ 24 , 27 ].
ResNet50 is a representative network of the ResNets. As shown in Fig. 3 a, ResNet50 consists of six layers, including input layer, convolution layer, pooling layer, residual block, global average pooling layer, and output layer. The core idea is to introduce short-circuit connections to convert learning objectives into residuals, thereby reducing the impact on model performance when increasing network depth. Fig. 3 DL model neural network architectures for discriminating longitudinal and oblique vaginal septa. a . The neural network architecture of ResNet50; b . The neural network architecture of ConvNeXt-B; c . The neural network architecture of ViT-B/16. DL, deep learning
DL model neural network architectures for discriminating longitudinal and oblique vaginal septa. a . The neural network architecture of ResNet50; b . The neural network architecture of ConvNeXt-B; c . The neural network architecture of ViT-B/16. DL, deep learning
ResNet50 is one of the most widely used deep learning models in image classification, of which the advantage is that it solves the gradient vanishing problem in deep networks through residual connections, allowing deeper networks to be trained, and showing good performance on a variety of tasks and data sets. But due to more reliance on capturing local features, it is weak in capturing global features of images.
ConvNeXt comes in multiple versions, with variations in the number of channels and blocks within each stage. ConvNeXt is composed of an image preprocessing layer, a processing layer, and a classification layer. Its core lies in the processing layer, which contains four stages. Each stage repeats several blocks, and the number of repetitions defines the depth of the model. The aim is to extract image features through a series of convolutional operations. In ConvNeXt-B, the input channel counts for the four stages respectively are 128, 256, 512 and 1024, and the number of block repetitions respectively are 3, 3, 27 and 3 (Fig. 3 b).
Compared with ResNet50, the advantage of ConvNeXt-B is that it uses large convolution kernels to expand the receptive field, thereby capturing more contextual information. At the same time, through the introduction of deep separable convolution, the model maintains high performance while relatively reducing the consumption of computing resources. The disadvantage is that the use of large convolution kernels requires more memory usage.
ViT-B/16 is a variant of ViT-B, with a patch size of 16 × 16, and ViT-B is the base resolution variant in the ViT model. The core modules of the ViT model consist of three parts: the embedding layer, the transformer encoder, and the multilayer perceptron (MLP) head. In the embedding layer, the input image is divided into a series of image patches, and each image patch is represented as a sequence of vectors after undergoing a linear mapping. In the transformer encoder, ViT utilizes multiple transformer encoder layers to process the input sequence of image patches. Each transformer encoder layer comprises a self-attention mechanism and a feed-forward network. The self-attention mechanism captures the relationships between image patches and generates context-aware feature representations for them. In the MLP head, the output of the ViT model is used for classification prediction through an additional linear layer. Typically, a global average pooling operation is applied to the output of the last transformer encoder layer, aggregating the features of the image patch sequence into a global feature vector, which is then classified using a linear layer (Fig. 3 c).
The advantage of ViT-B/16 is that the transformer architecture used can capture global dependencies in images through the self-attention mechanism. Since it does not rely on convolutional layers, the model can be easily extended to different image sizes and resolutions; its disadvantage is requiring more data to train. After the ultrasound image is input into the model, the features extracted by each layer of the ViT-B/16 network are a mixture of local and global information, while the ResNet50 and ConvNeXt-B networks extract local features first and then refine global features using local features.
We constructed DL models for the differential diagnosis of longitudinal vaginal septum and oblique vaginal septum using 5-fold cross-validation. The size of each input ultrasound image was uniformly scaled to 224 × 224 and normalized for each channel. In 5-fold cross-validation, all patients were divided into five parts, with ultrasound images of the same individual in the same part, and the proportion of individuals with longitudinal vaginal septum and oblique vaginal septum kept constant in each part. During the training of each fold, one part was selected as the validation set, and the other four parts were used as the training set. Besides, the hyperparameters including learning rate for model training were set as shown in Table S1. The optimizer used is Adam, which calculates the first and second moments of the gradient to adjust the learning rate for each parameter of models. Betas are the exponential decay factors in Adam that can be used to control the adjustment of learning rate. Eps is a constant added to the denominator to prevent division by zero errors, and weight-decay is the L2 regularization term, which is used to control the magnitude of parameters to prevent overfitting. The hyperparameter epoch was adjusted to a suitable value so that the loss of the training set reached plateau. During the training process, the optimal model parameters were selected according to the highest accuracy achieved on the training set.
In addition, we further considered combining the three well-trained DL models, using the outputs of each model as inputs for the XGBoost classification algorithm to construct a combined classification model. By utilizing the Optuna library in Python, we can search the XGBoost hyperparameter space based on the Bayesian optimization algorithm, and automatically obtain the best model hyperparameters. The parameter settings of Optuna are as shown in Table S2.
To visually compare the differences between different models, the Grad-CAM package in Python was used to show the local focused regions of each model.
Quantitative data were compared using t test or Mann-Whitney U while qualitative data were compared using chi-square test. Receiver operating characteristic (ROC) curves were plotted to evaluate and compare the performance of DL models. Four metrics including accuracy, sensitivity, specificity and AUC were selected to quantify the performance of the models. 95% confidence intervals (95%CIs) of the numeric variables were calculated. The average value of these metrics was taken as the final performance estimated value. The difference was considered statistically significant with P < 0.05. The data was analyzed using Python 3.10.8.
Discussion
In this study, we totally introduced three DL deep learning algorithms to establish ultrasonographic diagnostic models to classify longitudinal vaginal septum and oblique vaginal septum. Importantly, we found that ConvNeXt-B showed the best capability to discriminate these two vaginal anomalies with the highest AUC of 0.897.
The longitudinal vaginal septum and oblique vaginal septum clinically belongs to two classifications of female vaginal anomalies [ 4 ]. Longitudinal vaginal septum may be asymptomatic in adolescence, thus avoiding medical examination, which will cause fertility problems and make the transvaginal delivery challenging in women of child-bearing age. Relatively, oblique vaginal septum will severely impair the patients’ life quality and reproductive ability, with earlier onset and more complications. Besides, oblique vaginal septum can be divided into three subtypes: Type I, with a complete closure on obstructed-side vagina; Type II, patients with a small hole on the septum; Type II, with the presence of a fistula between two-sided cervixes. Type I patients tend to suffer from heavy dysmenorrhea and therefore are early diagnosed while type II and type III patients may encounter a delayed diagnosis for incomplete obstruction [ 5 , 8 ].
Auxiliary imaging like ultrasonography and magnetic resonance imaging has opened a novel era of non-invasive diagnosis of congenital female reproductive tract anomalies [ 28 ]. Nowadays, for the integrated advantages, 3D-US is a clinically frequently selected diagnostic imaging approach for female reproductive malformations. However, due to factors like confusing manifestations and complex anatomical structures, sonographers are confronted with a series of difficulties when using 3D-US to classify longitudinal and oblique vaginal septa. For instance, patients with longitudinal vaginal septum can be correlated with uterus malformations, similar to patients with oblique vaginal septum [ 29 – 31 ]. In addition, patients with type II/III oblique vaginal septum can discharge menstrual blood through the small septal hole or the cervix fistula, thus complaining alleviated obstructive symptoms, which are likely to be confused with longitudinal vaginal septum [ 5 ]. Simultaneously, both the longitudinal and the oblique vaginal septa derive from alternative embryologic development of müllerian duct [ 29 ]. Clinicians and sonographers who lack the knowledge of embryology and mullerian duct anomalies and lack clinical exposure to related cases, may fail to distinguish longitudinal and oblique vaginal septa. As is reported in the literature and presented in Table.S2, oblique vaginal septum started to disturb the females after menarche [ 5 ], when transabdominal ultrasound imaging may be more suitable for examination but may produce unclear images if the patients were with visceral obesity or did not cooperate well. Moreover, without pre-existing indicative clinical symptoms, even experienced sonographer may omit the vaginal anomalies when the ultrasonic probe can be pushed forward successfully. In addition, lower-level or local hospital are scarcely exposed to cases of vaginal anomalies and usually poorly equipped, where a reliable diagnostic tool may greatly help. Taking the above issues into consideration, it’s imperative to promote the diagnostic performance of ultrasound imaging for discriminating longitudinal and oblique vaginal septa.
As is reported, compared to CNN-based models, the transformer-based networks perform well in various domains through the self-attention mechanism and global information processing capability [ 27 ]. However, in this study, we found that ViT-B/16 does not perform as well as CNN-based models in all metrics and takes the longest time for training to reach convergence. In contrast, ConvNeXt-B not only performed better than ViT-B/16 in term of accuracy, sensitivity, specificity and AUC, but also can converge quickly in the training process.
Transformer was initially used in natural language processing and later its ideas were introduced to the field of image analysis, which is more suitable for tasks of global dependency like text translation. In this study, we aimed to identify the vaginal anomalies of ultrasonographic images in a DL supportive manner. However, given that the ROI of the ultrasonographic images of longitudinal and oblique vaginal septum can’t be in detail depicted manually, we almost took the whole picture as input data. Therefore, the extra noisy information may lead to the poor performance of the DL models, especially the transformer-based model (ViT-B/16). Notably, CNN-based DL models are designed to capture the local dependency, thus achieving satisfactory performance in the image recognition task. ConvNeXt-B can capture features of variable scales through optimizing the traditional pooling layers and prompt the performance of network by co-opting from ResNets and by combing with features of variable scales. As we can see, compared to other DL models, ConvNeXt-B can reach convergence and show relatively better capability of classification with only several epoch. We further tried to combine the three DL models. Compared with ConvNeXt-B, which has the best performance as a single model, the AUC of the combined model in both image-based and case-based recognition decreased, indicating that ensemble learning may be not necessary to improve the performance of ConvNeXt-B model.
To the best of our knowledge, this is the first time that deep learning models were used to identify longitudinal vaginal septum and oblique vaginal septum. ConvNeXt-B demonstrated the most outstanding performance with the AUC of 0.897, indicating its potential of accurate discriminative diagnosis of longitudinal vaginal septum and oblique vaginal septum in clinical application.
But there are still some limitations. The sample size seems relatively insufficient due to the low incidence of longitudinal and oblique vaginal septum. Besides, ultrasound images with normal vagina weren’t enrolled to assess the ability of discerning vaginal anomalies. Because in clinical practice, sonographers do not capture and store the images of healthy vagina unless they have a preliminary diagnosis of vaginal anomalies before ultrasound imaging. What’s more, we didn’t enroll multicenter cases and images to establish and validate the DL models. In the future, multicenter data with larger sample-size will be collected to improve the robustness of the DL models. In addition, the inclusion criteria of image quality should be clearly defined and better conducted. Concerning the image segmentation, we utilized ITK-SNAP 4.0 to depict the ROI manually. If possible, computer-assisted image segmentation will be conducted to automatically extract the ROI in our following studies. Using the DL algorithms, we are aiming to achieve accurate classification of other MDAs, like uterus septum.
In conclusion, ConvNeXt-B shows good performance on the discriminating of longitudinal vaginal septum and oblique vaginal septum and may have promising application in DL aided ultrasonographic diagnostic network, further benefiting the establishment of appropriate clinical decision-making.
Introduction
Müllerian duct anomalies (MDAs) are a collection of female congenital malformations probably involving uterus, cervix and vagina, with an incidence of about 5%, which have negative impacts on women’s physiology and psychology [ 1 ]. It’s reported that the prevalence of MDAs varies from 6 to 15% in women disturbed by fertility concerns and recurrent miscarriages, rising to 25% in women suffering both infertility and recurrent pregnancy loss [ 2 , 3 ]. Oblique vaginal septum belongs to an important subgroup of vaginal malformations of MDAs. According to the American Society for Reproductive Medicine (ASRM) classification of MDAs, it is defined as oblique vaginal septum when asymmetric septum and obstructed hemivagina occur, frequently co-existent with didelphic uterus/septate uterus and ipsilateral renal agenesis [ 4 – 6 ]. The clinical presentations of oblique vaginal septum are quite diverse and severe, such as dysmenorrhea, vaginal/pelvic mass and cyclic lower abdomen pain, etc [ 5 , 7 , 8 ]. Relatively, longitudinal vaginal septum is another main group of vaginal malformations, with a septum of variable length, and patients with longitudinal vaginal septum are commonly asymptomatic or present with dyspareunia [ 9 ]. Hence, considering the reproductive threatening and pain bearing, precisely identifying the longitudinal and oblique vaginal septa is of much importance.
In clinical practice, three-dimensional ultrasonography (3D-US) tends to be the diagnostic imaging of first-choice for vaginal anomalies, with comparable accuracy, greater accessibility, better operation convenience and lower economic cost [ 10 – 12 ]. However, due to the rarity of incidence, complexity of anatomical structures, and confusion of clinical manifestations of oblique vaginal septum and longitudinal vaginal septum, the inexperienced sonographers are likely to make a missed diagnosis and misdiagnosis, leading to the delay in treatment and increased risk of some severe complications including endometriosis, pelvic adhesions and infertility [ 13 – 15 ]. So, there is an urgent need to optimize the ultrasonography based diagnostic modality of longitudinal vaginal septum and oblique vaginal septum, to aid the sonographers make precise diagnosis and the clinicians make appropriate interventional decision-making.
In the past decade, deep learning (DL) technology has played an increasingly crucial role in medical images analysis [ 16 , 17 ]. CNN as a powerful DL algorithm, has shown outstanding talent to acquire the representative feature of images automatically, making DL-assisted diagnostic decision-making viable. It’s reported that about 80% studies of medical images analysis involved the CNN [ 18 , 19 ]. In obstetrical and gynecological congenital malformation field, CNN-based models have been widely applied in fetal ultrasound imaging classification. Xie et al. [ 19 ] utilized CNN-based DL algorithms to identify abnormal fetal brain ultrasound images. Li et al. [ 20 ] proposed a novel CNN-based model to recognize fetal lip abnormalities in ultrasound images accurately. Transformer is designed to conduct nature language processing or computer vision, which has attracted attention in medical image analysis as well [ 21 ]. Recently, transformer-based models have been introduced to improve the recognizing of fetal congenital heart disease [ 22 ].
However, the discrimination of longitudinal and oblique vaginal septa in ultrasound images using DL techniques has not been reported. In this study, two CNN [ 17 ]-based models (ResNet50 and ConvNeXt-B) and one transformer-based model (ViT-B/16) will be used for the classification of ultrasound images of longitudinal vaginal septum and oblique vaginal septum. Among them, ResNet50 was the most commonly used CNN-based DL model [ 23 ]. ViT-B/16 was a transformer-based DL model, and ConvNeXt-B was created by introducing part of the idea of transformer into CNN through depthwise convolution to achieve a similar effect [ 24 ]. With the assistance of deep learning models, we hope to achieve accurate ultrasonographic diagnosis of longitudinal and oblique vaginal septa and to improve treatment decision-making.
Supplementary Material
Supplementary Material 1.
Supplementary Material 1.
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.