Deep Learning-based Classification Model using SMOTE Resampling Technique to Identify Potent Inhibitors of Lethal Factor of Anthrax and Principal Component, Chemical Space Analysis

preprint OA: closed
Full text JSON View at publisher
Full text 152,565 characters · extracted from preprint-html · click to expand
Deep Learning-based Classification Model using SMOTE Resampling Technique to Identify Potent Inhibitors of Lethal Factor of Anthrax and Principal Component, Chemical Space Analysis | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Deep Learning-based Classification Model using SMOTE Resampling Technique to Identify Potent Inhibitors of Lethal Factor of Anthrax and Principal Component, Chemical Space Analysis Madhulata Kumari, Mohd Asif Shah, Saurav Mallik, Kanad Ray This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5315945/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Anthrax is a highly lethal disease caused by Bacillus anthracis. Lethal factor (LF) with protective antigen directly contributes to anthrax symptoms in humans. This research work identified a small molecule inhibitors of anthrax lethal factor. We developed a consolidated computational strategy that includes a deep learning-based SMOTE + artificial neural network (ANN) hybrid model, principal component analysis, t-SNE, activity cliff, constellation plot, scaffold, and fingerprinting to identify potential drug candidates against Anthrax. The best model showed 0.98 accuracy, 0.99 specificity, 0.99 sensitivity, 0.99 F1-score, 0.99 recall, 0.99 ROC, and 0.99 precision. The trained hybrid model screened out 134 FDA-approved drugs, 338 experimental drugs, 51 phytochemical compounds of the phytochemical database, and eight natural products from NCI divest IV as anthrax inhibitors. We found scaffold of ring system with substitution patterns such as 4-oxopyrrolo[3,2-c]quinolone enhanced the biological activity of Anthrax inhibitors. Fingerprints indicated greater than 80% and are linked to the ring system using the substitution pattern scaffold. These studies conclude that SMOTE + ANN model could be an efficient method for the virtual screening of large database and a new way to screen small molecules against Anthrax. Deep Learning Bacillus Anthrax Scaffold analysis Chemical analysis Actif cliff Principal component analysis t-SNE Fingerprint analysis phytochemical database SMOTE Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Introduction Bacillus anthracis is a gram-positive, encapsulated, rod-shaped, aerobic, spore-forming bacterial pathogen [ 1 ]. The main components of anthrax toxin are composed of lethal factor (LF, 90 kDa), edema factor (EF, 89 kDa), and protective antigen (PA, 83 kDa) [ 2 – 4 ]. An individual component is nontoxic; however, the composition of two components: LF combined with PA is called a lethal toxin, and EF with PA is called edema toxin, which directly contributes to the anthrax symptoms and lethality [ 5 – 9 ]. PA is necessary for the toxin to enter the cytoplasm, where it acts. Exotoxin makes up anthrax toxin, and intracellular active enzymes LF and EF are both [ 10 ]. Within the cytoplasm, they manifest their harmful effects. Genes that code for toxins can be found on the pXO1 (182 kb; accession no. NC001496) plasmid [ 11 ]. These genes, cya, lef , and pag , respectively encode EF, LF, and PA [ 12 – 15 ]. The protein known as the lethal factor toxin is directly linked to cell death. The LF gene is inactivated, or this protein is defective, which significantly reduces the virulence of the Bacillus anthracis strain (by a factor of roughly 1000). In order to treat Anthrax at all stages of disease inhibition, LF, the primary toxin component of the anthrax toxin, is necessary. The survival of cells or organisms in a sick state is inversely proportional to the concentration of this toxin. Numerous researchers have demonstrated the significance of LF toxin. Many research groups using both traditional and computational methods to develop anthrax drugs use it as a target for inhibitors. Martino Forino et al. created numerous compounds that blocked the LF and examined their effectiveness using a fragment-based strategy [ 16 ]. LF is a critical component of one of these toxins and a therapeutic target for anthrax inhibitors. Anthrax infections in humans are uncommon, but they have been reported in workers in the cattle industry. However, the anthrax spore can be maintained for a long time and discharged in areas where people congregate, making inhalation of the spore exceedingly deadly. Due to these characteristics, Anthrax is a viable candidate for deployment as a biological weapon. Armed forces from several nations began working on it as a biological weapon, and the threat increased when terrorists started using it. There were at least 79 respiratory infections and 68 fatalities due to the unintentional release of anthrax spores from a military research facility in the former Soviet Union in 1979. Goldberg et al. found that zinc-dependent metalloproteinase LF is a critical component of anthrax toxin and an important potential target for drug design [ 17 ]. Hydroxamates are known as anthrax LF inhibitors [ 18 ]. Several aminoglycoside antibiotics were found by Lee et al. as direct competitive inhibitors, with neomycin B being the most effective among them [ 19 ]. A quantitative high throughput test was created to screen small compounds that may reduce or prevent the internalization of the anthrax toxin using LF-beta-lactamase fusion proteins. The TEM-1 beta-lactamase (developed by Dr. Thomas Bugge's group at the NIH) was fused to the PA-binding region of the LF N-terminal (1-254 amino acids) to create the fusion protein. LF-beta-lactamase fusion proteins will internalize in the presence of PA and act on beta-lactamase substrate (CCF2/AM) trapped in cells due to cytoplasm esterase cleavage. When beta-lactamase hydrolyzes CCF2, acceptor fluorescence is released and fluoresces at 447 nm (blue light). The following filter set, Lambda (EX) = 405 nm, Lambda (EM) = 460 nm/530 nm, was used to monitor fluoresce intensity using an EnVision plate reader (PerkinElmer, Boston, MA) [ 20 ]. Artificial Intelligence in drug discovery The traditional drug development process is tedious, expensive, time-consuming, and inefficient; the success rate is very low, with only one hit molecule from one lakh launching in the market. The drug development process is being revolutionized by artificial intelligence, which can quickly identify possible biologically active molecules from millions of candidate compounds in a short amount of time [ 21 ]. Pattern recognition, biomarker identification and/or categorization, and other goals are among them. Artificial neural networks (ANNs), in particular, have been employed instead of ADMET factor testing and QSAR modeling evaluation to attain these goals [ 22 ]. The benefit of using deep neural networks is comprehending extremely complicated biological spatial settings [ 23 ]. The deep neural network plays a significant role in the drug discovery process. It is applied at different stages in the drug development process, from target identification to the clinical phase. The deep learning algorithms can extract physical, chemical, and biological properties from chemicals and accurately predict their biological activity. Artificial neural network methodology is applied in the drug development process as an alternative to traditional drug development [ 24 ]. Pang et al. applied a deep neural network for drug-target interaction prediction based on feature representation [ 25 ]. ANN was used to predict the pharmacokinetic of aminoglycosides in severely ill patients [ 26 ]. Bilsland et al. developed ANN to screen for senescence-inducing compounds using known agonist compounds [ 27 ]. Domine et al. used ANN algorithm to predict the adverse drug effect [28]. Moon et al. built ANN for dose determination of HMG-CoA-reductase inhibitors [29]. Kumari et al. proposed deep learning for virtual screening of compounds against Sars Cov-2 [ 30 ]. Kumari et al. used deep learning, quantitative structure-activity relationship (QSAR), molecular docking, molecular dynamics, and free energy calculation in drug design and development [ 31 ]. The deep learning is an excellent computational approach for the virtual screening small molecule inhibitors to accelerate the drug discovery process and reduce experimental work's time and costs. However, the experimental screening of Anthrax is dangerous, requiring high laboratory safety for research work. The aim of study was to predict novel potential drug candidates for anthrax infection. So, the proposed work is an effective deep learning model for an imbalanced anti-anthrax bioassay dataset to increase the classification rate and to reduce the false positive rate in minority classes without increasing the false negative rate in the majority classes. In order to solve the imbalance dataset problem, Under-sampling, Over-sampling, and the synthetic minority over-sampling technique (SMOTE) are some of the resampling techniques employed. The SMOTE generates additional minority samples to achieve class balance, while ANN learns hierarchical feature representation from the balanced data to screen biologically active molecules from unknown chemical libraries. This study used the high throughput screening approach to rapidly identify anthrax inhibitors using hybrid algorithms based on deep learning. In order to obtain potential drug candidates for Anthrax, we collated an experimental qHTS bioassay dataset for lethal toxin internalization. The strategy was to construct a deep learning-based classification model to predict the biological activity of anthrax inhibitors. Before the model's development, we used important descriptors for bioactivity in five databases phytochemical compounds, natural product NCI diversity set IV, and FDA-approved drugs and experimental drugs and natural products from the ZINC database to search for novel inhibitors against Lethal toxins to treat Anthrax. Further, we analysed the chemical space of the anthrax inhibitors dataset. The study suggested that the deep learning could generate potent drug candidates for treating Anthrax. Results and discussion In this study, we built deep learning-based models to predict inhibitors of Anthrax. In the qHTS assay for anthrax Lethal Toxin Internalization, compounds are first classified as active and inactive. An inactive compound's PUBCHEM_ACTIVITY_SCORE is zero, while active compounds' PUBCHEM_ACTIVITY_SCORE score ranges between 40 and 100. The data pre-processing procedure was done before to the training of the model. The molecule that was converted into 179 descriptor vectors explains the structural and functional properties of Anthrax inhibitors. It is impossible to fully train an efficient model because once the imbalanced data samples reach a certain level, the classification effect of the model would substantially decline. To solve the unbalanced dataset issue, create balanced dataset samples using resampling techniques, then utilize those samples to train a model to increase the classification model's overall accuracy. In this study, we used deep learning models to train balanced data and monitored the statistical parameters of model classification to manage the hybrid sampling process. Three hidden layers with ReLU activation function were employed in the suggested ANN architecture, along with one dense layer and a sigmoid function for binary classification. We employed a learning rate (0.001), Adam optimizer, and 100 epochs to optimize a model. The model performance of ANN was measured for accuracy and loss, as shown in Fig. 1 and Fig. 2 , respectively. The results showed that ANN model with SMOTE optimizer has a better predictive ability for the external dataset. In this study, we focused on imbalanced data and investigated the performance of resampling techniques. The best resampling method was chosen by comparing the model’s performance. Firstly, we employed resampling methods such as under-sampling, over-sampling, and SMOTE with ANN to test the model's superiority. The statistical results are shown (Table 1 ). That the accuracy of ANN model with SMOTE is the highest among others. The training loss curve of the hybrid model (SMOTE + ANN) shows a sharp drop at first, then fluctuates with an increment of epochs, and finally drops slowly. The training loss curves show a faster convergence speed during 1–20 epochs, achieving robust and excellent performance with the training model. Also, the loss curve of the test dataset shows faster convergence from starting then slowly converging with an increment of epochs. Therefore, SMOTE + ANN model can take less training time to predict the biological activity of molecules. By contrasting the performance of the models using various statistical parameters, the best SMOTE + ANN hybrid model was chosen. Table 1 presents the statistical findings of the test validation. The model's accuracy assessed Fig. 3 ’s overall effectiveness. Figure 4 displays the sensitivity and specificity bar chart. The hybrid SMOTE + ANN model obtained overall 98% accuracy, sensitivity, specificity, recall, F-measure, ROC, and 99% precision, by comparing the classification models. Additionally, ROC was calculated to demonstrate the model's resilience. As a result, it is frequently utilized for a quick performance evaluation of virtual screening techniques. Figure 5 shows the SMOTE + ANN models' AUC curve, which displayed a value of 0.98. (c). The confusion matrix showed the percentage of compounds that were identified; (a) the SMOTE + ANN model's TP is 0.98 and TN is 0.99, while FN is 0.015 and FP is 0.018. Thus, comparison analysis revealed that the SMOTE + ANN was the best hybrid model out of the three. The findings imply that this strategy might work well for filtering out large databases. In the unbalanced dataset, the ANN classifier model gains significantly from using SMOTE. SMOTE is effective at solving the classification model's problem of class imbalance. Table 1 The statistical results of deep learning-based models of the anti-anthrax testset Classification Model Accuracy Specificity Sensitivity Precision Recall F1-score *ROC Under-sampling ANN 0.66 0.46 0.87 0.61 0.87 0.71 0.66 Over-sampling ANN 0.97 0.96 0.99 0.96 0.99 0.97 0.98 SMOTE ANN 0.98 0.98 0.98 0.99 0.98 0.98 0.98 *ROC: Receiver Operating Characteristic. Deployment of SMOTE + ANN hybrid model Five unknown chemical libraries were virtually screened using the suggested SMOTE + ANN hybrid model. In order to identify natural compounds that are effective anti-anthrax inhibitors, we screened out (a) 134 FDA-approved drugs, (b) 338 experimental drugs, (c) 130 phytochemical compounds from a phytochemical database, (d) 15 natural products from NCI divsetIV, and (e) 8098 natural compounds from ZINC database. Then, we used Lipinski's rules (RO5) on the molecules that had been sorted, which led to the identification of 51 phytochemical compounds of the phytochemical database, 8 of the natural products from NCI divsetIV, and 3050 of the natural compounds from the ZINC database as anthrax inhibitors. Chemical space analysis In this research work, chemical space has been characterized by eight types of chemical properties namely druglikeness, ligand efficiency, toxic properties, chemical shape, atom counts, ring counts, functional counts and 3-dimensional globularity. We have calculated 57 descriptors using DataWarrior tools. After that, we employed PCA to find important descriptors. The scatter plot of PCA showed three-dimensional scatter plot of active and inactive compounds of a lethal factor of anthrax in Fig. 6 (A). In Fig. 6 , the result showed a PCA-based two-dimensional scatter plot of (B) activity score, (C) Druglikeness (D) Mutagenic (E) Tumerogenic (F) Reproductive effective (G) Irritant and H) 2D- Eigenvalues which is based on 50 descriptors. The eigenvalues of important descriptors listed in the Table 2 . First three principal components cover 50% of the variation. The t-SNE algorithm is an effective tool for visualizing the molecular similarity of molecules of the inhibitors of the anthrax dataset. We selected 427 compounds with activity scores. Figure 7 shows a visual representation of a two-dimensional t-SNE map of anthrax inhibitors with activity scores. Each datapoints represent a chemical structure, and the colour indicates activity scores. Molecular similarity can be mapped into a constellation plot; Fig. 8 shows a constellation plot of anthrax inhibitors. Table 2 Eigenvalues of first two Principal components of important descriptors. S. No. Variable Name PC1 PC2 1. Non-H Atoms 0.265295 0.002054 2. Total Surface Area 0.261924 -0.00658 3. VDW-Volume 0.258821 -0.04426 4. VDW-Surface 0.258007 -6.79E-04 5. Monoisotopic Mass 0.249458 0.004788 6. Molweight 0.248822 0.004555 7. Total Molweight 0.247677 -0.00612 8. Rings Closures 0.199163 -0.05212 9. Small Rings 0.196141 -0.0646 10. H-Acceptors 0.187382 0.195857 11. Molecular Complexity 0.181031 -0.00443 12. Electronegative Atoms 0.16195 0.204544 13. Non-C/H Atoms 0.16195 0.204544 14. Rotatable Bonds 0.1524 0.061341 15. Hetero-Rings 0.148183 0.061238 16. LELP from Pubchem_Sid 0.147187 -0.12893 17. sp3-Atoms 0.139905 -0.22066 18. Polar Surface Area 0.125943 0.227964 19. Aromatic Rings 0.125713 0.165036 20. Aromatic Atoms 0.125614 0.158279 21. Non-Aromatic Hetero-Rings 0.099007 -0.1434 22. Saturated Hetero-Rings 0.093897 -0.176 23. Saturated Rings 0.091614 -0.26605 24. Non-Aromatic Rings 0.088381 -0.25144 25. Hetero-Aromatic Rings 0.087967 0.207168 26. Amides 0.08713 0.108793 27. Aromatic Nitrogens 0.078695 0.187258 28. Acidic Oxygens 0.076084 -0.02449 29. cLogP 0.074702 -0.12574 30. Carbo-Rings 0.074069 -0.15133 31. Carbo-Aromatic Rings 0.073619 -0.00277 32. Amines 0.061288 -0.16471 33. Globularity SVD 0.055727 -0.11326 34. Druglikeness 0.053129 -0.03755 35. Alkyl-Amines 0.050453 -0.21251 36. Basic Nitrogens 0.050438 -0.17797 37. Stereo Centers 0.044728 -0.1834 38. Molecular Flexibility 0.042233 0.060851 39. H-Donors 0.041249 0.050932 40. Fragments 0.039836 -0.06471 41. Aromatic Amines 0.034193 -0.01587 42. Relative PSA 0.031168 0.27485 43. Saturated Carbo-Rings 0.026642 -0.20324 44. Symmetric atoms 0.018181 -0.10439 45. Non-Aromatic Carbo-Rings 0.013587 -0.21297 46. LLE from Pubchem_Sid -0.05773 0.14529 47. Shape Index -0.08173 0.068126 48. cLogS -0.08885 -0.00984 49. LE from Pubchem_Sid -0.20474 0.057717 50. Globularity Vol -0.21441 -0.11112 Constellation plots The construction of constellation plots employs a dataset of 472 anthrax inhibitors. Analogs of a dataset have 153 cores which contain 472 compounds. Figure 8 shows the constellation plot that gives chemical space and chemical substructure information based on network and coordinates. Each node represents analog series. Figure 8 is a picture of a single constellation of the most active inhibitors among all molecules. Analog series close in the chemical space have scaffold diversity. Activity score can be mapped into a constellation plot, where the colours for the activity score range from red to green. Activity score colours are varied from red to green. The highest activity score is 90. The molecules between data points are represented by linking lines. Low activity score is represented by the red colour (activity score < 50), whereas the highest activity score is represented by the green colour (activity score: 90). The activity score and fragment-based similarity are shown to be quantitatively related in Fig. 9 . Similarity in activity score versus fragment (> 80%) indicates that molecules with relative descriptors also have similar fragment pairs. Compound has maximum activity scores 85, and it contains 16 similar molecules. Figure 9 shows the molecular similarity chart of non-mutagenic, non-tumorigenic, non-reproductive effective, and non-irritant of anthrax inhibitors, which belong to a similar scaffold of 4-oxopyrrolo[3,2-c]quinolone. Scaffold analysis Furthermore, we analyzed the Murcko scaffolds and Ring systems with substitution patterns. Murcko scaffolds only contain the ring systems and remove side chains. Among the 411 distinct scaffolds, 377 are singletons. Out of 34 scaffolds, 28 Murcko scaffolds have two frequencies. Figure 10 shows the six most common Murcko scaffolds and their respective frequencies, namely: 4-oxopyrrolo[3,2-c]quinolone (6); dibenzhydryl (4), diphenyl (3); benzene (14), phenoxy (4) and piperidin-2-yl]ethyl]-2-methyl sulfanyl phenothiazine(3). Aside from benzene, the second most frequent scaffold was a 4-oxopyrrolo[3,2-c]quinolone, followed by di-benzhydryl and phenoxy Murcko scaffolds. Ring systems with substitution pattern In this scaffold, all single ring and annelated ring system carries an exeo-cyclic, non-hydrogen substituents. Among the 295 distinct scaffolds, 176 are singletons. Out of 119 scaffolds, 18 scaffolds have more than ten frequencies. Figure 10 shows the six most common Murcko scaffolds and their respective frequencies namely: Figure 11 shows the 18 most common ring systems with substituents scaffold and their respective frequencies. Aside from benzene whose frequency was 118. The second most frequent scaffold was piperazine (50), followed by furan (29), morpholine (21), piperidine (15), 4-oxopyrrolo[3,2-c]quinolone (15), benzodioxane (14), thienyl (13), thiazole (12), triazole (11), benzodioxole (11). These scaffolds are responsible for inhibitory effects. Piperazine group containing molecules having activity scores ranging from 40 to 50. 4-oxopyrrolo[3,2-c]quinolone group containing molecules having activity scores ranging from 40 to 85, making it a suitable drug candidate for Anthrax. Ring system with a substitution pattern scaffold shows that a change in atoms or groups results in a change in biological activity. We discovered that 4-oxopyrrolo[3,2-c]quinolone scaffold has similar chemical and biological activity using fragment pair analysis. Similar structures produce the same biological effect and have a lower failure rate. The use of scaffolding is based on a low-risk approach and produces structural diversity. These structures can act like drugs and have strong, adaptable binding affinities for a range of targets. The Ring systems with substitution pattern scaffold was used in the present study to assess the value of the rings' connectivity network based on the plain ring system, which indicated the value of specific heteroatoms and their relative position rings. The results suggest that the potential anti-anthrax drug candidates might be significantly influenced by the type of ring structure with specific substituents. This method can be used in future studies to assess the impact of ring structure in relation to the kind of placement of substituents. Numerous substituents can be placed in a number of places to enhance the scaffold's varied properties. Conclusion In this research work, a deep learning-based hybrid model and virtual screening enabled us to identify novel potential inhibitors of Anthrax lethal factor from a chemical database. A small list of compounds from the experimental lab assay of Anthrax has enabled us to predict several drug candidates with potential inhibitory activity. We have developed a deep learning-based hybrid SMOTE + ANN model to identify potential drug candidates against Anthrax. The best model showed 0.98 accuracy, specificity, sensitivity, recall, ROC, F1-score, and 0.99 precision. We screened out 134 FDA-approved drugs, 338 experimental drugs, 51 phytochemical compounds of the phytochemical database, 8 of the Natural products from NCI divest IV, and 3050 of the Natural compounds from the ZINC database as anthrax inhibitors. Additionally, the present work explored chemical space analysis with the help of activity cliff, t-SNE of anthrax inhibitors using constellation plot, and t-SNE plot. This novel visualization of the chemical space is able to identification of promising analog series with score activity. We found scaffold of ring system with substitution patterns such as 4-oxopyrrolo[3,2-c]quinolone enhanced the biological activity of anthrax inhibitors. Fingerprints showed greater than 80% and are connected with the ring system with the substitution pattern scaffold. The results demonstrate that the deep learning model can be used to widen the search for small molecules to inhibit Anthrax and to fully use the diverse now publically available compound databases. Materials and Methods High-throughput screen data collection We extracted experimental anthrax lethal toxin inhibitors from a publically available database [ 20 ]. The total no. of chemical molecules is 70,086. After completing the data curation process, we got 57,593 unique molecules. The total no. of active and inactive inhibitors was 471 and 57122, respectively. 2D chemical structures were converted into 3D using CORINA software [ 32 ]. Data pre-processing Data pre-processing is an important step to prepare the data to form a deep learning model and improve the data quality. .Data pre-processing techniques remove outliers and scales the features to an equivalent range, as well as cleaning, transforming, reducing, and integrating data in order to check missing values and noisy data and make it ready for model development. Therefore, we used data pre-processing techniques before building a deep learning model. Descriptor calculation Molecular descriptors encode molecules' physical, chemical, and structural information. The PowerMV tool was used to generate molecular descriptors, namely 147 pharmacophore fingerprints, 24 weighted burdens, and eight molecular properties [ 33 ]. DataWarrior tool was used for computing principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), constellation plots, and scaffold analysis [ 34 ]. Dealing with an imbalanced dataset Data imbalance reflects an unequal distribution of classes within the dataset. In this case. Working with an imbalanced dataset is challenging since the ANN model has shown poor performance. The majority class has a significantly bigger sample size, while the minority class has a much lower sample size. The ratio between the two classes is high, which would result in inaccurate classification and lower accuracy. When it comes to virtual drug discovery screening, this is a prevalent issue. The anti-anthrax dataset has 57594 compounds, and only 0.8% of the total dataset has active chemicals, whereas 99.2% of the sample has inactive biological activity. Deep learning algorithms can cause under-fitting or over-fitting issues since they are biased toward the dominant class. To address the issue of imbalanced data, we are now using resampling techniques such as Under-sampling, Over-sampling, and SMOTE. Using an algorithm, the resampling technique creates a balanced dataset from the imbalanced dataset. A hybrid model based on deep learning is constructed using this balanced dataset to enhance overall statistical findings further. Figure 12 displays the workflow structure for deep learning-based virtual screening. Random under-sampling technique This method involves using all molecules in the minority class and then removing certain molecules from the majority class randomly until both classes are equal. In this investigation, the entire dataset was subjected to an under-sampling approach. However, under-sampling techniques have risks because they could lose potentially important information about molecules. As a result, the number of compounds in the majority of samples was reduced from 57122 to 472, which is equal to the quantity of an uncommon class of active chemicals. Random over-sampling technique Over-sampling the minority class is one way to deal with imbalanced datasets. This method involves using all molecules in the majority class and duplicating molecules randomly from the minority class until both classes are equal but these molecules don’t provide any new insight into the model. Instead, new molecules can be created by synthesizing the old ones.. The majority class uses all instances. We performed oversampling technique on the entire dataset. As a result, the minority class sample (active molecules) expanded from 472 compounds to 57122, which is equal to the number of inactive molecules in the majority class sample. Synthetic Minority Over-sampling Technique The SMOTE is an over-sampling technique that generates synthetic molecules from a minority class of imbalanced datasets and resolves the overfitting and underfitting problems described by Nitesh Chawla et al. [ 35 , 36 ]. Nakamura et al. applied Learning Vector Quantization based Synthetic Minority Oversampling Technique for biomedical data [ 37 ]. Seo et al., applied Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection [ 38 ]. Pandey et al. used Smote for the automatic detection of arrhythmia from an imbalanced ECG database [ 39 ]. To handle the class-imbalanced dataset, Derhab et al. presented a hybrid model of SMOTE and ANN [ 40 ]. Kumari et al. proposed a hybrid model of SMOTE-ENN + ANN to identify Marburg virus inhibitors [ 41 ]. Therefore, we used SMOTE technique to solve the imbalanced data problem. Dataset division The dataset was divided into 4:1 ratios after the resampling procedure. So, we got 80% of the training dataset and 20% of the testset. Hyperparameters are adjusted during the training phase using the training dataset. Adjusting hyperparameters is necessary to prevent overfitting. The testset is then used to adjust the hybrid model's hyperparameters and assess the classification model's statistical performance in terms of accuracy, sensitivity, specificity, precision, recall, F1-score, and ROC. Model development Artificial Neural Network We created an ANN model, which is a form of deep learning. A subset of machine learning known as "deep learning" uses layers of nodes to convert inputs into outputs. ANNs are based on the brain's neural network [ 42 ]. Both linear and non-linear datasets can be easily fitted with ANNs. The training of ANNs using example data facilitates their learning process. The method of repeatedly utilizing a training algorithm to change the weights between neurons to get the desired outcome. The learning rate, optimizers, activation function, and initial value of the weights are only a few of the model parameters that must be optimized in order to maximize the ANN model's effectiveness. The input layer, the hidden layer, and the output layer are the three different types of layers that make up an ANN [ 43 ]. The input layer and the output layer, respectively, reflect the classifier's input and output. The layers between the input and output layers that change the data feature to create predictions are known as the hidden layers. ANN can have a lot of secret layers. In this model, the chosen parameters are fed into the input vector and sent to the hidden layer, which uses mathematical operators to process the data. After the procedure is finished, a test dataset with comparable features is utilized with the trained model. The ANN model known as the Multi-layer Perceptron (MLP) [ 44 ] is represented by Eq. 1 : $$\:{y}_{i}=f\left(\sum\:_{i=1}^{n}{w}_{ij}{x}_{i}\:+{b}_{j}\right)$$ 1 ………….. Where \(\:\text{t}\text{h}\text{e}\:\text{a}\text{c}\text{t}\text{i}\text{v}\text{a}\text{t}\text{i}\text{o}\text{n}\:\text{f}\text{u}\text{n}\text{c}\text{t}\text{i}\text{o}\text{n}\:\:\text{i}\text{s}\:\:\:\:f\:,\:\text{a}\text{n}\text{d}\:{y}_{i}\) is the output result. \(\:{x}_{i}\) is the input vector, \(\:{w}_{ij}\) is a weight, n is the number of nodes, and \(\:{b}_{j}\) is the bias. The activation function in a neural network is responsible for transforming the node’s summed weighted input from the node’s activation into output for that input. A mathematical function known as the sigmoid has an S-shaped curve [ 45 ]. The sigmoid function transforms a real value into one that can be understood as a probability because the sigmoid function has the property that maps the entire number line into a small range, such as between 0 and 1 or -1 and 1. In this study, the sigmoid function is used by each neuron in a multi-layer neural network to predict the probabilities as outputs in the range 0 to 1. The sigmoid function is defined in Eq. 2 : $$\:S\left(x\right)=\frac{1}{1+{e}^{-x}}$$ 2 ……….. Here, x is the input. Rectified linear units (ReLU) The rectified linear activation function (ReLU) is the most commonly used activation function in neural network models [ 46 ]. The function returns 0 if it receives any negative input, but it returns that value for any positive value. So it can be written as in Eq. 3: f(x) = max (0, x) ……………(Eq. 3) where x is the input to a neuron and where f(x) is the ReLU function. Before developing the final model, we went through several rounds of trial and error. For model optimization, we used a combination of all hyperparameters such as hidden layers, layer type (dense layer), activation function rectified linear unit (ReLU), output layer function (Sigmoid), Adam optimizer, and epochs. As a loss function, we used binary cross-entropy. Training model to achieve the best results may take some time. As a result, for model development, we used the Google Colab computer environment. Model performance In binary classification, the accuracy of the minority class is much more important than the accuracy of the majority sample, so dealing with the minority class with the help of resampling techniques improves the overall classification performance. We evaluated the model's performance for comparative analysis using the results of the confusion matrix and loss and gain. The confusion matrix is a specific table that allows visualization of the classification model's performance [ 47 – 48 ]. The active compounds are represented by the minority class, while the inactive compounds are represented by the majority class. Where each column of the matrix represents a predicted class instance, and each row represents actual class instances. It's easy to see if the system is mixing up two classes. True positives (TPs) and true negatives (TNs) are the correctly predicted active and inactive compounds, respectively; false positives (FPs) are inactive compounds that are incorrectly classified as active compounds, and false negatives (FNs) are active compounds that are incorrectly classified as inactive compounds. We evaluated accuracy, sensitivity, specificity, recall, F-measure, and ROC based on the confusion matrix data [ 49 ]. The accuracy of the classification model represents its overall performance. It is defined as the ratio of correctly classified compounds to the total number of compounds in the dataset in Eq. 4 . $$\:Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$ 4 …….. The amount of total active compounds predicted by the learning method as active anti-anthrax compounds are referred to as sensitivity. The true positive rate is directly proportional to sensitivity in Eq. 5 . $$\:Sensitivity=\frac{TP}{TP+FN}$$ 5 …….. Specificity refers to the proportion of total inactive compounds predicted by the learning model to be inactive for Anthrax. The true negative rate is directly proportional to specificity in Eq. 6 . $$\:Specificity=\frac{TN}{TN+FP}$$ 6 …….. The precision of the model is defined as the number of true active compounds correctly predicted divided by the total number of active compounds classified by the model in Eq. 7 . $$\:Precision=\frac{TP}{TP+FP}$$ 7 …….. The recall is calculated as the total number of active compounds divided by the number of active compounds that were correctly predicted. The sensitivity and recall are calculated simultaneously in Eq. 8 . $$\:Recall=\frac{TP}{TP+FN}$$ 8 …….. The model's accuracy on the dataset is measured by the F1-score. As the harmonic mean of recall and precision, it is so named. An F-score of 1 indicates a flawless model. F1-score is used to assess the effectiveness of the classification model, while sensitivity and specificity are used to track changes in the proportion of occurrences falling into the majority or minority class, which corresponds to the active or inactive class, respectively. F1-score will increase when both recall and precision value are higher, in Eq. 9 . $$\:F1-score=2*\frac{Precision*Recall}{Precisison+Recall}$$ 9 …….. The receiver operating characteristic (ROC) curve's area beneath it is known as the area under the curve (AUC). A 2-D graphical plot called the ROC illustrates and visualizes performance. The ROC's AUC is between 0 and 1. AUC values close to 1 indicate that the TP is more likely to be correctly predicted than the FP. Compound library and chemicals We extracted chemical molecules from five libraries: a) 2510 FDA-approved drugs from Drugbank b) 6186 experimental drugs from the Drugbank,[ 50 ] c) 918 phytochemical compounds, d) 423 natural products from the NCI divsetIV [ 51 ], and e) 112,267 natural compounds from the ZINC database [ 52 ] to find new potential anti-anthrax inhibitors. Then, in order to prioritize lead compounds, we used Lipinski's rules (RO5)[ 53 ]. The primary anti-anthrax molecules anticipated by the suggested model were present in the remaining compounds. Abbreviations ANN Artificial Neural Network AUC Area Under Curve ReLU Rectified Linear Unit ROC Receiver Operating Characteristic SMOTE Synthetic minority oversampling technique Declarations Ethics approval and consent to participate Not Applicable. Consent for publication Not Applicable. Data Availability Statement The datasets generated and/or analysed during the current study are available in the “NCBI PubChem” repository site, https://pubchem.ncbi.nlm.nih.gov/bioassay/912 C ompeting interest The authors declare no conflict of interest. Author Contributions: M.K., K.R. worked on Conceptualizations, methodology, formal analysis, and writing—original draft preparation, while M.A.S., S.M. participated in writing—review and editing, validation, supervision, project administration. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding from their institutes. References Nestorovich EM, Bezrukov SM. Designing inhibitors of anthrax toxin. Expert Opin Drug Discov. 2014 Mar;9(3):299-318. doi: 10.1517/17460441.2014.877884. Barth H, Aktories K, Popoff MR, et al. Binary bacterial toxins: biochemistry, biology, and applications of common clostridium and bacillus proteins. Microbiol Mol Biol Rev 2004;68:373-402. Abrami L, Reig N, van der Goot FG. Anthrax toxin: the long and winding road that leads to the kill. Trends Microbiol 2005;13(2):72–78. [PubMed: 15680766] Rakesh Bhatnagar & Smriti Batra (2001) Anthrax Toxin, Critical Reviews in Microbiology, 27:3, 167-200, DOI: 10.1080/20014091096738 Firoved, A.M.; Miller, G.F.; Moayeri, M.; Kakkar, R.; Shen, Y.; Wiggins, J.F.; McNally, E.M.; Tang, W.J.; Leppla, S.H. Bacillus anthracis edema toxin causes extensive tissue lesions and rapid lethality in mice. Am. J. Pathol. 2005, 167, 1309–1320. Collier RJ, Young JA. Anthrax toxin. Annu Rev Cell Dev Biol. 2003;19:45-70. doi: 10.1146/annurev.cellbio.19.111301.140655. Duesbery NS, Vande Woude GF. Anthrax toxins. Cell Mol Life Sci. 1999 Sep;55(12):1599-609. doi: 10.1007/s000180050399. Moayeri M, Leppla SH. The roles of anthrax toxin in pathogenesis. Curr Opin Microbiol. 2004 Feb;7(1):19-24. doi: 10.1016/j.mib.2003.12.001. Banks DJ, Ward SC, Bradley KA. New insights into the functions of anthrax toxin. Expert Rev Mol Med. 2006 Apr 11;8(7):1-18. doi: 10.1017/S1462399406010714. Lowe DE, Glomski IJ. Cellular and physiological effects of anthrax exotoxin and its elevance to disease. Front Cell Infect Microbiol. 2012 Jun 1;2:76. doi: 10.3389/fcimb.2012.00076. PMID: 22919667; PMCID: PMC3417473. Thorne, C. 1993. Bacillus anthracis , p. 113-124. In A. L. Sonenshein, J. A. Hoch, and R. Losick (ed.), Bacillus subtilis and other gram-positive bacteria: biochemistry, physiology, and molecular genetics. American Society for Microbiology, Washington, D.C. Robertson, D. L., and S. H. Leppla. 1986. Molecular cloning and expression in Escherichia coli of the lethal factor gene of Bacillus anthracis. Gene 44 : 71-78. Mock, M., E. Labruyere, P. Glaser, A. Danchin, and A. Ullmann. 1988. Cloning and expression of the calmodulin-sensitive Bacillus anthracis adenylate cyclase in Escherichia coli. Gene 64 : 277-284. Robertson, D. L., M. T. Tippetts, and S. H. Leppla. 1988. Nucleotide sequence of the Bacillus anthracis edema factor gene (cya): a calmodulin-dependent adenylate cyclase. Gene 73 : 363-371. Tippetts, M. T., and D. L. Robertson. 1988. Molecular cloning and expression of the Bacillus anthracis edema factor toxin gene: a calmodulin-dependent adenylate cyclase. J. Bacteriol. 170 : 2263-2266. Forino M, Johnson S, Wong TY, Rozanov DV, Savinov AY, Li W, Fattorusso R, Becattini B, Orry AJ, Jung D, Abagyan RA, Smith JW, Alibek K, Liddington RC, Strongin AY, Pellecchia M. Efficient synthetic inhibitors of anthrax lethal factor. Proc Natl Acad Sci U S A. 2005 Jul 5;102(27):9499-504. doi: 10.1073/pnas.0502733102. Epub 2005 Jun 27. PMID: 15983377; PMCID: PMC1160517. Goldberg AB, Turk BE. Inhibitors of the Metalloproteinase Anthrax Lethal Factor. Curr Top Med Chem. 2016;16(21):2350-8. doi: 10.2174/1568026616666160413135732. PMID: 27072692; PMCID: PMC5208045. Li F, Chvyrkova I, Terzyan S, Wakeham N, Turner R, Ghosh AK, Zhang XC, Tang J. Inhibition of anthrax lethal factor: lability of hydroxamate as a chelating group. Appl Microbiol Biotechnol. 2012 May;94(4):1041-9. doi: 10.1007/s00253-012-3893-7. Epub 2012 Jan 25. PMID: 22270239; PMCID: PMC3364607. Lee, L.V.; Bower, K.E.; Liang, F.S.; Shi, J.; Wu, D.; Sucheck, S.J.; Vogt, P.K. and Wong, C.H. (2004) J. Am. Chem. Soc., 126(15), 4774-4775. National Center for Biotechnology Information (2023). PubChem Bioassay Record for AID 912, Source: National Center for Advancing Translational Sciences (NCATS). Retrieved February 5, 2023 from https://pubchem.ncbi.nlm.nih.gov/bioassay/912. Paul D, Sanap G, Shenoy S, Kalyane D, Kalia K, Tekade RK. Artificial intelligence in drug discovery and development. Drug Discov Today. 2021 Jan;26(1):80-93. doi: 10.1016/j.drudis.2020.10.010. Epub 2020 Oct 21. PMID: 33099022; PMCID: PMC7577280. Cheirdaris, D.G. (2020). Artificial Neural Networks in Computer-Aided Drug Design: An Overview of Recent Advances. In: Vlamos, P. (eds) GeNeDis 2018. Advances in Experimental Medicine and Biology, vol 1194. Springer, Cham. https://doi.org/10.1007/978-3-030-32622-7_10 Kim J, Park S, Min D, Kim W. Comprehensive Survey of Recent Drug Discovery Using Deep Learning. Int J Mol Sci. 2021 Sep 15;22(18):9983. doi: 10.3390/ijms22189983. PMID: 34576146; PMCID: PMC8470987 Bourquin J, Schmidli H, van Hoogevest P, Leuenberger H. Basic concepts of artificial neural networks (ANN) modeling in the application to pharmaceutical development. Pharm. Dev. Technol. 2 (2), 95–109 (1997). Peng J, Li J, Shang X. A learning-based method for drug-target interaction prediction based on feature representation learning and deep neural network. BMC Bioinformatics. 21 (Suppl 13), 394 (2020). Yamamura S. Clinical application of artificial neural network (ANN) modeling to predict pharmacokinetic parameters of severely ill patients. Adv. Drug. Deliv. Rev. 551233-1251 (2003). Bilsland AE, Pugliese A, Liu Y et al. Identification of a selective G1-phase benzimidazolone inhibitor by a senescence-targeted virtual screen using artificial neural networks. Neoplasia 17(9), 704–715 (2015). Domine D, Guillon C, Devillers J, Lacroix R, Lacroix J, Doré JC. Nonlinear neural mapping analysis of the adverse effects of drugs. SAR QSAR Environ. Res. 8 (1-2), 109–120 (1998). Moon A, Smith T. A preliminary evaluation of neural network analysis for pharmacodynamic modeling of the dosing of the hydroxymethylglutaryl coenzyme A-reductase inhibitors simvastatin and atorvastatin. Clin Ther. 24 (4), 653–661 (2002). Kumari M, Subbarao N. Deep learning model for virtual screening of novel 3C-like protease enzyme inhibitors against SARS coronavirus diseases. Comput. Biol. Med. 132 , 104317 (2021). Kumari M, Subbarao N. Development of a deep learning-based quantitative structure-activity relationship model to identify potential inhibitors against the 3C-like protease of SARS-CoV-2. Future Med Chem. 2022 Nov;14(21):1541-1559. doi: 10.4155/fmc-2021-0063. Epub 2022 Sep 30. PMID: 36177879. J. Sadowski J, Gasteiger J, Klebe G. Comparison of automatic three-dimensional model builders using 639 X-ray structures. J. Chem. Inf. Model. 34 4 (1994). Liu K, Feng J, Young SS. PowerMV: a software environment for molecular viewing, descriptor generation, data analysis and hit evaluation. J. Chem. Inf. Model. 45 (2), 515–522 (2005). Sander T, Freyss J, von Korff M, Rufener C. DataWarrior: an open-source program for chemistry aware data visualization and analysis. J Chem Inf Model. 2015 Feb 23;55(2):460-73. doi: 10.1021/ci500588j. Epub 2015 Feb 2. PMID: 25558886. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority oversampling technique, J. Artificial. Intell. Res. 16 321–357 (2002). Blagus R, Lusa L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics. 2013 Mar 22;14:106. doi: 10.1186/1471-2105-14-106. PMID: 23522326; PMCID: PMC3648438. Nakamura M, Kajiwara Y, Otsuka A, Kimura H. LVQ-SMOTE - Learning Vector Quantization based Synthetic Minority Over-sampling Technique for biomedical data. BioData Min. 2013 Oct 2;6(1):16. doi: 10.1186/1756-0381-6-16. PMID: 24088532; PMCID: PMC4016036. Seo JH, Kim YH. Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection. Comput Intell Neurosci. 2018 Nov 1;2018:9704672. doi: 10.1155/2018/9704672. PMID: 30515202; PMCID: PMC6236522. Pandey SK, Janghel RR. Automatic detection of arrhythmia from imbalanced ECG database using CNN model with SMOTE. Australas Phys Eng Sci Med. 2019 Dec;42(4):1129-1139. doi: 10.1007/s13246-019-00815-9. Epub 2019 Nov 14. PMID: 31728941. Derhab A, Aldweesh A, Emam AZ,. Khan FK. Intrusion Detection System for Internet of Things Based on TemporalConvolution Neural Network and Efficient Feature Engineering Wirel. Commun. Mob. Comput. 2020 6689134 (2020). Kumari M, Subbarao N. A hybrid resampling algorithms SMOTE and ENN based deep learning models for identification of Marburg virus inhibitors. Future Med Chem. 2022 May;14(10):701-715. doi: 10.4155/fmc-2021-0290. Epub 2022 Apr 8. PMID: 35393862. SDreyfus SE. Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure. J. Guid. Control. Dyn. , 13 926-928 (1990). Banadkooki FB, Ehteram M, Ahmed AN, Teo FY, Ebrahimi M, Fai CM, Huang YF, El-Shafie A. Suspended sediment load prediction using artificial neural network and ant lion optimization algorithm. Environ. Sci. Pollut. Res. Int. 30, 38094-38116 (2020). Agatonovic-Kustrin S, Beresford R, Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J. Pharm. Biomed. Anal. 22 (5),.717-27 (2000). LeCun YA, Bottou L, Orr GB, Muller KR. Efficient backprop, in: Neural Networks: Tricks of the Trade – Second Edition, 9–48 (2012). Nair V, Hinton GE, Rectified linear units improve restricted boltzmann machines, in: Proc. - Int. Conf. Mach. Learn. 807–814 (2010). Sokolova M, . Lapalme G. A systematic analysis of performance measures for classification tasks., Inf. Process Manage . 45, 427–437 (2009). Ting KM, Confusion Matrix. In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. (2017). Fawcett T. An introduction to ROC analysis, Pattern Recognition Letters , 27, 861–874 (2006). https://www.drugbank.com/datasets [access date: 11 october 2023] https://wiki.nci.nih.gov/display/NCIDTPdata/Compound+Sets [access date: 12 october 2023] http://zinc15.docking.org [access date: 15 october 2023] Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Deliv. Rev . 46, 3-26 (2001). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5315945","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":378294404,"identity":"2d02bbea-cf89-4c7b-a803-9922fac6b568","order_by":0,"name":"Madhulata Kumari","email":"","orcid":"","institution":"Amity University","correspondingAuthor":false,"prefix":"","firstName":"Madhulata","middleName":"","lastName":"Kumari","suffix":""},{"id":378294405,"identity":"3bc08947-078c-4578-9ffc-8c346bbd62e1","order_by":1,"name":"Mohd Asif Shah","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA30lEQVRIie3OMQrCMBSA4SeFujxwFUrxBEJKIA4GzxIRMiuCOHZzc1YEz9DiBQIZugQ9QBfFC1RcBAeNowgNOjnkhyQk8PEC4PP9a4JzYo/XgkZqt8pNpHwjjZV7jNRfkG4anKujONDWRufXMfA4U8EprSNMhawtRMnaezmNViBppsLEQYDB8FZyMEgiBD3MFPSO9aR5rYTY845Bekd4WNK8OKYgsR9TjBhkdoqyBB0f0zizZEQTE876SEZ0rXFST4rF7nITg2Rrgl2J80G8LBZ5LYHg7UY+Xnw+n8/3S08ubkrTnLI70AAAAABJRU5ErkJggg==","orcid":"","institution":"Kardan University","correspondingAuthor":true,"prefix":"","firstName":"Mohd","middleName":"Asif","lastName":"Shah","suffix":""},{"id":378294406,"identity":"386dab7c-2be0-4b86-a172-36fe96d806c8","order_by":2,"name":"Saurav Mallik","email":"","orcid":"","institution":"Harvard T. H. Chan School of Public Health","correspondingAuthor":false,"prefix":"","firstName":"Saurav","middleName":"","lastName":"Mallik","suffix":""},{"id":378294407,"identity":"5f9533b0-d53f-4d52-85c7-f27e0d287aae","order_by":3,"name":"Kanad Ray","email":"","orcid":"","institution":"Amity University Rajasthan","correspondingAuthor":false,"prefix":"","firstName":"Kanad","middleName":"","lastName":"Ray","suffix":""}],"badges":[],"createdAt":"2024-10-23 05:53:29","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5315945/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5315945/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":69069624,"identity":"b17d1ec8-4a7c-42de-be36-1518e86c8b12","added_by":"auto","created_at":"2024-11-15 09:46:35","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":53774,"visible":true,"origin":"","legend":"\u003cp\u003eThe SMOTE+ANN hybrid model exhibits the highest accuracy when training and test dataset accuracy are plotted against the number of training epochs.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/ecb413ae49568d63694ab39a.png"},{"id":69069625,"identity":"a6e6f4d6-e224-4e1b-ad8d-3837108862a0","added_by":"auto","created_at":"2024-11-15 09:46:35","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":67247,"visible":true,"origin":"","legend":"\u003cp\u003eThe SMOTE+ANN hybrid model has the lowest loss value when the loss value of the training and test datasets are plotted against the number of epochs.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/e541c0437579d4a7f21c2830.png"},{"id":69070392,"identity":"6c5641a2-4d68-4976-8476-e920b19a2aa0","added_by":"auto","created_at":"2024-11-15 09:54:35","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":17674,"visible":true,"origin":"","legend":"\u003cp\u003eBar chart of SMOTE+ANN hybrid model exhibits the highest accuracy and recall (0.98) and precision (0.99) of the anti-anthrax testset among the deep learning models utilising the resampling algorithms.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/fab72fa21e33fab9638ea57a.png"},{"id":69070393,"identity":"14ed902e-2934-4f8b-a697-6f9be2a11026","added_by":"auto","created_at":"2024-11-15 09:54:35","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":15886,"visible":true,"origin":"","legend":"\u003cp\u003eBar chart showing specificity and sensitivity for the deep learning models using resampling algorithms, where the SMOTE+ANN hybrid model shows a maximum specificity (0.98) of the anti-anthrax testset.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/023b39f45591d7606550cf9c.png"},{"id":69070753,"identity":"846bf509-920c-4cf8-993f-c34b0a05df59","added_by":"auto","created_at":"2024-11-15 10:02:35","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":45134,"visible":true,"origin":"","legend":"\u003cp\u003eThe ROC plot depicts significant AUC curve values for the deep learning models of SMOTE+ ANN hybrid model shows the maximum AUC curve (0.98) of the anti-anthrax testset.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/7afced940652c30c178db51b.png"},{"id":69070394,"identity":"05a558cb-790f-4e99-a30e-6b54810bfb64","added_by":"auto","created_at":"2024-11-15 09:54:35","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":483460,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003e(\u003c/strong\u003eA) PCA-based three-dimensional scatter plot of active and inactive compounds of anthrax lethal factor (B) two-dimensional scatter plot of activity score, (C) Druglikeness (D) Mutagenic (E) Tumerogenic (F) Reproductive effective (G) Irritant, anthrax inhibitors, and H) 2D-Eigenvalues, based on 50 descriptors. The plot shows 472 data points, each one representing a chemical structure.\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/7d4e61369605aa6d89337ed4.png"},{"id":69069627,"identity":"86b19a59-d6fd-4a75-a579-314c4196b5e1","added_by":"auto","created_at":"2024-11-15 09:46:35","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":181035,"visible":true,"origin":"","legend":"\u003cp\u003et-SNE visualization of chemical space for a dataset of Anthrax inhibitors based on Activity Score. ( t-SNE settings: perplexity = 20, learning rate = 50, iterations = 1000). The plot shows 472 data points, each representing a chemical structure, and the colour of these points indicates activity scores.\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/cc97f06d77f1e710b228c962.png"},{"id":69070754,"identity":"ae90f8d3-7e14-4b4a-84f4-3b884c3db481","added_by":"auto","created_at":"2024-11-15 10:02:35","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":223423,"visible":true,"origin":"","legend":"\u003cp\u003eConstellation plot for a dataset of Anthrax inhibitors. The plot shows 472 data points, each one representing an analogs series. Every circle in the plot represents a core; Selected analog series are identified with structures: 1,5-dimethyl-N-[2-(2-methylpiperidin-1-yl)ethyl]-4-oxopyrrolo[3,2-c]quinoline-2-carboxamide (7966425, activity score is 85).\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/cbf6fd1e41846fe2300654ff.png"},{"id":69069633,"identity":"c5c795e9-d8eb-40d4-bca0-25a0d8a7ecca","added_by":"auto","created_at":"2024-11-15 09:46:35","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":147535,"visible":true,"origin":"","legend":"\u003cp\u003eStructure similarity chart of non-mutagenic, non-tumorigenic, non-reproductive effective, and non-irritant compounds with activity score against Anthrax lethal factor using cell-based inhibition dataset.\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/17dc1876d871440b5ee663f6.png"},{"id":69069629,"identity":"486cacc4-8e5c-4bd1-8974-5d5c2627603b","added_by":"auto","created_at":"2024-11-15 09:46:35","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":53114,"visible":true,"origin":"","legend":"\u003cp\u003eMurcko scaffold structures versus frequency scatter plot. The frequency of Murcko scaffold is colour-coded, with blue representing the highest frequency and red the lowest frequency.\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/0b6e546fdfcdb0ff88ab00a4.png"},{"id":69069634,"identity":"6477215c-63be-446d-b022-9e9b727684b5","added_by":"auto","created_at":"2024-11-15 09:46:35","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":157300,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003e(A-B) \u003c/strong\u003eRing systems with substitution pattern versus frequency scatter plot. The frequency of plain ring is colour-coded, with blue representing the highest frequency and red the lowest frequency.\u003c/p\u003e","description":"","filename":"11.png","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/ef50a769c72e686e72e31613.png"},{"id":69069635,"identity":"4872f497-3fae-4d0b-bf5d-95ab69bc2c84","added_by":"auto","created_at":"2024-11-15 09:46:35","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":171549,"visible":true,"origin":"","legend":"\u003cp\u003eThe illustration of the deep learning model pipeline for identifying anthrax inhibitors.\u003c/p\u003e","description":"","filename":"12.png","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/f2ce0a1aa04312f4548e1791.png"},{"id":82072457,"identity":"a913741b-cd02-49f7-9af4-26d0577d0baf","added_by":"auto","created_at":"2025-05-06 13:23:59","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2560383,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5315945/v1/c1e9d68e-e9e8-470e-9f47-3bf8299ca0af.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Deep Learning-based Classification Model using SMOTE Resampling Technique to Identify Potent Inhibitors of Lethal Factor of Anthrax and Principal Component, Chemical Space Analysis","fulltext":[{"header":"Introduction","content":"\u003cp\u003eBacillus anthracis is a gram-positive, encapsulated, rod-shaped, aerobic, spore-forming bacterial pathogen [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. The main components of anthrax toxin are composed of lethal factor (LF, 90 kDa), edema factor (EF, 89 kDa), and protective antigen (PA, 83 kDa) [\u003cspan additionalcitationids=\"CR3\" citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. An individual component is nontoxic; however, the composition of two components: LF combined with PA is called a lethal toxin, and EF with PA is called edema toxin, which directly contributes to the anthrax symptoms and lethality [\u003cspan additionalcitationids=\"CR6 CR7 CR8\" citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. PA is necessary for the toxin to enter the cytoplasm, where it acts. Exotoxin makes up anthrax toxin, and intracellular active enzymes LF and EF are both [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Within the cytoplasm, they manifest their harmful effects. Genes that code for toxins can be found on the pXO1 (182 kb; accession no. NC001496) plasmid [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. These genes, \u003cem\u003ecya, lef\u003c/em\u003e, and \u003cem\u003epag\u003c/em\u003e, respectively encode EF, LF, and PA [\u003cspan additionalcitationids=\"CR13 CR14\" citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. The protein known as the lethal factor toxin is directly linked to cell death. The LF gene is inactivated, or this protein is defective, which significantly reduces the virulence of the Bacillus anthracis strain (by a factor of roughly 1000). In order to treat Anthrax at all stages of disease inhibition, LF, the primary toxin component of the anthrax toxin, is necessary. The survival of cells or organisms in a sick state is inversely proportional to the concentration of this toxin. Numerous researchers have demonstrated the significance of LF toxin. Many research groups using both traditional and computational methods to develop anthrax drugs use it as a target for inhibitors. Martino Forino et al. created numerous compounds that blocked the LF and examined their effectiveness using a fragment-based strategy [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eLF is a critical component of one of these toxins and a therapeutic target for anthrax inhibitors. Anthrax infections in humans are uncommon, but they have been reported in workers in the cattle industry. However, the anthrax spore can be maintained for a long time and discharged in areas where people congregate, making inhalation of the spore exceedingly deadly. Due to these characteristics, Anthrax is a viable candidate for deployment as a biological weapon. Armed forces from several nations began working on it as a biological weapon, and the threat increased when terrorists started using it. There were at least 79 respiratory infections and 68 fatalities due to the unintentional release of anthrax spores from a military research facility in the former Soviet Union in 1979.\u003c/p\u003e \u003cp\u003eGoldberg et al. found that zinc-dependent metalloproteinase LF is a critical component of anthrax toxin and an important potential target for drug design [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. Hydroxamates are known as anthrax LF inhibitors [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. Several aminoglycoside antibiotics were found by Lee et al. as direct competitive inhibitors, with neomycin B being the most effective among them [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eA quantitative high throughput test was created to screen small compounds that may reduce or prevent the internalization of the anthrax toxin using LF-beta-lactamase fusion proteins. The TEM-1 beta-lactamase (developed by Dr. Thomas Bugge's group at the NIH) was fused to the PA-binding region of the LF N-terminal (1-254 amino acids) to create the fusion protein. LF-beta-lactamase fusion proteins will internalize in the presence of PA and act on beta-lactamase substrate (CCF2/AM) trapped in cells due to cytoplasm esterase cleavage. When beta-lactamase hydrolyzes CCF2, acceptor fluorescence is released and fluoresces at 447 nm (blue light). The following filter set, Lambda (EX)\u0026thinsp;=\u0026thinsp;405 nm, Lambda (EM)\u0026thinsp;=\u0026thinsp;460 nm/530 nm, was used to monitor fluoresce intensity using an EnVision plate reader (PerkinElmer, Boston, MA) [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e\n\u003ch3\u003eArtificial Intelligence in drug discovery\u003c/h3\u003e\n\u003cp\u003eThe traditional drug development process is tedious, expensive, time-consuming, and inefficient; the success rate is very low, with only one hit molecule from one lakh launching in the market. The drug development process is being revolutionized by artificial intelligence, which can quickly identify possible biologically active molecules from millions of candidate compounds in a short amount of time [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Pattern recognition, biomarker identification and/or categorization, and other goals are among them. Artificial neural networks (ANNs), in particular, have been employed instead of ADMET factor testing and QSAR modeling evaluation to attain these goals [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. The benefit of using deep neural networks is comprehending extremely complicated biological spatial settings [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. The deep neural network plays a significant role in the drug discovery process. It is applied at different stages in the drug development process, from target identification to the clinical phase. The deep learning algorithms can extract physical, chemical, and biological properties from chemicals and accurately predict their biological activity. Artificial neural network methodology is applied in the drug development process as an alternative to traditional drug development [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. Pang et al. applied a deep neural network for drug-target interaction prediction based on feature representation [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. ANN was used to predict the pharmacokinetic of aminoglycosides in severely ill patients [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. Bilsland et al. developed ANN to screen for senescence-inducing compounds using known agonist compounds [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. Domine et al. used ANN algorithm to predict the adverse drug effect [28]. Moon et al. built ANN for dose determination of HMG-CoA-reductase inhibitors [29]. Kumari et al. proposed deep learning for virtual screening of compounds against Sars Cov-2 [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. Kumari et al. used deep learning, quantitative structure-activity relationship (QSAR), molecular docking, molecular dynamics, and free energy calculation in drug design and development [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. The deep learning is an excellent computational approach for the virtual screening small molecule inhibitors to accelerate the drug discovery process and reduce experimental work's time and costs. However, the experimental screening of Anthrax is dangerous, requiring high laboratory safety for research work.\u003c/p\u003e \u003cp\u003eThe aim of study was to predict novel potential drug candidates for anthrax infection. So, the proposed work is an effective deep learning model for an imbalanced anti-anthrax bioassay dataset to increase the classification rate and to reduce the false positive rate in minority classes without increasing the false negative rate in the majority classes. In order to solve the imbalance dataset problem, Under-sampling, Over-sampling, and the synthetic minority over-sampling technique (SMOTE) are some of the resampling techniques employed. The SMOTE generates additional minority samples to achieve class balance, while ANN learns hierarchical feature representation from the balanced data to screen biologically active molecules from unknown chemical libraries. This study used the high throughput screening approach to rapidly identify anthrax inhibitors using hybrid algorithms based on deep learning. In order to obtain potential drug candidates for Anthrax, we collated an experimental qHTS bioassay dataset for lethal toxin internalization. The strategy was to construct a deep learning-based classification model to predict the biological activity of anthrax inhibitors. Before the model's development, we used important descriptors for bioactivity in five databases phytochemical compounds, natural product NCI diversity set IV, and FDA-approved drugs and experimental drugs and natural products from the ZINC database to search for novel inhibitors against Lethal toxins to treat Anthrax. Further, we analysed the chemical space of the anthrax inhibitors dataset. The study suggested that the deep learning could generate potent drug candidates for treating Anthrax.\u003c/p\u003e"},{"header":"Results and discussion","content":"\u003cp\u003eIn this study, we built deep learning-based models to predict inhibitors of Anthrax. In the qHTS assay for anthrax Lethal Toxin Internalization, compounds are first classified as active and inactive. An inactive compound's PUBCHEM_ACTIVITY_SCORE is zero, while active compounds' PUBCHEM_ACTIVITY_SCORE score ranges between 40 and 100. The data pre-processing procedure was done before to the training of the model. The molecule that was converted into 179 descriptor vectors explains the structural and functional properties of Anthrax inhibitors. It is impossible to fully train an efficient model because once the imbalanced data samples reach a certain level, the classification effect of the model would substantially decline. To solve the unbalanced dataset issue, create balanced dataset samples using resampling techniques, then utilize those samples to train a model to increase the classification model's overall accuracy. In this study, we used deep learning models to train balanced data and monitored the statistical parameters of model classification to manage the hybrid sampling process.\u003c/p\u003e \u003cp\u003eThree hidden layers with ReLU activation function were employed in the suggested ANN architecture, along with one dense layer and a sigmoid function for binary classification. We employed a learning rate (0.001), Adam optimizer, and 100 epochs to optimize a model. The model performance of ANN was measured for accuracy and loss, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e and Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, respectively. The results showed that ANN model with SMOTE optimizer has a better predictive ability for the external dataset.\u003c/p\u003e \u003cp\u003eIn this study, we focused on imbalanced data and investigated the performance of resampling techniques. The best resampling method was chosen by comparing the model\u0026rsquo;s performance. Firstly, we employed resampling methods such as under-sampling, over-sampling, and SMOTE with ANN to test the model's superiority. The statistical results are shown (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u003cb\u003e).\u003c/b\u003e\u003c/p\u003e \u003cp\u003eThat the accuracy of ANN model with SMOTE is the highest among others. The training loss curve of the hybrid model (SMOTE\u0026thinsp;+\u0026thinsp;ANN) shows a sharp drop at first, then fluctuates with an increment of epochs, and finally drops slowly. The training loss curves show a faster convergence speed during 1\u0026ndash;20 epochs, achieving robust and excellent performance with the training model. Also, the loss curve of the test dataset shows faster convergence from starting then slowly converging with an increment of epochs. Therefore, SMOTE\u0026thinsp;+\u0026thinsp;ANN model can take less training time to predict the biological activity of molecules.\u003c/p\u003e \u003cp\u003eBy contrasting the performance of the models using various statistical parameters, the best SMOTE\u0026thinsp;+\u0026thinsp;ANN hybrid model was chosen. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e presents the statistical findings of the test validation. The model's accuracy assessed Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e\u003cb\u003e\u0026rsquo;s\u003c/b\u003e overall effectiveness. Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e displays the sensitivity and specificity bar chart. The hybrid SMOTE\u0026thinsp;+\u0026thinsp;ANN model obtained overall 98% accuracy, sensitivity, specificity, recall, F-measure, ROC, and 99% precision, by comparing the classification models. Additionally, ROC was calculated to demonstrate the model's resilience. As a result, it is frequently utilized for a quick performance evaluation of virtual screening techniques. Figure\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e shows the SMOTE\u0026thinsp;+\u0026thinsp;ANN models' AUC curve, which displayed a value of 0.98. (c). The confusion matrix showed the percentage of compounds that were identified; (a) the SMOTE\u0026thinsp;+\u0026thinsp;ANN model's TP is 0.98 and TN is 0.99, while FN is 0.015 and FP is 0.018. Thus, comparison analysis revealed that the SMOTE\u0026thinsp;+\u0026thinsp;ANN was the best hybrid model out of the three. The findings imply that this strategy might work well for filtering out large databases. In the unbalanced dataset, the ANN classifier model gains significantly from using SMOTE. SMOTE is effective at solving the classification model's problem of class imbalance.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe statistical results of deep learning-based models of the anti-anthrax testset\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClassification Model\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSensitivity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eF1-score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003e*ROC\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUnder-sampling\u003c/p\u003e \u003cp\u003eANN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.66\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.66\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOver-sampling\u003c/p\u003e \u003cp\u003eANN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSMOTE\u003c/p\u003e \u003cp\u003eANN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e*ROC: Receiver Operating Characteristic.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eDeployment of SMOTE + ANN hybrid model\u003c/h3\u003e\n\u003cp\u003eFive unknown chemical libraries were virtually screened using the suggested SMOTE\u0026thinsp;+\u0026thinsp;ANN hybrid model. In order to identify natural compounds that are effective anti-anthrax inhibitors, we screened out (a) 134 FDA-approved drugs, (b) 338 experimental drugs, (c) 130 phytochemical compounds from a phytochemical database, (d) 15 natural products from NCI divsetIV, and (e) 8098 natural compounds from ZINC database. Then, we used Lipinski's rules (RO5) on the molecules that had been sorted, which led to the identification of 51 phytochemical compounds of the phytochemical database, 8 of the natural products from NCI divsetIV, and 3050 of the natural compounds from the ZINC database as anthrax inhibitors.\u003c/p\u003e\n\u003ch3\u003eChemical space analysis\u003c/h3\u003e\n\u003cp\u003eIn this research work, chemical space has been characterized by eight types of chemical properties namely druglikeness, ligand efficiency, toxic properties, chemical shape, atom counts, ring counts, functional counts and 3-dimensional globularity. We have calculated 57 descriptors using DataWarrior tools. After that, we employed PCA to find important descriptors. The scatter plot of PCA showed three-dimensional scatter plot of active and inactive compounds of a lethal factor of anthrax in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e (A). In Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, the result showed a PCA-based two-dimensional scatter plot of (B) activity score, (C) Druglikeness (D) Mutagenic (E) Tumerogenic (F) Reproductive effective (G) Irritant and H) 2D- Eigenvalues which is based on 50 descriptors. The eigenvalues of important descriptors listed in the Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. First three principal components cover 50% of the variation. The t-SNE algorithm is an effective tool for visualizing the molecular similarity of molecules of the inhibitors of the anthrax dataset. We selected 427 compounds with activity scores. Figure\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e shows a visual representation of a two-dimensional t-SNE map of anthrax inhibitors with activity scores. Each datapoints represent a chemical structure, and the colour indicates activity scores. Molecular similarity can be mapped into a constellation plot; Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e shows a constellation plot of anthrax inhibitors.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eEigenvalues of first two Principal components of important descriptors.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eS. No.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVariable Name\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePC1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePC2\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNon-H Atoms\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.265295\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.002054\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTotal Surface Area\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.261924\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.00658\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVDW-Volume\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.258821\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.04426\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVDW-Surface\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.258007\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-6.79E-04\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMonoisotopic Mass\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.249458\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.004788\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e6.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMolweight\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.248822\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.004555\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e7.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTotal Molweight\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.247677\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.00612\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e8.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRings Closures\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.199163\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.05212\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e9.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSmall Rings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.196141\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.0646\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e10.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eH-Acceptors\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.187382\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.195857\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e11.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMolecular Complexity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.181031\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.00443\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e12.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eElectronegative Atoms\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.16195\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.204544\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e13.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNon-C/H Atoms\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.16195\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.204544\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e14.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRotatable Bonds\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.1524\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.061341\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e15.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHetero-Rings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.148183\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.061238\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e16.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLELP from Pubchem_Sid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.147187\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.12893\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e17.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003esp3-Atoms\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.139905\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.22066\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e18.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePolar Surface Area\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.125943\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.227964\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e19.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAromatic Rings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.125713\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.165036\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e20.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAromatic Atoms\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.125614\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.158279\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e21.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNon-Aromatic Hetero-Rings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.099007\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.1434\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e22.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSaturated Hetero-Rings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.093897\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.176\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e23.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSaturated Rings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.091614\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.26605\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e24.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNon-Aromatic Rings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.088381\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.25144\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e25.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHetero-Aromatic Rings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.087967\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.207168\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e26.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAmides\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.08713\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.108793\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e27.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAromatic Nitrogens\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.078695\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.187258\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e28.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAcidic Oxygens\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.076084\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.02449\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e29.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecLogP\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.074702\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.12574\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e30.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCarbo-Rings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.074069\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.15133\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e31.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCarbo-Aromatic Rings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.073619\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.00277\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e32.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAmines\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.061288\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.16471\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e33.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGlobularity SVD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.055727\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.11326\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e34.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDruglikeness\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.053129\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.03755\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e35.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAlkyl-Amines\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.050453\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.21251\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e36.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBasic Nitrogens\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.050438\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.17797\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e37.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStereo Centers\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.044728\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.1834\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e38.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMolecular Flexibility\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.042233\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.060851\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e39.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eH-Donors\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.041249\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.050932\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e40.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFragments\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.039836\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.06471\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e41.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAromatic Amines\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.034193\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.01587\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e42.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRelative PSA\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.031168\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.27485\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e43.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSaturated Carbo-Rings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.026642\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.20324\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e44.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSymmetric atoms\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.018181\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.10439\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e45.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNon-Aromatic Carbo-Rings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.013587\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.21297\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e46.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLLE from Pubchem_Sid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e-0.05773\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.14529\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e47.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eShape Index\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e-0.08173\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.068126\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e48.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ecLogS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e-0.08885\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.00984\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e49.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLE from Pubchem_Sid\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e-0.20474\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.057717\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e50.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGlobularity Vol\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e-0.21441\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-0.11112\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003eConstellation plots\u003c/h3\u003e\n\u003cp\u003eThe construction of constellation plots employs a dataset of 472 anthrax inhibitors. Analogs of a dataset have 153 cores which contain 472 compounds. Figure\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e shows the constellation plot that gives chemical space and chemical substructure information based on network and coordinates. Each node represents analog series. Figure\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003e is a picture of a single constellation of the most active inhibitors among all molecules. Analog series close in the chemical space have scaffold diversity. Activity score can be mapped into a constellation plot, where the colours for the activity score range from red to green.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eActivity score colours are varied from red to green. The highest activity score is 90. The molecules between data points are represented by linking lines. Low activity score is represented by the red colour (activity score\u0026thinsp;\u0026lt;\u0026thinsp;50), whereas the highest activity score is represented by the green colour (activity score: 90). The activity score and fragment-based similarity are shown to be quantitatively related in Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e. Similarity in activity score versus fragment (\u0026gt;\u0026thinsp;80%) indicates that molecules with relative descriptors also have similar fragment pairs. Compound has maximum activity scores 85, and it contains 16 similar molecules. Figure\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e shows the molecular similarity chart of non-mutagenic, non-tumorigenic, non-reproductive effective, and non-irritant of anthrax inhibitors, which belong to a similar scaffold of 4-oxopyrrolo[3,2-c]quinolone.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eScaffold analysis\u003c/h3\u003e\n\u003cp\u003eFurthermore, we analyzed the Murcko scaffolds and Ring systems with substitution patterns. Murcko scaffolds only contain the ring systems and remove side chains. Among the 411 distinct scaffolds, 377 are singletons. Out of 34 scaffolds, 28 Murcko scaffolds have two frequencies. Figure\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003e shows the six most common Murcko scaffolds and their respective frequencies, namely: 4-oxopyrrolo[3,2-c]quinolone (6); dibenzhydryl (4), diphenyl (3); benzene (14), phenoxy (4) and piperidin-2-yl]ethyl]-2-methyl sulfanyl phenothiazine(3). Aside from benzene, the second most frequent scaffold was a 4-oxopyrrolo[3,2-c]quinolone, followed by di-benzhydryl and phenoxy Murcko scaffolds.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eRing systems with substitution pattern\u003c/h2\u003e \u003cp\u003eIn this scaffold, all single ring and annelated ring system carries an exeo-cyclic, non-hydrogen substituents. Among the 295 distinct scaffolds, 176 are singletons. Out of 119 scaffolds, 18 scaffolds have more than ten frequencies. Figure\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e10\u003c/span\u003e shows the six most common Murcko scaffolds and their respective frequencies namely:\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e11\u003c/span\u003e shows the 18 most common ring systems with substituents scaffold and their respective frequencies. Aside from benzene whose frequency was 118. The second most frequent scaffold was piperazine (50), followed by furan (29), morpholine (21), piperidine (15), 4-oxopyrrolo[3,2-c]quinolone (15), benzodioxane (14), thienyl (13), thiazole (12), triazole (11), benzodioxole (11). These scaffolds are responsible for inhibitory effects. Piperazine group containing molecules having activity scores ranging from 40 to 50. 4-oxopyrrolo[3,2-c]quinolone group containing molecules having activity scores ranging from 40 to 85, making it a suitable drug candidate for Anthrax. Ring system with a substitution pattern scaffold shows that a change in atoms or groups results in a change in biological activity. We discovered that 4-oxopyrrolo[3,2-c]quinolone scaffold has similar chemical and biological activity using fragment pair analysis. Similar structures produce the same biological effect and have a lower failure rate. The use of scaffolding is based on a low-risk approach and produces structural diversity. These structures can act like drugs and have strong, adaptable binding affinities for a range of targets. The Ring systems with substitution pattern scaffold was used in the present study to assess the value of the rings' connectivity network based on the plain ring system, which indicated the value of specific heteroatoms and their relative position rings. The results suggest that the potential anti-anthrax drug candidates might be significantly influenced by the type of ring structure with specific substituents. This method can be used in future studies to assess the impact of ring structure in relation to the kind of placement of substituents. Numerous substituents can be placed in a number of places to enhance the scaffold's varied properties.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn this research work, a deep learning-based hybrid model and virtual screening enabled us to identify novel potential inhibitors of Anthrax lethal factor from a chemical database. A small list of compounds from the experimental lab assay of Anthrax has enabled us to predict several drug candidates with potential inhibitory activity. We have developed a deep learning-based hybrid SMOTE\u0026thinsp;+\u0026thinsp;ANN model to identify potential drug candidates against Anthrax. The best model showed 0.98 accuracy, specificity, sensitivity, recall, ROC, F1-score, and 0.99 precision. We screened out 134 FDA-approved drugs, 338 experimental drugs, 51 phytochemical compounds of the phytochemical database, 8 of the Natural products from NCI divest IV, and 3050 of the Natural compounds from the ZINC database as anthrax inhibitors. Additionally, the present work explored chemical space analysis with the help of activity cliff, t-SNE of anthrax inhibitors using constellation plot, and t-SNE plot. This novel visualization of the chemical space is able to identification of promising analog series with score activity. We found scaffold of ring system with substitution patterns such as 4-oxopyrrolo[3,2-c]quinolone enhanced the biological activity of anthrax inhibitors. Fingerprints showed greater than 80% and are connected with the ring system with the substitution pattern scaffold. The results demonstrate that the deep learning model can be used to widen the search for small molecules to inhibit Anthrax and to fully use the diverse now publically available compound databases.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eHigh-throughput screen data collection\u003c/h2\u003e \u003cp\u003e We extracted experimental anthrax lethal toxin inhibitors from a publically available database [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. The total no. of chemical molecules is 70,086. After completing the data curation process, we got 57,593 unique molecules. The total no. of active and inactive inhibitors was 471 and 57122, respectively. 2D chemical structures were converted into 3D using CORINA software [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e32\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eData pre-processing\u003c/h2\u003e \u003cp\u003eData pre-processing is an important step to prepare the data to form a deep learning model and improve the data quality. .Data pre-processing techniques remove outliers and scales the features to an equivalent range, as well as cleaning, transforming, reducing, and integrating data in order to check missing values and noisy data and make it ready for model development. Therefore, we used data pre-processing techniques before building a deep learning model.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eDescriptor calculation\u003c/h2\u003e \u003cp\u003eMolecular descriptors encode molecules' physical, chemical, and structural information. The PowerMV tool was used to generate molecular descriptors, namely 147 pharmacophore fingerprints, 24 weighted burdens, and eight molecular properties [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. DataWarrior tool was used for computing principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), constellation plots, and scaffold analysis [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e34\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eDealing with an imbalanced dataset\u003c/h2\u003e \u003cp\u003eData imbalance reflects an unequal distribution of classes within the dataset. In this case. Working with an imbalanced dataset is challenging since the ANN model has shown poor performance. The majority class has a significantly bigger sample size, while the minority class has a much lower sample size. The ratio between the two classes is high, which would result in inaccurate classification and lower accuracy. When it comes to virtual drug discovery screening, this is a prevalent issue.\u003c/p\u003e \u003cp\u003eThe anti-anthrax dataset has 57594 compounds, and only 0.8% of the total dataset has active chemicals, whereas 99.2% of the sample has inactive biological activity. Deep learning algorithms can cause under-fitting or over-fitting issues since they are biased toward the dominant class. To address the issue of imbalanced data, we are now using resampling techniques such as Under-sampling, Over-sampling, and SMOTE. Using an algorithm, the resampling technique creates a balanced dataset from the imbalanced dataset. A hybrid model based on deep learning is constructed using this balanced dataset to enhance overall statistical findings further. Figure\u0026nbsp;12 displays the workflow structure for deep learning-based virtual screening.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eRandom under-sampling technique\u003c/h2\u003e \u003cp\u003eThis method involves using all molecules in the minority class and then removing certain molecules from the majority class randomly until both classes are equal. In this investigation, the entire dataset was subjected to an under-sampling approach. However, under-sampling techniques have risks because they could lose potentially important information about molecules. As a result, the number of compounds in the majority of samples was reduced from 57122 to 472, which is equal to the quantity of an uncommon class of active chemicals.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eRandom over-sampling technique\u003c/h2\u003e \u003cp\u003eOver-sampling the minority class is one way to deal with imbalanced datasets. This method involves using all molecules in the majority class and duplicating molecules randomly from the minority class until both classes are equal but these molecules don\u0026rsquo;t provide any new insight into the model. Instead, new molecules can be created by synthesizing the old ones.. The majority class uses all instances. We performed oversampling technique on the entire dataset. As a result, the minority class sample (active molecules) expanded from 472 compounds to 57122, which is equal to the number of inactive molecules in the majority class sample.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eSynthetic Minority Over-sampling Technique\u003c/h2\u003e \u003cp\u003eThe SMOTE is an over-sampling technique that generates synthetic molecules from a minority class of imbalanced datasets and resolves the overfitting and underfitting problems described by Nitesh Chawla et al. [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e35\u003c/span\u003e, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e36\u003c/span\u003e]. Nakamura et al. applied Learning Vector Quantization based Synthetic Minority Oversampling Technique for biomedical data [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. Seo et al., applied Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. Pandey et al. used Smote for the automatic detection of arrhythmia from an imbalanced ECG database [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. To handle the class-imbalanced dataset, Derhab et al. presented a hybrid model of SMOTE and ANN [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. Kumari et al. proposed a hybrid model of SMOTE-ENN\u0026thinsp;+\u0026thinsp;ANN to identify Marburg virus inhibitors [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e41\u003c/span\u003e]. Therefore, we used SMOTE technique to solve the imbalanced data problem.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eDataset division\u003c/h2\u003e \u003cp\u003eThe dataset was divided into 4:1 ratios after the resampling procedure. So, we got 80% of the training dataset and 20% of the testset. Hyperparameters are adjusted during the training phase using the training dataset. Adjusting hyperparameters is necessary to prevent overfitting. The testset is then used to adjust the hybrid model's hyperparameters and assess the classification model's statistical performance in terms of accuracy, sensitivity, specificity, precision, recall, F1-score, and ROC.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eModel development\u003c/h2\u003e \u003cdiv id=\"Sec20\" class=\"Section3\"\u003e \u003ch2\u003eArtificial Neural Network\u003c/h2\u003e \u003cp\u003eWe created an ANN model, which is a form of deep learning. A subset of machine learning known as \"deep learning\" uses layers of nodes to convert inputs into outputs. ANNs are based on the brain's neural network [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. Both linear and non-linear datasets can be easily fitted with ANNs. The training of ANNs using example data facilitates their learning process. The method of repeatedly utilizing a training algorithm to change the weights between neurons to get the desired outcome. The learning rate, optimizers, activation function, and initial value of the weights are only a few of the model parameters that must be optimized in order to maximize the ANN model's effectiveness.\u003c/p\u003e \u003cp\u003eThe input layer, the hidden layer, and the output layer are the three different types of layers that make up an ANN [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. The input layer and the output layer, respectively, reflect the classifier's input and output. The layers between the input and output layers that change the data feature to create predictions are known as the hidden layers. ANN can have a lot of secret layers. In this model, the chosen parameters are fed into the input vector and sent to the hidden layer, which uses mathematical operators to process the data. After the procedure is finished, a test dataset with comparable features is utilized with the trained model. The ANN model known as the Multi-layer Perceptron (MLP) [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e44\u003c/span\u003e] is represented by Eq.\u0026nbsp;\u003cspan refid=\"Equ1\" class=\"InternalRef\"\u003e1\u003c/span\u003e:\u003cdiv id=\"Equ1\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ1\" name=\"EquationSource\"\u003e\n$$\\:{y}_{i}=f\\left(\\sum\\:_{i=1}^{n}{w}_{ij}{x}_{i}\\:+{b}_{j}\\right)$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e1\u003c/div\u003e\u003c/div\u003e\u0026hellip;\u0026hellip;\u0026hellip;\u0026hellip;..\u003c/p\u003e \u003cp\u003eWhere \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\text{t}\\text{h}\\text{e}\\:\\text{a}\\text{c}\\text{t}\\text{i}\\text{v}\\text{a}\\text{t}\\text{i}\\text{o}\\text{n}\\:\\text{f}\\text{u}\\text{n}\\text{c}\\text{t}\\text{i}\\text{o}\\text{n}\\:\\:\\text{i}\\text{s}\\:\\:\\:\\:f\\:,\\:\\text{a}\\text{n}\\text{d}\\:{y}_{i}\\)\u003c/span\u003e\u003c/span\u003eis the output result. \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{x}_{i}\\)\u003c/span\u003e\u003c/span\u003e is the input vector, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{w}_{ij}\\)\u003c/span\u003e\u003c/span\u003e is a weight, n is the number of nodes, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{b}_{j}\\)\u003c/span\u003e\u003c/span\u003eis the bias.\u003c/p\u003e \u003cp\u003eThe activation function in a neural network is responsible for transforming the node\u0026rsquo;s summed weighted input from the node\u0026rsquo;s activation into output for that input. A mathematical function known as the sigmoid has an S-shaped curve [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e45\u003c/span\u003e]. The sigmoid function transforms a real value into one that can be understood as a probability because the sigmoid function has the property that maps the entire number line into a small range, such as between 0 and 1 or -1 and 1. In this study, the sigmoid function is used by each neuron in a multi-layer neural network to predict the probabilities as outputs in the range 0 to 1. The sigmoid function is defined in Eq.\u0026nbsp;\u003cspan refid=\"Equ2\" class=\"InternalRef\"\u003e2\u003c/span\u003e:\u003cdiv id=\"Equ2\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ2\" name=\"EquationSource\"\u003e\n$$\\:S\\left(x\\right)=\\frac{1}{1+{e}^{-x}}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e2\u003c/div\u003e\u003c/div\u003e\u0026hellip;\u0026hellip;\u0026hellip;..\u003c/p\u003e \u003cp\u003eHere, x is the input.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eRectified linear units (ReLU)\u003c/h2\u003e \u003cp\u003eThe rectified linear activation function (ReLU) is the most commonly used activation function in neural network models [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e46\u003c/span\u003e]. The function returns 0 if it receives any negative input, but it returns that value for any positive value. So it can be written as in Eq.\u0026nbsp;3:\u003c/p\u003e \u003cp\u003ef(x)\u0026thinsp;=\u0026thinsp;max (0, x) \u0026hellip;\u0026hellip;\u0026hellip;\u0026hellip;\u0026hellip;(Eq.\u0026nbsp;3)\u003c/p\u003e \u003cp\u003ewhere x is the input to a neuron and where f(x) is the ReLU function.\u003c/p\u003e \u003cp\u003eBefore developing the final model, we went through several rounds of trial and error. For model optimization, we used a combination of all hyperparameters such as hidden layers, layer type (dense layer), activation function rectified linear unit (ReLU), output layer function (Sigmoid), Adam optimizer, and epochs. As a loss function, we used binary cross-entropy. Training model to achieve the best results may take some time. As a result, for model development, we used the Google Colab computer environment.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003eModel performance\u003c/h2\u003e \u003cp\u003eIn binary classification, the accuracy of the minority class is much more important than the accuracy of the majority sample, so dealing with the minority class with the help of resampling techniques improves the overall classification performance. We evaluated the model's performance for comparative analysis using the results of the confusion matrix and loss and gain.\u003c/p\u003e \u003cp\u003eThe confusion matrix is a specific table that allows visualization of the classification model's performance [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e47\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e48\u003c/span\u003e]. The active compounds are represented by the minority class, while the inactive compounds are represented by the majority class. Where each column of the matrix represents a predicted class instance, and each row represents actual class instances. It's easy to see if the system is mixing up two classes. True positives (TPs) and true negatives (TNs) are the correctly predicted active and inactive compounds, respectively; false positives (FPs) are inactive compounds that are incorrectly classified as active compounds, and false negatives (FNs) are active compounds that are incorrectly classified as inactive compounds. We evaluated accuracy, sensitivity, specificity, recall, F-measure, and ROC based on the confusion matrix data [\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e49\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe accuracy of the classification model represents its overall performance. It is defined as the ratio of correctly classified compounds to the total number of compounds in the dataset in Eq.\u0026nbsp;\u003cspan refid=\"Equ3\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003cdiv id=\"Equ3\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ3\" name=\"EquationSource\"\u003e\n$$\\:Accuracy=\\frac{TP+TN}{TP+TN+FP+FN}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e4\u003c/div\u003e\u003c/div\u003e\u0026hellip;\u0026hellip;..\u003c/p\u003e \u003cp\u003eThe amount of total active compounds predicted by the learning method as active anti-anthrax compounds are referred to as sensitivity. The true positive rate is directly proportional to sensitivity in Eq.\u0026nbsp;\u003cspan refid=\"Equ4\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003cdiv id=\"Equ4\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ4\" name=\"EquationSource\"\u003e\n$$\\:Sensitivity=\\frac{TP}{TP+FN}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e5\u003c/div\u003e\u003c/div\u003e\u0026hellip;\u0026hellip;..\u003c/p\u003e \u003cp\u003eSpecificity refers to the proportion of total inactive compounds predicted by the learning model to be inactive for Anthrax. The true negative rate is directly proportional to specificity in Eq.\u0026nbsp;\u003cspan refid=\"Equ5\" class=\"InternalRef\"\u003e6\u003c/span\u003e.\u003cdiv id=\"Equ5\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ5\" name=\"EquationSource\"\u003e\n$$\\:Specificity=\\frac{TN}{TN+FP}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e6\u003c/div\u003e\u003c/div\u003e\u0026hellip;\u0026hellip;..\u003c/p\u003e \u003cp\u003eThe precision of the model is defined as the number of true active compounds correctly predicted divided by the total number of active compounds classified by the model in Eq.\u0026nbsp;\u003cspan refid=\"Equ6\" class=\"InternalRef\"\u003e7\u003c/span\u003e.\u003cdiv id=\"Equ6\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ6\" name=\"EquationSource\"\u003e\n$$\\:Precision=\\frac{TP}{TP+FP}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e7\u003c/div\u003e\u003c/div\u003e\u0026hellip;\u0026hellip;..\u003c/p\u003e \u003cp\u003eThe recall is calculated as the total number of active compounds divided by the number of active compounds that were correctly predicted. The sensitivity and recall are calculated simultaneously in Eq.\u0026nbsp;\u003cspan refid=\"Equ7\" class=\"InternalRef\"\u003e8\u003c/span\u003e.\u003cdiv id=\"Equ7\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ7\" name=\"EquationSource\"\u003e\n$$\\:Recall=\\frac{TP}{TP+FN}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e8\u003c/div\u003e\u003c/div\u003e\u0026hellip;\u0026hellip;..\u003c/p\u003e \u003cp\u003eThe model's accuracy on the dataset is measured by the F1-score. As the harmonic mean of recall and precision, it is so named. An F-score of 1 indicates a flawless model. F1-score is used to assess the effectiveness of the classification model, while sensitivity and specificity are used to track changes in the proportion of occurrences falling into the majority or minority class, which corresponds to the active or inactive class, respectively. F1-score will increase when both recall and precision value are higher, in Eq.\u0026nbsp;\u003cspan refid=\"Equ8\" class=\"InternalRef\"\u003e9\u003c/span\u003e.\u003cdiv id=\"Equ8\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equ8\" name=\"EquationSource\"\u003e\n$$\\:F1-score=2*\\frac{Precision*Recall}{Precisison+Recall}$$\u003c/div\u003e\u003cdiv class=\"EquationNumber\"\u003e9\u003c/div\u003e\u003c/div\u003e\u0026hellip;\u0026hellip;..\u003c/p\u003e \u003cp\u003eThe receiver operating characteristic (ROC) curve's area beneath it is known as the area under the curve (AUC). A 2-D graphical plot called the ROC illustrates and visualizes performance. The ROC's AUC is between 0 and 1. AUC values close to 1 indicate that the TP is more likely to be correctly predicted than the FP.\u003c/p\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003eCompound library and chemicals\u003c/h2\u003e \u003cp\u003eWe extracted chemical molecules from five libraries: a) 2510 FDA-approved drugs from Drugbank b) 6186 experimental drugs from the Drugbank,[\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e50\u003c/span\u003e] c) 918 phytochemical compounds, d) 423 natural products from the NCI divsetIV [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e51\u003c/span\u003e], and e) 112,267 natural compounds from the ZINC database [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e52\u003c/span\u003e] to find new potential anti-anthrax inhibitors. Then, in order to prioritize lead compounds, we used Lipinski's rules (RO5)[\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e53\u003c/span\u003e]. The primary anti-anthrax molecules anticipated by the suggested model were present in the remaining compounds.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Abbreviations","content":"\u003cdiv class=\"DefinitionList\"\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eANN\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eArtificial Neural Network\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eAUC\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eArea Under Curve\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eReLU\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eRectified Linear Unit\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eROC\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eReceiver Operating Characteristic\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eSMOTE\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eSynthetic minority oversampling technique\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot Applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot Applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability Statement\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets generated and/or analysed during the current study are available in the \u0026ldquo;NCBI PubChem\u0026rdquo; repository site, https://pubchem.ncbi.nlm.nih.gov/bioassay/912\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eC\u003c/strong\u003e\u003cstrong\u003eompeting interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no conflict of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions:\u003c/strong\u003e M.K., K.R. worked on Conceptualizations, methodology, formal analysis, and writing\u0026mdash;original draft preparation, while M.A.S., S.M. participated in writing\u0026mdash;review and editing, validation, supervision, project administration. All authors have read and agreed to the published version of the manuscript.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding:\u003c/strong\u003e This research received no external funding from their institutes.\u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eNestorovich EM, Bezrukov SM. Designing inhibitors of anthrax toxin. Expert Opin Drug Discov. 2014 Mar;9(3):299-318. doi: 10.1517/17460441.2014.877884.\u003c/li\u003e\n\u003cli\u003eBarth H, Aktories K, Popoff MR, et al. Binary bacterial toxins: biochemistry, biology, and applications of common clostridium and bacillus proteins. Microbiol Mol Biol Rev 2004;68:373-402.\u003c/li\u003e\n\u003cli\u003eAbrami L, Reig N, van der Goot FG. Anthrax toxin: the long and winding road that leads to the kill. Trends Microbiol 2005;13(2):72\u0026ndash;78. [PubMed: 15680766]\u003c/li\u003e\n\u003cli\u003eRakesh Bhatnagar \u0026amp; Smriti Batra (2001) Anthrax Toxin, Critical Reviews in Microbiology, 27:3, 167-200, DOI: 10.1080/20014091096738\u003c/li\u003e\n\u003cli\u003eFiroved, A.M.; Miller, G.F.; Moayeri, M.; Kakkar, R.; Shen, Y.; Wiggins, J.F.; McNally, E.M.; Tang, W.J.; Leppla, S.H. Bacillus anthracis edema toxin causes extensive tissue lesions and rapid lethality in mice. Am. J. Pathol. 2005, 167, 1309\u0026ndash;1320.\u003c/li\u003e\n\u003cli\u003eCollier RJ, Young JA. Anthrax toxin. Annu Rev Cell Dev Biol. 2003;19:45-70. doi: 10.1146/annurev.cellbio.19.111301.140655.\u003c/li\u003e\n\u003cli\u003eDuesbery NS, Vande Woude GF. Anthrax toxins. Cell Mol Life Sci. 1999 Sep;55(12):1599-609. doi: 10.1007/s000180050399. \u003c/li\u003e\n\u003cli\u003eMoayeri M, Leppla SH. The roles of anthrax toxin in pathogenesis. Curr Opin Microbiol. 2004 Feb;7(1):19-24. doi: 10.1016/j.mib.2003.12.001.\u003c/li\u003e\n\u003cli\u003eBanks DJ, Ward SC, Bradley KA. New insights into the functions of anthrax toxin. Expert Rev Mol Med. 2006 Apr 11;8(7):1-18. doi: 10.1017/S1462399406010714.\u003c/li\u003e\n\u003cli\u003eLowe DE, Glomski IJ. Cellular and physiological effects of anthrax exotoxin and its elevance to disease. Front Cell Infect Microbiol. 2012 Jun 1;2:76. doi: 10.3389/fcimb.2012.00076. PMID: 22919667; PMCID: PMC3417473.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eThorne, C.\u003c/strong\u003e 1993. \u003cem\u003eBacillus anthracis\u003c/em\u003e, p. 113-124. \u003cem\u003eIn\u003c/em\u003e A. L. Sonenshein, J. A. Hoch, and R. Losick (ed.), \u003cem\u003eBacillus subtilis\u003c/em\u003e and other gram-positive bacteria: biochemistry, physiology, and molecular genetics. American Society for Microbiology, Washington, D.C.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eRobertson, D. L., and S. H. Leppla.\u003c/strong\u003e 1986. Molecular cloning and expression in Escherichia coli of the lethal factor gene of Bacillus anthracis. Gene 44\u003cstrong\u003e:\u003c/strong\u003e71-78.\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eMock, M., E. Labruyere, P. Glaser, A. Danchin, and A. Ullmann.\u003c/strong\u003e 1988. Cloning and expression of the calmodulin-sensitive Bacillus anthracis adenylate cyclase in Escherichia coli. Gene 64\u003cstrong\u003e:\u003c/strong\u003e277-284. \u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eRobertson, D. L., M. T. Tippetts, and S. H. Leppla.\u003c/strong\u003e 1988. Nucleotide sequence of the Bacillus anthracis edema factor gene (cya): a calmodulin-dependent adenylate cyclase. Gene 73\u003cstrong\u003e:\u003c/strong\u003e363-371. \u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eTippetts, M. T., and D. L. Robertson.\u003c/strong\u003e 1988. Molecular cloning and expression of the \u003cem\u003eBacillus anthracis\u003c/em\u003e edema factor toxin gene: a calmodulin-dependent adenylate cyclase. J. Bacteriol. 170\u003cstrong\u003e:\u003c/strong\u003e2263-2266. \u003c/li\u003e\n\u003cli\u003eForino M, Johnson S, Wong TY, Rozanov DV, Savinov AY, Li W, Fattorusso R, Becattini B, Orry AJ, Jung D, Abagyan RA, Smith JW, Alibek K, Liddington RC, Strongin AY, Pellecchia M. Efficient synthetic inhibitors of anthrax lethal factor. Proc Natl Acad Sci U S A. 2005 Jul 5;102(27):9499-504. doi: 10.1073/pnas.0502733102. Epub 2005 Jun 27. PMID: 15983377; PMCID: PMC1160517.\u003c/li\u003e\n\u003cli\u003eGoldberg AB, Turk BE. Inhibitors of the Metalloproteinase Anthrax Lethal Factor. Curr Top Med Chem. 2016;16(21):2350-8. doi: 10.2174/1568026616666160413135732. PMID: 27072692; PMCID: PMC5208045.\u003c/li\u003e\n\u003cli\u003eLi F, Chvyrkova I, Terzyan S, Wakeham N, Turner R, Ghosh AK, Zhang XC, Tang J. Inhibition of anthrax lethal factor: lability of hydroxamate as a chelating group. Appl Microbiol Biotechnol. 2012 May;94(4):1041-9. doi: 10.1007/s00253-012-3893-7. Epub 2012 Jan 25. PMID: 22270239; PMCID: PMC3364607.\u003c/li\u003e\n\u003cli\u003eLee, L.V.; Bower, K.E.; Liang, F.S.; Shi, J.; Wu, D.; Sucheck, S.J.; Vogt, P.K. and Wong, C.H. (2004) J. Am. Chem. Soc., 126(15), 4774-4775.\u003c/li\u003e\n\u003cli\u003eNational Center for Biotechnology Information (2023). PubChem Bioassay Record for AID 912, Source: National Center for Advancing Translational Sciences (NCATS). Retrieved February 5, 2023 from https://pubchem.ncbi.nlm.nih.gov/bioassay/912.\u003c/li\u003e\n\u003cli\u003ePaul D, Sanap G, Shenoy S, Kalyane D, Kalia K, Tekade RK. Artificial intelligence in drug discovery and development. Drug Discov Today. 2021 Jan;26(1):80-93. doi: 10.1016/j.drudis.2020.10.010. Epub 2020 Oct 21. PMID: 33099022; PMCID: PMC7577280.\u003c/li\u003e\n\u003cli\u003eCheirdaris, D.G. (2020). Artificial Neural Networks in Computer-Aided Drug Design: An Overview of Recent Advances. In: Vlamos, P. (eds) GeNeDis 2018. Advances in Experimental Medicine and Biology, vol 1194. Springer, Cham. https://doi.org/10.1007/978-3-030-32622-7_10\u003c/li\u003e\n\u003cli\u003eKim J, Park S, Min D, Kim W. Comprehensive Survey of Recent Drug Discovery Using Deep Learning. Int J Mol Sci. 2021 Sep 15;22(18):9983. doi: 10.3390/ijms22189983. PMID: 34576146; PMCID: PMC8470987\u003c/li\u003e\n\u003cli\u003eBourquin J, Schmidli H, van Hoogevest P, Leuenberger H. Basic concepts of artificial neural networks (ANN) modeling in the application to pharmaceutical development. Pharm. Dev. Technol. \u003cem\u003e2\u003c/em\u003e(2), 95\u0026ndash;109 (1997).\u003c/li\u003e\n\u003cli\u003ePeng J, Li J, Shang X. A learning-based method for drug-target interaction prediction based on feature representation learning and deep neural network. BMC Bioinformatics. \u003cem\u003e21\u003c/em\u003e(Suppl 13), 394 (2020).\u003c/li\u003e\n\u003cli\u003eYamamura S. Clinical application of artificial neural network (ANN) modeling to predict pharmacokinetic parameters of severely ill patients. Adv. Drug. Deliv. Rev. 551233-1251 (2003).\u003c/li\u003e\n\u003cli\u003eBilsland AE, Pugliese A, Liu Y et al. Identification of a selective G1-phase benzimidazolone inhibitor by a senescence-targeted virtual screen using artificial neural networks. Neoplasia 17(9), 704\u0026ndash;715 (2015).\u003c/li\u003e\n\u003cli\u003eDomine D, Guillon C, Devillers J, Lacroix R, Lacroix J, Dor\u0026eacute; JC. Nonlinear neural mapping analysis of the adverse effects of drugs. SAR QSAR Environ. Res. \u003cem\u003e8\u003c/em\u003e(1-2), 109\u0026ndash;120 (1998).\u003c/li\u003e\n\u003cli\u003eMoon A, Smith T. A preliminary evaluation of neural network analysis for pharmacodynamic modeling of the dosing of the hydroxymethylglutaryl coenzyme A-reductase inhibitors simvastatin and atorvastatin. Clin Ther. \u003cem\u003e24\u003c/em\u003e(4), 653\u0026ndash;661 (2002).\u003c/li\u003e\n\u003cli\u003eKumari M, Subbarao N. Deep learning model for virtual screening of novel 3C-like protease enzyme inhibitors against SARS coronavirus diseases. Comput. Biol. Med. \u003cem\u003e132\u003c/em\u003e, 104317 (2021). \u003c/li\u003e\n\u003cli\u003eKumari M, Subbarao N. Development of a deep learning-based quantitative structure-activity relationship model to identify potential inhibitors against the 3C-like protease of SARS-CoV-2. Future Med Chem. 2022 Nov;14(21):1541-1559. doi: 10.4155/fmc-2021-0063. Epub 2022 Sep 30. PMID: 36177879.\u003c/li\u003e\n\u003cli\u003eJ. Sadowski J, Gasteiger J, Klebe G. Comparison of automatic three-dimensional model builders using 639 X-ray structures. J. Chem. Inf. Model. 34 4 (1994).\u003c/li\u003e\n\u003cli\u003eLiu K, Feng J, Young SS. PowerMV: a software environment for molecular viewing, descriptor generation, data analysis and hit evaluation. J. Chem. Inf. Model. \u003cem\u003e45\u003c/em\u003e(2), 515\u0026ndash;522 (2005). \u003c/li\u003e\n\u003cli\u003eSander T, Freyss J, von Korff M, Rufener C. DataWarrior: an open-source program for chemistry aware data visualization and analysis. J Chem Inf Model. 2015 Feb 23;55(2):460-73. doi: 10.1021/ci500588j. Epub 2015 Feb 2. PMID: 25558886.\u003c/li\u003e\n\u003cli\u003eChawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority oversampling technique, J. Artificial. Intell. Res. 16 321\u0026ndash;357 (2002).\u003c/li\u003e\n\u003cli\u003eBlagus R, Lusa L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics. 2013 Mar 22;14:106. doi: 10.1186/1471-2105-14-106. PMID: 23522326; PMCID: PMC3648438.\u003c/li\u003e\n\u003cli\u003eNakamura M, Kajiwara Y, Otsuka A, Kimura H. LVQ-SMOTE - Learning Vector Quantization based Synthetic Minority Over-sampling Technique for biomedical data. BioData Min. 2013 Oct 2;6(1):16. doi: 10.1186/1756-0381-6-16. PMID: 24088532; PMCID: PMC4016036.\u003c/li\u003e\n\u003cli\u003eSeo JH, Kim YH. Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection. Comput Intell Neurosci. 2018 Nov 1;2018:9704672. doi: 10.1155/2018/9704672. PMID: 30515202; PMCID: PMC6236522.\u003c/li\u003e\n\u003cli\u003ePandey SK, Janghel RR. Automatic detection of arrhythmia from imbalanced ECG database using CNN model with SMOTE. Australas Phys Eng Sci Med. 2019 Dec;42(4):1129-1139. doi: 10.1007/s13246-019-00815-9. Epub 2019 Nov 14. PMID: 31728941.\u003c/li\u003e\n\u003cli\u003eDerhab A, Aldweesh A, Emam AZ,. Khan FK. Intrusion Detection System for Internet of Things Based on TemporalConvolution Neural Network and Efficient Feature Engineering Wirel. Commun. Mob. Comput. 2020 6689134 (2020).\u003c/li\u003e\n\u003cli\u003eKumari M, Subbarao N. A hybrid resampling algorithms SMOTE and ENN based deep learning models for identification of Marburg virus inhibitors. Future Med Chem. 2022 May;14(10):701-715. doi: 10.4155/fmc-2021-0290. Epub 2022 Apr 8. PMID: 35393862.\u003c/li\u003e\n\u003cli\u003eSDreyfus SE. Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure. J. Guid. Control. Dyn. , 13 926-928 (1990).\u003c/li\u003e\n\u003cli\u003eBanadkooki FB, Ehteram M, Ahmed AN, Teo FY, Ebrahimi M, Fai CM, Huang YF, El-Shafie A. Suspended sediment load prediction using artificial neural network and ant lion optimization algorithm. Environ. Sci. Pollut. Res. Int. 30, 38094-38116 (2020).\u003c/li\u003e\n\u003cli\u003eAgatonovic-Kustrin S, Beresford R, Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J. Pharm. Biomed. Anal. 22 (5),.717-27 (2000).\u003c/li\u003e\n\u003cli\u003eLeCun YA, Bottou L, Orr GB, Muller KR. Efficient backprop, in: Neural Networks: Tricks of the Trade \u0026ndash; Second Edition, 9\u0026ndash;48 (2012).\u003c/li\u003e\n\u003cli\u003eNair V, Hinton GE, Rectified linear units improve restricted boltzmann machines, in: Proc. - Int. Conf. Mach. Learn. 807\u0026ndash;814 (2010).\u003c/li\u003e\n\u003cli\u003eSokolova M, . Lapalme G. A systematic analysis of performance measures for classification tasks., \u003cem\u003eInf. Process Manage\u003c/em\u003e. 45, 427\u0026ndash;437 (2009).\u003c/li\u003e\n\u003cli\u003eTing KM, Confusion Matrix. In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. (2017).\u003c/li\u003e\n\u003cli\u003eFawcett T. An introduction to ROC analysis, \u003cem\u003ePattern Recognition Letters\u003c/em\u003e, 27, 861\u0026ndash;874 (2006).\u003c/li\u003e\n\u003cli\u003ehttps://www.drugbank.com/datasets [access date: 11 october 2023]\u003c/li\u003e\n\u003cli\u003ehttps://wiki.nci.nih.gov/display/NCIDTPdata/Compound+Sets [access date: 12 october 2023]\u003c/li\u003e\n\u003cli\u003ehttp://zinc15.docking.org [access date: 15 october 2023]\u003c/li\u003e\n\u003cli\u003eLipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. \u003cem\u003eAdv. Drug. Deliv. Rev\u003c/em\u003e. 46, 3-26 (2001).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Deep Learning, Bacillus Anthrax, Scaffold analysis, Chemical analysis, Actif cliff, Principal component analysis, t-SNE Fingerprint analysis, phytochemical database, SMOTE","lastPublishedDoi":"10.21203/rs.3.rs-5315945/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5315945/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAnthrax is a highly lethal disease caused by Bacillus anthracis. Lethal factor (LF) with protective antigen directly contributes to anthrax symptoms in humans. This research work identified a small molecule inhibitors of anthrax lethal factor. We developed a consolidated computational strategy that includes a deep learning-based SMOTE\u0026thinsp;+\u0026thinsp;artificial neural network (ANN) hybrid model, principal component analysis, t-SNE, activity cliff, constellation plot, scaffold, and fingerprinting to identify potential drug candidates against Anthrax. The best model showed 0.98 accuracy, 0.99 specificity, 0.99 sensitivity, 0.99 F1-score, 0.99 recall, 0.99 ROC, and 0.99 precision. The trained hybrid model screened out 134 FDA-approved drugs, 338 experimental drugs, 51 phytochemical compounds of the phytochemical database, and eight natural products from NCI divest IV as anthrax inhibitors. We found scaffold of ring system with substitution patterns such as 4-oxopyrrolo[3,2-c]quinolone enhanced the biological activity of Anthrax inhibitors. Fingerprints indicated greater than 80% and are linked to the ring system using the substitution pattern scaffold. These studies conclude that SMOTE\u0026thinsp;+\u0026thinsp;ANN model could be an efficient method for the virtual screening of large database and a new way to screen small molecules against Anthrax.\u003c/p\u003e","manuscriptTitle":"Deep Learning-based Classification Model using SMOTE Resampling Technique to Identify Potent Inhibitors of Lethal Factor of Anthrax and Principal Component, Chemical Space Analysis","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-11-15 09:46:30","doi":"10.21203/rs.3.rs-5315945/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a59b29ce-add2-4441-865a-03f52eaf51a1","owner":[],"postedDate":"November 15th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-05-06T13:23:43+00:00","versionOfRecord":[],"versionCreatedAt":"2024-11-15 09:46:30","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5315945","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5315945","identity":"rs-5315945","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00