AI-SCAN: A Scalable AI-Driven IDS for Cyber Threat Detection in Cloud Environments

preprint OA: closed
Full text JSON View at publisher
Full text 42,048 characters · extracted from preprint-html · click to expand
AI-SCAN: A Scalable AI-Driven IDS for Cyber Threat Detection in Cloud Environments | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 17 April 2025 V1 Latest version Share on AI-SCAN: A Scalable AI-Driven IDS for Cyber Threat Detection in Cloud Environments Authors : Khatha Mahendar and Gandla Shivakanth 0000-0001-6787-6929 [email protected] Authors Info & Affiliations https://doi.org/10.22541/au.174491153.30619618/v1 378 views 169 downloads Contents Abstract 5. Conclusion Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract AI-SCAN is a CNN-based scalable Intrusion Detection System (IDS) that detects known and unknown cyber-attacks with minimal false positives. AI-SCAN is created to solve contemporary cybersecurity challenges, employing a systematic approach involving data acquisition, preprocessing, feature selection, class balancing, model design, training, and evaluation. The model utilizes the CSE-CICIDS2018 dataset, a benchmark dataset mimicking real-world cloud network traffic with varied attack patterns, to train and test its performance. Using techniques like Z-score normalization, SMOTE class balancing (Synthetic Minority Oversampling Techniques), and a customized CNN architecture that distinguishes between malicious and legitimate network traffic, the model detects attacks with state-of-the-art accuracy. Measures of accuracy, precision, recall, and F1-score demonstrate that AI-SCAN outperformed the current IDS models with a 97.5% accuracy in detecting attacks and high sensitivity to uncommon and novel attack patterns. Balancing strategies and architecture guarantee scalability, robustness, and applicability for deployment in dynamic cloud environments. 1. Introduction IDSs are widely acknowledged as essential tools for protecting computer networks from harmful activity, such as hacking or unauthorized access. IDSs monitor network environments and can identify abnormal behaviours that may help eliminate potential threats. However, traditional IDSs, especially the ones using network-based approaches, have known weaknesses, which include the persistently high rate of false positives of normal and benign activities as malicious. This creates a huge flood of unnecessary alerts overwhelming analysts, wasting valuable time, and reducing efficiency. In the worst cases, this volume of false alerts may delay and postpone the proper detection and response to actual threats, allowing attackers to exploit those same vulnerabilities. The shortcomings of conventional IDSs in addressing these problems point to an urgent need for new approaches that might answer modern cybersecurity challenges [1][2]. Other legacy IDSs have a problem with false positives and are weak in detecting sophisticated and evolving types of cyberattacks. Legacy systems normally depend on signature-based detection techniques, which are inherently bounded in their ability to identify unknown or zero-day attacks. This technique depends on pre-defined patterns that do not detect threats that are not in the known patterns of behaviour. Thus, networks are left vulnerable to emerging vulnerabilities. In addition, because static detection methods are employed so intensively, false alarms tend to propagate, and it becomes increasingly difficult for administrators to sift through the noise to identify true threats. Matters are further exacerbated by the performance bottlenecks that occur whenever traditional IDSs are used in the analysis of large volumes of network traffic found in modern environments. These systems fail to scale properly or handle complex data streams in real-time, further eroding their value in the fast-paced cybersecurity world of today [3]. While much progress has been made with IDS technology, contemporary systems continue to struggle in terms of detection and response. The false positive rates are very high and the most important issue as normal network activities create many alarms that tend to draw security personnel’s attention away from actual risks. Most of the existing IDSs are also based on signature-based detection models that are inherently reactive and incapable of detecting new attack patterns. This is particularly a problem in a world where attackers are becoming more sophisticated and employing mutated or unseen tactics to avoid detection. These vulnerabilities highlight a key gap in the IDS’s ability to protect against current threats and indicate the requirement for more adaptive and forward-looking detection methodologies [4]. The greatest limitation of the current IDSs is the lack of dynamic adaptability. In the world of cyber threats, systems have to identify and respond to new attack patterns without constant updates from humans. Most of the existing IDSs, however, are still based on rigid rule-based or signature-based frameworks that make it difficult to keep abreast of attackers’ changing strategies. Once a vulnerability is exploited, attackers can bypass the system until new detection rules are implemented, thereby creating a window of opportunity for malicious activity. This failure to evolve dynamically with rising threats not only degrades the efficiency of the IDSs but also poses an increased chance for extended exposure periods to attacks. With the increase in cyber crimes creating novel and more inventive ways of breaking network defences, the fixed characteristic of traditional IDSs becomes liability-prone too [5][6]. Another major limitation associated with traditional models of IDSs is their opacity or the ”black box” nature of the system. The lack of explainability in these systems makes it challenging for security teams to interpret and respond to alerts. Without an understanding of how the system arrives at its conclusions, it is difficult to trust its outputs or validate its decisions. This opacity erodes the trust in the system and complicates the process of refining models over time for detection. Explainability is particularly important in operational contexts where precise and transparent reasoning is required to ensure the reliability and accountability of security decisions. The inability to explain why certain behaviours are flagged as suspicious not only erodes trust but also adds complexity to the investigation and mitigation processes [7]. With the proliferation of IoT technologies, challenges to IDSs have further multiplied. IoT devices are deployed in various environments and introduce new attack vectors such as DDoS attacks, Mirai botnet attacks, and other tampering that could lead to degradation in network performance. These pose the need for more robust and scalable security measures in IoT ecosystems. Thus, there is a huge gap that malware persons are exploiting these weaknesses of IoT devices in launching mammoth attacks. IDSs have to be adapted to be effective in high-traffic and distributed environments. The complexity of network traffic is adding up with the scale of the data flow, which is posing great challenges to traditional IDSs due to the loss in performance and scalability. These traditional IDSs are mainly not capable of meeting the performance and scalability requirements for modern demands [8]. The classification of security attacks into active and passive categories demonstrates the diversity of the challenges that IDSs have to overcome. In the case of active attacks, for instance, Distributed Denial-of-Service and message spoofing, operations tend to be disrupted, whereas they are difficult to detect primarily because of their aggressive nature. However, in the case of passive attacks, such as eavesdropping and traffic analysis, monitoring and taking data without changing its content is paramount. These passive techniques allow attackers to remain undetected for extended periods, increasing the potential for data breaches and other security incidents. To combat these threats effectively, IDSs must incorporate real-time anomaly detection capabilities that can identify and block malicious activities before they cause significant harm [9][10]. Despite incremental improvements, many existing IDSs are restricted in their capability to classify various types of attacks simultaneously. Due to the simple binary classification models, these systems cannot capture complexity in modern network environments where threats are varied and overlapping. This is what makes the current detection models inferior and calls for more advanced detection models that will be able to classify multiple attacks accurately and with reliability, regardless of the continuously evolving and unpredictable threat landscape. Addressing these challenges requires a fundamental shift in how IDSs are designed and implemented, with a focus on scalability, adaptability, and explainability. To this end, we propose AI-SCAN, an advanced IDS model, that improves detection accuracy, scalability, and adaptability using the powerful CNN models. Specifically, our work is aimed at improving the detection of not only known cyber threats but also novel ones while mitigating the high false positive rates reported with traditional IDSs. AI-SCAN integrates hyperparameter tuning, class balancing methodologies, and sophisticated feature selection techniques to ensure robust and efficient detection within dynamic cloud environments. These facts make the model analyze large-scale network traffic data and classify multiple types of attacks, making it a highly capable solution for modern cybersecurity challenges. 2. Literature review The rapid diffusion of IoT technology is creating transformative capabilities that redefine technological platforms and connect devices to bring innovation in numerous sectors. IoT technologies create complex systems for the collection and sharing of data. IoT technologies are used by many businesses to provide automation and optimization in the fields of transportation, healthcare, home automation, and industrial applications. IoT devices are essential to healthcare because they allow for patient condition monitoring, which improves results and reduces costs [8]. Similarly, smart homes depend on IoT to provide better security, efficient energy consumption, and convenience through connected appliances. Industries use IoT in predictive maintenance, asset tracking, and process optimization, which increases productivity and reduces possible losses from equipment failure. In transportation, IoT allows for smart traffic management, fleet coordination, and self-driving vehicles, which increase safety and operational efficiency [11]. These applications underscore the necessity of IoT in advancing society and improving quality of life. Nevertheless, the large-scale adoption of IoT comes with severe cybersecurity threats. The fundamental threats to IoT security result from the expansion of the attack surface and in addition from the diversity of connected devices. Many IoT devices, having restricted processing abilities, become vulnerable to attacks while being otherwise hard to secure effectively. Common vulnerabilities include poor authentication protocols, non-encryption, and poor security updates, which make IoT systems vulnerable to malicious actors looking for full control of these devices [12]. Furthermore, the highly distributed nature of IoT networks complicates the implementation of robust security measures, as devices often interact with heterogeneous systems in complex environments. DDoS attacks are of huge concern because they involve compromised IoT devices, which are then used to flood targeted servers with unnecessary requests. The other issue is the humongous amount of data produced by the IoTs, and this data needs protection from unauthorized access or breaches. This issue is therefore in need of some robust cybersecurity mechanisms that deal with the specific characteristics of IoT environments [13]. Given its ability to identify and remove threats autonomously in real-time, artificial intelligence (AI) has emerged as a game-changing technology for enhancing cybersecurity. Examples of machine learning algorithms that may analyze traffic patterns and identify anomalies that can indicate security threats include neural networks, decision trees, and support vector machines. For example, network traffic can be monitored by AI-based intrusion detection systems to spot criminal activities and alert companies to potential dangers [9]. Threat intelligence is further facilitated by AI, which makes it possible to identify new cyberthreats by automating the examination of massive volumes of threat data from many sources. AI is especially useful in tackling the dynamic and ever-changing nature of cyber threats in IoT situations because of its agility and scalability. However, the constant evolution of these threats necessitates regular updates and refinements to AI models, which may reduce their effectiveness over time if not managed properly [10]. The integration of cybersecurity with sustainable IoT networks has become increasingly significant, particularly in advancing Sustainable Development Goals (SDGs). Protecting IoT systems from cyber threats is essential to avoid disruptions that can negatively impact critical infrastructures, with consequences for both the environment and the economy. AI-based solutions, such as the Adaptive Flexible Weighted AdaBoost (AF-WAdaBoost) model, enhance IoT cybersecurity while promoting sustainability by reducing the frequency of system replacements caused by cyberattacks [14]. These advancements are in line with SDG 9: Infrastructure, Industry, and Innovation. Sustainable industrialization is made easier by the advancements, which also build robust infrastructure [15]. IoT security in place also reduces the environmental impacts of cyber incidents, which contributes to SDGs 13 on Climate Action by reducing the cost of environmental recovery endeavours. This suggests an essential role for advanced AI in making IoT ecosystems both technically and environmentally sustainable [16] [17]. Research by [18] on the theoretical interaction between sustainable development and cybersecurity within networks among organizations brings out the aspect that cybersecurity has increasingly become a major enabler of green technological development. The fear of cyber threats may even prevent organizations from automating procedures, thus further delaying the actualization of sustainability. Likewise, a thorough review of the relationship between cybersecurity and green technology is conducted by [19], with a focus on how both contribute to sustainability goals. Challenges including the swift evolution of cyberthreats and the incorporation of strong security measures into sustainable technology are highlighted in the report. It concludes that a dual focus on sustainability and security is necessary for fostering technological advancement while preserving environmental benefits. Dynamic systems that can adapt to the shifting nature of threats without human intervention are necessary to meet these problems. Although IoT technologies have vast potential, they come with challenges related to cybersecurity, which are very different from traditional network systems. In many ways, IoT networks require much higher energy efficiency, safety, and performance compared to conventional network systems, implementing typical security protocols much more challenging [20]. Some applications of AI to enhance IoT cybersecurity have indeed shown promise but have limitations in various areas. For example, the absence of large, representative datasets hinders the development of AI models that can address the complexity and diversity of IoT threats [11]. Most of the available datasets are outdated, overly generalized, or insufficiently comprehensive, thus limiting their utility for real-world applications. In addition, the computational demands of AI models often surpass the capabilities of IoT devices, creating barriers to their deployment. The opaque nature of many AI-driven systems also poses challenges because security analysts might not be able to trust or interpret the models’ decisions. Further highlighting the necessity of strong and interpretable AI models are adversarial attacks, in which malevolent actors alter inputs to trick AI systems [9] [12]. To bridge the above gaps, AI-SCAN is proposed as an innovative solution aimed at improving the level of cybersecurity in dynamic cloud environments. Unlike most existing IDS models, AI-SCAN uses the CNN architecture in the detection of known and unknown cyber threats without generating false positives. It has utilized the benchmark dataset CSE-CICIDS2018 simulating real-world cloud network traffic to ensure detection of diverse attack scenarios with high accuracy. The proposed AI-SCAN overcomes the limitations of traditional IDSs through techniques like Z-score normalization, SMOTE for class balancing, and a specially designed CNN structure that would help in handling the issues related to the representativeness of datasets, model interpretability, and computational feasibility. AI-SCAN is also scalable and robust. This makes it particularly suitable for a cloud environment in which network traffic may be dynamic or hard to predict. Unlike the traditional AI-driven IDS systems, which must be time and again retrained to remain useful, AI-SCAN relies on adaptive learning, meaning continuance of relevance without requiring too much human interference. The architecture of the model has reduced the false positive rate dramatically, thus obviating one of the biggest open issues in IDS research. AI-SCAN integrated into IoT security frameworks will be beneficial to organizations with an intrusion detection system that is not only accurate and scalable but also sustainable in the long term. Although IoT technologies harbour enormous potential in the transformation of industries and overall quality of life, generalized adaptation presents major challenges regarding cybersecurity. In this scenario, innovative approaches to solving them are required; thus, one of the hopeful directions forward from the challenges comes from AI-based IDSs as they include flexibility, scalability, and precision and help secure ecosystems. AI-SCAN bridges these already existing cybersecurity frameworks by addressing crucial gaps such as limitations in dataset size, available computational power, and the number of false positives that can occur during a detection system, setting a new standard for modern IDS systems. With the assurance of strong AI-driven approaches associated with sustainability, AI-SCAN reassures advanced cybersecurity for dynamic cloud environments and supports world development goals. 3. Proposed methodology The proposed model introduces a scalable AI-driven IDS with reduced false positives, thereby effectively detecting known and novel cyber threats in dynamic cloud environments. This is because of the CNN architecture adopted, along with the use of the CSE-CICIDS2018 dataset [24], considered a contemporary benchmark dataset. The development process for the proposed IDS can be understood as a systematic approach composed of seven major stages, comprising data acquisition; preprocessing; feature selection; handling class imbalance; model design; training; and final performance evaluation. All these stages contribute to the development of a strong and efficient system that will overcome modern cyber threats. The selection of the CSE-CICIDS2018 dataset was based on its thorough simulation of actual cloud network traffic and a variety of attack scenarios. Developed in an AWS cloud environment, this dataset has both normal and malicious flows that will be very relevant for the evaluation of IDS models designed for cloud-based applications. It has seven different scenarios simulating various types of network intrusions. Table 1 summarizes these scenarios, reflecting the diversity in the included attack types. To put it another way, using up-to-date and sophisticated cyberattack data will enable the IDS to more accurately identify new threats and improve generalization. Table 1: Attack Scenarios in the CSE-CICIDS2018 Dataset [24] 1 Brute-force attacks on SSH and FTP 2 Denial of Service (DoS) attacks 3 Web-based attacks (SQL injection) 4 Botnet attacks 5 Infiltration of the network 6 Port scanning and probing activities 7 Distributed Denial of Service (DDoS) A key factor in obtaining AI-SCAN’s peak performance was feature selection. Source and destination IP addresses, port numbers, protocol kinds, flow durations, packet lengths, payload sizes, and connection states were among the features that were chosen. As a result, these characteristics are useful for differentiating between benign and malevolent networking behaviour. These chosen features are crucial for improving the classification accuracy of a certain data collection by reducing superfluous data. Recursive feature elimination (RFE) and statistical tests were used to evaluate feature relevance, making sure that only the most pertinent qualities were kept for model training. This method improved real-time threat detection while lowering computational overhead. The dataset analysis revealed a very significant class imbalance problem; here, the normal class largely dominates all attack classes. Such imbalance may lead to biased model predictions and poor detection of minority attack types. Table 2 gives the attack classes and their sample distributions wherein the imbalance between classes is well depicted. Table 2: Attack Classes and Sample Distribution in the CSE-CICIDS2018 Dataset Normal 1,200,000 Brute Force SSH 22,000 Brute Force FTP 18,000 DoS 45,000 SQL Injection 9,000 Botnet 36,000 Infiltration 7,000 Port Scan 50,000 DDoS 60,000 To address this imbalance, hybrid resampling techniques were applied to the dataset. First, an under-sampling technique was utilized to decrease ”normal” class samples. Thereafter, a SMOTE algorithm was implemented over the minority classes of attack where synthetic samples had been generated to enable the model to generalize at its best as well as to find rare cyberattacks. Figure 1 illustrates the class distribution before and after resampling, which shows the successful resolution of class imbalance. Figure 1: Workflow diagram To improve the quality and dependability of the input data, data preprocessing was carried out after the dataset was collected. The raw dataset had some problems like redundant records, missing values, outliers, and irrelevant categorical features that had to be taken care of to obtain optimal performance. In the cleaning phase, all null, infinity, and non-informative values were removed to prevent adverse impact on the model. This was done through label encoding, where attack labels were taken as categorical variables, and translated into numerical formats. One-hot encoding was achieved to make them amenable for easy processing through the model. The data would then be standardised to convert all features to a uniform range. This used the Z-score normalization formula: \(Z=\frac{(x-\mu)}{\sigma}\) (1) The dataset was transformed, with x standing for the feature value, μ for the mean, and σ for the standard deviation, so that all features had a mean of zero and a standard deviation of one. With standardization, all input features contribute equally throughout training, which speeds up model convergence. Performance optimization for AI-SCAN was achieved by hyperparameter tuning. To find the optimal learning rate, batch size, and optimizer combination, a grid search approach was used. In the range of 10-3 to 10-5, the learning rate was adjusted, with the most steady convergence occurring at 0.001. Batch size optimization was carried out with values of 32, 64, and 128, which showed that the best performance was achieved with 128, concerning both training efficiency and generalization. The adaptive learning rate adjustments in the Adam optimizer ensured faster convergence than the traditional optimizers like SGD. The systematic tuning made sure AI-SCAN had superior classification accuracy with an undertone of minimum false positives. The heart of the proposed model lies in its CNN architecture, specially designed to handle sequential network traffic data. With a kernel size of six and three Conv1D layers with 64 filters each, this CNN uses ReLU (Rectified Linear Unit) activation algorithms. The model can more successfully distinguish between benign and malevolent activity thanks to the output of these convolutional layers, which are hierarchical spatial and temporal patterns taken from the input network traffic data. After each Conv1D layer, batch normalization is applied to stabilize training, whereas MaxPooling1D layers apply down-sampling, reducing complexity while retaining features. Two completely linked Dense layers follow the convolutional layers, after which a flattened layer transforms the feature mappings into a one-dimensional vector. High-level feature learning is facilitated by the 64 units with ReLU activation that make up each Dense layer. A Dropout layer with a 0.5 dropout rate is inserted between the Dense layers to prevent overfitting by randomly deactivating neurons during training. Finally, class probabilities are generated by a Dense output layer with SoftMax active, enabling the classification of network data into numerous classes. Table 3 lists the primary parameters of the proposed CNN model. Table 3: Parameters of the Proposed CNN Model Conv1D (3 Layers) 64 filters, kernel size = 6, ReLU Batch Normalization Applied after each Conv1D layer MaxPooling1D Pool size = 2 Flatten Converts feature maps to a vector Dense (2 Layers) 64 units, ReLU activation Dropout Rate = 0.5 Dense (Output) SoftMax activation, multi-class output The independent test dataset was used for the evaluation of AI-SCAN, and all the metrics of accuracy, precision, recall, and F1 score were computed. To ascertain the superiority of AI-SCAN over the other IDS models, confidence intervals of each metric at a 95% confidence level were computed. A paired t-test was performed to compare the performance of AI-SCAN against the baseline models, and results showed that AI-SCAN improved detection accuracy while reducing false positives statistically. These aggressive evaluation methods generate strong evidence regarding the effectiveness and reliability of AI-SCAN intrusion detection. Figure 2: Class-wise Distribution the CES-CICIDS2018 dataset. Figure 2 illustrates the distribution of attack classes before and after applying the SMOTE technique. The pre-resampling distribution shows the predominance of the ’Normal’ class, while the post-resampling distribution shows a balanced dataset, thus improving the generalization of the IDS model. 4. Implementation &Results The AI-SCAN implementation uses Python to create a CNN-based intrusion detection system to identify cyberthreats in dynamic cloud settings. Normalization is a phase in the process of encoding and cleaning the categorical features of the CSE-CICIDS2018 dataset. SMOTE addresses the problem of class imbalance by ensuring that the attack classes are represented. CNN models are constructed using convolutional, pooling, dropout, and dense layers for multi-class classification. It works well and is scalable. Performance indicators such as accuracy, precision, recall, F1-score, and confusion matrix are used to assess the model. Figure 3: Confusion matrix The confusion matrix will provide a very clear picture of categorization outcomes across various assault classes, as shown in Figure 3. The matrix’s off-diagonal members indicate misclassifications, such as False Positives and False Negatives, but the diagonal elements show accurate predictions or True Positives. With few misclassifications, the Normal traffic class—which makes up the majority of the dataset—displays outstanding prediction ability. Minor confusion is observed between attack classes such as Brute Force SSH and Brute Force FTP , as well as between Port Scan and DoS , due to the similarity in their traffic patterns. Despite this, the misclassification rates remain negligible compared to the true positive counts. The matrix highlights the model’s capability to maintain high classification accuracy across all classes, including rare attacks like SQL Injection and Infiltration , which are typically underrepresented. This demonstrates the effectiveness of the model’s class balancing strategy (SMOTE and under-sampling) combined with its robust CNN architecture. Figure 4(a): Performance metrics of proposed CNN-based IDS and Figure 4(b) Per-Class Performance Analysis of the Proposed ID For a detailed comparison of the suggested model with current intrusion detection systems, Table 4 offers a performance comparison. Traditional machine learning-based models, such as the Random Forest-based IDS, achieve 92.3% accuracy but suffer from low recall values (85.2%) because of their inability to learn sophisticated patterns in network traffic. Such models work on manually defined features and find it difficult to work with large-scale network data in real time, leading to performance bottlenecks. Deep learning models, such as RNN and LSTM-based IDSs, achieve higher accuracy (94.1% and 95.5%, respectively) by learning sequential patterns in network traffic. However, they suffer from the problem of increased training times and may not be able to manage long-term dependencies, which affect their performance in detecting some patterns of attacks efficiently. Moreover, RNN-based IDS models suffer from a lack of robustness in the form of vanishing gradients, which lowers their performance on highly imbalanced datasets. The baseline CNN-based IDS achieves 96.2% accuracy, which shows the strength of the use of convolutional layers to extract spatial and temporal features in network traffic. However, AI-SCAN goes a step ahead by leveraging sophisticated feature selection, SMOTE-based class balancing, hyperparameter optimization, and optimized CNN architecture, which leads to a remarkable boost in detection performance. AI-SCAN outperforms all the compared models with an accuracy of 97.5% and improved precision (95.8%) and recall (94.3%), indicating its ability to differentiate between types of attacks and suppress false alarms. The employment of Z-score normalization, adaptive CNN architecture, and class imbalance handling techniques makes AI-SCAN maintain high generalization ability while detecting known and unknown cyber-attacks. Such improvements make AI-SCAN extremely adaptive to dynamic cloud environments, where network conditions are dynamic, and intrusion detection should be accurate and scalable. Table 4: Comparison of Performance Metrics with Existing IDS Models Traditional ML-Based IDS (Random Forest) 92.3 89.7 85.2 87.4 [21] RNN-Based IDS 94.1 91.6 90.2 90.9 [21] LSTM-Based IDS 95.5 93.8 92.1 92.9 [22] CNN-Based IDS (Baseline) 96.2 94.4 93.5 93.9 [23] Proposed AI-SCAN (CNN-Based IDS) 97.5 95.8 94.3 95.0 Current study 5. Conclusion This study introduces AI-SCAN, AI-driven scalable IDS that effectively addresses the limitations of traditional and modern intrusion detection models in dynamic cloud environments. By leveraging a CNN architecture tailored for sequential network traffic analysis, AI-SCAN achieves exceptional accuracy in detecting known and novel cyber threats, minimizing false positives and ensuring robust classification across all attack classes. The use of the CSE-CICIDS2018 dataset, comprehensive preprocessing, feature selection, and hybrid class balancing techniques (SMOTE and under-sampling) enables the model to generalize well to complex and imbalanced datasets. With 97.5% accuracy and notable gains in precision, recall, and F1-score over current techniques, evaluation metrics validate AI-SCAN’s better performance. The model’s capacity to identify even uncommon and underrepresented attacks is demonstrated by the confusion matrix and per-class performance analysis, guaranteeing thorough threat identification. Because of its efficiency, scalability, and adaptability, AI-SCAN is a viable real-time intrusion detection solution for cloud environments, providing a path to improved network security in the face of changing cyber threats. To improve detection speed and real-time scalability, future research should investigate additional model design optimization and integration with other AI techniques. References: 1. J. M. Kizza, System Intrusion Detection and Prevention, in Guide to Computer Network Security, Springer, Verlag London, 2024, pp. 295–323. 2. Z. Ahmad, A. S. Khan, C. W. Shiang, J. Abdullah, and F. Ahmad, “Network intrusion detection system: A systematic study of machine learning and deep learning approaches,” Trans. Emerg. Telecommun. Technol., vol. 32, no. 1, p. e4150, 2021. 3. V. Hnamte and J. Hussain, “Dependable intrusion detection system using deep convolutional neural network: A novel framework and performance evaluation approach,” Telematics Inform. Rep., vol. 11, p. 100077, 2023. 4. M. Masdari and H. Khezri, “A survey and taxonomy of the fuzzy signature-based intrusion detection systems,” Appl. Soft Comput., vol. 92, p. 106301, 2020. 5. P. Panagiotou, N. Mengidis, T. Tsikrika, S. Vrochidis, and I. Kompatsiaris, “Host-based intrusion detection using signature-based and AI-driven anomaly detection methods,” Inf. Secur. Int. J., vol. 50, no. 1, pp. 37–48, 2021. 6. C. J. Chahira, “Model for improving performance of network intrusion detection based on machine learning techniques,” Ph.D. dissertation, Kabarak Univ., 2019. 7. Q. Liu, V. Hagenmeyer, and H. B. Keller, “A review of rule learning-based intrusion detection systems and their prospects in smart grids,” IEEE Access, vol. 9, pp. 57542–57564, 2021. 8. N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),” in 2015 Military Communications and Information Systems Conference (MilCIS), IEEE, Canberra, pp. 1–6, 2015. 9. G. Baldini and I. Amerini, “Online distributed denial of service (DDoS) intrusion detection based on adaptive sliding window and morphological fractal dimension,” Comput. Netw., vol. 210, p. 108923, 2022. 10. B. Dong and X. Wang, “Comparison of deep learning methods to traditional methods for network intrusion detection,” in 2016 8th IEEE International Conference on Communication Software and Networks (ICCSN), IEEE, Beijing, pp. 581–585, 2016. 11. N. Moustafa, N. Koroniotis, M. Keshk, A. Y. Zomaya, and Z. Tari, “Explainable intrusion detection for cyber defences in the Internet of Things: Opportunities and solutions,” IEEE Commun. Surv. Tutorials, vol. 25, no. 3, pp. 1775–1807, 2023. 12. T. N. Dao, D. V. Le, and X. N. Tran, “Optimal network intrusion detection assignment in multi-level IoT systems,” Comput. Netw., vol. 232, p. 109846, 2023. 13. M. Bhavsar, K. Roy, J. Kelly, and O. Olusola, “Anomaly-based intrusion detection system for IoT application,” Discover Internet Things, vol. 3, no. 1, p. 5, 2023. 14. J. Arshad, M. A. Azad, M. M. Abdeltaif, and K. Salah, “An intrusion detection framework for energy-constrained IoT devices,” Mech. Syst. Signal Process., vol. 136, p. 106436, 2020. 15. H. Liu, C. Zhong, A. Alnusair, and S. R. Islam, “FAIxID: A framework for enhancing AI explainability of intrusion detection results using data cleaning techniques,” J. Netw. Syst. Manag., vol. 29, no. 4, p. 40, 2021. 16. L. Awalbeh, F. Muheidat, M. Tawalbeh, and M. Quwaider, ”IoT privacy and security: Challenges and solutions,” Appl. Sci., vol. 10, p. 4102, 2020. 17. S.-H. Lee, Y.-L. Shiue, C.-H. Cheng, Y.-H. Li, and Y.-F. Huang, ”Detection and prevention of DDoS attacks on the IoT,” Appl. Sci., vol. 12, p. 12407, 2022. 18. A. Alghamdi, A. M. Al Shahrani, S. S. AlYami, I. R. Khan, P. A. Sri, P. Dutta, A. Rizwan, and P. Venkatareddy, ”Security and energy efficient cyber-physical systems using predictive modeling approaches in wireless sensor network,” Wirel. Netw., vol. 30, pp. 5851–5866, 2024. 19. I. H. Sarker, M. H. Furhad, and R. Nowrozy, ”AI-driven cybersecurity: An overview,” Secur. Intell. Model. Res. Dir., vol. 2, p. 173, 2021. 20. E. Simon, B. Ayyoob, and J. Sharifi, ”Environmentally sustainable smart cities and their converging AI, IoT, and big data technologies and solutions: An integrated approach to an extensive literature review,” Energy Inform., vol. 6, p. 9, 2023. 21. N. Khan, M. I. Mohmand, S. U. Rehman, Z. Ullah, Z. Khan, and W. Boulila, “Advancements in intrusion detection: A lightweight hybrid RNN-RF model,” PLoS One, vol. 19, no. 6, p. e0299666, 2024. DOI: 10.1371/journal.pone.0299666. Retraction in: PLoS One, vol. 20, no. 2, p. e0319019, 2025. DOI: 10.1371/journal.pone.0319019. 22. T. S. Chu, S. S. Nair, and G. Lakshmikanthan, “Network Intrusion Detection Using Advanced AI Models: A Comparative Study of Machine Learning and Deep Learning Approaches,” Int. J. Commun. Netw. Inf. Secur., vol. 14, no. 2, pp. 359–365, 2022. Available: https://ijcnis.org/index.php/ijcnis/article/view/7708. 23. O. A. Ayeni, S. C. Ewa, and O. Owolafe, “Convolutional Neural Network Based Model for Intrusion Detection,” Int. J. RFID Secur. Cryptogr. (IJRFIDSC), vol. 6, no. 1, pp. 215–222, 2023. 24. A. Sharafaldin, I. Habibi Lashkari, and A. A. Ghorbani, ”CSE-CIC-IDS2018: A Dataset for Intrusion Detection Systems,” Canadian Institute for Cybersecurity (CIC), University of New Brunswick (UNB), 2018. [Online]. Available: https://www.unb.ca/cic/datasets/ids-2018.html Information & Authors Information Version history V1 Version 1 17 April 2025 Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords class imbalance handling cnn cyber threat detection dynamic cloud environments ids Authors Affiliations Khatha Mahendar Koneru Lakshmaiah Education Foundation View all articles by this author Gandla Shivakanth 0000-0001-6787-6929 [email protected] Koneru Lakshmaiah Education Foundation View all articles by this author Metrics & Citations Metrics Article Usage 378 views 169 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Khatha Mahendar, Gandla Shivakanth. AI-SCAN: A Scalable AI-Driven IDS for Cyber Threat Detection in Cloud Environments. Authorea . 17 April 2025. DOI: https://doi.org/10.22541/au.174491153.30619618/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.174491153.30619618/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9ffc76b959ae0700',t:'MTc3OTQ1OTY0OQ=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00