A Comparative Study: Between Non Autoencoders IDS and Autoencoder Based IDS Approaches in Network Communication

preprint OA: closed
Full text JSON View at publisher
Full text 65,143 characters · extracted from preprint-html · click to expand
A Comparative Study: Between Non Autoencoders IDS and Autoencoder Based IDS Approaches in Network Communication | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Comparative Study: Between Non Autoencoders IDS and Autoencoder Based IDS Approaches in Network Communication Kanak Giri This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5350806/v2 This work is licensed under a CC BY 4.0 License Status: Posted Version 2 posted You are reading this latest preprint version Show more versions Abstract Intrusion detection is an integrated security issue in the present digital environment. Malicious cyber-attackers can frequently linger in tremendous volumes of regular data in demented network traffic. It has a superior destruct of concealing and opacity in cyberspace, making it challenging for Network Intrusion Detection Systems (NIDS) to aver catching accuracy and timing. The false-positive issue is one of the underlying drawbacks of network intrusion detection systems (NIDS), which are widely engaged to discover threats and safeguard the networks. Imbalance classes and unwarranted material data a reminiscence failure skyway to make false positive, which are inferior in company in the preparation dataset. Feature engineering is also performed in this approach using the Recursive Property Excreting method. In this experiment NSL-KDD Dataset is used. The outcome shows that our approach is finer than different state-of-art approaches in terms of different metrics like Accuracy, Precision, Recall, and F1 Score. IDS (Intrusion Detection System) Autoencoders Cybersecurity Feature Selection Random Forest Deep Learning NSL-KDD Dataset Figures Figure 1 I. Introduction The order of growth in internet users and the use of devices incorporated daily into the network are speedily maturation. With growing users and much devices in the network, the intensity of data assemblage generated and stored increases exponentially. Withal, attacks on the network and data storage person augmented in collateral [1]. This brings in contrastive methodologies for detecting unwelcome events and activities in the network and responding effectively. Cyber security is the technology, techniques, and practices used to defend web systems [2]. The enlarged adoption of wireless networks to transfer large amounts of information has caused a slew of security risks and reclusiveness issues in recent present. Communication Networks succeed a fanlike range of excitable user information assailable to threats from both internecine and outside attackers. These threats may be both manlike and automatic, are spacious, and steadily turn concealment, directing to unmarked seclusion violations [3]. As a conclusion, various prophylactic and defending measures, much as intrusion find systems, Intrusion Detection System (IDS). The Intrusion Spying System (IDS) is a ingredient of cyber surety systems [4]. By analyzing data generated by network devices, intrusion detection systems (IDS) may be used to locate, evaluate, and diagnose intrusions. On the other hand, Anomaly detection approaches use normal system behavior to create regular patterns, identifying anomalies as deviations from the norm. These approaches are pretty exciting since they can rectify all known and undiscovered types of threats [7]. The underlying issue with anomaly detection systems is that they require fine-tuning and have high false-positive rates. In recent years, advanced threat assaults have grown tremendously, but the standard network intrusion detection system relied on filtering of feature has significant limitations that make it challenging to identify new attacks promptly [8]. A large variety of strategies based on machine learning methodologies have been developed. However, they are not particularly effective in detecting all forms of intrusions [9]. ML is a part of Artificial Intelligence and is the systematic investigation of processes, techniques, and mathematical analysis used by computer systems to handle complicated problems. Because intrusion detection is a probabilistic classifier, it may be addressed using machine learning approaches [10]. It has been demonstrated that constructing IDS using ML approaches may result in higher precision; nevertheless, investigations have revealed that the most reliable and realistic IDS has yet to be established. Each IDS approach has its benefits and disadvantages under different settings. The main contributions of this paper are as follows: A Feature Selection module is introduced here, which proposes Recursive Feature Elimination using the Random Forest method to select features participating to the output most. With the help of NSL-KDD dataset, we can analyse the performance of the various state-of-art techniques applied to IDS with and without Feature Engineering. Further, an Autoencoder-based Anomaly Detection model which uses Deep Learning is proposed for effective detection of intrusions in the network. Subsequently, the introduced model is validated using different metrics and the result shows that it perform is a very significant manner compare to previous strategies. The balance content of the paper is structured as follows. Section ii addresses related studies and provides a summary of previous contributions. Section iii illustrates the proposed IDS in depth. Section IV summarises the conclusions of our planned study and suggests potential ideas for expanding on this work. II. Related Approaches This section describes existing intrusion detection systems (IDS) built for various networks that are accessible in the literature. Wisanwanichthan et al. [11] introduced a Hybrid Approach with double layer to observe low-frequency attacks. Uncovering of low-frequency attacks is someway complex compared to new attacks because they turn a slight from native scenarios and existing datasets make significantly lower sign of records of specified attacks. Initially, they did feature engineering using Principal Portion Analysis for feature selection. Then proposed a 2-layer model for attack detection where Stratum 1 is of Naive Bayes Classifier victimized to find DoS and Probe Attack Class & Layer 2 is of SVM which is used to detect R2L and U2R attack class of NSL-KDD Dataset [6]. Liu et al. [12] suggested a layered detection of intrusion technique relied on random forest and k-means algorithms for the records in binary classification; and then classified abnormal records into attack sub-classes with the help of convolutional neural network, long short-term memory, and various distinct deep learning algorithms. They also worked on the issue of unbalancing in this dataset through adaptive synthetic sampling. Experiments are performed on CIC-IDS and NSL-KDD Datasets. Elhefnawy et al. [13] introduced a Hybrid Nested Genetic-Fuzzy Algorithm (HNGFA) framework to identify attacks in which mainly the sample size is small, like R2L and U2R in the KDD'99 dataset and Backdoors and Worms in UNSW-NB15 dataset. The adaptive model is developed with the help of dual-nested Genetic-Fuzzy Algorithms (GFA). Each GFA consists of two-nested Genetic Algorithms, One is for creating the rules and the other is for tuning the rules. This algorithm works on the concept of Survival of the Fittest. The victory of a machine learning model mainly relies on the subjacent parameters on which it is functioning. Choosing the right and optimized feature combination and then tuning the classifier is pivotal, especially in supervised learning. Real-world networks and publicly available datasets may not necessarily have a normal (Gaussian) distribution of data. Instead, changeable distributions are more likely to be garbled. Normalizing the data is a must-do step before feeding it to any classifier to get rid of this skewness so that the detection rate may be improved. For normalization, we do have many alternative approaches to select for. Siddiqi et al. [1] proposed a statistical method for data normalization, which can be used for selecting the efficient normalization method out of the available options for the dataset we have. Five separate datasets were utilized with two different methods of selecting the features to represent the capacity of the suggested methodology. Tang et al. [14] suggested an improved particle swarm optimized online regularized extreme learning machine (IPSO-IRELM) model for detection of intrusion. This model works on the concept of Dynamic adaption of features while training the model and then feeding this learning to a feed forward neural network of single layer with a correction of optimizing the initial weights and deviations. Liu et al. [6] introduced an approach for intrusion detection in uneven internet traffic. They presented the Difficult Set Sampling Technique (DSSTE). Firstly, the unbalanced training set is divided into clusters using Edited Nearest Neighbour and KMeans methods, then clubbed again with its diminution samples to form a more reasonable training dataset for the classifier, especially for low sample records. In order to learn better all the differentiations in classifiers in the training stage and enhance classification accuracy. Fatani et al. [16] introduced an Intrusion Detection model which works with Meta heuristics algorithms and deep learning methodologies. The method is based and uses Convolutional Neural Networks for extraction of features. Selection of features by the TSODE approach, Differential evolution (DE) operators can be combined to achieve the new structure of Transient Search Optimization (TSO) algorithm. As seen in Table-1, several studies have already given multiple intrusion detection strategies which are operating on multiple Learning methodologies. Previous research has committed chiefly developing neural networks and a few machine learning approaches to populate the overall detection impingement. Feature selection and ensemble learning are the two most significant optimising techniques. According to the survey table, only a few systems employ feature selection processes; however, there is still ample opportunity to enhance these proposed research strategies. The following Table 1 represents the various state of the art approaches/ proposed methods for intrusion detection using various datasets without using autoencoder method. In the table content, we can easily check the features and evaluation criteria for the respective approach. Though these methods are efficient in some manner but for the growing intrusion attacks in networks it is very important to deploy an algorithm which can be self-decision maker, self-learner and self-adder of new kind of attacks in dataset. Table 1 Abstract view of the related state-of-art Techniques Authors Proposed Methodology Dataset Used Feature Selection Approach Evaluation Criteria Wisanwanichthan et al.[11] Double-Layered Hybrid Approach (DLHA), Naive Bayes classifier, Support Vector Machine NSL-KDD Pearson Correlation Coefficient (PCC) Accuracy, F Score, Precision, Detection Rate, False Alarm Rate Liu et al.[12] K-Means, Random Forest (RF), Convolutional Neural Network (CNN), Long Short-term Memory (LSTM), Deep learning NSL-KDD, CIS-IDS2017 NA Accuracy, True Positive Rate,, Training Time, Prediction Time, False Positive Rate Elhefnawy et al. [13] Hybrid Nested Genetic-Fuzzy Algorithm (HNGFA), Genetic Algorithms, Fuzzy Logic Systems KDDCUP99, UNSW-NB15 Ranking per Minor/Major using Cross-Validation Accuracy, Precision, F Score, Recall, Rate of false alarm Tang et al. [14] Improved Particle Swarm Optimized Online Regularized Extreme Learning Machine (IPSO-IRELM), UCI Dataset, NSL-KDD NA Precision, True Positive Rate, False Positive Rate, F-Score, AUC Wang et al. [15] Deep Residual Convolution Neural Network KDDCUP99 NA Precision, Recall, False Positive Rate, F Score, Accuracy Liu et al. [6] Difficult Set Sampling Technique (DSSTE) Algorithm, Edited Nearest Neighbor (ENN), KMeans Algorithm NSL-KDD, CSE-CIC-IDS2018 NA Accuracy, Precision, Recall, F-Score Fatani et al. [16] Deep Learning, Metaheuristics (MH) Algorithm, Transient Search Optimization Algorithm using Differential Evolution (TSODE) KDDCup-99, NSL-KDD, BoT-IoT, CICIDS-2017 Convolutional Neural Network Average Accuracy, Average Recall, Average Precision, Performance Improvement Rate Jiang et al. [17] Hybrid Sampling with Deep Hierarchical Network, One-side Selection, Synthetic Minority Over-sampling Technique (SMOTE) NSL-KDD, UNSW-NB15 Neural Network with convolution, Bi-directional memory of Long and Short-Term Accuracy, Precision, Recall, F-Score III. Autoencoder Based Proposed Approach The proposed model's critical steps include dataset pre-processing, classification, and outcome assessment. Each phase of the proposed system is vital and significantly impacts the model's performance. We used the NSL-KDD Dataset (An improvement of the KDD'99 data set), where redundant records are removed to produce a more reasonable dataset. Pre-Processing Total of 41 Features are there with NSL-KDD Dataset on which each set (Normal/Attack) is justified. Out of these 41 features, 3 features are based on category ('protocol_type', 'service' and 'flag'). The features which are in majority of Machine Learning require output or the input variables to be arithmetical in value. This shows that all the date in different categories must be changed to integers. We employed the One Strong and Efficient Encoding approach to translating these categorical features into numerical values. One hot encoding is an essential component of feature engineering for Machine learning. Then features are scaled in a range to avoid features with high values from weighing too heavily in the findings. Feature Selection Module The performance of any machine learning model depends on the underlying features. Feature selection is crucial in finding out specific features that most contribute to each attack type. Non-essential features may cause incorrect conclusions while increasing the computing cost of the classifier. The Random Forest (RF) method is a approach of machine learning that generates decision trees by arbitrarily splitting and combining nodes. The numerous trees vote on the ultimate categorization result. RF will classify normal and exceptional events in the dataset-KDD for the test dataset, and these exceptional events will be further partitioned. Random forest trains a collection of decision trees individually, allowing for parallel training on Spark. It is required to establish the size and extremity of trees in a good manner. Attack Detection Module This study proposes an autoencoder-based anomaly detection methodology. We utilized the NSL-KDD Dataset for this experiment. However, several flaws are associated with this dataset, such as class imbalance. Only 0.04 fraction of the samples in the NSL-KDD training dataset belong to the U2R attack type, making it severely underrepresented. The case is similar for the R2L and probe attack types, making it difficult for classifiers to detect these underrepresented types, resulting in poor accuracy. Another difficulty is that this dataset is unreasonable given the present networking context. Most network traffic is benign, with just a small amount being malicious. At the same time, in the NSL-KDD training set, attack samples account for 80% of the total dataset, rendering models trained with this dataset useless in real-world circumstances. Our autoencoder-based technique aims to address these issues. In this technique, we used a sparse autoencoder with a loss on the inputs. It has an input layer equivalent to the total features, a dropout layer, and a hidden layer of 8 neuron cells. Thus, the autoencoder's hidden representation has a compression of 122/8, requiring it to discover meaningful patterns and relationships between the features. Finally, there is a 122-unit output layer. The ReLU function is the activation of both the concealed and visible layers. To recreate its input, the autoencoder was activated. In other terms, it studies the optimal hyperplane. The system was built using just the "Normal" samples in the training dataset, enabling it to ponder the characteristics of normal conduct. It was achieved by introducing the model that would decrease the error of mean squared between its output and input. The regularization constraints imposed on the autoencoder prevent it from simply copying the input to the output and regularization the data; additionally, the dropout presented on the inputs renders the autoencoder a particular case of a denoising autoencoder, which is trained to restore the input from a disrupted, tainted variant of its own, forcing the autoencoder to learn even more characteristics of the data. Conclusion Based on Deep Learning, this research developed an autoencoder-based anomaly detection technique. The features are chosen using a Recursive Feature Elimination method and the Random Forest Algorithm. We also addressed the problem of Class Variation in the current dataset. The suggested classification model in this research is tested using the Dataset-NSL-KDD. This technique is faster at preparing data and may need less training time, also, it has been likened to a number of cutting-edge procedures. The experimental findings indicate that this approach can extract the correct information from network data, increasing the machine's effectiveness by training it on more critical features. Furthermore, the findings of the performance metrics assessed reveal that this strategy surpasses previous strategies in attack detection with high accuracy and minimal false positives. Further in this existing methodology, more features can be added and how this approach the model can be more efficient in the real world applications. Future researchers can adopt more normalization approaches to enhance the outcomes of an ML-based IDS. References Shi Z, He S, Sun J, Chen T, Chen J, Dong H (2023) An Efficient Multi-Task Network for Pedestrian Intrusion Detection, in IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 649–660, Jan. 10.1109/TIV.2022.3166911 Wang L, Yang J, Workman M, Wan P (April 2022) Effective algorithms to detect stepping-stone intrusion by removing outliers of packet RTTs. Tsinghua Sci Technol 27(2):432–442. 10.26599/TST.2021.9010041 Siddiqi MA, Pak W (2021) An Agile Approach to Identify Single and Hybrid Normalization for Enhancing Machine Learning-Based Network Intrusion Detection. IEEE Access 9(137513):137494. 10.1109/ACCESS.2021.3118361 Elsayed MA, Wrana M, Mansour Z, Lounis K, Ding SHH, Zulkernine M (2022) AdaptIDS: Adaptive Intrusion Detection for Mission-Critical Aerospace Vehicles, in IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 12, pp. 23459–23473, Dec. 10.1109/TITS.2022.3214095 Diro A, Chilamkurti N (May 2018) Distributed attack detection scheme using deep learning approach for Internet of Things. Futur Gener Comput Syst 82:761–768. 10.1016/j.future.2017.08.043 Ghasemi J, Esmaily J, Moradinezhad R (2020) Intrusion detection system using an optimized kernel extreme learning machine and efficient features, Sadhana - Acad. Proc. Eng. Sci., vol. 45, no. 1, Dec. 10.1007/s12046-019-1230-x Halbouni TS, Gunawan MH, Habaebi M, Halbouni M, Kartiwi, Ahmad R (2022) Machine Learning and Deep Learning Approaches for CyberSecurity: A Review. IEEE Access 10:19572–19585. Institute of Electrical and Electronics Engineers Inc. 10.1109/ACCESS.2022.3151248 Al-Daweri MS, Ariffin KAZ, Abdullah S, Senan MFEM (2020) An analysis of the KDD99 and UNSW-NB15 datasets for the intrusion detection system, Symmetry (Basel)., vol. 12, no. 10, pp. 1–32, Oct. 10.3390/sym12101666 Liu L, Wang P, Lin J, Liu L (2021) Intrusion Detection of Imbalanced Network Traffic Based on Machine Learning and Deep Learning. IEEE Access 9:7550–7563. 10.1109/ACCESS.2020.3048198 Chkirbene Z, Erbad A, Hamila R, Mohamed A, Guizani M, Hamdi M (2020) TIDCS: A Dynamic Intrusion Detection and Classification System Based Feature Selection. IEEE Access 8:95864–95877. 10.1109/ACCESS.2020.2994931 Le Jeune L, Goedeme T, Mentens N (2021) Machine Learning for Misuse-Based Network Intrusion Detection: Overview, Unified Evaluation and Feature Choice Comparison Framework. IEEE Access 9:63995–64015. 10.1109/ACCESS.2021.3075066 Akashdeep I, Manzoor, Kumar N (Dec. 2017) A feature reduced intrusion detection system using ANN classifier. Expert Syst Appl 88:249–257. 10.1016/j.eswa.2017.07.005 Jiang K, Wang W, Wang A, Wu H (2020) Network Intrusion Detection Combined Hybrid Sampling with Deep Hierarchical Network. IEEE Access 8:32464–32476. 10.1109/ACCESS.2020.2973730 Wisanwanichthan T, Thammawichai M (2021) A Double-Layered Hybrid Approach for Network Intrusion Detection System Using Combined Naive Bayes and SVM. IEEE Access 9:138432–138450. 10.1109/ACCESS.2021.3118573 Liu C, Gu Z, Wang J (2021) A Hybrid Intrusion Detection System Based on Scalable K-Means + Random Forest and Deep Learning. IEEE Access 9:75729–75740. 10.1109/ACCESS.2021.3082147 Elhefnawy R, Abounaser H, Badr A (2020) A hybrid nested genetic-fuzzy algorithm framework for intrusion detection and attacks. IEEE Access 8:98218–98233. 10.1109/ACCESS.2020.2996226 Tang Y, Li C (2021) An Online Network Intrusion Detection Model Based on Improved Regularized Extreme Learning Machine. IEEE Access 9:94826–94844. 10.1109/ACCESS.2021.3093313 Fatani A, Elaziz MA, Dahou A, Al-Qaness MAA, Lu S (2021) IoT Intrusion Detection System Using Deep Learning and Enhanced Transient Search Optimization. IEEE Access 9:123448. 10.1109/ACCESS.2021.3109081 Jiang H, He Z, Ye G, Zhang H (2020) Network Intrusion Detection Based on PSO-Xgboost Model. IEEE Access 8:58392–58401. 10.1109/ACCESS.2020.2982418 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 2 posted You are reading this latest preprint version Show more versions Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5350806","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":378661618,"identity":"4bae28db-13f9-4a0d-98fd-7777180f32f4","order_by":0,"name":"Kanak Giri","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6UlEQVRIiWNgGAWjYBACCSBmbABSbOzNBx8AOTx8xGvhOZZsANLCRqQWECvHDMRhIKhFsv3swY8zflnk8/EcS6v8mmMnw8bA/PDRDTxapHnykiU39klYtrE3H7stuy0Z6DA2Y+McPFrkGHIMJB/2SBgA/ZJ2W3IbM1ALD5s0Xi38b4x/grUA/VIsua2esBZpoErJDT8gWhg/bjtMWIvkjDdmljMbwA5LlmbcdpyHjZmAXyTO5xjf7PlTZyDf3nzw489t1fb87M0PH+PTAgaMbRCamQdMElIOBn+gWn8QpXoUjIJRMApGGgAAkyhD2gE7vJYAAAAASUVORK5CYII=","orcid":"","institution":"Swami Keshwanand Institute of Engineering Technology \u0026 Gramothan","correspondingAuthor":true,"prefix":"","firstName":"Kanak","middleName":"","lastName":"Giri","suffix":""}],"badges":[],"createdAt":"2024-10-29 03:53:17","currentVersionCode":2,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-5350806/v2","doiUrl":"https://doi.org/10.21203/rs.3.rs-5350806/v2","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":97669671,"identity":"7262c4bc-aa0a-450f-8ff5-617fdd3a4d67","added_by":"auto","created_at":"2025-12-08 09:28:39","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":95019,"visible":true,"origin":"","legend":"","description":"","filename":"Paper.docx","url":"https://assets-eu.researchsquare.com/files/rs-5350806/v2/8b8f91ee5ec4197dceb4e1e7.docx"},{"id":97474431,"identity":"b346e38f-5849-4091-8a0a-38e94cd0ca18","added_by":"auto","created_at":"2025-12-04 18:34:15","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":342,"visible":true,"origin":"","legend":"","description":"","filename":"rs5350806.json","url":"https://assets-eu.researchsquare.com/files/rs-5350806/v2/344bcb22609289f449b61af1.json"},{"id":97670017,"identity":"6449e466-fdbb-4b4f-b080-d71899d3961b","added_by":"auto","created_at":"2025-12-08 09:29:35","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":54309,"visible":true,"origin":"","legend":"","description":"","filename":"rs53508062enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-5350806/v2/7fc51a9e268947716e64875f.xml"},{"id":97670028,"identity":"a8a5ab0d-598c-4c9e-af67-b5f1ef2db9e9","added_by":"auto","created_at":"2025-12-08 09:29:35","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":45163,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-5350806/v2/0df170982438c895eeff45a5.png"},{"id":97474437,"identity":"1bfdbefa-2862-47cf-8781-5885a8cb01ea","added_by":"auto","created_at":"2025-12-04 18:34:15","extension":"xml","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":50967,"visible":true,"origin":"","legend":"","description":"","filename":"rs53508062structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-5350806/v2/8a3856a67033087d4ecd77df.xml"},{"id":97474434,"identity":"fcb02a88-a199-4730-9167-596dcd2fc655","added_by":"auto","created_at":"2025-12-04 18:34:15","extension":"html","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":58723,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-5350806/v2/72a835dff59b4a3106bf1a2e.html"},{"id":97474436,"identity":"ae328f01-e2f3-4ad9-bc3c-f314a2ed209f","added_by":"auto","created_at":"2025-12-04 18:34:15","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":172003,"visible":true,"origin":"","legend":"\u003cp\u003eSuggested Model\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-5350806/v2/a52be5d0dc42bdd547c83dc7.jpeg"},{"id":97677630,"identity":"ff3a62e6-e17d-4762-839c-d94bd4638b77","added_by":"auto","created_at":"2025-12-08 09:53:47","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":559353,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5350806/v2/63dcda6e-b4d9-44a2-bc3f-0f675a163887.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cem\u003eA Comparative Study: Between Non Autoencoders IDS and Autoencoder Based IDS Approaches in Network Communication\u003c/em\u003e\u003c/p\u003e","fulltext":[{"header":"I. Introduction","content":"\u003cp\u003eThe order of growth in internet users and the use of devices incorporated daily into the network are speedily maturation. With growing users and much devices in the network, the intensity of data assemblage generated and stored increases exponentially. Withal, attacks on the network and data storage person augmented in collateral [1]. This brings in contrastive methodologies for detecting unwelcome events and activities in the network and responding effectively. Cyber security is the technology, techniques, and practices used to defend web systems [2]. The enlarged adoption of wireless networks to transfer large amounts of information has caused a slew of security risks and reclusiveness issues in recent present. Communication Networks succeed a fanlike range of excitable user information assailable to threats from both internecine and outside attackers. These threats may be both manlike and automatic, are spacious, and steadily turn concealment, directing to unmarked seclusion violations [3]. As a conclusion, various prophylactic and defending measures, much as intrusion find systems, Intrusion Detection System (IDS). The Intrusion Spying System (IDS) is a ingredient of cyber surety systems [4]. By analyzing data generated by network devices, intrusion detection systems (IDS) may be used to locate, evaluate, and diagnose intrusions.\u003c/p\u003e\u003cp\u003eOn the other hand, Anomaly detection approaches use normal system behavior to create regular patterns, identifying anomalies as deviations from the norm. These approaches are pretty exciting since they can rectify all known and undiscovered types of threats [7]. The underlying issue with anomaly detection systems is that they require fine-tuning and have high false-positive rates. In recent years, advanced threat assaults have grown tremendously, but the standard network intrusion detection system relied on filtering of feature has significant limitations that make it challenging to identify new attacks promptly [8].\u003c/p\u003e\u003cp\u003eA large variety of strategies based on machine learning methodologies have been developed. However, they are not particularly effective in detecting all forms of intrusions [9]. ML is a part of Artificial Intelligence and is the systematic investigation of processes, techniques, and mathematical analysis used by computer systems to handle complicated problems. Because intrusion detection is a probabilistic classifier, it may be addressed using machine learning approaches [10]. It has been demonstrated that constructing IDS using ML approaches may result in higher precision; nevertheless, investigations have revealed that the most reliable and realistic IDS has yet to be established. Each IDS approach has its benefits and disadvantages under different settings.\u003c/p\u003e\u003cp\u003eThe main contributions of this paper are as follows:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eA Feature Selection module is introduced here, which proposes Recursive Feature Elimination using the Random Forest method to select features participating to the output most.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eWith the help of NSL-KDD dataset, we can analyse the performance of the various state-of-art techniques applied to IDS with and without Feature Engineering.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eFurther, an Autoencoder-based Anomaly Detection model which uses Deep Learning is proposed for effective detection of intrusions in the network. Subsequently, the introduced model is validated using different metrics and the result shows that it perform is a very significant manner compare to previous strategies.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eThe balance content of the paper is structured as follows. Section ii addresses related studies and provides a summary of previous contributions. Section iii illustrates the proposed IDS in depth. Section IV summarises the conclusions of our planned study and suggests potential ideas for expanding on this work.\u003c/p\u003e"},{"header":"II. Related Approaches","content":"\u003cp\u003eThis section describes existing intrusion detection systems (IDS) built for various networks that are accessible in the literature.\u003c/p\u003e\u003cp\u003eWisanwanichthan et al. [11] introduced a Hybrid Approach with double layer to observe low-frequency attacks. Uncovering of low-frequency attacks is someway complex compared to new attacks because they turn a slight from native scenarios and existing datasets make significantly lower sign of records of specified attacks. Initially, they did feature engineering using Principal Portion Analysis for feature selection. Then proposed a 2-layer model for attack detection where Stratum 1 is of Naive Bayes Classifier victimized to find DoS and Probe Attack Class \u0026amp; Layer 2 is of SVM which is used to detect R2L and U2R attack class of NSL-KDD Dataset [6].\u003c/p\u003e\u003cp\u003eLiu et al. [12] suggested a layered detection of intrusion technique relied on random forest and k-means algorithms for the records in binary classification; and then classified abnormal records into attack sub-classes with the help of convolutional neural network, long short-term memory, and various distinct deep learning algorithms. They also worked on the issue of unbalancing in this dataset through adaptive synthetic sampling. Experiments are performed on CIC-IDS and NSL-KDD Datasets.\u003c/p\u003e\u003cp\u003eElhefnawy et al. [13] introduced a Hybrid Nested Genetic-Fuzzy Algorithm (HNGFA) framework to identify attacks in which mainly the sample size is small, like R2L and U2R in the KDD'99 dataset and Backdoors and Worms in UNSW-NB15 dataset. The adaptive model is developed with the help of dual-nested Genetic-Fuzzy Algorithms (GFA). Each GFA consists of two-nested Genetic Algorithms, One is for creating the rules and the other is for tuning the rules. This algorithm works on the concept of Survival of the Fittest.\u003c/p\u003e\u003cp\u003eThe victory of a machine learning model mainly relies on the subjacent parameters on which it is functioning. Choosing the right and optimized feature combination and then tuning the classifier is pivotal, especially in supervised learning. Real-world networks and publicly available datasets may not necessarily have a normal (Gaussian) distribution of data. Instead, changeable distributions are more likely to be garbled. Normalizing the data is a must-do step before feeding it to any classifier to get rid of this skewness so that the detection rate may be improved. For normalization, we do have many alternative approaches to select for. Siddiqi et al. [1] proposed a statistical method for data normalization, which can be used for selecting the efficient normalization method out of the available options for the dataset we have. Five separate datasets were utilized with two different methods of selecting the features to represent the capacity of the suggested methodology.\u003c/p\u003e\u003cp\u003eTang et al. [14] suggested an improved particle swarm optimized online regularized extreme learning machine (IPSO-IRELM) model for detection of intrusion. This model works on the concept of Dynamic adaption of features while training the model and then feeding this learning to a feed forward neural network of single layer with a correction of optimizing the initial weights and deviations.\u003c/p\u003e\u003cp\u003eLiu et al. [6] introduced an approach for intrusion detection in uneven internet traffic. They presented the Difficult Set Sampling Technique (DSSTE). Firstly, the unbalanced training set is divided into clusters using Edited Nearest Neighbour and KMeans methods, then clubbed again with its diminution samples to form a more reasonable training dataset for the classifier, especially for low sample records. In order to learn better all the differentiations in classifiers in the training stage and enhance classification accuracy.\u003c/p\u003e\u003cp\u003eFatani et al. [16] introduced an Intrusion Detection model which works with Meta heuristics algorithms and deep learning methodologies. The method is based and uses Convolutional Neural Networks for extraction of features. Selection of features by the TSODE approach, Differential evolution (DE) operators can be combined to achieve the new structure of Transient Search Optimization (TSO) algorithm.\u003c/p\u003e\u003cp\u003eAs seen in Table-1, several studies have already given multiple intrusion detection strategies which are operating on multiple Learning methodologies. Previous research has committed chiefly developing neural networks and a few machine learning approaches to populate the overall detection impingement. Feature selection and ensemble learning are the two most significant optimising techniques. According to the survey table, only a few systems employ feature selection processes; however, there is still ample opportunity to enhance these proposed research strategies. The following Table\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e represents the various state of the art approaches/ proposed methods for intrusion detection using various datasets without using autoencoder method. In the table content, we can easily check the features and evaluation criteria for the respective approach. Though these methods are efficient in some manner but for the growing intrusion attacks in networks it is very important to deploy an algorithm which can be self-decision maker, self-learner and self-adder of new kind of attacks in dataset.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eAbstract view of the related state-of-art Techniques\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAuthors\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eProposed Methodology\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eDataset Used\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eFeature Selection Approach\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eEvaluation Criteria\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eWisanwanichthan et al.[11]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eDouble-Layered Hybrid Approach (DLHA), Naive Bayes classifier, Support Vector Machine\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eNSL-KDD\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003ePearson Correlation Coefficient (PCC)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eAccuracy, F Score, Precision, Detection Rate, False Alarm Rate\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLiu et al.[12]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eK-Means, Random Forest (RF), Convolutional Neural Network (CNN), Long Short-term Memory (LSTM), Deep learning\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eNSL-KDD, CIS-IDS2017\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eNA\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eAccuracy, True Positive Rate,, Training Time, Prediction Time, False Positive Rate\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eElhefnawy et al. [13]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHybrid Nested Genetic-Fuzzy Algorithm (HNGFA), Genetic Algorithms, Fuzzy Logic Systems\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eKDDCUP99, UNSW-NB15\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eRanking per Minor/Major using Cross-Validation\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eAccuracy, Precision, F Score, Recall, Rate of false alarm\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTang et al. [14]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eImproved Particle Swarm Optimized Online Regularized Extreme Learning Machine (IPSO-IRELM),\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eUCI Dataset, NSL-KDD\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eNA\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003ePrecision, True Positive Rate, False Positive Rate, F-Score, AUC\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eWang et al. [15]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eDeep Residual Convolution Neural Network\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eKDDCUP99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eNA\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003ePrecision, Recall, False Positive Rate, F Score, Accuracy\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLiu et al. [6]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eDifficult Set Sampling Technique (DSSTE) Algorithm, Edited Nearest Neighbor (ENN), KMeans Algorithm\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eNSL-KDD, CSE-CIC-IDS2018\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eNA\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eAccuracy, Precision, Recall, F-Score\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFatani et al. [16]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eDeep Learning, Metaheuristics (MH) Algorithm, Transient Search Optimization Algorithm using Differential Evolution (TSODE)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eKDDCup-99, NSL-KDD, BoT-IoT, CICIDS-2017\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eConvolutional Neural Network\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eAverage Accuracy, Average Recall, Average Precision, Performance Improvement Rate\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eJiang et al. [17]\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHybrid Sampling with Deep Hierarchical Network, One-side Selection, Synthetic Minority Over-sampling Technique (SMOTE)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eNSL-KDD, UNSW-NB15\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eNeural Network with convolution, Bi-directional memory of Long and Short-Term\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eAccuracy, Precision, Recall, F-Score\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e"},{"header":"III. Autoencoder Based Proposed Approach","content":"\u003cp\u003eThe proposed model's critical steps include dataset pre-processing, classification, and outcome assessment. Each phase of the proposed system is vital and significantly impacts the model's performance. We used the NSL-KDD Dataset (An improvement of the KDD'99 data set), where redundant records are removed to produce a more reasonable dataset.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003ePre-Processing\u003c/strong\u003e\u003cp\u003eTotal of 41 Features are there with NSL-KDD Dataset on which each set (Normal/Attack) is justified. Out of these 41 features, 3 features are based on category ('protocol_type', 'service' and 'flag'). The features which are in majority of Machine Learning require output or the input variables to be arithmetical in value. This shows that all the date in different categories must be changed to integers. We employed the One Strong and Efficient Encoding approach to translating these categorical features into numerical values. One hot encoding is an essential component of feature engineering for Machine learning. Then features are scaled in a range to avoid features with high values from weighing too heavily in the findings.\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eFeature Selection Module\u003c/strong\u003e\u003cp\u003eThe performance of any machine learning model depends on the underlying features. Feature selection is crucial in finding out specific features that most contribute to each attack type. Non-essential features may cause incorrect conclusions while increasing the computing cost of the classifier.\u003c/p\u003e\u003c/p\u003e\u003cp\u003eThe Random Forest (RF) method is a approach of machine learning that generates decision trees by arbitrarily splitting and combining nodes. The numerous trees vote on the ultimate categorization result. RF will classify normal and exceptional events in the dataset-KDD for the test dataset, and these exceptional events will be further partitioned. Random forest trains a collection of decision trees individually, allowing for parallel training on Spark. It is required to establish the size and extremity of trees in a good manner.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eAttack Detection Module\u003c/strong\u003e\u003cp\u003eThis study proposes an autoencoder-based anomaly detection methodology. We utilized the NSL-KDD Dataset for this experiment. However, several flaws are associated with this dataset, such as class imbalance. Only 0.04 fraction of the samples in the NSL-KDD training dataset belong to the U2R attack type, making it severely underrepresented. The case is similar for the R2L and probe attack types, making it difficult for classifiers to detect these underrepresented types, resulting in poor accuracy. Another difficulty is that this dataset is unreasonable given the present networking context. Most network traffic is benign, with just a small amount being malicious. At the same time, in the NSL-KDD training set, attack samples account for 80% of the total dataset, rendering models trained with this dataset useless in real-world circumstances. Our autoencoder-based technique aims to address these issues.\u003c/p\u003e\u003c/p\u003e\u003cp\u003eIn this technique, we used a sparse autoencoder with a loss on the inputs. It has an input layer equivalent to the total features, a dropout layer, and a hidden layer of 8 neuron cells. Thus, the autoencoder's hidden representation has a compression of 122/8, requiring it to discover meaningful patterns and relationships between the features. Finally, there is a 122-unit output layer. The ReLU function is the activation of both the concealed and visible layers.\u003c/p\u003e\u003cp\u003eTo recreate its input, the autoencoder was activated. In other terms, it studies the optimal hyperplane. The system was built using just the \"Normal\" samples in the training dataset, enabling it to ponder the characteristics of normal conduct. It was achieved by introducing the model that would decrease the error of mean squared between its output and input. The regularization constraints imposed on the autoencoder prevent it from simply copying the input to the output and regularization the data; additionally, the dropout presented on the inputs renders the autoencoder a particular case of a denoising autoencoder, which is trained to restore the input from a disrupted, tainted variant of its own, forcing the autoencoder to learn even more characteristics of the data.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eBased on Deep Learning, this research developed an autoencoder-based anomaly detection technique. The features are chosen using a Recursive Feature Elimination method and the Random Forest Algorithm. We also addressed the problem of Class Variation in the current dataset. The suggested classification model in this research is tested using the Dataset-NSL-KDD. This technique is faster at preparing data and may need less training time, also, it has been likened to a number of cutting-edge procedures. The experimental findings indicate that this approach can extract the correct information from network data, increasing the machine's effectiveness by training it on more critical features. Furthermore, the findings of the performance metrics assessed reveal that this strategy surpasses previous strategies in attack detection with high accuracy and minimal false positives.\u003c/p\u003e\u003cp\u003eFurther in this existing methodology, more features can be added and how this approach the model can be more efficient in the real world applications. Future researchers can adopt more normalization approaches to enhance the outcomes of an ML-based IDS.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eShi Z, He S, Sun J, Chen T, Chen J, Dong H (2023) An Efficient Multi-Task Network for Pedestrian Intrusion Detection, in IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 649\u0026ndash;660, Jan. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/TIV.2022.3166911\u003c/span\u003e\u003cspan address=\"10.1109/TIV.2022.3166911\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang L, Yang J, Workman M, Wan P (April 2022) Effective algorithms to detect stepping-stone intrusion by removing outliers of packet RTTs. Tsinghua Sci Technol 27(2):432\u0026ndash;442. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.26599/TST.2021.9010041\u003c/span\u003e\u003cspan address=\"10.26599/TST.2021.9010041\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSiddiqi MA, Pak W (2021) An Agile Approach to Identify Single and Hybrid Normalization for Enhancing Machine Learning-Based Network Intrusion Detection. IEEE Access 9(137513):137494. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2021.3118361\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2021.3118361\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eElsayed MA, Wrana M, Mansour Z, Lounis K, Ding SHH, Zulkernine M (2022) AdaptIDS: Adaptive Intrusion Detection for Mission-Critical Aerospace Vehicles, in IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 12, pp. 23459\u0026ndash;23473, Dec. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/TITS.2022.3214095\u003c/span\u003e\u003cspan address=\"10.1109/TITS.2022.3214095\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDiro A, Chilamkurti N (May 2018) Distributed attack detection scheme using deep learning approach for Internet of Things. Futur Gener Comput Syst 82:761\u0026ndash;768. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.future.2017.08.043\u003c/span\u003e\u003cspan address=\"10.1016/j.future.2017.08.043\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGhasemi J, Esmaily J, Moradinezhad R (2020) Intrusion detection system using an optimized kernel extreme learning machine and efficient features, Sadhana - Acad. Proc. Eng. Sci., vol. 45, no. 1, Dec. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s12046-019-1230-x\u003c/span\u003e\u003cspan address=\"10.1007/s12046-019-1230-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHalbouni TS, Gunawan MH, Habaebi M, Halbouni M, Kartiwi, Ahmad R (2022) Machine Learning and Deep Learning Approaches for CyberSecurity: A Review. IEEE Access 10:19572\u0026ndash;19585. Institute of Electrical and Electronics Engineers Inc.\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2022.3151248\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2022.3151248\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAl-Daweri MS, Ariffin KAZ, Abdullah S, Senan MFEM (2020) An analysis of the KDD99 and UNSW-NB15 datasets for the intrusion detection system, Symmetry (Basel)., vol. 12, no. 10, pp. 1\u0026ndash;32, Oct. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/sym12101666\u003c/span\u003e\u003cspan address=\"10.3390/sym12101666\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu L, Wang P, Lin J, Liu L (2021) Intrusion Detection of Imbalanced Network Traffic Based on Machine Learning and Deep Learning. IEEE Access 9:7550\u0026ndash;7563. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2020.3048198\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2020.3048198\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChkirbene Z, Erbad A, Hamila R, Mohamed A, Guizani M, Hamdi M (2020) TIDCS: A Dynamic Intrusion Detection and Classification System Based Feature Selection. IEEE Access 8:95864\u0026ndash;95877. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2020.2994931\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2020.2994931\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLe Jeune L, Goedeme T, Mentens N (2021) Machine Learning for Misuse-Based Network Intrusion Detection: Overview, Unified Evaluation and Feature Choice Comparison Framework. IEEE Access 9:63995\u0026ndash;64015. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2021.3075066\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2021.3075066\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAkashdeep I, Manzoor, Kumar N (Dec. 2017) A feature reduced intrusion detection system using ANN classifier. Expert Syst Appl 88:249\u0026ndash;257. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.eswa.2017.07.005\u003c/span\u003e\u003cspan address=\"10.1016/j.eswa.2017.07.005\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJiang K, Wang W, Wang A, Wu H (2020) Network Intrusion Detection Combined Hybrid Sampling with Deep Hierarchical Network. IEEE Access 8:32464\u0026ndash;32476. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2020.2973730\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2020.2973730\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWisanwanichthan T, Thammawichai M (2021) A Double-Layered Hybrid Approach for Network Intrusion Detection System Using Combined Naive Bayes and SVM. IEEE Access 9:138432\u0026ndash;138450. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2021.3118573\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2021.3118573\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu C, Gu Z, Wang J (2021) A Hybrid Intrusion Detection System Based on Scalable K-Means\u0026thinsp;+\u0026thinsp;Random Forest and Deep Learning. IEEE Access 9:75729\u0026ndash;75740. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2021.3082147\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2021.3082147\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eElhefnawy R, Abounaser H, Badr A (2020) A hybrid nested genetic-fuzzy algorithm framework for intrusion detection and attacks. IEEE Access 8:98218\u0026ndash;98233. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2020.2996226\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2020.2996226\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTang Y, Li C (2021) An Online Network Intrusion Detection Model Based on Improved Regularized Extreme Learning Machine. IEEE Access 9:94826\u0026ndash;94844. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2021.3093313\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2021.3093313\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFatani A, Elaziz MA, Dahou A, Al-Qaness MAA, Lu S (2021) IoT Intrusion Detection System Using Deep Learning and Enhanced Transient Search Optimization. IEEE Access 9:123448. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2021.3109081\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2021.3109081\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJiang H, He Z, Ye G, Zhang H (2020) Network Intrusion Detection Based on PSO-Xgboost Model. IEEE Access 8:58392\u0026ndash;58401. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1109/ACCESS.2020.2982418\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2020.2982418\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":true,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"IDS (Intrusion Detection System), Autoencoders, Cybersecurity, Feature Selection, Random Forest, Deep Learning, NSL-KDD Dataset","lastPublishedDoi":"10.21203/rs.3.rs-5350806/v2","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5350806/v2","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eIntrusion detection is an integrated security issue in the present digital environment. Malicious cyber-attackers can frequently linger in tremendous volumes of regular data in demented network traffic. It has a superior destruct of concealing and opacity in cyberspace, making it challenging for Network Intrusion Detection Systems (NIDS) to aver catching accuracy and timing. The false-positive issue is one of the underlying drawbacks of network intrusion detection systems (NIDS), which are widely engaged to discover threats and safeguard the networks. Imbalance classes and unwarranted material data a reminiscence failure skyway to make false positive, which are inferior in company in the preparation dataset. Feature engineering is also performed in this approach using the Recursive Property Excreting method. In this experiment NSL-KDD Dataset is used. The outcome shows that our approach is finer than different state-of-art approaches in terms of different metrics like Accuracy, Precision, Recall, and F1 Score.\u003c/p\u003e","manuscriptTitle":"A Comparative Study: Between Non Autoencoders IDS and Autoencoder Based IDS Approaches in Network Communication","msid":"","msnumber":"","nonDraftVersions":[{"code":2,"date":"2025-12-04 18:34:10","doi":"10.21203/rs.3.rs-5350806/v2","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}},{"code":1,"date":"2024-11-15 15:59:31","doi":"10.21203/rs.3.rs-5350806/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"929d9943-7210-44ca-9f23-b85feddb7fcd","owner":[],"postedDate":"December 4th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-11-25T20:53:39+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-04 18:34:10","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v2","identity":"rs-5350806","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5350806","identity":"rs-5350806","version":["v2"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00