Enhanced cyber attack detection using optimized random forest with SMOTE-based class balancing and feature selection

Authors

  • Jonson Manurung Universtias Pertahanan Republik Indonesia, Bogor, Indonesia Author
  • Adam Mardamsyah Universitas Pertahanan Republik Indonesia, Bogor, Indonesia Author
  • Baringin Sianipar Universitas HKBP Nomensen, Medan, Indonesia Author

Keywords:

CIC-IDS2017, Cyber Attack Detection, Intrusion Detection System, Machine Learning, Random Forest

Abstract

The rapid expansion of interconnected enterprise networks has intensified cybersecurity threats, while traditional signature-based intrusion detection systems remain ineffective against evolving and imbalanced attack patterns, particularly zero-day and low-frequency attacks. This study aims to develop an optimized and practically deployable intrusion detection framework by leveraging a Random Forest classifier on the CIC-IDS2017 benchmark dataset, with emphasis on robust minority attack detection, computational efficiency, and interpretability for real-world security operations. The proposed method integrates comprehensive data preprocessing, Synthetic Minority Over-sampling Technique (SMOTE) for class imbalance mitigation, feature importance–driven dimensionality reduction, and exhaustive grid search–based hyperparameter optimization within a unified machine learning pipeline. Experiments conducted on 2.52 million network flow records demonstrate that the optimized model achieves 98.14% accuracy, 96.25% weighted F1-score, and 0.993 ROC-AUC, while maintaining stable performance across all attack categories, including minority classes such as Infiltration and Botnet with F1-scores exceeding 93%. Feature selection reduced dimensionality by 58.3% and training time by 63.2% without degrading performance, enhancing deployment feasibility in enterprise intrusion detection environments. Comparative analysis confirms that the proposed approach outperforms baseline Random Forest models, traditional machine learning methods, and recent deep learning approaches while requiring significantly lower computational resources. These findings indicate that a holistically optimized Random Forest framework offers a reliable, interpretable, and operationally efficient solution for real-world network security monitoring and cyber defense systems.

References

Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., & Ahmad, F. (2021). Network intrusion detection system: A systematic study of machine learning and deep learning approaches. Transactions on Emerging Telecommunications Technologies, 32(1), e4150. https://doi.org/10.1002/ett.4150

Ahsan, M., Nygard, K. E., Gomes, R., Chowdhury, M. M., Rifat, N., & Connolly, J. F. (2022). Cybersecurity threats and their mitigation approaches using Machine Learning—A Review. Journal of Cybersecurity and Privacy, 2(3), 527–555. https://doi.org/10.3390/jcp2030027

Ashiku, L., & Dagli, C. (2021). Network intrusion detection system using deep learning. Procedia Computer Science, 185, 239–247. https://doi.org/10.1016/j.procs.2021.05.025

Bamakan, S. M. H., Wang, H., Tian, Y., & Shi, Y. (2016). An effective intrusion detection framework based on MCLP/SVM optimized by time-varying chaos particle swarm optimization. Neurocomputing, 199, 90–102. https://doi.org/10.1016/j.neucom.2016.03.031

Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(1), 281–305.

Booij, T. M., Chiscop, I., Meeuwissen, E., Moustafa, N., & den Hartog, F. T. (2021). ToN_IoT: The role of heterogeneity and the need for standardization of features and attack types in IoT network intrusion data sets. IEEE Internet of Things Journal, 9(1), 485–496. https://doi.org/10.1109/JIOT.2021.3085194

Chawla, N. V, Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Faker, O., & Dogdu, E. (2019). Intrusion detection using big data and deep learning techniques. Proceedings of the 2019 ACM Southeast Conference, 86–93. https://doi.org/10.1145/3299815.3314439

Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets (Vol. 10). Springer. https://doi.org/10.1007/978-3-319-98074-4

Ferrag, M. A., Maglaras, L., Moschoyiannis, S., & Janicke, H. (2020). Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. Journal of Information Security and Applications, 50, 102419. https://doi.org/10.1016/j.jisa.2019.102419

Kasongo, S. M., & Sun, Y. (2020). Performance analysis of intrusion detection systems using a feature selection method on the UNSW-NB15 dataset. Journal of Big Data, 7(1), 105. https://doi.org/10.1186/s40537-020-00379-6

Khraisat, A., Gondal, I., Vamplew, P., & Kamruzzaman, J. (2019). Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity, 2(1), 20. https://doi.org/10.1186/s42400-019-0038-7

Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30(1), 25–36.

Liu, H., & Lang, B. (2019). Machine learning and deep learning methods for intrusion detection systems: A survey. Applied Sciences, 9(20), 4396. https://doi.org/10.3390/app9204396

Moustafa, N., & Slay, J. (2015). UNSW-NB15: a comprehensive data set for network intrusion detection systems. 2015 Military Communications and Information Systems Conference, 1–6. https://doi.org/10.1109/MilCIS.2015.7348942

Probst, P., Wright, M. N., & Boulesteix, A.-L. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(3), e1301. https://doi.org/10.1002/widm.1301

Ring, M., Wunderlich, S., Scheuer, D., Landes, D., & Hotho, A. (2019). A survey of network-based intrusion detection data sets. Computers & Security, 86, 147–167. https://doi.org/10.1016/j.cose.2019.06.005

Saputra, C. H. (2024). Integrasi Audit dan Teknik Clustering untuk Segmentasi dan Kategorisasi Aktivitas Log. Jurnal Teknologi Informasi Dan Ilmu Komputer, 11(1), 209–214. https://doi.org/10.25126/JTIIK.20241118071

Sarhan, M., Layeghy, S., Moustafa, N., & Portmann, M. (2021). NetFlow datasets for machine learning-based network intrusion detection systems. In Big Data Technologies and Applications (pp. 117–135). Springer. https://doi.org/10.1007/978-3-030-67044-3_5

Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), 108–116. https://doi.org/10.5220/0006639801080116

Shone, N., Ngoc, T. N., Phai, V. D., & Shi, Q. (2018). A deep learning approach to network intrusion detection. IEEE Transactions on Emerging Topics in Computational Intelligence, 2(1), 41–50. https://doi.org/10.1109/TETCI.2017.2772792

Tama, B. A., & Rhee, K.-H. (2019). A combination of PSO-based feature selection and tree-based classifiers ensemble for intrusion detection systems. In Advances in Computer and Electrical Engineering (pp. 1–26). https://doi.org/10.4018/978-1-5225-8176-5.ch001

Thabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429–441. https://doi.org/10.1016/j.ins.2019.11.004

Thakkar, A., & Lohiya, R. (2021). A review on machine learning and deep learning perspectives of IDS for IoT: Recent updates, security issues, and challenges. Archives of Computational Methods in Engineering, 28(4), 3211–3243. https://doi.org/10.1007/s11831-020-09496-0

Vinayakumar, R., Alazab, M., Soman, K. P., Poornachandran, P., Al-Nemrat, A., & Venkatraman, S. (2019). Deep learning approach for intelligent intrusion detection system. IEEE Access, 7, 41525–41550. https://doi.org/10.1109/ACCESS.2019.2895334

Warzyński, A., & Kołaczek, G. (2020). Intrusion detection systems vulnerability: A comparative study. Computers & Security, 94, 101846. https://doi.org/10.1016/j.cose.2020.101846

Yin, C., Zhu, Y., Fei, J., & He, X. (2017). A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access, 5, 21954–21961. https://doi.org/10.1109/ACCESS.2017.2762418

j15

Downloads

Published

2026-01-25

Similar Articles

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)