Discipline: Computer Sciences and Information Management
Subcategory: Computer Science & Information Systems
Ochaun Marshall - University of North Carolina at Greensboro
Co-Author(s): Mutahir Nadeem, Roanoke College, Washington DC Sarbjit Singh and Xiaohong Yuan, North Carolina Agricultural and Technical State University, Greensboro, NC Xing Fang, Illinois State University, Normal, IL
Network security is of vital importance for corporations and institutions. In order to protect valuable computer systems, network data needs to be analyzed so that possible network intrusions can be detected. Supervised machine learning methods achieve high accuracy at classifying network data as normal or malicious, but they require the availability of fully labeled datasets. Fully labeled datasets are costly, because they require a domain expert, and time consuming to make. The recently developed Ladder network, which combines neural networks with unsupervised learning, shows promise in achieving a high accuracy while only requiring a small number of labeled examples. We propose that Ladder networks are able to provide a high accuracy in network traffic classification with fewer labeled observations. We applied the Ladder network to classifying network data using the Third International Knowledge Discovery and Data Mining Tools Competition dataset (KDD1999) in two experiments. The results are then compared to three popular supervised classifiers: Deep Belief Network, Support Vector Machine and Random Forest. In the first experiment, four tests varying the number of examples required for each class, i.e. 10, 1000, 2000, and 5000 were conducted. The training data set includes both labeled and unlabeled examples. The ratio of labeled data for the Ladder network was kept at 50%. The Ladder network was able to achieve similar results compared to supervised classifiers while using a limited number of labeled samples. In the second experiment, the number of labeled examples was changed, while the number of examples per class were kept constant. In that experiment, the Ladder network was able to maintain a classification accuracy above 90% despite having less than 50 percent of the data labeled in the training set. We conclude that Ladder networks are not only a valuable tool in image classification, but also in intrusion detection. In the near future, we would like to use the ladder network on the NSL-KDD dataset and other, newer datasets. Other future work includes exploring what other feed-forward neural network architectures could be integrated with the ladder network structure to improve classification performance.
Funder Acknowledgement(s): National Science Foundation
Faculty Advisor: Xiaohung Yuan, xhyuan@ncat.edu
Role: I conducted both of the experiments, built the development environment, generated records for all of the results, created the repository for all of the classifiers, managed version control, and I led the literature review.