Discipline: Technology and Engineering
Subcategory: Electrical Engineering
Tam Le - University of the District of Columbia
Co-Author(s): Nian Zhang and Sasan Haghani, University of the District of Columbia, Washington, DC
Analyzing classification of big data in machine learning requires classification method to calculate the accuracy. While the k-nearest neighbor (KNN) classification is known among the most popular and successful pattern classification techniques, it usually suffers from the existing outliers, and in the small training samples situation, it performs poor. Unlike the classic KNN method, in which only the nearest neighbors of a test sample are used to estimate a group membership, Extended Nearest Neighbor (ENN) classifiers make a prediction by not only considering who are the nearest neighbors of the test sample, but also who consider the test sample as their nearest neighbors. Three variations of ENN classifiers exist. The first ENN (ENN) method classifies a test sample based on which decision results in the largest intra-class coherence among all possible classes. The second ENN (ENN.V1) classifier, reveals that the classification of a test sample depends not only on who are the nearest neighbors of the test sample, but also on who consider the test sample as one of their nearest neighbors. The third ENN (ENN.V2) classifier is an approximation of the first ENN classifier under two constraints. In this project, these three ENN methods are applied to several data sets from the UCI Machine Learning Depository, including the indoor user prediction from RSS data set, and the ‘pen digit’ data sets. The performance of various ENN classifiers are compared. In computational experiment with pen-based recognition, the approximately average accuracy of ENN classifier, ENN.V1 classifier, and ENN.V2 classifier approximately are around 0.9777, 0.99827 and 0.99864, respectively.
Funder Acknowledgement(s): This work was supported by the National Science Foundation Grants HRD #1505509 and HRD #1435947.
Faculty Advisor: Nian Zhang, nzhang@udc.edu
Role: Analyze method, and run analysis calculation based on machine learning data set repositoty.