Discipline: Mathematics and Statistics
Subcategory: Mathematics and Statistics
Santiara Churchwell - North Carolina A&T State University
Co-Author(s): Ian Livengood, Zhilin 'Lin' Qian, and Xirui 'Siri' Zhou
In this paper, we will recognize human faces from the ORL Database of Faces by utilizing Principal Component Analysis (PCA), Support Vector Analysis (SVM), Naïve Bayes Classifier, and Random Forest. We will introduce the prominent and relatively new concept of facial recognition, comparing it to traditional methods and presenting its significance in the field of biometrics. We will establish the importance of determining the most accurate classifiers for the ORL dataset. We will also define the data source while providing a description of the database we utilized. Next, we will explain our methodology concerning processing the data, including steps such as loading the image data, adding labels, extracting features with PCA, generating eigenfaces, reconstructing images, classifying the data with various methods, and performing cross-validation. We successfully accounted for 98.58% of the variance through utilizing 250 principal components in PCA. We accomplished a mean accuracy of 96.75 with SVM and 96.25% model accuracy with Random Forest. We interpreted and compared these results in terms of accuracy and time efficiency. This paper also provides considerations of future work. If granted more time, we would perform additional classification techniques, such as K-Nearest Neighbors and Neural Networks, and we would implement cross-validation methods with Naive Bayes Classifier to ensure that our program is efficient for varying training sets defined within our dataset.
Funder Acknowledgement(s): We are appreciative of the research experience we have been granted with the support of several generous sources. As researchers working under the grant NSF HRD-1719498, we have been given the funds necessary to explore the realm of facial recognition. We would like to extend a special thanks to the following sponsors: the NSF ACE DSA Project, Mathematics Department at North Carolina A&T State University, Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information (KLIPSHDI) of Ministry of Education, School of Computer Science and Engineering at Nanjing University of Science and Technology, Dr. Guoqing Tang of NCAT, Dr. Zhong Jin of NJUST, and Mr. Jonathan Fabish of NCAT and NCSU.
Faculty Advisor: Dr. Guoqing Tang, firstname.lastname@example.org
Role: I performed the Random Forest Classification test on the ORL Database. I incorporated the concept of Random Forest into python code through opencv. I imported the RandomForestClassifier and make_classification packages. I also defined the training data and class labels for the training data as X and Y respectively. Then, I tuned the parameters, n_estimators, max_depth, oob_score, n_jobs, random_stae, max_features and min_samples_leaf? to acquire our desired accuracy. I then created the variable test_labels, an array of the class labels of the test set, set y_true equal to test_labels, and defined y_pred_rf to be an array of the Random Forest model?s prediction of the class labels of the test set. I calculated the model accuracy after tuning the parameters properly. I accomplished a model accuracy of 96.25% with a standard deviation of 1.275%.