Discipline: Computer Sciences and Information Management
Subcategory: Computer Engineering
Session: 3
Room: Hoover
Zahra Rasuli - Fisk University
Diabetes is a disease in which the body’s ability to produce or respond to the hormone insulin is impaired, resulting in abnormal metabolism of carbohydrates and elevated levels of glucose in the blood and urine. In this research, we use Machine Learning methods to diagnose diabetes through Glucose, pregnancy, and BMI and other features. The research data is from Pima Indians. The Pima Indians of Arizona have the highest reported prevalence of diabetes of any population in the world. During the 1853 Gadsden Purchase, the Pima Bajo who were residing in Gila Valley were forced to colonize and in 1959 a Pima reservation in Arizona was created and the number of people with diabetes among the Pima Indians increased 10 fold. The data set has 768 samples with 8 independent variables and a dependent variable. The 8 independent variables are pregnant, number of times pregnant, glucose, plasma glucose concentration, pressure, diastolic blood pressure, triceps, triceps skin fold thickness, insulin, BMI, diabetes pedigree function and age. The dependent variable is diabetes. Our goals for this research project were to determine the best model to diagnose diabetes through those features, and to determine which factors played the most significant role in diagnosing diabetes. The machine learning models used in this research were Support Vector Machine, Logistic Regression, MLP, Decision tree, Random forest and Ensemble Learning methods. The best accuracy was obtained when using Stacking Ensemble method which achieved a score of 78.1. The PCA analysis confirms that a higher testing score would be hard to attain as the data cannot be split into distinct sections. Glucose, pregnancy, and BMI played the most significant role in diagnosing Diabetes.
Funder Acknowledgement(s): NSF-TIP
Faculty Advisor: Dr. Qian and Dr. Hota, shota@fisk.edu
Role: I worked on developing the models that were used to train and test the dataset. In each model, the parameters were tweaked yielding a more accurate testing score. Additionally, I created the powerpoint presentation along with my partner, Valencia.