Discipline: Computer Sciences and Information Management
Subcategory: Computer Science & Information Systems
Session: 1
Room: Exhibit Hall A
Charles L Green - The Pennsylvania State University, PA
Co-Author(s): Yao Ma, The Michigan State University, MI; Dr Jiangtao Huang, The Michigan State University, MI; Dr Jiliang Tang, The Michigan State University, MI
Massive Open Online Courses (MOOCs) have grown rapidly as of recently. As of 2018, there have been 101 million people who have signed up for a MOOC and in 2018 alone, there were 20 million new learners who have signed up for one or more MOOC. Despite these high rates of signing up for a MOOC, the completion rates are still significantly lower compared to their in-class counterparts to these classes. Based on the Open University Learning Analytics dataset by Jakub Kuzilek, in 2013 and 2014, 47.25% of the people in the selected modules passed while 21.62% failed and 31.13% drop the class. Finding which factors on their grades is the most impactful on the final outcome of the class can help raise the number of people passing, while also lowering the drop and fail rate of these courses. This can be done with Learning Analytics on these courses. The goal was to test different machine learning algorithms and to compare their performances for this dataset. We also used the data obtained from machine learning algorithms to help find which factors are most important in predicting the final outcome of a class. Those factors included demographics and engagement. These factors were used to test different machine learning algorithms. The Machine Learning algorithms predicted the probability for the outcomes of pass, pass with distinction, fail, and withdraw, and this was compared to the binary results of pass or not pass. With the analysis of the test, a strong positive correlation between online user engagement and the final outcome was found. Depending on the engagement rate of each user and their interactions, it would affect the likelihood of the final outcome of these courses. In the performance of the tested algorithms, Gradient Boosted Decision Trees (GBDT) had the highest accuracy in all tests with the Random Forest (RF) and Decision Tree (DT) algorithms being close behind in terms of accuracy. Despite the higher accuracy of GBDT, it was less efficient than RF and DT. Comparing the binary results to the non-binary results, the binary results expectedly performed better than the non-binary results. Even with the lower accuracy of the non-binary results, these results are important because knowing the specific final outcome of the course may change in how the course tries to help the user improve their chance to pass. In future tests, these studies will help in adding new and changing some of the variables that were used to teach these learning algorithms. This will help make the algorithms more accurate in its predictions. Also, make the program able to predict certain trends throughout the course, for example when they may withdraw. Knowing these specifics about the user can help them have a better chance of passing the course and not withdrawing. With this test and future test, MOOCs can be improved so more users can use these courses to better their education and use their newly found education to improve their careers.
Funder Acknowledgement(s): This study was supported, in part, by a grant from The Michigan State University's Summer Research Opportunities Program (SROP) and by The Pennsylvania State University?s Millenium Scholar Program (MSP).
Faculty Advisor: Jiliang Tang, tangjili@msu.edu
Role: I worked on this project from the start. After I was given the data to use from my advisor, I was advised on the way from the people I worked with to keep me going on the right track. With their advice and guidance, I was able to optimize the Algorithms, test the data and get the results which I interpreted.