Discipline: Computer Sciences and Information Management
Subcategory: Computer Science & Information Systems
Alice Ngoc Lam - Kansas State University
Co-Author(s): William Hsu and Paula Mendez, Kansas State University, Manhattan, KS
Machine learning is used in a variety of industries as a tool to cluster data to see similar trends. Simply, machine learning produces models based past experience that enables improved performance on future instances of a task, such as classification, prediction, or pattern recognition. Here, the experience consists of historical sensor data and the task is prediction of animal disease transmission. In this project, the input consists of second-by-second RFID proximity data among susceptible bovine specimens from an experimental herd of 70 cattle, along with their daily temperature and isolation history. The goal of this work was to analyze proximity data to prepare it as training data for supervised classification learning, enabling disease transmission and propagation models to be built. Python, a programming language, was used to build the algorithm. This language was chosen due to its mix of functionalities and tool packages for development. The majority of this project was spent on data preparation as it is a crucial step that had to be completed before applying the machine learning algorithm. The data was collected from the Beef Cattle Institute (BCI) and hosted on a relational database server (SQL) and accessible via both database clients such as MySQL and remote file access (using secure shell, or SSH). Two database queries were implemented to produce alternative training data: one consisting of a ‘Group-By’ count of tagged cattle who came within a specified radius threshold of the specimen (candidates for exposure) on an eighty-six thousand four hundred second interval, and another consisting of the actual list (bit vector) of cattle. If time allows, an additional classifier, logistic regression, a machine learning model that analyzes data to explain the relationship between one variable to another, will be applied to the training data. The same algorithm derived from the BCI data will be applied to a demo on the Edison, a tiny wearable computer. An accurate algorithm applied to the data collected from a smart sensor on the Edison should output predicted results that is anticipated for. Further research will consist of applying reinforcement learning, a different machine learning method, to the algorithm and will be used to see if its accuracy and learning rate efficiency increases.
Funder Acknowledgement(s): This work was supported by the National Science Foundation grant No. 1305059 (KS-LSAMP). Additional research made possible by the Knowledge Discoveries of Databases Lab.
Faculty Advisor: William H. Hsu, bhsu@ksu.edu
Role: I handled data collection hosted on a MySQL database server. Most of my time was spent on generating queries consisting of a ‘Group-By’ count of tagged cattle in pairs who came within a specified radius threshold which the Beef Cattle Institute provided as 0.09 meters. To narrow down on more accurate candidates, a thirty second window was applied to aggregate the number of pairs in contact for at least a second. The results of the query displayed all the cows by selected day and count of cows that came within the threshold. Later, machine learning applications were introduced.