Discipline: Computer Sciences and Information Management
Subcategory: Computer Science & Information Systems
Georgiana Wright - Alabama Agricultural and Mechanical University
Co-Author(s): Angelina Uno-Antonison, Worthy Lab HudsonAlpha Institute for Biotechnology, Huntsville, AL; Marius Schamschula, Alabama Agricultural and Mechanical University, Huntsville, AL
Approximately 41,000 women in the US are expected to die from breast cancer in 2018. Mammograms are needed to find breast cancer yet the cost for a mammogram from a radiologist is $730-$1,000. This cost exceeds the $50 Medicare reimbursement, which can cause financial burden onto patients. This can cause them turn away or delay proper treatment to protect their finances. To decrease this problem, we created a machine learning tool with image recognition features that reads mammograms and examines it into normal shapes or irregular abnormalities as well as a trained radiologists. The prototype was established to be created with Python, a high-level programming language for all-purpose programming created by Guido van Rossum. PyCharm was used as the developing environment to use during this experiment. The data was collected from the MIAS Mini-Mammographic Database, which was created by organization of UK research groups. The mammograms were withdrawn from the UK National Breast Screening Programme. The database contains 322 images of mammograms. It also includes radiologists’ markings on the locations of abnormalities within the images. The images were downloaded and stored in PyCharm. Then the program associates the classifications to mammograms and inputs them into the ML algorithm for each of the mammograms. This establishes a model that can predict abnormalities in a mammogram. The images in the database are used to train the model to learn abnormalities. The first column of the file is the MIAS database reference number. The 2nd column is the type of tissue within the mammogram: F is for Fatty, G is for Fatty-glandular, and D is for Dense-glandular. The 3rd column is for the class of abnormality suggested by the radiologist: CALC is for Calcification, CIRC is for Well-defined masses, SPIC is for Spiculated masses, MISC is for Other, ill-defined masses, ARCH is for Architectural distortion, ASYM is for Asymmetry, NORM is for Normal. The 4th column is for the severity of abnormality: B is for Benign, M is for Malignant. The 5th and 6th columns is for x,y image-coordinates of centre of abnormality. The 7th column is the radius of a circle enclosing the abnormality. The model of the tool was trained with every image in the dataset. Developing the images to fit within these categories was done during the preprocessing stage. The computer ran for 7 hours for 345 iterations out of 30,000 iterations which was outputted in the Google Kubernetes system. The results were limited by time and Kubernetes capabilities. I will be using results for future research by limiting iteration time and finding mammogram images from a different database to further train the model. References: Grgic, Mislav, et al. ‘Databases.’ Mammographic Image Analysis Homepage , University of Zagreb. J., Suckling et al.’The Mammographic Image Analysis Society Digital Mammogram Database’ Exerpta Medica.
Funder Acknowledgement(s): I must thank Angelina Uno-Antonison from the HudsonAlpha Biotechnology Institute for providing logistic and technical support. Funding was provided by an NSF ASSURE#1436572/ HBCU-UP grant to M. Schamschula.
Faculty Advisor: Marius Schamschula, firstname.lastname@example.org
Role: Responsible for investigating the different databases and finding a suitable source for training the model. Wrote the software and formatted the image data. Did the primary investigation of prototyping the application locally with DeepDetect.