Discipline: Physics
Subcategory: Computer Science & Information Systems
Session: 2
Tracy Edwards - Hampton University
Co-Author(s): Justin Solomon, Duke University, Raleigh, NC USA Aiping Ding Duke University, Raleigh, NC USA Ehsan Samei, Duke University, Raleigh, NC USA
Hospitals are required to monitor and track the radiation dose used for computed tomography (CT) examinations. Individual CT scanners send such information to a dose monitoring database after every patient is scanned. These raw data are potentially very valuable in understanding how imaging examinations are being performed in the hospital and to ensure optimal patient safety. However, the data are often difficult to analyze due to the inconsistent ways that various CT manufacturers organize and report the radiation dose information and due to the natural complexity of imaging data in a real-world radiology department. With over 100k examinations performed every year at some hospitals, there is a need for automated solutions to help clean up such data. Therefore, the objective of this study was to create a machine learning algorithm that can automatically categorize raw radiation dose data from a radiology department at a major academic hospital. The raw radiation dose data from 65,356 CT examinations was collected from the radiology department at Duke University Hospital. The data was organized in tabular format with each row corresponding to a radiation event and each column corresponding to either a text-based (Study Description, Institution name, Model Station Name) or numerical-based (CT dose index, Dose Length Product ) descriptor of the radiation event. The goal was to train a model that could predict the category of each radiation event (scout, contrast timing, or diagnostic) based on these predictor columns. A decision-tree model was trained using the text processing tools in the Natural Langue Took Kit (NLTK) and the machine learning tools in the Scikit-learn Python package. Text-based columns of interest that contributed to predicating the scan purpose were vectorized (converted to numerical values) based on a Word2Vec word embedding algorithm. Once vectorized, they were included with CT Dose Index and Dose Length Product as predictors of the scan purpose. Training data was manually labeled under the guidance of an expert clinical medical physicist and included 6,000 cases. Once the model learns which scan purpose label is associated with the each vectored column, it is then able to make a prediction for data it has not seen before. The model was able to achieve a 99% predication accuracy, demonstrating that such an approach is potentially valuable in helping to categorize and eventually analyze radiation dose data for the ultimate benefit of patients. Future work will focus on applying similar methods to better clean-up and categorize other aspects of the radiation dose data.
Funder Acknowledgement(s): Justin Solomon
Faculty Advisor: Justin Solomon, justin.solomon@duke.edu
Role: I manually labeled training data under the guidance of an expert clinical medical physicist and included 6,000 cases.