Discipline: Computer Sciences and Information Management
Subcategory: Computer Science & Information Systems
Session: 2
Room: Exhibit Hall
Marilyn Lionts - University of Texas at Austin
Co-Author(s): Luca Bonomi, Vanderbilt University Medical Center, TN
IntroductionSurvival analysis is widely used with clinical data to estimate probable timelines for a certain event occurring. These analyses can provide useful insights on the spread of diseases and guide interventions. Due to the sensitive nature of health data, there are significant privacy concerns. One method for protecting privacy, called differential privacy, involves adding a small amount of “noise” to obscure the real data. However, performing longitudinal analyses across multiple sites may lead to overly perturbed results. Therefore, new methods need to be developed to balance privacy and usability. In this project, we combine differential privacy with homomorphic encryption to develop a new privacy solution, which provides high utility. MethodsOur solution improves the usability by perturbing data at each site with a small amount of noise, and it ensures that the overall results satisfy differential privacy. In our local perturbation, we compare current data to the previous release to further reduce the noise injected. Finally, we use a homomorphic secret shared protocol to aggregate the data for each temporal release.In our evaluations, we artificially split a publicly available Covid-19 dataset into 2, 4, and 8 equally sized parties to simulate aggregation across varying institutions. We test releases every 4 days, every 7 days, and every 31 days. As usability measures, we consider mean absolute error (MAE) of survival curves and absolute error of number of patients.ResultsCompared to the existing differential privacy aggregation method, we find lower MAE across all simulations. We find lower absolute error in all simulations after 40 days of data release, with notably lower absolute error in the 8-party simulation. Our method demonstrates improved accuracy with rigorous privacy. As an example, the MAE is reduced from 0.51 to 0.28 in the case of 4-party. DiscussionOverall, this technique can allow more institutions to share real-time data, resulting in improved patient care. In the future, we will explore more domain-specific utility measures, such as log-rank test, to assess the usability of the proposed methods.References: Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 9, 3–4 (August 2014), 211–407. https://doi.org/10.1561/0400000042Froelicher, D., Troncoso-Pastoriza, J.R., Raisaro, J.L. et al. Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption. Nat Commun 12, 5910 (2021). https://doi.org/10.1038/s41467-021-25972-yLuca Bonomi, Xiaoqian Jiang, Lucila Ohno-Machado, Protecting patient privacy in survival analyses, Journal of the American Medical Informatics Association, Volume 27, Issue 3, March 2020, Pages 366–375, https://doi.org/10.1093/jamia/ocz195
Funder Acknowledgement(s): Funder Acknowledgement: I thank K. Unertl, A. Becker, M. Gomez, R. Jenkins, and the Vanderbilt Medical Center for support during this project. Funding was provided by NSF REU Site Award #2050895.
Faculty Advisor: Luca Bonomi, luca.bonomi@vumc.org
Role: I read current literature on differential privacy and discussed with my mentor some specific techniques to include in the new method. Specifically, I was most involved with the dynamic aspect of the datasat, and I made suggestions on the implementation of the sparse vector method. I also wrote some of the implementation code, especially the portion to evaluate and test the method. Some examples of this include the multi-party simulation and statistical evaluations like mean absolute error and KL-divergence.