Discipline: Computer Sciences and Information Management
Subcategory: Computer Engineering
Session: 4
Camille Harris - University of California Berkeley
A topic of growing interest in computer science is machine learning algorithms that incorrectly give harmful results to marginalized communities such as women and racial minorities. The training and testing sets used to build these algorithms often leave out such groups causing these disparities to arise. Many public and private entities use software hoping to avoid the implicit bias of humans, however many algorithms perpetuate bias. In response, researchers have developed over 30 statistical metrics to measure algorithmic bias. These metrics form the basis of multiple software tools that promise to determine the fairness of an algorithm. This research evaluates the comparative effectiveness of three existing opensource tools (FairTest, Fairness Measures, and Aequitas) in 1) identifying bias according to the statistical metrics they embed, and 2) evaluating the tools proposed solutions for mitigating bias. We ran these tools on profiles of individuals from COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a software tool made by Northpointe, Inc. that uses machine learning to assess the recidivism risk of prisoners. COMPAS was selected because it had readily available data and bias in the algorithm is widely debated; while it violates many of the metrics with respect to racial groups, due to the concept of mutual incompatibility, satisfying all metrics is impossible. This data provided useful comparisons because it is known to both violate and satisfy some of the statistical metrics, making it easy to judge the comprehensiveness of the tools. After preprocessing and running the data on each tool we found that none of the tools were comprehensive, with each covering only 3-4 metrics. Running FairTest resulted in an error due to an inability to process the different categories of this dataset, and therefore this tool failed to produce results. The remaining tools presented results inconsistently: Aequitas gave binary results indicating if a metric was violated or not, while Fairness Measures gave the numerical result of running the metric, leaving conclusions of fairness to the user. Lastly, although Aequitas gave information on determining the importance of each metric given the context of use, none of these tools provided bias mitigation strategies to address identified violations. Only roughly half of the 32 metrics are used across the three tools, so even using all three will not provide a comprehensive evaluation of the fairness of an algorithm. Despite this, each test provides some information about the existence of bias. Future work in this area may include the development of a more comprehensive data analysis tool that evaluates algorithms with respect to all metrics and informs users of how context relates to metrics. Tools for placing consideration of these statistical metrics in context are essential, as it is impossible to satisfy all metrics, and fairness is not a property of an algorithm in isolation.
Funder Acknowledgement(s): The research was supported by the University of California New Experiences for Research and Diversity in Science (Cal NERDS) program which provided support for the project through the NSF funded LSAMP program and the University of California Leadership Excellence through Advanced Degrees (UC LEADS) program.
Faculty Advisor: Deirdre Mulligan, dkm@ischool.berkeley.edu
Role: For this project, my advisers gave me guidance on the importance of the question my research seeks to answer and how I may go about approaching the question. From there we worked together to formulate the research project. After the project was created I did the entirety of the project; this included getting the COMPAS data and appropriately pre-processing it for each tool, downloading, and running the tools with the data, when needed, debugging the software or changing the dataset to get results, and comparing the results of each of these fairness tests.