Discipline: Computer Sciences and Information Management
Subcategory: Cancer Research
Tracy S. Edwards - Hampton University
Co-Author(s): Luisel Ricks-Santi, Hampton University, Hampton, VA; J. Tyson McDonald, Hampton University, Hampton, VA
The Cancer Genome Atlas (TCGA) has acquired genetic sequencing data from several hundred women with breast cancer. With this data, we were able to assign genotypes to a list of prioritized genes that we hypothesize are associated with breast cancer status and with increased breast cancer risk in women of African descent. In our project, we used computational analysis methods and bioinformatics techniques to determine which genetic variants were associated with breast cancer status as a function of genetic ancestry (African versus Caucasian). The aim of this study was to compare the number of potentially damaging variants found in cases from women with breast cancer using the Cancer Genome Atlas (TCGA) database and controls from the HapMap project using computational methods. Our hypothesis is that there are ancestrally specific single nucleotide polymorphism (SNP) profiles with differences between cancer cases and controls that may cause an increased predisposition to the development of breast cancer. Using the TCGA (cases) and HapMap (controls), we downloaded the genetic sequencing data and genotype data from a preliminary set 246 patients and participants with African or Caucasian ancestry. We have also curated a list of potentially damaging genetic variants (>2,000). To identify genetic variants in a subset of this list, 83 DNA repair genes were identified and the TCGA data files were processed using a GATK pipeline. To filter for variants in targeted genomic regions, we used the GATK Print Reads tool. Next, the filtered data files were processed to find genetic variants using the GATK Haplotype Caller tool resulting in a final pipeline output file. We have current results on 7 genetic variants in TCGA patients vs. controls. The association of disease status (cases vs. controls) and the variant was analyzed statically. Odds ratios with 95% confidence intervals were calculated as a function of race. The combined effects of multiple variants were also examined. There were 700 controls and 246 cases. Four of seven SNPS were associated with case status. For SNPs in the MUS81, GEN1 and RNF168 genes, the variant was associated with being a case (p<0.035) and for a SNP in the MLH1 gene the variant was associated with risk (p=0.02). When stratified by ancestry, African descent had a higher risk for breast cancer when the patient was a carrier of the MUS81 variant (p<0.0001). Haplotype analysis determined that in combination, these variants were also associated with a significant risk for breast cancer (p<0.0095), and in women of African descent, this haplotype had a much higher risk (p<0.0001). From our results, 4 of 7 genetic variants were associated with breast cancer and have validated our computational pipeline. Future work will include a complete analysis of 83 genetic variants using additional cases from the TCGA database. Our data can potentially be used to identify more genetic variants associated with breast cancer risk.
Tracy S. Edwards ERN Abstact.docxFunder Acknowledgement(s): Hampton University
Faculty Advisor: J. Tyson McDonald, JOHN.MCDONALD@hamptonu.edu
Role: As a student researcher, I was responsible for the identification and download of patient files, running the GATK pipeline needed to extract data and maintaining data file organization.