Emerging Researchers National (ERN) Conference

nsf-logo[1]

  • About
    • About AAAS
    • About the NSF
    • About the Conference
    • Partners/Supporters
    • Project Team
  • Registration
    • Conference Registration
    • Exhibitor Registration
    • Hotel Reservations
  • Abstracts
    • Abstract Submission Process
    • Presentation Schedules
    • Abstract Submission Guidelines
    • Presentation Guidelines
    • Undergraduate Abstract Locator (2020)
    • Graduate Abstract Locator (2020)
    • Faculty Abstract Locator (2020)
  • Travel Awards
  • Resources
    • App
    • Award Winners
    • Code of Conduct-AAAS Meetings
    • Code of Conduct-ERN Conference
    • Conference Agenda
    • Conference Materials
    • Conference Program Books
    • ERN Photo Galleries
    • Events | Opportunities
    • Exhibitor Info
    • HBCU-UP/CREST PI/PD Meeting
    • In the News
    • NSF Harassment Policy
    • Plenary Session Videos
    • Professional Development
    • Science Careers Handbook
    • Additional Resources
    • Archives
  • Engage
    • Webinars
    • Video Contest
    • Video Contest Winners
    • ERN 10-Year Anniversary Videos
    • Plenary Session Videos
  • Contact Us
  • App View

A Bioinformatics Approach to Classify Viruses Using a Decision Tree Model

Undergraduate #43
Discipline: Computer Sciences and Information Management
Subcategory: Computer Science & Information Systems

Samuel Liburd Jr. - University of the Virgin Islands


Viruses serve as one of the most efficient vectors for death and disease, killing millions worldwide and mutating uncontrollably. In order to identify and understand viruses, a classification system was created based on features such as virus size, shape, genome structure, and mode of replication. To better understand this system, I hypothesized that it was possible to classify viruses biologically using genomic features and machine learning techniques. To do so, I analysed 511 (+) ssRNA virus genomes for unique genetic characteristics that identify them. The six virus families to be classified were Flaviviridae, Potyviridae, Betaflexaviridae, Virgaviridae, Picornaviridae, and Tombusviridae. Based on my literature review, I wrote a Python script that extracted 12 different features for performing the classification task: genome length, adenine, guanine, cytosine, and thymine count, the number of start codons, G-C and A-T percentages, host organisms, the number of proteins encoded, and the number, if any, of segmentations in the genome. The relevance of these attributes was then ranked using the Correlation-based Feature Subset Eval and Best First algorithms available in the data mining package Weka. The most relevant subset of attributes (genome length, A, C, and G counts, G-C percentage, host organism, and number of proteins formed) was selected with C4.5 classification algorithm. The training method used 66% of the genomic datasets to create a decision tree model. The tests were conducted on the remaining datasets and the results obtained shown that 99.4% of the remaining viruses were accurately classified. This accuracy level encouraged and supported my initial hypothesis that it is possible to classify viruses using machine learning techniques and genomic based features. In the future, I plan to expand this approach using machine learning techniques such as support vector machines and artificial neural networks that could serve as powerful tools to monitor and update changes to viral genomes.
References: Cock, Peter. ‘Using FASTA Nucleotide Files In Biopython’. www2.warwick.ac.uk. N.p., 2016. Web. 3 Oct. 2016.
Gelderblom HR. Structure and Classification of Viruses. In: Baron S, editor. Medical Microbiology. 4th edition. Galveston (TX): University of Texas Medical Branch at Galveston; 1996. Chapter 41.

Funder Acknowledgement(s): UVI NSF/HBCU-UP SURE grant #1137472

Faculty Advisor: Marc Boumedine, mboumedine@gmail.com

Role: I conducted all of the research for this project.

ERN Conference

Celebrating 10 years of ERN!

What’s New

  • Webinars
  • Events|Opportunities
  • AAAS CEO Comments on Social Unrest, Racism, and Inequality
  • Maintaining Accessibility in Online Teaching During COVID-19
  • In the News
  • #ShutDownSTEM
  • HBCU/CREST PI/PD Meeting

Conference Photos

ERN Conference Photo Galleries

Awards

ERN Conference Award Winners

Checking In

Navigation

  • About the ERN Conference
  • Partners/Supporters
  • Abstracts
  • Travel Awards
  • Conference Registration
  • Exhibitor Registration
  • Hotel Reservations

nsf-logo[1]

This material is based upon work supported by the National Science Foundation (NSF) under Grant No. DUE-1930047. Any opinions, findings, interpretations, conclusions or recommendations expressed in this material are those of its authors and do not represent the views of the AAAS Board of Directors, the Council of AAAS, AAAS’ membership or the National Science Foundation.

AAAS

1200 New York Ave, NW Washington,DC 20005
202-326-6400
Contact Us
About Us

The World's Largest General Scientific Society

Useful Links

  • Membership
  • Careers at AAAS
  • Privacy Policy
  • Terms of Use

Focus Areas

  • Science Education
  • Science Diplomacy
  • Public Engagement
  • Careers in STEM

 

  • Shaping Science Policy
  • Advocacy for Evidence
  • R&D Budget Analysis
  • Human Rights, Ethics & Law
© 2021 American Association for the Advancement of Science