Emerging Researchers National (ERN) Conference

nsf-logo[1]

  • About
    • About AAAS
    • About the NSF
    • About the Conference
    • Partners/Supporters
    • Project Team
  • Registration
    • Conference Registration
    • Exhibitor Registration
    • Hotel Reservations
  • Abstracts
    • Abstract Submission Process
    • Presentation Schedules
    • Abstract Submission Guidelines
    • Presentation Guidelines
    • Undergraduate Abstract Locator (2020)
    • Graduate Abstract Locator (2020)
    • Faculty Abstract Locator (2020)
  • Travel Awards
  • Resources
    • App
    • Award Winners
    • Code of Conduct-AAAS Meetings
    • Code of Conduct-ERN Conference
    • Conference Agenda
    • Conference Materials
    • Conference Program Books
    • ERN Photo Galleries
    • Events | Opportunities
    • Exhibitor Info
    • HBCU-UP/CREST PI/PD Meeting
    • In the News
    • NSF Harassment Policy
    • Plenary Session Videos
    • Professional Development
    • Science Careers Handbook
    • Additional Resources
    • Archives
  • Engage
    • Webinars
    • Video Contest
    • Video Contest Winners
    • ERN 10-Year Anniversary Videos
    • Plenary Session Videos
  • Contact Us
  • App View

A Deep Learning Language Model for Software Bug Detection

Graduate #37
Discipline: Computer Sciences and Information Management
Subcategory: Computer Science & Information Systems
Session: 4
Room: Virginia B

John Heaps - University of Texas at San Antonio
Co-Author(s): Dr. Rocky Slavin, University of Texas at San Antonio, San Antonio, TX; Dr. Xiaoyin Wang, University of Texas at San Antonio, San Antonio, TX; Dr. Jianwei Niu, University of Texas at San Antonio, San Antonio, TX.



Software bugs can lead to delays in development, high costs, and security risks to all stakeholders. In 2018, software bugs cost the world economy over $1.7 trillion and impacted over 3.7 billion people. Currently, most bug detection tools use static analysis techniques to detect software bugs. However, static analysis techniques have many limitations: code patterns or specifications must be manually defined, it is conservative, and it is not always scalable. Deep learning models and techniques have been shown to solve similar limitations in the past, and so may be successful in mitigating such limitations again. However, there are many obstacles in the application of deep learning to code: the complex syntactic structures of code, the constant definition of new methods and variables, no well-catered dataset for learning, and the data sparsity problem. Deep learning on code works by creating a vector representation for every code element in a vocabulary to represent a language model. The current language model is a statistical model based on the probability of occurrence of code elements. However, such a language model does not well represent the bug detection problem, where it is more important to model the meaning and logic behind code elements. To solve this, we define a behavioral language model for code by utilizing code elements that only affect the behavior of the code element being learned. The two main evaluations performed to determine vector representation quality are perplexity (or entropy) and an exploratory analysis of the vector space (where the better clustering of similar code elements indicates better vector representations). The vector representations produced by our model, to our knowledge, achieved similar quality as the state of the art based on its perplexity of 6.45 where our model was significantly smaller and far simpler than those in the current literature. Exploratory analysis showed many good clusters of similar code elements, but there were some code elements that did not seem to cluster properly. Further, the model is unable to mitigate all the above limitations, the worst of such being the data sparsity problem and encountering new code elements, which significantly impedes its ability to be applied to any code analysis. In our future work we plan to implement a different semantic language model based on code element definitions, which has the potential to mitigate all above limitations and will allow it to be feasibly applied to bug detection analysis. References: Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. On the naturalness of software. In 2012 34th International Conference on Software Engineering, pages 837?847. IEEE, 2012. John Heaps, Xiaoyin Wang, Travis Breaux, and Jianwei Niu. Toward detection of access control models from source code via word embedding. In Proceedings of the 24th ACM Symposium on Access Control Models and Technologies, pages 103?112. ACM, 2019.

Funder Acknowledgement(s): This research was funded, in part, by the CREST Center for Security and Privacy Enhanced Cloud Computing (C-SPECC) through the National Science Foundation (NSF) (Grant #1736209)

Faculty Advisor: Dr. Jianwei Niu, jianwei.niu@utsa.edu

Role: I was heavily involved in every aspect of the research including: the definition of the new language model, the implementation of the deep learning model, the collection of data to learn, and the evaluation of results.

ERN Conference

Celebrating 10 years of ERN!

What’s New

  • Webinars
  • Events|Opportunities
  • AAAS CEO Comments on Social Unrest, Racism, and Inequality
  • Maintaining Accessibility in Online Teaching During COVID-19
  • In the News
  • #ShutDownSTEM
  • HBCU/CREST PI/PD Meeting

Conference Photos

ERN Conference Photo Galleries

Awards

ERN Conference Award Winners

Checking In

Navigation

  • About the ERN Conference
  • Partners/Supporters
  • Abstracts
  • Travel Awards
  • Conference Registration
  • Exhibitor Registration
  • Hotel Reservations

nsf-logo[1]

This material is based upon work supported by the National Science Foundation (NSF) under Grant No. DUE-1930047. Any opinions, findings, interpretations, conclusions or recommendations expressed in this material are those of its authors and do not represent the views of the AAAS Board of Directors, the Council of AAAS, AAAS’ membership or the National Science Foundation.

AAAS

1200 New York Ave, NW Washington,DC 20005
202-326-6400
Contact Us
About Us

The World's Largest General Scientific Society

Useful Links

  • Membership
  • Careers at AAAS
  • Privacy Policy
  • Terms of Use

Focus Areas

  • Science Education
  • Science Diplomacy
  • Public Engagement
  • Careers in STEM

 

  • Shaping Science Policy
  • Advocacy for Evidence
  • R&D Budget Analysis
  • Human Rights, Ethics & Law
© 2021 American Association for the Advancement of Science