• Skip to main content
  • Skip to after header navigation
  • Skip to site footer
ERN: Emerging Researchers National Conference in STEM

ERN: Emerging Researchers National Conference in STEM

  • About
    • About AAAS
    • About NSF
    • About the Conference
    • Project Team
    • Advisory Board
  • Conference
  • Abstracts
    • Abstract Submission Process
    • Abstract Submission Guidelines
    • Presentation Guidelines
  • Travel Awards
  • Resources
    • Award Winners
    • Code of Conduct-AAAS Meetings
    • Code of Conduct-ERN Conference
    • Conference Agenda
    • Conference Materials
    • Conference Program Books
    • ERN Photo Galleries
    • Events | Opportunities
    • Exhibitor Info
    • HBCU-UP PI/PD Meeting
    • In the News
    • NSF Harassment Policy
    • Plenary Session Videos
    • Professional Development
    • Science Careers Handbook
    • Additional Resources
    • Archives
  • Engage
    • Webinars
    • ERN 10-Year Anniversary Videos
    • Plenary Session Videos
  • Contact Us
  • Login

Errors in Automatic Speech Recognition

Undergraduate #233
Discipline: Computer Sciences and Information Management
Subcategory: Computer Science & Information Systems

Benjamin Kassman - Lewis and Clark College
Co-Author(s): Catherine Seita, Cornell University, NY; Jason Antal, Gallaudet University, DC; Jaron Rekhop, Gallaudet University, DC



Automatic Speech Recognition (ASR) software provides a service, transcribing spoken content into text, that may benefit people who identify as d/Deaf or Hard of Hearing (DHH). Text-based messaging such as ASR is a mode of communication that is readily accessible to DHH individuals in the workplace. Despite the advances that ASR technology has made in recent years, the transcriptions produced by ASR are not always accurate. Our aim with this study was to work within the limitations imposed by current ASR techniques, rather than to develop new techniques.
We studied the types of errors that occur with ASR transcriptions and tested our hypothesis that transcription errors can be accurately predicted through examining linguistic factors and speaker tendencies. We also hoped to identify through our analyses which specific factors contributed most to transcription error. We focused on errors that occurred during sessions in an experiment where hearing participants used an ASR app to interact with DHH individuals. ASR software assigned a confidence score to each word that it produced and an app that we developed displayed the words produced with ASR and underlined each of the words that had a score below 75% in order to indicate the likelihood that the word was incorrectly transcribed.
In 9 sessions in an experiment with 12 hearing participants altogether, we were able to collect 3100 ASR produced words, which we were able to compare with reference words collected by audio transcription. This allowed us to produce a rich dataset with metadata on each word. We found that word pairs with phonetic and lexical stress similarity were more likely to be incorrectly produced by the ASR system. In addition, there was a correlation between the part of speech of an ASR produced word and word error rate for the part of speech, and a correlation between a speaker’s native language and accuracy of transcription.
Using these data, we confirmed that our app’s confidence rating was an accurate indicator of whether a word would be mistranscribed. High confidence was defined as above 75% confident. Out of 3108 spoken words, 74.5% were transcribed correctly and had high confidence, 15.3% were correctly yet low confidence, 3.1% were incorrect yet high confidence, and 6.7% were incorrect and low confidence. For future work, it would be useful to also collect prosodic information such as rate of speech and inflection for each word in the dataset to better understand how these factors indicate transcription errors.

Not Submitted

Funder Acknowledgement(s): This work has been generously supported by an NSF REU Site Grant (#1460894) awarded to Dr. Raja Kushalnagar, PI.

Faculty Advisor: Matt Huenerfauth, matt.huenerfauth@rit.edu

Role: I helped to design the test sessions for the app and created the scenarios that the participants worked through. I collected the data, setting up video recording sessions to test out the ASR app and managing the API of the app as the experiment ran. I processed the data collected from the video sessions, transcribing the audio into text through a Python program, analyzing the video and captioning it for our Deaf and Hard of Hearing researchers. Finally, I analyzed the data and tested out our hypotheses regarding the relationship between certain metadata and error rate in transcription.

Sidebar

Abstract Locators

  • Undergraduate Abstract Locator
  • Graduate Abstract Locator

This material is based upon work supported by the National Science Foundation (NSF) under Grant No. DUE-1930047. Any opinions, findings, interpretations, conclusions or recommendations expressed in this material are those of its authors and do not represent the views of the AAAS Board of Directors, the Council of AAAS, AAAS’ membership or the National Science Foundation.

AAAS

1200 New York Ave, NW
Washington,DC 20005
202-326-6400
Contact Us
About Us

  • LinkedIn
  • Facebook
  • Instagram
  • Twitter
  • YouTube

The World’s Largest General Scientific Society

Useful Links

  • Membership
  • Careers at AAAS
  • Privacy Policy
  • Terms of Use

Focus Areas

  • Science Education
  • Science Diplomacy
  • Public Engagement
  • Careers in STEM

Focus Areas

  • Shaping Science Policy
  • Advocacy for Evidence
  • R&D Budget Analysis
  • Human Rights, Ethics & Law

© 2023 American Association for the Advancement of Science