Discipline: Technology and Engineering
Subcategory: Biomedical Engineering
Janice Nguyen - California State University, Los Angeles
American Sign Language (ASL) is utilized by the deaf and hard of hearing community as its primary mode of communication. However, deaf and hard of hearing individuals are often at risk for receiving inadequate treatment especially in healthcare due to communication barriers in a hearing centric society. To address the communication barrier, hospitals often opt to use video remote interpreting services which have issues such as poor video call quality, and lack of space in more crowded hospitals for the patient to sign and see the interpreter. This can be problematic especially in emergencies where immediate care is needed. As an effort to alleviate this problem, research in computer vision based real time ASL interpreting models is ongoing. ASL has its own unique linguistic structure and rules that dictate individual signs’ meanings and how to construct certain sentences. For instance, a sign’s meaning is determined by the 5 parameters: hand shape, palm orientation, facial expression, movement, and location. However, most interpreting models are hand shape based and lack the integration of facial cues, which are especially crucial in cases of conveying different types of sentences (i.e. questions, statements, etc.) that have the same sign structure. Thus, we hypothesize that the integration of facial cues in computer vision based ASL interpreting models has the potential to improve performance and reliability.We introduce a new facial expression based classification model that can be used to improve ASL interpreting models. This model utilizes the pre-trained facial landmark functions by the Dlib library to detect and extract certain facial points. The facial points are extracted from the midpoint frame of videos of subjects signing complete sentences. These facial points are used to train a Random Forest Regression tree model and the trained model is tested on frames from a separate set of videos of subjects signing. The model classifies the frames as statements or assertions using the trained Random Forest Regression tree model and was able to achieve an accuracy of 77%. Future uses of this model include using it to pre-classify videos of signers to make it more easier to interpret sentences that have similar signs but different facial features to indicate the type of sentence being signed. Utilizing this pre-classification process can improve current ASL interpreting models by classifying sentences into types and applying the appropriate ASL interpreting models that would perform the best for that specific type of sentence.References: Tyler G. James, Kyle A. Coady, Jeanne-Marie R. Stacciarini, Michael M. McKee, David G. Phillips, David Maruca,and JeeWon Cheong.“They’re Not Willing To Accommodate Deaf patients”: Communication Experiences of Deaf American Sign Language Users in the Emergency Department.Qualitative Health Research, 32(1):48–63, Jan. 2022Gerard Biau and Erwan Scornet.A random forest guided tour. TEST, 25(2):197–227, June 2016.
Funder Acknowledgement(s): CREST-CATSUS (Center for the Advancement of Sustainable Urban Systems)
Faculty Advisor: Y. Curtis Wang, firstname.lastname@example.org
Role: This research is a part of my Master’s thesis, so I have been the one to gather the data, and develop the software to preprocess the data, to run the facial point functions, and to train and test the Random Forest Regression model.