Speaking correct grammar is essential in conveying meaning and demonstrating a learner’s mastery of the English language. Even if you speak advanced vocabulary, incorrect grammar can lower others’ perception of your communication skills and even lead to serious miscommunication. Consider a common mistake wherein a speaker suggests that they don’t care about something by saying “I could care less”. However the correct phrase is “I couldn’t care less”. Saying “I could care less” implies that the speaker somewhat cares and could potentially care even less thereby negating the intent of the phrase.
While plenty of work has been done to automate evaluation of grammar quality in written text, to the best of our knowledge no one has yet attempted to automate evaluation of grammar sophistication and quality in speech.
Today we are releasing a breakthrough new capability in automatic spoken language assessment by adding a grammar quality score to the Speechace Spontaneous Speech Recognition API . The API accepts an audio sample as input and then transcribes it and produces a grammar score along with pronunciation, fluency and vocabulary scores on the speech sample. Our grammar score aligns with the IELTS spoken grammar score rubric and is very effective in evaluating spoken proficiency for the purpose of recruitment, test preparation and student placement.
To get a preview of our grammar API capabilities, please review the below audio samples which our API graded to be 5.9 and 7.4:
Speech sample with spoken grammar score of 5.9:
Speech sample with spoken grammar score of 7.4:
Challenges in scoring spoken grammar
Observe that evaluating grammar quality in speech is incredibly challenging because of the following reasons:
(a) Spoken form of a language has different rules than its written form. Consider the following real life conversations between John and Tom:
Tom: My little brother is a really good student.
John: Why do you say that?
Tom: Well, he is really smart, so he always gets good grades.
John: Maybe he gets good grades because he studies hard.
Tom: Didn’t know you used boiling water.
Tom: Didn’t know you used boiling water.
John: Don’t have to but it’s um …they reckon it’s um, quicker
As you probably guessed, the first excerpt is from an English textbook, while the second excerpt is from a real-life conversation. The rules observed in the two excerpts are so different that they almost seem from two different languages. Spoken language will appropriately contain fragments and slips which would normally be considered errors in written language.
(b) Grammatical sophistication is not just demonstrated by speaking in error free language. Otherwise, a beginner who sticks to very basic grammar would be scored highly. In fact, even native speakers often make grammatical slips. It is rather the use of particular phrases and structures that signals the sophistication of the speaker. Therefore a pure error checker fails to reliably distinguish speakers of various grammatical proficiency levels because it only focuses on mistakes.
(c) Finally, speech recognition inaccuracies preclude transcribing the audio and applying off the shelf grammar evaluation tools for written text. While it is alluring to take an audio sample, transcribe it and produce a grammar score by passing the transcription to tools such as languagetool.org or Grammarly, reality is quite different. Since speech recognition engines are not perfect, the produced transcriptions almost always have word errors and therefore grammar checkers for written language cannot consume these transcripts to provide a reliable score.
To tackle the above complexities in evaluating grammar in speech, we have designed and trained novel machine learning algorithms specifically to solve this problem. We evaluated hundreds of grammatical features from native and non-native English speaker samples and identified the most reliable features that measure variety, precision, complexity and accuracy of spoken grammar. We are optimistic that our new API will significantly reduce the burden in obtaining a reliable grammar score in spoken language assessment scenarios.