Today we are announcing the general availability of Speechace API version 5 in all regions. In v5 we have made significant updates to all of the underlying models which power Speechace. While Speechace is a single REST API, there are multiple models at work under the hood to process and serve a single request.
In Automatic Speech Recognition, the acoustic model is the component responsible for transforming an audio signal to a linguistic unit such as a phoneme. A better acoustic model is able to detect the right signal for a wide range of speakers and under a wide range of recording conditions.
The scoring model is responsible for evaluating each utterance at the sentence, word, syllable, and phoneme level. The scoring model assigns a score at each level and identifies the degree of intelligibility of the speaker’s pronunciation.
The fidelity model classifies whether the speaker has faithfully attempted the expected utterance, indicating if this is a valid attempt for which a score should be recorded or whether the speaker should reattempt the utterance.
The fluency model evaluates the speaker’s overall speaking fluency, producing an IELTS and PTE speaking score based on the fluency and pronunciation of the speaker, and generating speaking fluency metrics such as the articulation rate, pausing, and length of run for the speaker.
How we evaluate Speechace:
At Speechace we continuously evaluate the performance of each these models using datasets and benchmarks that we maintain internally and regularly evolve as we observe field usage. Our datasets cover a wide range of speakers from different geographies and at every proficiency level. Our datasets also vary in the nature, length, and complexity of the audio recordings.
In Speechace v5, we have improved the performance of each of the above models over our target evaluation metrics.
Speechace v5 Improvements:
While the complete evaluation datasets are internally confidential to Speechace, we wanted to share some of the relative % improvements to provide an idea of what we measure and what gets better with v5.