Speechace v5 is available

Today we are announcing the general availability of Speechace API version 5 in all regions. In v5 we have made significant updates to all of the underlying models which power Speechace. While Speechace is a single REST API, there are multiple models at work under the hood to process and serve a single request.

Acoustic Model:

In Automatic Speech Recognition, the acoustic model is the component responsible for transforming an audio signal to a linguistic unit such as a phoneme. A better acoustic model is able to detect the right signal for a wide range of speakers and under a wide range of recording conditions.

Scoring Model:

The scoring model is responsible for evaluating each utterance at the sentence, word, syllable, and phoneme level. The scoring model assigns a score at each level and identifies the degree of intelligibility of the speaker’s pronunciation.

Fidelity Model:

The fidelity model classifies whether the speaker has faithfully attempted the expected utterance, indicating if this is a valid attempt for which a score should be recorded or whether the speaker should reattempt the utterance.

Fluency Model:

The fluency model evaluates the speaker’s overall speaking fluency, producing an IELTS and PTE speaking score based on the fluency and pronunciation of the speaker, and generating speaking fluency metrics such as the articulation rate, pausing, and length of run for the speaker.

How we evaluate Speechace:

At Speechace we continuously evaluate the performance of each these models using datasets and benchmarks that we maintain internally and regularly evolve as we observe field usage. Our datasets cover a wide range of speakers from different geographies and at every proficiency level. Our datasets also vary in the nature, length, and complexity of the audio recordings.

In Speechace v5, we have improved the performance of each of the above models over our target evaluation metrics.

Speechace v5 Improvements:

While the complete evaluation datasets are internally confidential to Speechace, we wanted to share some of the relative % improvements to provide an idea of what we measure and what gets better with v5.

  • 12.3% relative increase in overall accuracy of word pronunciation scoring
  • 22.4% relative increase in the F1 Score for word level mispronunciation classification
  • 8.8% relative increase in IELTS/PTE Score RMSE for spontaneous speaking activities
  • 14.3% increase in IELTS/PTE Score RMSE for read aloud speaking activities

Finally, let’s look (really listen) at a couple of examples to highlight some of these improvements. While it is always easy to cherry pick a few good examples, we chose a couple that demonstrate examples of general improvements we have seen with v5.

1. Robustness to fillers and insertions

One of the challenging aspects is when a user utterance contains additional speech unrelated to the answer which can often impact alignment and scoring. This often present in activities designed to stimulate a spontaneous response. We constantly work to improve our robustness to such events because not every recording can be perfectly clean all the way.

In this particular case, v5 is robust to fillers and insertions in the first 5 seconds and correctly aligns and scores the utterance despite of that challenge.

2. Improved scoring quality

In the next example, we see a pure improvement in scoring to an utterance which had a slight cutoff in recording.

In this particular case, the difference is more subtle. v5 is able to correctly score the utterance despite of the slight cutoff at the beginning and overall fair quality of the recording.

As with every version of Speechace, v5 was long in the making and went through beta private testing before making it to general release. We are grateful to those customers who have participated in our beta test and look forward to your feedback and thoughts after using v5.

You can start using Speechace v5 API today at https://docs.speechace.com 

All the best,

The Speechace team

Leave a Reply