Last week the surprise sale of Soapbox Labs sent a flurry of eLearning providers looking for alternatives to the Soapbox Labs API. In this post we will share some details on how the Speechace API can be used as a Voice AI for kids platform and as a replacement for Soapbox Labs.
tldr
Yes the Speechace API can be and has been widely used in K-12 activities for kids (like in this social robot built by MIT Media lab). While the Speechace models are general and not exclusively kids trained, our models are large and broad in nature and perform well with kids from Kindergarten age and up in both early literacy and second language learning contexts.
In fact, the Speechace API is in use by several large K-12 Education Publishers and eLearning providers in countries such as: Brazil, India, Germany, Vietnam, the United States, and many others.
Speechace can be used to support a variety of kids Voice AI use cases (more on these below):
- Phonics Reading
- Sight words
- Voice-based MCQ
- Oral Reading Fluency (ORF)
- English Language Learning (ELL)
Does Speechace offer specific or custom models for kids?
We serve our K-12 use cases from the same general model. Speechace has always been open to the idea of custom or use case specific models, but first we start by testing how well such requirements can be addressed by the general model. So far, the general model has performed well for kids use cases. And just as important, progressive new versions of the general model demonstrated improvement in accuracy for kids use cases.
The advantage of a general model is that it generalizes better to address a variety of use cases and demographics, and is less prone to over-fitting. It also allows Speechace to move faster continuously investing in and releasing improvements which benefit all use cases. Our general model is on its 9th major generation. Our most recent update involved a 10x acoustic model size upgrade and a major re-architecture to take advantage of new GPU capabilities. This yields accuracy and performance gains which become instantly available to all customers and use cases.
How well does Speechace work on kids?
Let’s talk metrics. First, let’s qualify the dataset we will report metrics on. It’s easy to show great looking results on a small trivial dataset, thin on pronunciation errors, recorded under pristine conditions by mostly native speakers. The real world, of course, is anything but that.
Our dataset is:
- Over 5000 items in size
- Recorded from over 350 different child speakers
- Is balanced in terms of Label 1 (Correct) items and Label 0 (Incorrect) items
- Comes from a real world production environment with noise, background talk, insertions, repetitions, interruptions and everything you might expect when a young child does their homework with family, siblings and life happening around them.
- Contains long and short utterances and is not just single words
- A holdout dataset never seen or used in training