Since the very inception of Speechace, customers have pressed us to develop a speaking test which automatically evaluates a user’s spoken language proficiency. Over the past year our spoken language assessment API has grown to score a comprehensive set of language skills and we were finally in place to develop the end to end Speechace Speaking Test.
To build out our test, we set a simple goal:
Automatically evaluate a user’s spoken language skills through a short and engaging 1-on-1 conversation, just as a qualified human evaluator would.
As we explored designs to achieve our goal, we observed a few irksome shortcomings in existing tests. To overcome these shortcomings we adopted the following set of principles:
1. Engage the user
Most in-market tests tend to be long and have a monotonous point and click approach. These tests prompt the user to speak through a sequence of utterances, at the end of which a score is provided. We felt that this click and record approach is outmoded and there is room to reimagine the testing experience and make it engaging, realistic and short. We committed to building a test which users enjoy.
2. Provide comprehensive feedback
While many automatic speaking tests provide feedback on pronunciation and fluency, we opted to provide feedback on the entire range of language skills i.e. pronunciation, fluency, vocabulary, grammar and coherence, just as a qualified human evaluator would.
3. Assess spontaneous speech
We noticed that existing tests only have read aloud, listen & repeat, or summarize style prompts but these activities do not measure how well a user can speak in real world interactions. Therefore we chose to only have spontaneous speaking activities in our test. We believe that a user’s spontaneity in speaking truly reflects their ability to think critically and respond under pressure in a second language.
4. Allow complete customization
Most in-market tests assess candidates on a fixed length standardized test. We felt this approach is too restrictive and opted to develop a fully customizable test. We decided that customers must have the freedom to design their own question prompts, make the test longer or shorter, and brand as they wish.
5. Generate a transparent assessment report
Most automatic speaking tests are a black box and provide no explanation on why a particular score was given. This is frustrating to both test takers and institutions that sponsor the test. The test taker doesn’t know exactly why they got a particular score. The institution feels shorthanded because they cannot fully trust the test especially when they see false negatives. Therefore explainability in scoring became one of our prime objectives.
With an eye towards the above principles, we present our take on assessing spoken language proficiency in the Speechace speaking test. Here are a few salient features of the test:
1. Engaging avatar based test examiners
We took advantage of some of the most cutting edge developments in avatar technology to develop an avatar led virtual interview styled test. We made sure that the avatars were realistic and had sufficient gender and ethnic diversity. Check out our test collection to review the types of avatars we used or try out a specific test on College admissions. Note that the avatars videos can be easily replaced by real human videos for additional authenticity.
2. Holistic evaluation of speaking skills
We married some of the best work in speech technology with modern natural language processing techniques to develop algorithms that provide a holistic assessment of key speaking skills such as vocabulary, grammar, pronunciation, coherence, fluency, and relevance of the response to test prompt. All our algorithms are designed to be robust enough to cope with transcription errors. Here is a sample score summary that the user sees after taking the test:
All responses are first passed through our custom designed relevance algorithm that rejects responses that are either irrelevant or insufficient. This ensures that no test taker can game the system. Further note that we provide an easy slider interface to convert the Speechace score to other standardized scores such as CEFR, TOEFL, PTE, IELTS and TOEIC.
3. Questions that make you think
Our test questions are designed to be contextual and make the candidate think. We provide 30 seconds to prepare for the question and then ask users to freely (spontaneously) speak for a minute. Based on our studies, we found that up to 3 such questions are sufficient to assess a candidate’s spoken language proficiency. Consider a speaking test we designed for recruiting call center workers – This test has the following 3 questions:
a. Why do you want to work in a call center?
b. How will you provide high quality customer service to callers?
c. How will you handle a call from an angry customer?
The beauty of such a test is that not only does it measure a user’s speaking skills but it also serves as a screening interview for domain knowledge and key job competencies.
4. A customizable domain specific speaking test
All aspects of our test are fully customizable. We can vary the type of avatars, the instructions spoken by avatars and the number of questions asked by avatars. We have developed the necessary infrastructure to quickly create custom tests to the needs of our customers. As of today, we have tests on a wide range of topics including college admissions, call center recruiting, aviation recruiting, nursing and more. With our test composition infrastructure we can create a test on any topic and of any length in a matter of minutes.
Furthermore, we can apply branding to avatars to personalize the test for a school or corporation. Consider the test we created for recruiting British Airways cabin crew members. You will observe that not only the avatars quiz users on topics relevant to being a cabin crew member but the avatars plug in branding specific to British Airways.
5. A transparent interactive report
At the end of the test, the user receives a detailed report with overall descriptive feedback and targeted feedback for each question on the test. Checkout a sample report for our candidate Jeevan Chopra:
The report provides feedback on the user’s response to each question including the response transcript, the rate at which they speak, any hesitation or pauses they have during speaking and a detailed pronunciation score. Additionally the report provides full audio playback of the user response along with playback of pronunciations of individual words:
Notice that currently we do not provide granular feedback on vocabulary and grammar but we are working on adding these details and hope to have them in a later release of the test.
Test accuracy and validity
We evaluated the accuracy and validity of the test on 10s of thousands of randomly selected IELTS exam candidates. We compared the IELTS equivalent score generated by the test algorithms against blind human scores by professional examiners. In these comparisons, we observed test re-test reliability of 0.82 Pearson correlation coefficient and our test algorithms were able to rate samples within 0.5 to 1.0 points of human examiners.
While the initial applications are focused on language proficiency assessment for either school admissions, immigration and contact center recruitment, we also see massive interest in general purpose job screening. We will continue to post updates as additional use cases are highlighted by our customers.
As of now we are deeply engaged with early adopters of our test and are listening to their feedback to iterate over the user interface. Simultaneously we are updating our speech algorithms to make them more accurate and get even better inter-rater agreement with human raters.