Speech Ace / Blog / Implementing Connected Speech Activities with Speechace API
Connected speech (or Linking) is how native English speakers blend words together to make their speech sound smooth and natural. Linking is key to sounding fluent. In spoken English, words often blend into each other, creating a smooth rhythm.
For example, “Left Brain” sounds like “Lef Brain” (the ‘t’ is dropped because of the ‘b’ consonant in the next word).
However, non-native learners often struggle with linking because they tend to pronounce each word separately, making their speech robotic and unnatural.
This is where eLearning applications built with the Speechace API can come in and help learners practice using connected speech correctly. This helps developers build smart, real-time feedback in their apps. This feedback guides learners to speak more naturally, like a native speaker.
The Speechace API recognizes some of the most common linking types and provides a detailed phoneme-by-phoneme breakdown, which helps applications measure if a learner used connected speech or not. To teach connected speech, your app can feature specific practice activities. For instance, you can ask learners to say phrases like “left brain,” “The apple,” or “See us,” and use the Speechace API to assess if linking happened.
Based on this, you can show visual feedback, like a ✅ if linking is detected, or ❌ if it’s missing. You can also show learners what they said compared to what a native speaker would say.
Speechace is designed to infer phrases where certain types of linking could happen. When you send text and audio to the Speechace API, it returns a phoneme list for each word in the text and scores for how well the speakers pronounced each word, syllable, and phoneme.
For phrases where linking could happen, Speechace will choose the phoneme list sequence based on what is allowed in language, and what the audio showed the speaker to attempt.
Let’s take the phrase “peanut butter and jelly”. The speaker can say it as:
“Peanut butter ‘n jelly”
or
“Peanut butter AND jelly”
In the first instance, the speaker links and reduces the “and’ to ‘n it in rapid speech. In the second, the speaker emphasizes the “and” perhaps because they are angry or reminding someone not to forget to bring both the peanut better AND the jelly.
In this case Speechace will:
Speechace will return the phoneme sequence in Arpabet notation. In this example the phoneme lists would be:
Developers can create practice activities that specifically target these linking patterns. Here are the currently supported linking types by Speechace:
/DH AH/
and /DH IY/
linking/T/
and /D/
dropping/AE N D/
to /AH N/
or /N/ reduction/R/
/R/
/DH AH/
extreme reduction/Y/
linking after “ee” sound.Let us look at a few common linking types and how developers can programmatically implement some of these linking types:
Rule: “The” is pronounced /DH IY/
when followed by a vowel and /DH AH/
when followed by a consonant.
Let us take the examples of phrases “The Book” and “The Apple” which are pronounced correctly as:
/DH AH/
/B UH K/
/DH IY/
/AE P AH L/
To understand how Speechace responds to this linking type, we will compare the API responses of two audio samples of “The Book”:
Audio 1 with Correct Pattern: “The Book” was pronounced correctly, with /DH AH/
before a consonant.
Audio 2 with Incorrect Pattern: “The Book” was pronounced with incorrect linking.
Let us compare the phoneme list for “The” in both responses below.
{
"word": "The",
"quality_score": 99,
"phone_score_list": [
{
"phone": "dh",
"quality_score": 100,
...
},
{
"phone": "ah",
"quality_score": 97.5,
...
}
],
}
{
"word": "The",
"quality_score": 89,
"phone_score_list": [
{
"phone": "dh",
"quality_score": 77,
...
},
{
"phone": "iy",
"quality_score": 99.4,
...
}
],
}
In Audio 1, the phone_score_list
for “The” correctly shows the phonemes /DH AH/
. This aligns with the rule that “the” is pronounced as /DH AH/
when followed by a consonant, as in the phrase “The Book”.
In contrast, in Audio 2, the phone_score_list
for “The” shows the phonemes /DH IY/
. Although the quality_score
for the /IY/
phoneme is high (99.4), indicating it was pronounced clearly, this particular pronunciation of “the” (/DH IY/
) is incorrect when followed by a consonant sound like in “The Book.” Hence, the issue here is the choice of pronunciation for connected speech, not the clarity of the individual phoneme.
We will compare the API responses for two audio samples of “The Apple”:
Audio 1 with Correct Pattern: “The Apple” was pronounced correctly, with /DH IY/
before a consonant.
Audio 2 with Incorrect Pattern: “The Apple” was pronounced with incorrect linking.
Let us compare the phoneme list for “The” in both responses below.
{
"word": "The",
"quality_score": 90,
"phone_score_list": [
{
"phone": "dh",
"quality_score": 95
...
},
{
"phone": "iy",
"quality_score": 84
...
}
],
}
{
"word": "The",
"quality_score": 96,
"phone_score_list": [
{
"phone": "dh",
"quality_score": 97
...
},
{
"phone": "ah",
"quality_score": 96
...
}
],
}
In Audio 1, the phone_score_list
for “The” correctly shows the phonemes /DH IY/
. This aligns with the rule that “the” is pronounced as /DH IY/
when followed by a vowel sound, as in the phrase “The Apple”.
In contrast, in Audio 2, the phone_score_list
for “The” shows the phonemes /DH AH/
. Although the quality_score
for the /AH/
phoneme is high (96), indicating it was pronounced clearly, this particular pronunciation of “the” (/DH AH/
) is incorrect when followed by a vowel sound like in “The Apple.” The issue here is the choice of pronunciation for connected speech, not the clarity of the individual phoneme.
Rule: The word “and” is optionally reduced to /AE N/
or even just /N/
in natural connected speech, especially when it is unstressed and appears between two words.
Let us take the example of phrase “Bread and Butter” which is pronounced as /B R EH D/
/AE N/
/B AH T ER/
or /B R EH D/
/N/
/B AH T ER/
with reduction.
We will compare API responses for two audio samples of “Bread and Butter” to understand how the Speechace API would respond with the list of phonemes:
Audio 1 with reduction applied: “Bread and butter” was pronounced, with “and” reduced to /AE N/
.
Audio 2 without reduction: “Bread and butter” was pronounced with “and” fully pronounced as /AE N D/
.
Let’s compare the phoneme list for “and” in both responses below.
{
"word": "and",
"quality_score": 92,
"phone_score_list": [
{
"phone": "ae",
"quality_score": 94.5
...
},
{
"phone": "n",
"quality_score": 89.875
...
}
],
}
{
"word": "and",
"quality_score": 95,
"phone_score_list": [
{
"phone": "ae",
"quality_score": 100
...
},
{
"phone": "n",
"quality_score": 100
...
},
{
"phone": "d",
"quality_score": 85
...
}
],
}
In Audio 1, the phone_score_list
for “and” contains the phonemes /AE/
and /N/
. The absence of the /D/
phoneme indicates that the word “and” was correctly reduced to /AE N/
.
In contrast, in Audio 2, the phone_score_list
for “and” includes all three phonemes: /AE/
, /N/
, and /D/
. This signifies that the word “and” was fully pronounced as /AE N D/
. While the individual phonemes might have a high quality_score
(e.g., 100 for /AE/
), indicating clear pronunciation, this particular form is an acceptable but less common pronunciation in natural, fast speech compared to the reduced form.
Rule: The /T/
and /D/
sounds often behave differently depending on the sound that follows them:
When a word ending in /T/
or /D/
is followed by a word starting with a vowel sound (including some words starting with ‘h’ like “hotel”), the /T/
or /D/
is typically linked clearly to the next word.
When a word ending in /T/
or /D/
is followed by a word starting with a consonant sound, the /T/
or /D/
may be dropped in casual, natural speech.
Let us take the example of phrase “hot dog” which is pronounced as /HH AA D AO G/
with the /T/
dropping. We will compare API responses for two audio samples:
Audio 1 with dropping applied: “Hot dog” was pronounced, with the /T/
sound dropped.
Audio 2 without dropping: “Hot dog” was pronounced with the /T/
sound fully pronounced.
{
"word": "hot",
"quality_score": 98,
"phone_score_list": [
{
"phone": "hh",
"quality_score": 100
...
},
{
"phone": "aa",
"quality_score": 95
...
}
]
}
{
"word": "hot",
"quality_score": 92,
"phone_score_list": [
{
"phone": "hh",
"quality_score": 100
},
{
"phone": "aa",
"quality_score": 98
},
{
"phone": "t",
"quality_score": 76
}
]
}
In Audio 1, the phone_score_list
for “Hot” contains the phonemes /HH/
, /AA/
. The /T/
sound was dropped because of the following consonant sound in “dog”.
In contrast, in Audio 2, the phone_score_list
for “Hot” includes all three phonemes: /HH/
, /AA/
, and /T/.
This shows that the word “Hot” was fully pronounced as /H AA T/
. This is not incorrect, but for speakers practicing rapid speech with linking, this is an opportunity missed to demonstrate and use linking.
After analyzing the Speechace API response, you can provide targeted feedback to learners using the following approach:
By following these guidelines, you can create a dynamic and effective feedback system for connected speech practice activities in your application.
The Speechace API offers a powerful tool for developers to integrate sophisticated connected speech practice into their language learning applications. By leveraging its detailed phoneme-level analysis, you can accurately detect various linking and reduction phenomena, providing learners with precise, real-time feedback. This capability is crucial for guiding non-native speakers toward more natural and fluent English pronunciation.
If you’d like to explore how to enable Connected Speech practice in your application or have other linking types you’d like to see implemented with Speechace API, get in touch at contact@speechace.com. We often enable new features selectively for beta customers before making them open for everyone. We’d be glad to work with you and give you early access.
You can find all the necessary resources to start with the Speechace API at https://api-docs.speechace.com
All the best,
The Speechace Team