Computer-Aided Second Language Learning through Speech-based Human-Computer Interactions

Human-Computer Communications Laboratory

Department of Systems Engineering and Engineering Management

Chinese University of Hong Kong

Introduction        Background        Demonstration        Publications        Researchers       





    The aim of this work is to develop automatic instruments for language learning. We attempt to develop a mispronunciation detection system to effectively highlight pronunciation errors produced by Cantonese (L1) learners of American English (L2). The target learners are adults who are native Cantonese and have learned English for some years. It is observed that mispronunciations made in L2 are mainly due to the disparities at the phonetic and phonotactic levels across the language pairs. Since some English phonemes are missing from the Cantonese inventory, the Cantonese learners with accent often substitute for an English phoneme with a Cantonese one with a  similar place or manner of articulation. Such substitutions may lead to misunderstanding and confusion among the English words. Fig. 1 shows the comparison between English and Cantonese phonemes.

(a)       A comparison between English and Cantonese consonants with exemplary words. Consonants highlighted in red are English consonants which do not exist in Cantonese. These are predicted to be substituted by learners with Cantonese consonants similar in place of articulation and/or manner of articulation.

(b)       A comparison between English and Cantonese vowels with exemplary words. Vowels highlighted in red are not found in Cantonese and are predicted to be substituted by phonetically-similar Cantonese vowels

Fig. 1 The comparison between English and Cantonese phonemes.


    A straightforward approach is to run phoneme-based recognition and then identify pronunciation errors in the learners’ speech. However, the phone recognition error rate is typically much higher than word error rate even for native speakers, which makes it difficult to distinguish between pronunciation errors and recognition errors. This means we cannot directly apply ASR to mispronunciation detection. Instead, we have developed a mispronunciation detection system where the acoustic models and an extended pronunciation dictionary with possible erroneous pronunciation variations are used to recognize the most likely phone sequences, given the known phone sequences. After that, automatic mispronunciation detection is conducted by running forced-alignment with the acoustic models and the extended pronunciation dictionary based on possible phonetic confusions. Fig. 2 is the schema of our system.

Fig. 2 Overview of ASR-based system to detect and diagnose second language learners’ mispronunciations


    For further details, please refer to our publications.