Ahmed Ali

    Auditory-Based Acoustic-Phonetic Signal Processing for Robust Automatic Recognition of Speaker-Independent Continuous Speech

    Automatic speech recognition (ASR) has been an area of intense research for more than four decades. Despite recent success in this field, the current state-of-the-art ASR systems are terribly deficient when compared to the human performance. They require training when used by a new speaker, their performance deteriorates significantly in presence of noise or any mismatch from the training environment and they have limited real vocabulary size (perplexity). These systems behave as a "Black Box" with minimal integration of acoustic-phonetic, Psychological, and auditory knowledge. Part of the deficiency in performance is attributed to the limited amount of acoustic-phonetic knowledge incorporated in those systems and the limited understanding of the different sources of acoustic-phonetic variability of speech.

    To help solve these problems, we investigate the acoustic-phonetic characteristics of the continuous speech basic building units (i.e. the phonemes). We use an auditory-based front-end processing system instead of the traditional front-end systems. The acoustic-phonetic features which prove to be rich in their information content are extracted and new algorithms are designed to use these features for phoneme recognition. Such extraction and manipulation algorithms are very simple and parallel in nature. Using this approach in the fricative and stop consonants, which are the most difficult phonemes in their recognition, achieved a recognition accuracy of 90%-95% for continuous speech spoken by multiple speakers (22) from 5 different dialect regions of American English. This represents a significant improvement over the 75%-80% rate which was achieved before using traditional techniques.

    Using these feature extraction algorithms as a front-end for a state-of-the-art system, which is typically a Hidden Markov Model (HMM) system, is expected to improve robustness in presence of adverse conditions. It will integrate more speech knowledge in the recognition process and combine both phoneme-level and word-level recognition. Moreover, implementing this front-end processing in hardware, will take advantage of the parallel nature of the processing to obtain real-time fast results.

    1. A.M.Abdelatty, H.Haddara and H.F.Ragaie, "Analog Behavioral Modeling of Artificial Neural Networks", International Conference on Microelectronics (ICM), Turkey, 1994.

    2. A.M.Abdelatty and H. Haddara, "Automatic Generation of Analog Behavioral Models", IEEE International Conference on Electronics, Circuits and Systems (ICECS), 1994.

    Ahmed Ali

