Auditory-Based Acoustic-Phonetic Signal Processing for Robust
Automatic Recognition of Speaker-Independent Continuous Speech
Automatic speech recognition (ASR) has been an area of intense research
for more than four decades. Despite recent success in this field, the
current state-of-the-art ASR systems are terribly deficient when compared
to the human performance. They require training when used by a new
speaker, their performance deteriorates significantly in presence of
noise or any mismatch from the training environment and they have
limited real vocabulary size (perplexity). These systems behave as a
"Black Box" with minimal integration of acoustic-phonetic,
Psychological, and auditory knowledge. Part of the deficiency in
performance is attributed to the limited amount of acoustic-phonetic
knowledge incorporated in those systems and the limited understanding of
the different sources of acoustic-phonetic variability of speech.
To help solve these problems, we investigate the acoustic-phonetic
characteristics of the continuous speech basic building units (i.e. the
phonemes). We use an auditory-based front-end processing system instead
of the traditional front-end systems. The acoustic-phonetic features
which prove to be rich in their information content are extracted and new
algorithms are designed to use these features for phoneme recognition.
Such extraction and manipulation algorithms are very simple and parallel in
nature. Using this approach in the fricative and stop consonants, which
are the most difficult phonemes in their recognition, achieved a recognition
accuracy of 90%-95% for continuous speech spoken by multiple speakers (22)
from 5 different dialect regions of American English. This represents a
significant improvement over the 75%-80% rate which was achieved before
using traditional techniques.
Using these feature extraction algorithms as a front-end for a
state-of-the-art system, which is typically a Hidden Markov Model (HMM)
system, is expected to improve robustness in presence of adverse
conditions. It will
integrate more speech knowledge in the recognition process and combine
both phoneme-level and word-level recognition. Moreover, implementing
this front-end processing in hardware, will take advantage of the parallel
nature of the processing to obtain real-time fast results.
Here is a summary file of our results in .pdf
1. A.M.Abdelatty, H.Haddara and H.F.Ragaie, "Analog
Behavioral Modeling of
Artificial Neural Networks", International Conference on Microelectronics (ICM),
2. A.M.Abdelatty and H. Haddara, "Automatic Generation of Analog
Behavioral Models", IEEE International Conference on Electronics, Circuits and Systems (ICECS),
Number of visitors since 8/19/99 (when the counter was last reset):
Back to my home page.
(C) Copyright 1996-1998 by Ahmed Ali. All Rights Reserved.