Type of Document Dissertation Author Huliehel, Fakhralden A. URN etd-06062008-162619 Title An RBFN-based system for speaker-independent speech recognition Degree PhD Department Electrical Engineering Advisory Committee
Advisor Name Title VanLandingham, Hugh F. Committee Chair Abbott, A. Lynn Committee Member Bay, John S. Committee Member Beex, A. A. Louis Committee Member Palettas, Panickos N. Committee Member Keywords
- voice-driven menu systems
Date of Defense 1995-07-17 Availability restricted AbstractA speaker-independent isolated-word small vocabulary system is developed for applications such as voice-driven menu systems. The design of a cascade of recognition layers is presented. Several feature sets are compared. Phone recognition is performed using a radial basis function network (RBFN). Dynamic time warping (DTW) is used for word recognition. The TIMIT database is used to design and test the automatic speech recognition (ASR) system.
Several feature sets using mel-scale filter bank (MSFB), smoothed FFT, reflection coefficients (also called P ARCORs), and cepstral features are extracted. The MSFBs outperform the other features considered in our study.
Multilayer perceptrons (MLPs) and radial basis function networks (RBFNs) are considered for phoneme recognition. RBFN's are easier to train than MLPs so that RBFN's were selected to perform phoneme classification.
Four RBFN's are compared: RBFN type-I is a single-layer RBFN, RBFN type-II is a two-layer net where the second layer consists of a vector of weights, RBFN type-III is a two-layer net where the second layer is a linear layer, and RBFN type-IV is a two-layer net where the second layer is a RBFN. RBFN type-II outperforms the others on the phone level where the phone recognition rate is about 44%.
Using clustering techniques, a suboptimal, iterative and interactive algorithm is developed to train the radial basis functions (RBFs). An algorithm is developed to reduce segmentation errors in TIMIT. The TIMIT 60 phone set is reduced to a 33 phone set by merging similar phones.
For 168 test speakers, 84% recognition rate is achieved on a vocabulary of 11 words from the sentence SAl ("she had your dark suit in greasy wash water all year") in TIMIT. For applications such as voice driven menu systems, the vocabulary words can be selected to be separable and distinct. A 95% recognition rate is achieved when the confusing words in the 11 words vocabulary are excluded to get an 8-word vocabulary.
Real-time implementation of the proposed system can be achieved using a digital signal processor that can perform a multiplication within lOOns.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access LD5655.V856_1995.H855.pdf 5.35 Mb 00:24:47 00:12:44 00:11:09 00:05:34 00:00:28next to an author's name indicates that all files or directories associated with their ETD are accessible from the Virginia Tech campus network only.
If you have questions or technical problems, please Contact DLA.