Date of Award
Summer 1990
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Electrical & Computer Engineering
Program/Concentration
Electrical and Computer Engineering
Committee Director
Stephen A. Zahorian
Committee Member
Oscar Gonzalez
Committee Member
David Livingston
Call Number for Print
Special Collections LD4331.E55Q53
Abstract
Hidden Markov models (HMM's) have achieved considerable success for isolated-word speaker-independent automatic speech recognition. However, the performance of an HMM algorithm is limited by its inability to discriminate between similar sounding words. The problem arises because all differences between speech patterns are treated as equally important. Thus the algorithm is particularly susceptible to confusions caused by phonetically-irrelevant differences. This thesis presents two types of preprocessing schemes as candidates for improving HMM performance. The aim is to maximize the differences between phonologically-distinct speech sounds while minimizing the effect of variations in phonologically-equivalent speech sounds. The preprocessors presented are a discrete cosine transformation (OCT) and linear discriminant analysis type transformation (LDA).
The HMM used in this investigation is a five-state, left-to-right structure. All the experiments were performed with either 30 or 99 highly confusable words from a eve isolated-word data base. Computations were performed on UNIX SUN work stations. All words were hand labeled in acoustic-phonetic segments. The DCT preprocessing, terms of a block transform encoding with data-independent basis vectors, was not found to be successful for improving overall word recognition performance. In contrast, the LDA preprocessing method did improve HMM word recognition accuracy. The LDA bas is vectors were computed from signal statistics so as to maximize the ratio of between to within phonetic class data variance. The LDA technique requires phonetically segmented data for training. Using speaker independent word recognition tests, i.e., one set of speakers for training and another set of speakers for testing, the LDA method reduced HMM word errors over 45%. Results show that discrimination between similar sounding words can be greatly improved.
The results of the research conducted in this study not only gives additional insights into the basic operation of hidden Markov modeling for speech recognition, but also could potentially be applied to large vocabulary continuous-speech speaker-independent speech recognition. It shows that significant improvements in speech recognition system performance may be achieved by better acoustic-phonetic modeling.
Rights
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
DOI
10.25777/05r3-zy80
Recommended Citation
Qian, Danming.
"Encoding Phonetic Knowledge for Use in Hidden Markov Models of Speech Recognition"
(1990). Master of Science (MS), Thesis, Electrical & Computer Engineering, Old Dominion University, DOI: 10.25777/05r3-zy80
https://digitalcommons.odu.edu/ece_etds/485
Included in
Computational Engineering Commons, Signal Processing Commons, Theory and Algorithms Commons