Date of Award

Summer 1990

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Electrical & Computer Engineering

Program/Concentration

Electrical and Computer Engineering

Committee Director

Stephen A. Zahorian

Committee Member

Oscar Gonzalez

Committee Member

David Livingston

Call Number for Print

Special Collections LD4331.E55Q53

Abstract

Hidden Markov models (HMM's) have achieved considerable success for isolated-word speaker-independent automatic speech recognition. However, the performance of an HMM algorithm is limited by its inability to discriminate between similar sounding words. The problem arises because all differences between speech patterns are treated as equally important. Thus the algorithm is particularly susceptible to confusions caused by phonetically-irrelevant differences. This thesis presents two types of preprocessing schemes as candidates for improving HMM performance. The aim is to maximize the differences between phonologically-distinct speech sounds while minimizing the effect of variations in phonologically-equivalent speech sounds. The preprocessors presented are a discrete cosine transformation (OCT) and linear discriminant analysis type transformation (LDA).

The HMM used in this investigation is a five-state, left-to-right structure. All the experiments were performed with either 30 or 99 highly confusable words from a eve isolated-word data base. Computations were performed on UNIX SUN work stations. All words were hand labeled in acoustic-phonetic segments. The DCT preprocessing, terms of a block transform encoding with data-independent basis vectors, was not found to be successful for improving overall word recognition performance. In contrast, the LDA preprocessing method did improve HMM word recognition accuracy. The LDA bas is vectors were computed from signal statistics so as to maximize the ratio of between to within phonetic class data variance. The LDA technique requires phonetically segmented data for training. Using speaker independent word recognition tests, i.e., one set of speakers for training and another set of speakers for testing, the LDA method reduced HMM word errors over 45%. Results show that discrimination between similar sounding words can be greatly improved.

The results of the research conducted in this study not only gives additional insights into the basic operation of hidden Markov modeling for speech recognition, but also could potentially be applied to large vocabulary continuous-speech speaker-independent speech recognition. It shows that significant improvements in speech recognition system performance may be achieved by better acoustic-phonetic modeling.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

DOI

10.25777/05r3-zy80

Share

COinS