Date of Award

Spring 1998

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Electrical & Computer Engineering

Program/Concentration

Electrical Engineering

Committee Director

Stephen A. Zahorian

Committee Member

John W. Stoughton

Committee Member

Peter L. Silsbee

Call Number for Print

Special Collections LD4331.E55 W36

Abstract

In this thesis an approach for efficiently computing a compact spectral/temporal feature set for representing a segment of speech, with effective resolution depending on both frequency and time position within the segment, is developed, analyzed, and tested. The goal of this method is to mimic the resolution properties of the human auditory system, but using a computationally efficient FFT-based front end rather than a more complex auditory model. In particular this method applies both frequency and time "warping" to FFT spectra to obtain good frequency resolution at low frequencies and good time resolution at high frequencies. Time resolution is also varied so that the center of the segment is better represented than the endpoints. The resolution can be varied by the selection of "warping" functions controlled using a small number of parameters. The method was experimentally verified for both phonetic classification and isolated word recognition. Results of 81.2% for the six stops /b, d, g, p, t, k/ are among the best reported in the literature. Finally, the ASM was implemented in a Visual Speech Display system for recognition of consonant-vowel-consonant (CVC) words in-real time.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

DOI

10.25777/v9my-5591

Share

COinS