Date of Award

Summer 2000

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical/Computer Engineering

Committee Director

Stephen A. Zahorian

Committee Member

John W. Stoughton

Committee Member

Oscar R. Gonzalez

Committee Member

Wu Li

Abstract

This dissertation presents an investigation of non-uniform time sampling methods for spectral/temporal feature extraction in speech. Frame-based features were computed based on an encoding of the global spectral shape using a Discrete Cosine Transform. In most current “standard” methods, trajectory (dynamic) features are determined from frame-based parameters using a fixed time sampling, i.e., fixed block length and fixed block spacing. In this research, new methods are proposed and investigated in which block length and/or block spacing are variable. The idea was initially tested with HMM-based isolated word recognition, and a significant performance improvement resulted when a variable block length and variable block method were applied. An accuracy of 97.9% was obtained with an alphabet recognition task using the ISOLET database. This result is by far the highest reported in the literature. The variable block length method was then adapted to accommodate the complexity of continuous speech. Three methods were proposed and each was tested with the TIMIT and NTIMIT databases using HMM recognizers. Phone recognition experiments were conducted using the standard 39 phone set. Tuning of parameters was achieved with monophone models using a simple HMM configuration. The methods were also evaluated with more complex models, such as models with more mixture components, models with a full covariance matrix and right-context biphone models. Experimental results indicated that none of the proposed methods perform significantly better than the standard method. However, the absolute best result obtained with the proposed front end is comparable to those obtained with current state-of-the-art systems. Also, the performance achieved with monophone models is favorable to many context-dependent systems which are more complex.

DOI

10.25777/zkn6-a793

ISBN

9780599965584

Share

COinS