Date of Award

Summer 2005

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Electrical & Computer Engineering

Program/Concentration

Electrical Engineering

Committee Director

Stephen A. Zahorian

Committee Member

W. Steven Gray

Committee Member

Vijayan Asari

Call Number for Print

Special Collections LD4331.E55 P73 2005

Abstract

Automatic speech recognizers perform poorly when training and test data are systematically different in terms of noise and channel characteristics. One manifestation of such differences is variations in the probability density functions (pdfs) between training and test features. Consequently, both automatic speech recognition and automatic speaker identification may be severely degraded. Previous attempts to mm1m1ze this problem include Cepstral Mean and Variance Normalization and transforming all speech features to a uni-variate Gaussian pdf. In this thesis, two techniques are presented for non-linearly scaling speech features to fit them to a target pdf - the first is based on the principles of Histogram matching (a commonly employed algorithm in image contrast enhancement applications) and the second is based on principles of quantile based Cumulative Density Function (CDF) matching for data drawn from different distributions. These methods can be used to compensate for the systematic marginal (i.e. each feature considered individually) differences between training and test features. For a more complete, multi-dimensional restoration of feature statistics, a linear (matrix) transformation is proposed, mapping the noisy feature space to the corresponding clean space. The matrix used for this global transformation is learned in a least squares sense from stereo training data - comprised of speech recorded simultaneously in clean and noisy conditions. We further propose a linear covariance normalization technique to compensate for differences in covariance properties between training and test data. Experimental results are given that illustrate the benefits of these algorithms for speech recognition and automatic speaker identification.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

DOI

10.25777/w1s9-ht79

Share

COinS