Date of Award
Summer 2005
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Electrical & Computer Engineering
Program/Concentration
Electrical Engineering
Committee Director
Stephen A. Zahorian
Committee Member
W. Steven Gray
Committee Member
Vijayan Asari
Call Number for Print
Special Collections LD4331.E55 P73 2005
Abstract
Automatic speech recognizers perform poorly when training and test data are systematically different in terms of noise and channel characteristics. One manifestation of such differences is variations in the probability density functions (pdfs) between training and test features. Consequently, both automatic speech recognition and automatic speaker identification may be severely degraded. Previous attempts to mm1m1ze this problem include Cepstral Mean and Variance Normalization and transforming all speech features to a uni-variate Gaussian pdf. In this thesis, two techniques are presented for non-linearly scaling speech features to fit them to a target pdf - the first is based on the principles of Histogram matching (a commonly employed algorithm in image contrast enhancement applications) and the second is based on principles of quantile based Cumulative Density Function (CDF) matching for data drawn from different distributions. These methods can be used to compensate for the systematic marginal (i.e. each feature considered individually) differences between training and test features. For a more complete, multi-dimensional restoration of feature statistics, a linear (matrix) transformation is proposed, mapping the noisy feature space to the corresponding clean space. The matrix used for this global transformation is learned in a least squares sense from stereo training data - comprised of speech recorded simultaneously in clean and noisy conditions. We further propose a linear covariance normalization technique to compensate for differences in covariance properties between training and test data. Experimental results are given that illustrate the benefits of these algorithms for speech recognition and automatic speaker identification.
Rights
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
DOI
10.25777/w1s9-ht79
Recommended Citation
Prasad, Saurabh.
"Non-Linear and Linear Transformations of Features for Robust Speech Recognition and Speaker Identification"
(2005). Master of Science (MS), Thesis, Electrical & Computer Engineering, Old Dominion University, DOI: 10.25777/w1s9-ht79
https://digitalcommons.odu.edu/ece_etds/475