Date of Award
Spring 2004
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Electrical & Computer Engineering
Program/Concentration
Electrical Engineering
Committee Director
Stephen A. Zahorian
Committee Member
Vijayan K. Asari
Committee Member
Min Song
Call Number for Print
Special Collections LD4331.C65 W359 2004
Abstract
The context of this thesis work is the improvement of automatic speech recognition (ASR) for use with digital libraries. First, commonly used multimedia file formats and codecs are surveyed with the objective of identifying those formats that preserve speech quality while keeping file sizes compact. The main contribution of the work is a new technique for speaker adaptation based on frequency scale modifications. The frequency scale is modified using a minimum mean square error matching of a spectral template for each speaker to a "typical speaker" spectral template. Each spectral template is computed from the average amplitude-normalized spectra of several seconds of the voiced portions of an utterance of a speaker. The advantages of the new technique include the relatively small amount of speech needed to form each spectral template, the text independence of the method, and the overall computational simplicity. Of several parameters investigated for implementing the spectral matching, two parameters, the low frequency limit and high frequency limit, were found to be the most effective. Generally, the improvements due to the speaker normalization were small. However, it was determined that the normalization could compensate for the primary differences between male and female speakers. Furthermore, adjustment of the frequency scale parameters based on a neural network classifier, resulted in large improvements in vowel classification accuracy, thus indicating that frequency scale modifications can be used to obtain better ASR performance.
Rights
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
DOI
10.25777/cxva-sk36
Recommended Citation
Wang, Wei.
"Speaker Normalization for Improved Automatic Speech Recognition for Digital Libraries"
(2004). Master of Science (MS), Thesis, Electrical & Computer Engineering, Old Dominion University, DOI: 10.25777/cxva-sk36
https://digitalcommons.odu.edu/ece_etds/558
Included in
Computational Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons