Date of Award

Spring 2004

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Electrical & Computer Engineering

Program/Concentration

Electrical Engineering

Committee Director

Stephen A. Zahorian

Committee Member

Vijayan K. Asari

Committee Member

Min Song

Call Number for Print

Special Collections LD4331.C65 W359 2004

Abstract

The context of this thesis work is the improvement of automatic speech recognition (ASR) for use with digital libraries. First, commonly used multimedia file formats and codecs are surveyed with the objective of identifying those formats that preserve speech quality while keeping file sizes compact. The main contribution of the work is a new technique for speaker adaptation based on frequency scale modifications. The frequency scale is modified using a minimum mean square error matching of a spectral template for each speaker to a "typical speaker" spectral template. Each spectral template is computed from the average amplitude-normalized spectra of several seconds of the voiced portions of an utterance of a speaker. The advantages of the new technique include the relatively small amount of speech needed to form each spectral template, the text independence of the method, and the overall computational simplicity. Of several parameters investigated for implementing the spectral matching, two parameters, the low frequency limit and high frequency limit, were found to be the most effective. Generally, the improvements due to the speaker normalization were small. However, it was determined that the normalization could compensate for the primary differences between male and female speakers. Furthermore, adjustment of the frequency scale parameters based on a neural network classifier, resulted in large improvements in vowel classification accuracy, thus indicating that frequency scale modifications can be used to obtain better ASR performance.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

DOI

10.25777/cxva-sk36

Recommended Citation

Wang, Wei. "Speaker Normalization for Improved Automatic Speech Recognition for Digital Libraries" (2004). Master of Science (MS), Thesis, Electrical & Computer Engineering, Old Dominion University, DOI: 10.25777/cxva-sk36
https://digitalcommons.odu.edu/ece_etds/558

Download

Included in

Computational Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Electrical & Computer Engineering Theses & Dissertations

Speaker Normalization for Improved Automatic Speech Recognition for Digital Libraries

Date of Award

Document Type

Degree Name

Department

Program/Concentration

Committee Director

Committee Member

Committee Member

Call Number for Print

Abstract

Rights

DOI

Recommended Citation

Included in

Search

Browse

Contribute

Links

Contact Us

Electrical & Computer Engineering Theses & Dissertations

Speaker Normalization for Improved Automatic Speech Recognition for Digital Libraries

Author

Date of Award

Document Type

Degree Name

Department

Program/Concentration

Committee Director

Committee Member

Committee Member

Call Number for Print

Abstract

Rights

DOI

Recommended Citation

Included in

Share

Search

Browse

Contribute

Links

Contact Us