Date of Award

Summer 2006

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Electrical & Computer Engineering

Program/Concentration

Electrical and Computer Engineering

Committee Director

Stephen A. Zahorian

Committee Director

Vijayan K. Asari

Committee Member

Oscar R. Gonzalez

Call Number for Print

LD4331.E55 C353 2006

Abstract

Automatic Speaker Recognition is the process of automatically recognizing who is speaking on the basis of individual information contained in speech signals. This technique of Automatic Speaker Recognition makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.

In this thesis, the techniques of Gaussian Mixture Models and Neural Networks for Automatic Speaker Identification are presented. Algorithms for Speaker Identification using Gaussian Mixture Models were developed, using both full covariance matrices and diagonal covariance matrices and were tested on the NTIMIT and SPIDRE databases. Experiments were also conducted using the existing neural network — Binary-Pair Partitioned on the NTIMIT and SPIDRE databases.

GMMs are trained with the maximum likelihood approach to give good models for each speaker. In contrast, NNs are trained to discriminate between speakers but form no explicit models for each speaker. It is conjectured that an appropriate combination of maximum likelihood and discriminative training, using both GMMs and NNs should give a better recognition rate than either method alone. Thus, fusion approaches to combine the Gaussian Mixture Models and neural networks for Speaker ID are proposed and tested for various cases. Comparisons of the best results obtained with each method were made for all the cases that were evaluated.

From the comparison of the results obtained with the GMMs alone, NNs alone, and combined GMM/NN, it was observed that the results obtained with GMM alone yielded the best recognition accuracy. The best accuracy using the NTIMIT database was 89.2% on the test data, and the best accuracy using the SPIDRE database was 86.7% on the test data, both results obtained using GMMs alone.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

DOI

10.25777/9b7q-mw34

Recommended Citation

Chalkapally, Usha G.. "Gaussian Mixture Models and Neural Networks for Automatic Speaker Identification" (2006). Master of Science (MS), Thesis, Electrical & Computer Engineering, Old Dominion University, DOI: 10.25777/9b7q-mw34
https://digitalcommons.odu.edu/ece_etds/304

Download

Included in

Computer Engineering Commons, Databases and Information Systems Commons, Signal Processing Commons, Speech and Hearing Science Commons

COinS

Electrical & Computer Engineering Theses & Dissertations

Gaussian Mixture Models and Neural Networks for Automatic Speaker Identification

Date of Award

Document Type

Degree Name

Department

Program/Concentration

Committee Director

Committee Director

Committee Member

Call Number for Print

Abstract

Rights

DOI

Recommended Citation

Included in

Search

Browse

Contribute

Links

Contact Us

Electrical & Computer Engineering Theses & Dissertations

Gaussian Mixture Models and Neural Networks for Automatic Speaker Identification

Author

Date of Award

Document Type

Degree Name

Department

Program/Concentration

Committee Director

Committee Director

Committee Member

Call Number for Print

Abstract

Rights

DOI

Recommended Citation

Included in

Share

Search

Browse

Contribute

Links

Contact Us