Date of Award
Summer 2006
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Electrical & Computer Engineering
Program/Concentration
Electrical and Computer Engineering
Committee Director
Stephen A. Zahorian
Committee Director
Vijayan K. Asari
Committee Member
Oscar R. Gonzalez
Call Number for Print
LD4331.E55 C353 2006
Abstract
Automatic Speaker Recognition is the process of automatically recognizing who is speaking on the basis of individual information contained in speech signals. This technique of Automatic Speaker Recognition makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.
In this thesis, the techniques of Gaussian Mixture Models and Neural Networks for Automatic Speaker Identification are presented. Algorithms for Speaker Identification using Gaussian Mixture Models were developed, using both full covariance matrices and diagonal covariance matrices and were tested on the NTIMIT and SPIDRE databases. Experiments were also conducted using the existing neural network — Binary-Pair Partitioned on the NTIMIT and SPIDRE databases.
GMMs are trained with the maximum likelihood approach to give good models for each speaker. In contrast, NNs are trained to discriminate between speakers but form no explicit models for each speaker. It is conjectured that an appropriate combination of maximum likelihood and discriminative training, using both GMMs and NNs should give a better recognition rate than either method alone. Thus, fusion approaches to combine the Gaussian Mixture Models and neural networks for Speaker ID are proposed and tested for various cases. Comparisons of the best results obtained with each method were made for all the cases that were evaluated.
From the comparison of the results obtained with the GMMs alone, NNs alone, and combined GMM/NN, it was observed that the results obtained with GMM alone yielded the best recognition accuracy. The best accuracy using the NTIMIT database was 89.2% on the test data, and the best accuracy using the SPIDRE database was 86.7% on the test data, both results obtained using GMMs alone.
Rights
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
DOI
10.25777/9b7q-mw34
Recommended Citation
Chalkapally, Usha G..
"Gaussian Mixture Models and Neural Networks for Automatic Speaker Identification"
(2006). Master of Science (MS), Thesis, Electrical & Computer Engineering, Old Dominion University, DOI: 10.25777/9b7q-mw34
https://digitalcommons.odu.edu/ece_etds/304
Included in
Computer Engineering Commons, Databases and Information Systems Commons, Signal Processing Commons, Speech and Hearing Science Commons