Date of Award
Spring 1996
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Electrical & Computer Engineering
Program/Concentration
Electrical Engineering
Committee Director
Peter L. Silsbee
Committee Member
Stephen A. Zahorian
Committee Member
Martin D. Meyer
Call Number for Print
Special Collections LD4331.E55 S82
Abstract
An audiovisual semi-continuous hidden Markov model (HMM)-based Automatic Speech Recognition (ASR) system and an improved method of integrating audio and visual information in an audiovisual discrete HMM-based ASR system are investigated.
In the audiovisual discrete HMM, an adaptive integration formulation is employed, which incorporates the integration into the HMM at a pre-categorical stage. A visual weighting parameter is determined automatically, which allows the relative contribution of audio and visual information to be adjusted adaptively. Using an adaptive weight, the accuracy increased by 13% compared to the same model with no adaptive weight.
The semi-continuous HMM is a class of models which includes both discrete and continuous mixture HMMs as its special forms and unifies vector quantization (VQ) the discrete HMM, and the continuous mixture HMM. It reduces the vector quantization distortion of discrete HMMs by using continuous output probability density functions represented by a combination of the discrete output probabilities of the model and the continuous Gaussian probability density functions (pdfs) associated with each VQ symbol. The parameters of the vector quantization codebook and the HMM can be optimized together to achieve a unified modeling approach. Experimental results show that the recognition performance could be improved significantly in a relatively low signal-to-noise ratio environment by using the semi-continuous HMMs and the accuracy could be increased 9% on average. The modified Gaussian pdfs, which are Gaussian within two standard deviation but have "heavier" Laplacian tails, is first used in the classification procedure of the semicontinuous HMM in this thesis. It is an efficient way to compensate for inadequate training of Gaussian pdfs, or different testing and training environments.
Rights
In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
DOI
10.25777/0jjj-1y69
Recommended Citation
Su, Qin.
"Adaptive Integration of Audio and Visual Information Using Discrete and Semi-Continuous Hidden Markov Models in Audiovisual Automatic Speech Recognition"
(1996). Master of Science (MS), Thesis, Electrical & Computer Engineering, Old Dominion University, DOI: 10.25777/0jjj-1y69
https://digitalcommons.odu.edu/ece_etds/532
Included in
Probability Commons, Signal Processing Commons, Systems and Communications Commons, Theory and Algorithms Commons