Date of Award

Summer 1985

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Electrical & Computer Engineering

Program/Concentration

Electrical Engineering

Committee Director

Stephen A. Zahorian

Committee Member

Jack Stoughton

Committee Member

Sharad V. Kanetkar

Call Number for Print

Special Collections LD4331.E55J33

Abstract

The objective of this research was to develop a transformation for mapping speech parameters to color parameter. This transformation is done in real-time, and the resulting color parameter are continuously displayed on a color monitor. This visual speech display is to be used as a speech articulation training aid for the deaf. The conversion of speech acoustic signals into speech parameter was accomplished using special -purpose electronics. The real-time conversion of speech parameter to display parameter was controlled by an 8086/8088 microprocessor operating in an S-100 bus structure. The coefficients of the Karhunen-Loeve series expansion of speech power spectra were used to encode speech into a set of parameter called principal-components. Each principal components is obtained as a linear combination of 16 spectral band energies. The focus of this research was to optimize the method for computing principal components for use with the visual speech display and to determine an optimal transformation from principal components to color parameters.

A series of experiments was completed to determine the principal-components basis vectors for both non-normalized and amplitude-normalized speech spectra. These basis vectors, determined from the statistical properties of the continuous speech of both male and female speakers, were found to be relatively speaker independent. In order to restrict the scope of the research to a specific objective, the transformation of speech parameters to color parameters was optimized for vowels. Clustering experiments of vowels in principal-components spaces showed that vowels are more clustered when level-normalized spectral band energies are used to compute principal-components parameters . However, implementation of a set of level-normalized spectral band energies was not feasible with the available hardware, because of the requirements for real-time operation. Therefore, the transformation from vowels to colors was based on the principal-component parameters obtained from non-normalized spectral band energies, although better results are expected if level-normalized spectral band energies are used to calculate the principal components.

A linear transformation was determined such that the three widely separated vowels /a/, as in hod, /i/, as in heed, and /u/, as in who'd, result in the three widely separated colors red, green, and blue respectively. A real-time flow-mode display of color patterns derived from speech sounds was implemented. A preliminary evaluation of the display indicates that many vowel sounds can be reliably identified by their visual display. Although separate transformations can be used for different speakers, a single fixed transformation appears adequate for males, females, and children.

Rights

In Copyright. URI: http://rightsstatements.org/vocab/InC/1.0/ This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).

DOI

10.25777/rszk-8x55

Share

COinS