Statistical optimization of acoustic models for large vocabulary speech recognition

Hu, Rusheng, 1971-

Hu, Rusheng, 1971-

View/Open

public.pdf (2.065Kb)

short.pdf (12.81Kb)

research.pdf (1.041Mb)

Date

2006

Format

Thesis

Metadata

[+] Show full item record

Abstract

This dissertation investigates optimization of acoustic models in speech recognition. Two new optimization methods are proposed for phonetic decision tree (PDT) search and Hidden Markov modeling (HMM)-- the knowledge-based adaptive PDT algorithm and the HMM gradient boosting algorithm. Investigations are conducted to applying both methods to improve word error rate of the state-of-the-art speech recognition system. However, these two methods are developed in a general machine learning background and their applications are not limited to speech recognition. The HMM gradient boosting method is based on a function approximation scheme from the perspective of optimization in function space rather than the parameter space, based on the fact that the Gaussian mixture model in each HMM state is an additive model of homogeneous functions (Gaussians). It provides a new scheme which can jointly optimize model structure and parameters. Experiments are conducted on the World Street Journal (WSJ) task and good improvements on word error rate are observed. The knowledge-based adaptive PDT algorithm is developed under a trend toward knowledge-based systems and aims at optimizing the mapping from contextual phones to articulatory states by maximizing implicit usage of the phonological and phonetic information, which is presumed to be contained in large data corpus. A computational efficient algorithm is developed to incorporate this prior knowledge in PDT construction. This algorithm is evaluated on the Telehealth conversational speech recognition and significant improvement on system performance is achieved.

URI

https://doi.org/10.32469/10355/4329
https://hdl.handle.net/10355/4329

Degree

Ph. D.

Thesis Department

Computer science (MU)

Rights

OpenAccess.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License. Copyright held by author.