Statistical optimization of acoustic models for large vocabulary speech recognition

MOspace/Manakin Repository

Breadcrumbs Navigation

Statistical optimization of acoustic models for large vocabulary speech recognition

Please use this identifier to cite or link to this item: http://hdl.handle.net/10355/4329

[+] show full item record


Title: Statistical optimization of acoustic models for large vocabulary speech recognition
Author: Hu, Rusheng, 1971-
Keywords: phonetic decision tree.
phonetic decision tree
Date: 2006
Publisher: University of Missouri--Columbia
Abstract: This dissertation investigates optimization of acoustic models in speech recognition. Two new optimization methods are proposed for phonetic decision tree (PDT) search and Hidden Markov modeling (HMM)-- the knowledge-based adaptive PDT algorithm and the HMM gradient boosting algorithm. Investigations are conducted to applying both methods to improve word error rate of the state-of-the-art speech recognition system. However, these two methods are developed in a general machine learning background and their applications are not limited to speech recognition. The HMM gradient boosting method is based on a function approximation scheme from the perspective of optimization in function space rather than the parameter space, based on the fact that the Gaussian mixture model in each HMM state is an additive model of homogeneous functions (Gaussians). It provides a new scheme which can jointly optimize model structure and parameters. Experiments are conducted on the World Street Journal (WSJ) task and good improvements on word error rate are observed. The knowledge-based adaptive PDT algorithm is developed under a trend toward knowledge-based systems and aims at optimizing the mapping from contextual phones to articulatory states by maximizing implicit usage of the phonological and phonetic information, which is presumed to be contained in large data corpus. A computational efficient algorithm is developed to incorporate this prior knowledge in PDT construction. This algorithm is evaluated on the Telehealth conversational speech recognition and significant improvement on system performance is achieved.
URI: http://hdl.handle.net/10355/4329
Other Identifiers: HuR-112706-D5446

This item appears in the following Collection(s)

[+] show full item record