dc.contributor.advisor | Zhao, Yunxin | eng |
dc.contributor.author | Zhang, Xiaojia, 1977- | eng |
dc.date.issued | 2005 | eng |
dc.date.submitted | 2005 Fall | eng |
dc.description | The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. | eng |
dc.description | Title from title screen of research.pdf file viewed on (January 11, 2007) | eng |
dc.description | Includes bibliographical references. | eng |
dc.description | Vita. | eng |
dc.description | Thesis (M.S.) University of Missouri-Columbia 2005. | eng |
dc.description | Dissertations, Academic -- University of Missouri--Columbia -- Computer science. | eng |
dc.description.abstract | Standard statistic n-gram language models play a critical and indispensable role in automatic speech recognition (ASR) applications. Though helpful to ASR, it suffers from a practical problem when lacking sufficient in-domain training data that come from same or similar sources as the task text. In order to improve language model performance, various datasets need to be used to supplement the in-domain training data. This thesis investigates effective approaches to language modeling for telehealth which consists of doctor-patient conversation speech in medical specialty domain. Efforts were made to collect and analyze various datasets for training as well as to find a method for modeling target language. By effectively defining word classes, and by combining class and word trigram language models trained separately from in-domain and out-of-domain datasets, large improvements were achieved in perplexity reduction over a baseline word trigram language model that simply interpolates word trigram models trained from different data sources. | eng |
dc.identifier.merlin | b57501750 | eng |
dc.identifier.uri | http://hdl.handle.net/10355/4245 | |
dc.language | English | eng |
dc.publisher | University of Missouri--Columbia | eng |
dc.relation.ispartofcommunity | University of Missouri--Columbia. Graduate School. Theses and Dissertations | eng |
dc.subject.lcsh | Automatic speech recognition | eng |
dc.subject.lcsh | Medical telematics -- Mathematical models | eng |
dc.title | Language modeling for automatic speech recognition in telehealth | eng |
dc.type | Thesis | eng |
thesis.degree.discipline | Computer science (MU) | eng |
thesis.degree.grantor | University of Missouri--Columbia | eng |
thesis.degree.level | Masters | eng |
thesis.degree.name | M.S. | eng |