[-] Show simple item record

dc.contributor.advisorZhao, Yunxineng
dc.contributor.authorZhang, Xiaojia, 1977-eng
dc.date.issued2005eng
dc.date.submitted2005 Falleng
dc.descriptionThe entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file.eng
dc.descriptionTitle from title screen of research.pdf file viewed on (January 11, 2007)eng
dc.descriptionIncludes bibliographical references.eng
dc.descriptionVita.eng
dc.descriptionThesis (M.S.) University of Missouri-Columbia 2005.eng
dc.descriptionDissertations, Academic -- University of Missouri--Columbia -- Computer science.eng
dc.description.abstractStandard statistic n-gram language models play a critical and indispensable role in automatic speech recognition (ASR) applications. Though helpful to ASR, it suffers from a practical problem when lacking sufficient in-domain training data that come from same or similar sources as the task text. In order to improve language model performance, various datasets need to be used to supplement the in-domain training data. This thesis investigates effective approaches to language modeling for telehealth which consists of doctor-patient conversation speech in medical specialty domain. Efforts were made to collect and analyze various datasets for training as well as to find a method for modeling target language. By effectively defining word classes, and by combining class and word trigram language models trained separately from in-domain and out-of-domain datasets, large improvements were achieved in perplexity reduction over a baseline word trigram language model that simply interpolates word trigram models trained from different data sources.eng
dc.identifier.merlinb57501750eng
dc.identifier.urihttp://hdl.handle.net/10355/4245
dc.languageEnglisheng
dc.publisherUniversity of Missouri--Columbiaeng
dc.relation.ispartofcommunityUniversity of Missouri--Columbia. Graduate School. Theses and Dissertationseng
dc.subject.lcshAutomatic speech recognitioneng
dc.subject.lcshMedical telematics -- Mathematical modelseng
dc.titleLanguage modeling for automatic speech recognition in telehealtheng
dc.typeThesiseng
thesis.degree.disciplineComputer science (MU)eng
thesis.degree.grantorUniversity of Missouri--Columbiaeng
thesis.degree.levelMasterseng
thesis.degree.nameM.S.eng


Files in this item

[PDF]
[PDF]
[PDF]

This item appears in the following Collection(s)

[-] Show simple item record