Shared more. Cited more. Safe forever.
    • advanced search
    • submit works
    • about
    • help
    • contact us
    • login
    View Item 
    •   MOspace Home
    • University of Missouri-Columbia
    • Graduate School - MU Theses and Dissertations (MU)
    • Theses and Dissertations (MU)
    • Theses (MU)
    • 2020 Theses (MU)
    • 2020 MU theses - Access restricted to UM
    • View Item
    •   MOspace Home
    • University of Missouri-Columbia
    • Graduate School - MU Theses and Dissertations (MU)
    • Theses and Dissertations (MU)
    • Theses (MU)
    • 2020 Theses (MU)
    • 2020 MU theses - Access restricted to UM
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    advanced searchsubmit worksabouthelpcontact us

    Browse

    All of MOspaceCommunities & CollectionsDate IssuedAuthor/ContributorTitleIdentifierThesis DepartmentThesis AdvisorThesis SemesterThis CollectionDate IssuedAuthor/ContributorTitleIdentifierThesis DepartmentThesis AdvisorThesis Semester

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular AuthorsStatistics by Referrer

    Selecting data for multilingual multi-domain neural machine translation on low resource languages

    Graham, Heather Mackenzie
    View/Open
    [PDF] GrahamHeatherRelease.pdf (6.599Mb)
    Date
    2020
    Format
    Thesis
    Metadata
    [+] Show full item record
    Abstract
    [ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] While machine translation has achieved impressive results on the world's most widely spoken languages, thousands of languages do not have the quantity of data necessary to train a state-of-the-art system. We propose here a technique to identify the best available datasets for augmentation in many-to-one multilingual neural machine translation systems by quantifying the factors that most affect translation performance - data set domain, relation between source side languages, translation quality, and data set size. Previous research has considered these factors qualitatively and in isolation of each other, but selecting an augmenting data set from various possibilities requires a quantitative synthesis of all these factors. We evaluate a number of techniques to measure each of these factors and learn a system combining them. The focus is on the Luyia languages of western Kenya as a case study for an extreme low resource scenario, but the application of these techniques to similar languages is also explored.
    URI
    https://hdl.handle.net/10355/78189
    Degree
    M.S.
    Thesis Department
    Computer science (MU)
    Rights
    Access to files is limited to the campuses of the University of Missouri
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License. Copyright held by author.
    Collections
    • Computer Science electronic theses and dissertations (MU)
    • 2020 MU theses - Access restricted to UM

    Send Feedback
    hosted by University of Missouri Library Systems
     

     


    Send Feedback
    hosted by University of Missouri Library Systems