Shared more. Cited more. Safe forever.
    • advanced search
    • submit works
    • about
    • help
    • contact us
    • login
    View Item 
    •   MOspace Home
    • University of Missouri-Kansas City
    • School of Graduate Studies (UMKC)
    • Theses and Dissertations (UMKC)
    • Dissertations (UMKC)
    • 2010 Dissertations (UMKC)
    • 2010 UMKC Dissertations - Freely Available Online
    • View Item
    •   MOspace Home
    • University of Missouri-Kansas City
    • School of Graduate Studies (UMKC)
    • Theses and Dissertations (UMKC)
    • Dissertations (UMKC)
    • 2010 Dissertations (UMKC)
    • 2010 UMKC Dissertations - Freely Available Online
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    advanced searchsubmit worksabouthelpcontact us

    Browse

    All of MOspaceCommunities & CollectionsDate IssuedAuthor/ContributorTitleIdentifierThesis DepartmentThesis AdvisorThesis SemesterThis CollectionDate IssuedAuthor/ContributorTitleIdentifierThesis DepartmentThesis AdvisorThesis Semester

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular AuthorsStatistics by Referrer

    Semantic Frameworks for Document and Ontology Clustering

    Tong, Tuanjie
    View/Open
    [PDF] TongSemFraDocOnt.pdf (3.522Mb)
    Date
    2011-01-20
    2010
    Fall 2010
    Format
    Thesis
    Metadata
    [+] Show full item record
    Abstract
    The Internet has made it possible, in principle, for scientists to quickly find research papers of interest. In practice, the overwhelming volume of publications makes this a time consuming task. It is, therefore, important to develop efficient ways to identify related publications. Clustering, a technique used in many fields, is one way to facilitate this. Ontologies can also help in addressing the problem of finding related entities, including research publications. However, the development of new methods of clustering has focused mainly on the algorithm per se, with relatively less emphasis on feature selection and similarity measures. The latter can significantly impact the accuracy of clustering, as well as the runtime of clustering. Also, to fully realize the high resolution searches that ontologies can make possible, an important first step is to find automatic ways to cluster related ontologies. The major contribution of this dissertation is an innovative semantic framework for document clustering, called Citonomy, a dynamic approach that (1) exploits citation semantics of scientific documents, (2) deals with evolving datasets of documents, and (3) addresses the interplay between algorithms, feature selections, and similarity measures in an integrated manner. This improves accuracy and runtime performance over existing clustering algorithms. As the first step in Citonomy, we propose a new approach to extract and build a model for citation semantics. Both subjective and objective evaluations prove the effectiveness of this model in extracting citation semantics. For the clustering stage, the Citonomy framework offers two approaches: (1) CS-VS: Combining Citation Semantics and VSM (Vector Space Model) Measures and (2) CS2CS: From Citation Semantics to Cluster Semantics. CS2CS is a document clustering algorithm with a 3-level feature selection process. It is an improvement over CS-VS in several aspects: i) deleting the requirement of a training step, ii) introducing an advanced feature selection mechanism, and iii) dynamic and adaptive clustering of new datasets. Compared to traditional document clustering, CS-VS and CS2CS significantly improve the accuracy of clustering by 5-15% (on average) in terms of the F-Measure. CS2CS is a linear clustering algorithm that is faster than the common document clustering algorithms K-Means and K-Medoids. In addition, it overcomes a major drawback of K-Means/Medoids algorithms in that the number of clusters can be dynamically determined by splitting and merging clusters. Fuzzy clustering with this approach has also been investigated. The related problem of ontology clustering is also addressed in this dissertation. Another semantics framework, InterOBO, has been designed for ontology clustering. A prototype to demonstrate the potential use of this framework, has been developed. The Open Biomedical Ontologies (OBOs) are used as a case study to illustrate the clustering technique used to identify common concepts and links. Detailed experimental results on different data sets are given to show the merits of the proposed clustering algorithms.
    Table of Contents
    Abstract -- List of Illustrations -- List of Tables -- Acknowledgments -- Introduction -- Review of Literature -- Overall Framework - Citonomy -- CS-VS - Combining Citation Semantics and VSM Mesasures -- CS2CS - From Citation Semantics to Cluster Semantics -- Interobo: A Framework for Knowledge Sharing in Biomedical Domain -- Experimental Results and Discussion -- Summary and Future Work -- Appendix -- Reference List -- Vita.
    URI
    http://hdl.handle.net/10355/9618
    Degree
    Ph.D
    Thesis Department
    Computer Science (UMKC)
     
    Telecommunications and Computer Networking (UMKC)
     
    Collections
    • 2010 UMKC Dissertations - Freely Available Online
    • Computer Science and Electrical Engineering Electronic Theses and Dissertations (UMKC)

    If you encounter harmful or offensive content or language on this site please email us at harmfulcontent@umkc.edu. To learn more read our Harmful Content in Library and Archives Collections Policy.

    Send Feedback
    hosted by University of Missouri Library Systems
     

     


    If you encounter harmful or offensive content or language on this site please email us at harmfulcontent@umkc.edu. To learn more read our Harmful Content in Library and Archives Collections Policy.

    Send Feedback
    hosted by University of Missouri Library Systems