[-] Show simple item record

dc.contributor.advisorLee, Yugyung, 1960-eng
dc.contributor.authorTong, Tuanjieeng
dc.date.issued2011-01-20eng
dc.date.issued2010eng
dc.date.issuedFall 2010eng
dc.descriptionTitle from PDF of title page, viewed on January 20, 2011.eng
dc.descriptionDissertation advisor: Yugyung Lee.eng
dc.descriptionVita.eng
dc.descriptionIncludes bibliographic references (pages 194-202).eng
dc.descriptionDissertation (Ph.D)--School of Computing and Engineering. University of Missouri--Kansas City, 2010.eng
dc.description.abstractThe Internet has made it possible, in principle, for scientists to quickly find research papers of interest. In practice, the overwhelming volume of publications makes this a time consuming task. It is, therefore, important to develop efficient ways to identify related publications. Clustering, a technique used in many fields, is one way to facilitate this. Ontologies can also help in addressing the problem of finding related entities, including research publications. However, the development of new methods of clustering has focused mainly on the algorithm per se, with relatively less emphasis on feature selection and similarity measures. The latter can significantly impact the accuracy of clustering, as well as the runtime of clustering. Also, to fully realize the high resolution searches that ontologies can make possible, an important first step is to find automatic ways to cluster related ontologies. The major contribution of this dissertation is an innovative semantic framework for document clustering, called Citonomy, a dynamic approach that (1) exploits citation semantics of scientific documents, (2) deals with evolving datasets of documents, and (3) addresses the interplay between algorithms, feature selections, and similarity measures in an integrated manner. This improves accuracy and runtime performance over existing clustering algorithms. As the first step in Citonomy, we propose a new approach to extract and build a model for citation semantics. Both subjective and objective evaluations prove the effectiveness of this model in extracting citation semantics. For the clustering stage, the Citonomy framework offers two approaches: (1) CS-VS: Combining Citation Semantics and VSM (Vector Space Model) Measures and (2) CS2CS: From Citation Semantics to Cluster Semantics. CS2CS is a document clustering algorithm with a 3-level feature selection process. It is an improvement over CS-VS in several aspects: i) deleting the requirement of a training step, ii) introducing an advanced feature selection mechanism, and iii) dynamic and adaptive clustering of new datasets. Compared to traditional document clustering, CS-VS and CS2CS significantly improve the accuracy of clustering by 5-15% (on average) in terms of the F-Measure. CS2CS is a linear clustering algorithm that is faster than the common document clustering algorithms K-Means and K-Medoids. In addition, it overcomes a major drawback of K-Means/Medoids algorithms in that the number of clusters can be dynamically determined by splitting and merging clusters. Fuzzy clustering with this approach has also been investigated. The related problem of ontology clustering is also addressed in this dissertation. Another semantics framework, InterOBO, has been designed for ontology clustering. A prototype to demonstrate the potential use of this framework, has been developed. The Open Biomedical Ontologies (OBOs) are used as a case study to illustrate the clustering technique used to identify common concepts and links. Detailed experimental results on different data sets are given to show the merits of the proposed clustering algorithms.eng
dc.description.tableofcontentsAbstract -- List of Illustrations -- List of Tables -- Acknowledgments -- Introduction -- Review of Literature -- Overall Framework - Citonomy -- CS-VS - Combining Citation Semantics and VSM Mesasures -- CS2CS - From Citation Semantics to Cluster Semantics -- Interobo: A Framework for Knowledge Sharing in Biomedical Domain -- Experimental Results and Discussion -- Summary and Future Work -- Appendix -- Reference List -- Vita.eng
dc.format.extentxiv, 203 pageseng
dc.identifier.urihttp://hdl.handle.net/10355/9618eng
dc.publisherUniversity of Missouri--Kansas Cityeng
dc.subjectCitation semanticseng
dc.subject.lcshSemantics -- Data processingeng
dc.subject.lcshDocument clusteringeng
dc.subject.lcshOntologies (Information retrieval)eng
dc.subject.lcshSemantic integration (Computer systems)eng
dc.subject.lcshSemantic networks (Information theory)eng
dc.subject.lcshCluster analysis -- Data processingeng
dc.subject.otherDissertation -- University of Missouri--Kansas City -- Computer science.eng
dc.titleSemantic Frameworks for Document and Ontology Clusteringeng
dc.typeThesiseng
thesis.degree.disciplineComputer Science (UMKC)eng
thesis.degree.disciplineTelecommunications and Computer Networking (UMKC)eng
thesis.degree.grantorUniversity of Missouri--Kansas Cityeng
thesis.degree.levelDoctoraleng
thesis.degree.namePh.Deng


Files in this item

[PDF]

This item appears in the following Collection(s)

[-] Show simple item record