Ontology-based methods for disease similarity estimation and drug repositioning
Abstract
Human genome sequencing and new biological data generation techniques have provided an opportunity to uncover mechanisms in human disease. Using gene-disease data, recent research has increasingly shown that many seemingly dissimilar diseases have similar/common molecular mechanisms. Understanding similarity between diseases aids in early disease diagnosis and development of new drugs. The growing collection of gene-function and gene-disease data has instituted a need for formal knowledge representation in order to extract information. Ontologies have been successfully applied to represent such knowledge, and data mining techniques have been applied on them to extract information. Informatics methods can be used with ontologies to find similarity between diseases which can yield insight into how they are caused. This can lead to therapies which can actually cure diseases rather than merely treating symptoms. Estimating disease similarity solely on the basis of shared genes can be misleading as variable combinations of genes may be associated with similar diseases, especially for complex diseases. This deficiency can be potentially overcome by looking for common or similar biological processes rather than only explicit gene matches between diseases. The use of semantic similarity between biological processes to estimate disease similarity could enhance the identification and characterization of disease similarity besides indentifying novel biological processes involved in the diseases. Also, if diseases have similar molecular mechanisms, then drugs that are currently being used could potentially be used against diseases beyond their original indication. This can greatly benefit patients with diseases that do not have adequate therapies especially people with rare diseases. This can also drastically reduce healthcare costs as development of new drugs is far more expensive than re-using existing ones. In this research we present functions to measure similarity between terms in an ontology, and between entities annotated with terms drawn from the ontology, based on both co-occurrence and information content. The new similarity measure is shown to outperform existing methods using biological pathways. The similarity measure is then used to estimate similarity among diseases using the biological processes involved in them and is evaluated using a manually curated and external datasets with known disease similarities. Further, we use ontologies to encode diseases, drugs and biological processes and demonstrate a method that uses a network-based algorithm to combine biological data about diseases with drug information to find new uses for existing drugs. The effectiveness of the method is demonstrated by comparing the predicted new disease-drug pairs with existing drug-related clinical trials.
Table of Contents
Introduction and motivation -- Ontologies in biomedical domain -- Methods to compute ontological similarity -- Proposed approach for ontological term similarity -- Augmentation of vocabulary and annotation in ontologies -- Estimation of disease similarity -- Use of ontologies for drug repositioning -- Future directions-perspective from pharmaceutical industry -- Appendix 1. Table for the ontological similarity scores -- Appendix 2. Test set of 200 records for evaluating mapping of disease text to Disease Ontology -- Appendix 3. Curated set of disease similarities used as the benchmark set -- Appendix 4. F-scores for different combinations of Score-Pvalues and GO-Process-Pvalues for PSB estimates of disease similarity -- Appendix 5. Test set formed from opinions of medical residents http://rxinformatics.umn.edu/SemanticRelatednessResources.html -- Appendix 6. Drug repositioning candidates
Degree
Ph.D.