Improving disease gene prioritization using the semantic similarity of Gene Ontology terms

Bioinformatics. 2010 Sep 15;26(18):i561-7. doi: 10.1093/bioinformatics/btq384.

Abstract

Motivation: Many hereditary human diseases are polygenic, resulting from sequence alterations in multiple genes. Genomic linkage and association studies are commonly performed for identifying disease-related genes. Such studies often yield lists of up to several hundred candidate genes, which have to be prioritized and validated further. Recent studies discovered that genes involved in phenotypically similar diseases are often functionally related on the molecular level.

Results: Here, we introduce MedSim, a novel approach for ranking candidate genes for a particular disease based on functional comparisons involving the Gene Ontology. MedSim uses functional annotations of known disease genes for assessing the similarity of diseases as well as the disease relevance of candidate genes. We benchmarked our approach with genes known to be involved in 99 diseases taken from the OMIM database. Using artificial quantitative trait loci, MedSim achieved excellent performance with an area under the ROC curve of up to 0.90 and a sensitivity of over 70% at 90% specificity when classifying gene products according to their disease relatedness. This performance is comparable or even superior to related methods in the field, albeit using less and thus more easily accessible information.

Availability: MedSim is offered as part of our FunSimMat web service (http://www.funsimmat.de).

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Benchmarking
  • Computational Biology / methods
  • Databases, Genetic
  • Genes
  • Genetic Diseases, Inborn / genetics*
  • Humans
  • Mice
  • Multifactorial Inheritance
  • Proteins / genetics
  • Semantics
  • Software*

Substances

  • Proteins