"PubGene" is the result of a study into whether data on gene-gene interactions
could be "mined
" from gene
names found in journal article abstracts in the Medline
literature database. The basic idea is that if two or more genes are referred to in the same article (or actually in the abstract
/keywords), there is a high probability they have a meaningful biological interaction. The database of connections produced was tested against several experimental data sets: it correctly predicted 51% of the interactions contained in the Database of Interacting Protein
s and 45% of the interactions contained in Online Mendelian Inheritance
in Man. While this may not seem very reliable in practical terms, statistically
it's very noteworthy. The PubGene data was also compared to gene expression data from DNA microarray
experiments, and correctly predicted a number of gene interactions from different human
The authors of the study openly admit that PubGene currently has limited practical use. Some of the current stumbling blocks which produce errors are inconsistent gene nomenclature, with the same name referring to different genes and different names for the same gene, and the fact that not all Medline abstracts and keyword lists contain gene names.
Masys, D.R. (2001) Linking microarray data to the literature. Nature Genetics 28(1): 9-10.
Jenssen T-K. et al. (2001) A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics 28(1): 21-28