Top: Table of Contents | Next: Simple String Comparison by Hamming Distance

I-A. Background: Genomics vs. Proteomics

Genomics: "The study of an organism's genome and the use of its genes" [1]

Proteomics: "The study of the full set of proteins encoded by a genome" [2]

In terms of data analysis, there are some similar problems between the two, such as string comparison. A gene or protein can be represented on the computer as a string of characters, with each character representing a chemical component in the actual chain (bases in the case of genes, or amino acids in the case of proteins).

However, Proteomics is far more complex for several reasons. For starters, the alphabet used to represent the bases in a gene contains four characters (A,G,C,T), while the alphabet used to represent the amino acids in a protein contains twenty characters. Also, a single gene can produce a variety of different proteins. These proteins can be altered by the state of the organism and the environment in which they are expressed. So for any genome, there is a far larger and more complex proteome.

The result is that some methods that were efficient on simpler genome problems are now inefficient for the more complex proteome problems.


Top: Table of Contents | Next: Simple String Comparison by Hamming Distance