Previous: Mapping Proteins to N-Length Binary Strings | Top: Table of Contents | Next: Problem Model

I-E. Problem of Distortion

On larger alphabets, avoiding distortion is probably impossible, especially if the particular score matrix does not fulfill certain mathematical properties. So to some extent we must accept and deal with distortion.

Minimizing distortion is often achieved by mapping to very long binary strings. The more bits we allow for each encoding, the more freedom we have to adjust and balance between all the different pairs. However, longer encodings put more workload on the computer, both in terms of the amount of memory used to store each encoding and in terms of the amount of time it takes to calculate the hamming distance between encodings.

We wish to study both the theoretical minimum distortion (that which we can achieve with any huge encoding scheme) as well as the balance between encoding length and distortion.


Previous: Mapping Proteins to N-Length Binary Strings | Top: Table of Contents | Next: Problem Model