Previous: Problem of Distortion | Top: Table of Contents | Next: Reduction of Problem Size

I-F. Problem Model

We present the following Problem Model. We will use the n = 4 character gene alphabet as a simple example, but it can also be applied to the n = 20 character protein alphabet.

Assume that each of our n symbols map to an arbitrary binary string of length N by some arbitrary encoding scheme that we do not yet know.

So we have something like:

A = 01101001010...
G = 10101010101...
C = 11010100101...
T = 00101010010...

And each of these binary strings is of length N

Note: n and N are two different values. n = 4 since we have 4 characters in our real gene strings, and N is an unknown and arbitrary number.

We then place these unknown encodings as columns in a matrix of Nxn, so our matrix would look something like:

A G C T
0 1 1 0
1 0 1 0
1 1 0 1
0 0 1 0
1 1 0 1
0 0 1 0
.
.
.
.
.
.
.
.
.
.
.
.

We can now say that any particular row is some combination of 4 binary values, so we have a total set of 24 = 16 possible rows. This is fine for now but in the larger problem we want to cut corners wherever we can.


Previous: Problem of Distortion | Top: Table of Contents | Next: Reduction of Problem Size