
Previous: Reduction of Problem Size | Top: Table of Contents | Next: Solving For Weights and Minimum Theoretical Distortion
I-H. Generating a Weighted Partition Matrix
Now we will generate a partition matrix with the following properties:
- There is a column for each possible significant row in our first matrix. So there is a column for
0001,
a column for
0010,
etc. up to
0111.
This means that our matrix has a width of 2n-1 - 1. This is only 7 on our simple problem, but on
the protein problem this is still huge.
- There is a row for each unique pair of symbols in our alphabet.
For the simple problem, the possible pairs are: (A,G),
(A,C),
(A,T),
(G,C),
(G,T),
(C,T).
We ignore pairs such as (C,A)
because we consider it the same as (A,C)
which is already represented. In general, we have nC2 rows.
- For any given cell, we look at the partition (or slice of the encoding) associated with that column
and the pair associated with that row. If the pair has two different values under that column, then we say
that column contributes to the hamming distance of that pair, and place a 1 in the cell.
Otherwise, we place a 0.
We will fill in the matrix for our simple example:
| |
Possible Encoding Cross-sections a.k.a. Set Partitions |
| Pairs |
0001 |
0010 |
0011 |
0100 |
0101 |
0110 |
0111 |
| (A,G) |
0 |
0 |
0 |
1 |
1 |
1 |
1 |
| (A,C) |
0 |
1 |
1 |
0 |
0 |
1 |
1 |
| (A,T) |
1 |
0 |
1 |
0 |
1 |
0 |
1 |
| (G,C) |
0 |
1 |
1 |
1 |
1 |
0 |
0 |
| (G,T) |
1 |
0 |
1 |
1 |
0 |
1 |
0 |
| (C,T) |
1 |
1 |
0 |
0 |
1 |
1 |
0 |
So for the top left cell, we are looking at the pair (A,G)
and the partition {A,G,C}
and {T}.
Under that partition, A and G
are in the same set and therefore this partition does not contribute to the hamming distance between
A and G,
so we assign a 0. We fill in the rest of the matrix in the same manner.
Previous: Reduction of Problem Size | Top: Table of Contents | Next: Solving For Weights and Minimum Theoretical Distortion