
The only truly important property of these encodings is the hamming distance between them. So for any given row we can swap all entries at once and not effect the hamming distance.
For example, if we change the first row from 0110 to 1001 we have not changed any hamming distances. All columns that were different in that row are still different, and all those that were the same are still the same.
With this ability in mind, we will swap all rows where the first column is a 1. We have not altered any hamming distances, but we now can guarantee that the entire first column is 0's.
Our original matrix:
| A | G | C | T |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 1 |
| 0 | 0 | 1 | 0 |
| 1 | 1 | 0 | 1 |
| 0 | 0 | 1 | 0 |
| . . . |
. . . |
. . . |
. . . |
Would become this:
| A | G | C | T |
| 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 1 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 1 | 0 |
| . . . |
. . . |
. . . |
. . . |
Now any particular row is a 0 and some combination of 3 binary values, so there are 2n-1 = 23 = 8 possible rows.
We can also ignore all rows that do not contribute to any hamming distances, which is any row where all the values are the same. Since our first column is all 0's, we know that the row 1111 does not exist, so the only row we are eliminating is 0000. We now have 2n-1 - 1 = 7 possible rows.
Furthermore, the order of the rows does not affect the hamming distance. Think back to our original encoding. We are free to swap the 4th and 5th bit of one encoding, as long as we do it to all the rest. So, we are therefore free to swap any two rows in our matrix. We could then rearrange the matrix to group all identical rows together.
We can now view the matrix as a set of 7 integers: the first integer is the number of times the row 0001 occurs, the second integer is the number of times the row 0010 occurs, and so on. We have now reduced an arbitrary length encoding scheme to a vector of 2n-1 - 1 whole numbers (since negative values wouldn't make any sense).
We can also view each of our possible rows as a unique partition of the set {A,G,C,T}. The row 0001 would partition it into: {A,G,C} and {T}. The row 0110 would partition it into {A,T} and {G,C}.
In this way we can see that half the possible partitions are redundant. i.e. 0111 represents {A} and {G,C,T}. 1000 also represents {A} and {G,C,T}. Also, we are not interested in a partition that results in one set: 1111 or 0000 would represent {A,G,C,T}. So looking at it this way, we still have reduced our possibilities from 2n down to 2n-1 - 1.