Previous: Solving For Weights and Minimum Theoretical Distortion | Top: Table of Contents | Next: Perturbed Data Sets

II-A. Interpretation of Results

The resulting solution from CPLEX gives us two important things:

The x values that are returned will most likely not be integral. To remedy this, we use them to calculate probability values in vector p. We define these as such:

Let X be the sum of all xi values in our vector x.

pi = xi / X     for all indices i.

We can use these p values as follows:

Let's say we want to create some encoding of our alphabet A,G,C,T so that each symbol maps to a binary string of some length N. We now know with what probability the rows 0001, 0010, etc. should occur in our encoding.

So if p1 = 0.15, then for any slice in our encoding scheme, there is a 15% chance that for a specific bit, A, G, and C will have 0 while T has 1, and so on.

With these probability values, we can experiment with randomly generated encoding schemes of any length to observe how long we need our encoded strings to be before we approach the minimum possible distortion.


Previous: Solving For Weights and Minimum Theoretical Distortion | Top: Table of Contents | Next: Perturbed Data Sets