Results
Towards the end of the REU program, I began testing the algorithm. These tests
were minimal, and I am in the process of conducting more extensive tests to
obtain more reliable results. In this section I will present my findings thus far.
I was interested in exploring three properties of the algorithm. First, I
measured the actual runtime. This can indicate whether the theoretical bound
does in fact hold and how large the associated constants are. Next, and more
importantly, I calculated the approximation factor—how large and how variable
it is. Finally, I checked the reliability of the algorithm’s results. Would
the algorithm detect that a certain pair of strings was less similar than
another?
When string size was not a factor, tests were conducted on a set of large strings (30 kb) and a set of small strings (2 kb). It was hypothesized that
the performance of the algorithm would be better on the larger strings.
Note that the “actual edit distance” is only a very close estimation of this
value. (After all, if we could find the actual edit distance with moves, the
algorithm would be much less impressive!) This estimate was found during the
creation of the two input strings. A certain number of edit operations was
performed on a given string to obtain the second string. This number,
assuming it was significantly smaller than the length of the string,
is a reasonable estimate for the actual edit distance with moves. Test
strings were taken from literary works found at the
Project Gutenberg
website.
Run time
Approximation Factor
Relative Accuracy
|