Student: Shiri Azenkot
School: Pomona College
Email: shiri.azenkot@pomona.edu
shiriaz@dimax.rutgers.edu
Project Name: String Edit Distance with Moves
Research Advisor: Dr. Graham Cormode, DIMACS Postdoc
Rutgers University




Results

Towards the end of the REU program, I began testing the algorithm. These tests were minimal, and I am in the process of conducting more extensive tests to obtain more reliable results. In this section I will present my findings thus far.

I was interested in exploring three properties of the algorithm. First, I measured the actual runtime. This can indicate whether the theoretical bound does in fact hold and how large the associated constants are. Next, and more importantly, I calculated the approximation factor—how large and how variable it is. Finally, I checked the reliability of the algorithm’s results. Would the algorithm detect that a certain pair of strings was less similar than another?

When string size was not a factor, tests were conducted on a set of large strings (30 kb) and a set of small strings (2 kb). It was hypothesized that the performance of the algorithm would be better on the larger strings.

Note that the “actual edit distance” is only a very close estimation of this value. (After all, if we could find the actual edit distance with moves, the algorithm would be much less impressive!) This estimate was found during the creation of the two input strings. A certain number of edit operations was performed on a given string to obtain the second string. This number, assuming it was significantly smaller than the length of the string, is a reasonable estimate for the actual edit distance with moves. Test strings were taken from literary works found at the Project Gutenberg website.

Run time

Could not load picture

Could not load picture

Approximation Factor

Could not load picture

Could not load picture

Relative Accuracy

Could not load picture

Could not load picture