| Student: | Shiri Azenkot |
| School: | Pomona College |
| Email: |
shiri.azenkot@pomona.edu
shiriaz@dimax.rutgers.edu |
| Project Name: | String Edit Distance with Moves |
| Research Advisor: |
Dr. Graham Cormode, DIMACS Postdoc
Rutgers University |
Algorithm Overview
I explored an algorithm that approximates the string edit distance with moves
in near linear time. My DIMACS research advisor, Dr. Graham Cormode,
introduced the algorithm in a paper he coauthored,
The String Edit Distance Matching Problem with Moves (2002).
The problem is NP-hard, and the algorithm yields a deterministic approximation
in time O(n log n log*n). In theory, the algorithm’s output is within a factor
of O(log n log*n) of the actual edit distance
with moves.
aaaaaaa is grouped as (aaa)(aa)(aa) In order to group the remaining nodes, the alphabet must first be reduced. This is done iteratively, until the alphabet consists of only three characters. The reduction method ensures that no two adjacent characters are equal. Landmarks are then set according to the locations of local maxima and local minima. Each landmark is grouped along with its adjacent nodes to form groups of 2 and 3. Example:
sandwich is grouped as
(san)(dw)(ich)
Each node in the resulting tree represents a substring, according to its descendant leaves. Example:
Comparing the parse trees
This is a complete parse tree, including the top and bottom levels.
Finally, the characteristic vectors of both trees are subtracted. The difference
is the approximation for the edit distance with moves.
|