This class provides an implementation of the Levenshtein distance algorithm,
which may be used to determine the minimum number of changes required to
transform one string into another. For the purpose of this algorithm, a
change is defined as replacing one character with another, inserting a new
character, or removing an existing character.
The basic algorithm works as follows for a source string S of length X and
a target string T of length Y:
- Create a matrix M with dimensions of X+1, Y+1.
- Fill the first row with sequentially-increasing values 0 through
X.
- Fill the first column with sequentially-increasing values 0 through
Y.
- Create a nested loop iterating over the characters in the strings. In
the outer loop, iterate through characters in S from 0 to X-1 using an
iteration counter I. In the inner loop, iterate through the characters
in T from 0 to Y-1 using an iterator counter J. Calculate the
following three things and place the smallest value in the matrix in
row I+1 column J+1:
- One more than the value in the matrix position immediately to the
left (i.e., 1 + M[I][J+1]).
- One more than the value in the matrix position immediately above
(i.e., 1 + M[I+1][J]).
- Define a value V to be zero if S[I] is the same as T[J], or one if
they are different. Add that value to the value in the matrix
position immediately above and to the left (i.e.,
V + M[I][J]).
- The Levenshtein distance value, which is the least number of changes
needed to transform the source string into the target string, will be
the value in the bottom right corner of the matrix (i.e.,
M[X][Y]).
Note that this is a completely "clean room" implementation, developed from a
description of the algorithm, rather than copying an existing implementation.
Doing it in this way eliminates copyright and licensing concerns associated
with using an existing implementation.