Edit Distance Implementation in Python 3

Nov 7, 2018

After taking a look at Jurafsky’s Stanford course “From Language to Information”, I was looking at various Python implementations of the Edit Distance algorithm. Some of them were really interesting, but I didn’t like some features they had or lacked. So I wrote my own implementation. I definitely don’t feel it’s the best someone can find, but:

  1. It calculates the whole Edit Distance table (D).
  2. It saves the operations which led to the D array (backtrace).

I know that calculating the whole D is not the best approach, but it’s useful for people who have just learned about Edit Distance and want to check their understanding. Or also if they want to check the correctness of their implementation.

Let’s go to the implementation!

First of all, I am talking about the Edit Distance algorithm with the Levenshtein operations. So, I am using the insertion, deletion and substitution operations between two words. Some other variants of the Edit Distance algorithm are:

Of course, dynamic programming has been used.

Levenshtein (Levenshtein, 1966) initially proposed that each of the three operations has a cost of 1, and later a variation which doesn’t allow substitutions. In this case, the substitution operation can be kept, but with a cost of 2, since it is the result of the other two operations, each of which has a cost of 1. In my implementation, the user can decide whether the substitution penalty will be equal to 1 (default) or something else.

In the lecture notes of the course, a confusion matrix for spelling errors was given, which isn’t included in the book. According to this matrix, some letters substitution shouldn’t always have the same cost, but it depends on the compared characters. I am planning to add this in my implementation later.



Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory, 10 (8), 707–710. Original in Doklady Akademii Nauk SSSR 163(4): 845–848 (1965).

Jurafsky, D. A., and Martin, J. H. (2018). Speech and Language Processing. Unpublished.