A Simple Example of Phonetic Similarity vs. Text Similarity

The followings sound the same so their phonetic similarity is 1.

(a) r u?
(b) ar u?
(c) are u?
(d) r you?
(e) ar you?
(f) are you?

However, the Levenshtein Distance (LD) and Levenshtein Similarity (LS) of (a) with the other strings differ:

LD(a, b) = 1; LS(a, b) = 0.5
LD(a, c) = 2; LS(a, c) = 0.33
LD(a, d) = 2; LS(a, d) = 0.33
LD(a, e) = 3; LS((a, e) = 0.25
LD(a, f) = 4; LS(a, f) = 0.2

Can you find LD and LS results for other possible combinations?

For i = j, LD(i, j) = 0 and LS(i, j) = 1 so you may want to ignore this case.

Note: LD and LS results were computed with our tool at http://www.minerazzi.com/tools/levenshtein/levenshtein-distance-calculator.php

References
http://www.minerazzi.com/tutorials/levenshtein-distance-tutorial.pdf
http://www.minerazzi.com/tutorials/distance-similarity-tutorial.pdf

Leave a comment