Binary Similarity Calculator

In my previous post I mentioned the Hamming Distance. I was thinking in adding to my Levenshtein Edit Distance calculator the ability to compute this statistic, but I changed my mind.

Rather, why not design a whole new tool that computes, in addition to the Hamming Distance, other coefficients?

So,…

Welcome to the Binary Similary Calculator and Vector Analyzer.

This is a new tool I’ve built over the weekend. It will be uploaded in the next few days.

The tool compares the similarity of a pair of strings. These should consist of binary characters and be of the same length. Since strings are treated as vectors, the tool also works as a vector analyzer; hence, its name.

The following similarity measures are computed from a contingency table of positive and negative matches and mismatches: Sokal-Michener (i.e., Simple Matching), Jaccard, Russell-Rao, Hamann, Sorensen (i.e., Dice), antiDice, Sneath-Sokal, Rodger-Tanimoto, Ochiai, Yule, Anderberg, Kulczynski, Pearson, and Gower2.

The tool also computes Dot Products and Cosine Coefficients.

Although a distance metric, the Hamming Distance (number of positive and negative mismatches) has been included because of its close relation with some of the aforementioned similarity measures.