Which definition of the Hamming distance is preferred?


In this exercise, we learn that SciPy's implementation of the Hamming distance differs from the one presented earlier in the lessons by dividing by the number of dimensions. Should we use one implementation over the other or do they have different applications?


If we’re interested in only the number of differences for an application, then we would prefer to have the implementation presented earlier in this lesson, that is, HammingDistance = # of differences. However, sometimes we want to think about the percentage that one data item differs from another. In this case, SciPy's implementation, HammingDistance = (# of differences)/(# of dimensions), is more helpful. However, it’s simple to switch between the two implementations by multiplying or dividing by the number of dimensions. So we don’t need to pick one implementation over the other. We simply need to be aware of which version is implemented in the software that we’re using.