Application of the Hamming distance in data science


In this lesson, we’re introduced to the Hamming distance and given the application of spell checking. Would you please present an application of the Hamming distance specific to data science?


In data science, our data isn’t always numerical. Sometimes, our data is categorical. For example, when we label data items or divide up data into named groups. It is sometimes useful to have a method for assigning a numerical value to a relationship between two data items having, potentially, distinct category labels. One simply way of doing this is to look at the labels that two data items have and add 1 to your computation if there’s a label that one item has but the other lacks. When we organize our data labels in a consistent way, for example, the data labels are ordered in the same way for each item, this is exactly the Hamming distance.

The numerical value you get from this computation, given that the method is chosen carefully, will give you some understanding of a relationship between the items.