What's the difference between a Python dictionary and a Python Counter?

Question

In this exercise, we see the use of the Python Counter data structure to store strings as keys and numbers as values. We would usually use a dictionary to do this. What’s the difference between a dict and a Counter? When should we choose to use a Counter?

Answer

The Counter data structure is much less flexible than the dictionary. It is designed to be high performance for applications specific to tallying. So it is a great choice in the case of this exercise where we want to examine many long strings and tally the number of times we see each unique word. Since we don’t want to do anything more than that, we don’t need the added flexibility of a dictionary. As a result, we get faster access and some other helpful functionality which is helpful for tallying applications. One of the most important of which is

  • most_common ([ n ])
    Returns a list of the n most common elements and their counts from the most common to the least.

So if your application requires only tallying and you want to do this as efficiently as possible, you should consider looking into the Counter collection.

14 Likes

Counter class is really great, I actually use it more often than normal dictionary, when it comes to solve interview questions. It supports also operations not available for normal dictionaries (addition, substraction, intersection).As the name suggest is only for ‘counting’ dictionaries (values of keys are integers). There is no need to initialize keys like in normal dictionaries (when key is not in Counter, you can still do counter[key] += 1). Just look at this beauty as example of use regex and Counter (example from Python documentation):

import re
from collections import Counter
words = re.findall(r'\w+', open('hamlet.txt').read().lower())
Counter(words).most_common(10)

You can replace ‘hamlet.txt’ with any text file to find 10 most common words in a given text file.
Another example, this time related directly to the lesson’s subject. Let’s suppose we have a list of reviews (it is actually a list of strings) named reviews.

cnt = Counter()
for sentence in reviews:
    words = re.findall(r'\w+', sentence.lower())
    cnt += Counter(words)

Code above count number of occurencies of all words in reviews. Great, isn’t it?
I highly recommend to familiarize yourself with documentation of Counter, deque, defaultdict classes from collections module and heapq class from standard library. They come in handy to solve interview questions.

1 Like