28 Aug 2013

Python Collection: Counter Class

Conceptual idea of Counter is class is to represents the multiset of mathematics. Mutiset is set which allows the a member(element) to appear more than once like [a,a,a,b,b,b,b,c,c].
Number of times a element appear is said to be multiplicity of element.
Where this multiset can be useful ?
Prime factor of 144 is [2,2,2,2,3,3],  which is a multiset !!

In python terms
Counter is a subclass of dict which stores the element as a key and count as a value. It is unordered collection type.
This class supports the mathematical operations like addition, subtraction, union and intersection between two objects. To use this class one needs to import Counter class from collections package.
In this post i'm going to work around this class for basic operations.

Use case 1:
Let say we have a list  ['red', 'blue', 'red', 'green', 'blue', 'blue']. from which we want to find out the number of occurrence of a element.
>>> color_list = ['red', 'blue', 'red', 'green', 'blue', 'blue']
>>> cnt = Counter()
>>> for word in color_list:
...     cnt[word] += 1
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})




creating the counter object :
>>> from collections import Counter             //importing Counter
>>> c = collections.Counter(a=3, b=2, c=5)

printing counter object :
>>> c
Counter({'c': 5, 'a': 3, 'b': 2})
>>> c.items()
[('a', 3), ('c', 5), ('b', 2)]

list all elements in a counter object :
>>> list(c.elements())
['a', 'a', 'a', 'c', 'c', 'c', 'c', 'c', 'b', 'b']

finding the count for an element :
>> c['a']
3
one interesting thing here is if we tries to find the element which is not in c , say c['k']
>> c['k']
0         // returns zero not KeyError,  which is a case when using the dictionary.

get the all keys and values in counter object :
>>> c.keys()
['a', 'c', 'b']
>>> c.values()
[3, 5, 2]

finding the most common elements and their count using most_common(n) method, where n is number of elements :
>>> c.most_common(1)   // n is 1
[('c', 5)]
and 
>>> c.most_common(2)  // n is 2
[('c', 5), ('a', 3)]

updating the counter object : 
>> c2 = {'a': 5}
>> c.update(c2)
>> c
Counter({'a': 8, 'b': 5, 'b': 2})    // value of a element is updated to (3+5=8).

Use case 2 : Find out the most common words in a file 
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
 ('you', 554),  ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]

No comments:

Post a Comment