End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Dictionary Comprehensions in Python

Python has many features which usually stay unknown to many programmers. This time I discovered the dictionary comprehensions, one of the nice features which can make code much cleaner. Let me start with a simple introduction about list comprehensions and collection uniqueness.

List Comprehensions

List comprehensions are a much simpler way of creating lists. This is one feature, rather widely used, and I saw this in many examples and source of many libraries.

Imagine you have a function which returns a list of data. A good example of this is xrange(start, end) function, which returns all numbers within the range [start, end). This is a generator, so it doesn't return all numbers at once, but you need to call this function many times, and each time it returns the next number.

Getting all numbers from the range [1, 10] using this function can be done like this:

numbers = []
for i in xrange(1, 11):
    numbers.append(i)

If you want to get only the even numbers, then you can write:

numbers = []
for i in xrange(1, 11):
    if i % 2 == 0:
        numbers.append(i)

List comprehensions can make the code much simpler. The below expression evaluates to a list:

[ expression for item in list if conditional ]

The first example can be then written as:

numbers = [i for i in xrange(1, 11)]

and the second:

numbers = [i for i in xrange(1, 11) if i % 2 == 0]

Of course this syntax can be a little bit strange at the very first moment, but you can get used to it, and then the code can be much simpler.

Removing Duplicates

Another common usage of collections is to remove duplicates. And again there are plenty of ways to do it.

Consider a collection like this:

numbers = [i for i in xrange(1,11)] + [i for i in xrange(1,6)]

The most complicated way of removing duplicates I've ever seen was:

unique_numbers = []
for n in numbers:
    if n not in unique_numbers:
        unique_numbers.append(n)

Of course it works, but there a much easier way. You can use a standard type like set. Set cannot have duplicates, so when converting a list to a set, all duplicates are removed. However at the end there will be set, not list, if you want to have a list, then you should convert it again:

unique_numbers = list(set(numbers))

Removing Object Duplicates

With objects, or dictionaries, the situation is a little bit different. You can have a list of dictionaries, where you use just one field for identity, this can look like:

data = [
  {'id': 10, 'data': '...'},
  {'id': 11, 'data': '...'},
  {'id': 12, 'data': '...'},
  {'id': 10, 'data': '...'},
  {'id': 11, 'data': '...'},
]

Removing duplicates, again, can be done using more or less code. Less is better, of course. With more code it can be:

unique_data = []
for d in data:
    data_exists = False
    for ud in unique_data:
        if ud['id'] == d['id']:
          data_exists = True
          break
    if not data_exists:
        unique_data.append(d)

And this can be done using a thing I discovered a couple of days ago, this is the dictionary comprehension. It has a similar syntax to the list comprehension, however it evaluates to a dicionary:

{ key:value for item in list if conditional }

This can be used to make a list without all duplicates using a custom field a identity:

{ d['id']:d for d in data }.values()

The above code creates a dictionary with a key, which is the field I want to use for uniqueness, and the whole dictionary as the value. The dictionary then contains only one entry for each key. The values() function is used to get a list only with values, as I don't need the key:value mappings any more.

3 comments:

Sebastian Raschka said...

Nice article! I turns out that comprehensions are also faster (about ~1.2x) if you are interested in some benchmarks, I've put them into an IPython notebook: http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/benchmarks/timeit_tests.ipynb?create=1#comprehensions

Viktor Kharkovets said...

>> numbers = [i for i in xrange(1, 11) if i % 2 == 0]
I think that its really important to create more realistic (or more specific) cases for such articles, otherwise some python novices will never know about
xrange(2, 11, 2)

crunchy_karma said...

But be aware that dict comprehensions works only in Python 2.7+.
in Python 2.6 and below you can replace it with:
dict((key, value) for item in list if condition)