News

Welcome to End Point’s blog

Ongoing observations by End Point people

Dictionary Comprehensions in Python

Python has many features which usually stay unknown to many programmers.

List Comprehensions

List comprehensions are much simpler way of creating lists. This is one feature which is rather widely used and I saw this in many examples and source of many libraries.

Imagine you have a function which returns a list of data. A good example of this is xrange(start, end) function which returns all numbers within the range [start, end), so it excludes the end. This is a generator, so it doesn't return all numbers at once, but you need to call this function many times, and each time it returns the next number.

Getting all numbers from range [1, 10] using this function can be done like this:

numbers = []
for i in xrange(1, 10):
    numbers.append(i)

If you want to get only the even numbers, then you can write:

numbers = []
for i in xrange(1, 11):
    if i % 2 == 0:
      numbers.append(i)

List comprehensions can make the code much simpler.

The whole expression evalutes to a list, and the main syntax is:

[ expression for item in list if conditional ]

The first example can be then written as:

numbers = [i for i in xrange(1, 11)]

and the second:

numbers = [i for i in xrange(1, 11) if i % 2 == 0]

Of course this syntax can be a little bit strange at the very first moment, but you can get used to it, and then the code can be much easier.

Removing Duplicates

Another common usage of collections is to remove duplicates. And again there are plenty of ways to do it.

Consider a collection like this:

numbers = [i for i in xrange(1,11)] + [i for i in xrange(1,6)]

The most complicated way of removing duplicates I've ever seen was:

unique_numbers = []
for n in numbers:
    if n not in unique_numbers:
        unique_numbers.append(n)

Of course it works, but there is another much easier way, you can use a standard type like set. Set cannot have duplicates, so when converting a list to set, all duplicates are removed. However at the end there will be set, not list, if you want to have list, then you should convert it again:

unique_numbers = list(set(numbers))

Removing Object Duplicates

With objects, or dictionaries, the situation is a little bit different. You can have a list of dictionaries, where you use just one field for identity, this can look like:

data = [
  {'id': 10, 'data': '...'},
  {'id': 11, 'data': '...'},
  {'id': 12, 'data': '...'},
  {'id': 10, 'data': '...'},
  {'id': 11, 'data': '...'},
]

Removing duplicates, again, can be done using more or less code. Less is better, of course. With more code it can be:

unique_data = []
for d in data:
    data_exists = False
    for ud in unique_data:
        if ud['id'] == d['id']:
          data_exists = True
          break
    if not data_exists:
        unique_data.append(d)

And this can be done using a thing I discoverd a couple of days ago, this is dictionary comprehension. It has a similar syntax as list comprehension, however evaluates to dicionary:

{ key:value for item in list if conditional }

This can be used to make a list without all duplicates using a custom field:

{ d['id']:d for d in data }.values()

The above code creates a dictionary with key, which is the field I want to use for uniqueness, and the whole dictionary as value. The dictionary then contains only one entry for each key. The values() function is used to get only values, as I don't need the key:value mappings any more.

4 comments:

Sebastian Raschka said...

Nice article! I turns out that comprehensions are also faster (about ~1.2x) if you are interested in some benchmarks, I've put them into an IPython notebook: http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/benchmarks/timeit_tests.ipynb?create=1#comprehensions

Alex Grönholm said...

What does this post have to do with Python, or dictionary comprehensions?

Jon Jensen said...

Alex, that's a very good question. Something is messed up with Blogger, which hosts our blog, because the content to the post does not currently match the URL! See this historical verison:

https://web.archive.org/web/20140708200320/http://blog.endpoint.com/2014/04/dictionary-comprehensions-in-python.html

Szymon Guz said...

Alex, thank you for the great comment. I've already restored the previous content of this article.