How to unleash the power of Python sets

How to unleash the power of Python sets

Sets in Python organize collections of unique objects. Learn how to take advantage of this powerful feature in your own code

Credit: Dreamstime

Of the major data types built into Python, the set is one of the least discussed, but also one of the most powerful. A Python set lets you create collections of objects where each object is unique to the collection, and it works with the speed and efficiency of Python’s dictionaries.

However, because Python’s sets are not as widely discussed as its lists or dictionaries, it’s easy to miss out on how sets can make your Python apps smarter and more elegant. Let’s fix that!

Python set basics

Sets are defined with a syntax that is reminiscent of Python’s dictionary type:

my_set = {1,2,3,4}

The fact that this looks a little like a dictionary is no accident. You can think of a set as a dictionary that stores only keys, no values. In fact, many of the mechanisms under Python’s hood for sets are built with the same code as for dictionaries.

You can also create a set with the set() built-in, which takes any iterable:

my_set = set([1,2,3,4])

Set members can contain any hashable type — basically, any object in Python that can be guaranteed not to change over its lifetime. Numbers and strings are all OK, as are instances of user-defined classes. (Even if their properties change over time, the instances themselves don’t change.) Again, this is exactly the same as how the keys work in Python’s dictionaries.

If you try to define a set with redundant members, the redundancies will be removed automatically, with previously defined members taking priority. For instance, if we defined my_set as {1,2,3,2,4,5}, the result would be {1,2,3,4,5}.

Uses for Python sets

One powerful and common use for sets is deduplicating the members of a collection or the output generated by an iterable. For instance, if you have a list, you can quickly deduplicate the list by making a set from its contents:

list_1 = [1,2,3,4,3,4,2,4,5,3]
set_1 = set(list_1)
# yields {1,2,3,4,5}

(Note that the original list is preserved.)

This is far faster than iterating through the list and testing for duplicates manually. You can also do this for any iterable, not just a list, although lists are a common source. If you do this with a string, for instance, you’ll get a set that contains all the unique characters in the string:

s1="Hello there"
# yields {' ', 'r', 'l', 't', 'e', 'h', 'o', 'H'}

Note that this technique will work only if the objects in the list are all hashable. You’ll get a TypeError if you try to add an unhashable object. Also, there is no parameter you can pass that will ignore unhashable objects, so if you’re in doubt about what’s hashable or not, you’ll have to iterate through the collection and .add() each element manually, testing as you go.

Another common use for sets is to quickly test for the presence of a small collection of objects within a larger collection, or vice versa, by way of the superset/subset methods described below.

Note that this works best when the larger of the two collections is something you can convert to a set once and then test against many times, because the overhead of converting a list to a set (especially a long list) might outstrip the performance gains from using sets in the first place. But on the whole, set membership testing is generally faster than iterating through objects and testing membership manually.

Adding and removing members of Python sets

If you want to add and remove members from sets, use the .add() and .remove() methods. For example, my_set.add(5) would update my_set to include 5, and my_set.remove(5) would remove 5 if it were present.

If you try to .remove() something from a set that isn’t there, you’ll get a KeyError — same as if you try to reference a key in a dictionary that doesn’t exist. To remove something without the risk of raising an error if it isn’t there, use .discard() instead of remove().

To drop all elements from a set, you can use .clear(), or reassign the variable to an empty set:

my_set = set()

Unions and intersections with Python sets

Sets support a number of operations where you take two or more sets and generate new ones from them. A union of two sets combines the two into a single set, removing any duplicates:

set_1 = {1,2,3}
set_2 = {4,5,6}
set_3 = set_1.union(set_2)
# yields {1,2,3,4,5,6}

You can also use the pipe operator to perform a union:

set_3 = set_1 | set_2

Again, this is a handy way to perform deduplication across multiple collections of items.

An intersection generates a new set from only the elements common to multiple sets:

set_1 = {1,2,3}
set_2 = {2,3,4}
set_3 = set_1.intersection(set_2)
# yields {2,3}

The & operator can also be used to combine two sets (union):

set_3 = set_1 & set_2

Many set operations can be expressed with operators, which we’ll illustrate below.

Differences with Python sets

if you want to find out which members two sets don’t have in common, you can use the difference() method:

set_1 = {1,2,3}
set_2 = {4,5,6}
set_3 = set_1.difference(set_2)
# yields {1,2,3}
set_3 = set_1 - set_2
# different way to express same operation

One way to express this in English might be, “Create a new set that has everything in set 1 that isn’t in set 2.”

By contrast, if we used set_3 = set_2.difference(set_1), the results would be {4,5,6}. Python sets also support symmetric difference operations. The symmetric difference returns elements that are in one set or the other, but not both.

set_1 = {1,2,3,4}
set_2 = {4,5,6,7}
set_3 = set_1.symmetric_difference(set_2)
# yields {1, 2, 3, 5, 6, 7}
set_3 = set_1 ^ set_2
# operator version

Supersets and subsets in Python

You're probably familiar by now with Python’s in operator, which you can use to search for the presence of a character in a string or an object in a list. Sets support in as well:

set_1 = {1,2,3,4}
1 in set_1 # this is True
5 in set_1 # this is False

What if you wanted to test for the presence of all the elements of one set inside another set? You can’t use in for that — Python will think you’re testing for the presence of the entire set object, not its individual elements. Fortunately, Python does provide ways to check such things with other set methods:

set_1 = {1,2,3,4}
set_2 = {1,2}
# Tests if members of set_2 are in set_1:
# Operator version:
set_2 <= set_1
# Tests if set_1 contains all members of set_2:
# Operator version:
set_1 >= set_2

Set updates in Python

Up until now we’ve only explored how to generate new sets from intersections or differences of existing sets. Python also lets you update a set in-place with intersections or differences:

# In-place update of set_1 with set_2:
set_1 |= set_2
# In-place intersection of set_1 with set_2;
set_1 &= set_2
# In-place difference of set_1 with set_2:
set_1 -= set_2
# In-place symmetric difference of set_1 with set_2:
set_1 ^= set_2

In-place updates are handy when you’re dealing with a very large set, and you don’t want to create an entirely new instance of the set (with all the overhead that goes with such an operation). Instead, you can make the changes directly to the existing set, which is more efficient.

Frozen sets in Python

I mentioned before how sets can only be made of things that are hashable. Since sets are mutable, they can’t themselves be used as set elements or dictionary keys. But there is a variety of set called the frozen set that isn’t mutable, and so can be used as a set element, as a dictionary key, or in any other context where you need a hashable type. To create a frozen set, just use frozenset() to generate one from an existing set or iterable:

set_1 = {1,2,3,4}
f_set = frozenset(set_1)
set_2 = {f_set,2,3,4}

Note that once you create a frozen set, it can’t be altered. The .add() and .remove() methods won’t work on a frozen set. You can use a frozen set to generate set intersections or differences, as long as you don’t try to store the results of such operations in-place.

Tags python

Show Comments