Spectral now part of Check Point’s CloudGuard to provide the industry’s most comprehensive security platform from code to cloud Read now

How zip() Works in Python

By Uri Shamay May 10, 2021

Every now and then we get datasets that we need to combine together. The zip() function in Python returns a tuple based on the passed in parameters.

But what is a tuple? Why do we need to combine datasets together, and how, exactly, do we use zip() in Python?

What is a tuple and why it matters

In Python, a tuple is multiple items that are stored in a single variable. In a way, it is a list of whatever you put in it. Here is an example of what a tuple looks like:

 fruit_tuple = ("apple", "banana", "oranges")

But what’s so special about a Python tuple? The quick answer is, once a tuple is created, the items remain in that order, becomes unchangeable, and allows for duplicate values. In short, a tuple is immutable.

Each tuple item is indexed, with the first item index being [0].

To find the length of a tuple, the function len() can be used. Here is an example:

 fruit_tuple = ("apple", "banana", "oranges")
 print(len(fruit_tuple))

It is also good to note that a tuple is expected to have more than one item. When you only have one item, you need to add an extra comma (,) value to the end of your item. If not, Python will not recognize it as a tuple.

For example:

 not_tuple = ("apple")
 fruit_tuple = ("apple",)

If you do not have the extra comma for a single item tuple, Python will only recognize it as a string or number type, depending on the value you passed in.

For example:

 not_tuple = ("apple")
 fruti_tuple = ("apple",)
 ​
 print(type(not_tuple))
 print(type(fruit_tuple))

The above will return the following:

 <class 'tuple'>
 <class 'str'>

You can also enforce a set of data to be a tuple by using the tuple() constructor. Here is an example of how to do it:

 fruit_tuple = tuple(("apple", "banana", "oranges"))

This is one way to ensure that your collection is definitely a tuple — which sets its mutability state into an immutable one.

The relationship between tuples and zip() function in Python

So why are tuples a big deal, especially when it comes to Python’s zip() function?

This is because zip() turns sets of data into a tuple by combining them into a single output.

A tuple in Python is special because it differs from other types of data that stores collection of data. There are four data types available in Python and they are list, set, dictionary and tuple.

  • A Python list is a collection where the order remains but is changeable and allows for duplicate data
  • A Python set is unordered and unindexed, but with no duplication allowed
  • A Python dictionary is an ordered collection that is unchangeable and ordered, but like set, it doesn’t allow for duplicates.

A Python tuple is the only data type out of the four that is unchangeable, ordered, and allows for duplicates. This means that whatever is contained within a tuple can be a single source of truth, due to its immutable nature.

When combined with zip(), it allows the developer to create new sets of data without fear of side effects or its truthiness becoming challengeable further down the line.

How zip() function works in Python

So how exactly does the zip() function work in Python?

First, zip() takes tuples in as parameters. Then it parallel iterates them to form a new tuple based on the given set of data. Using zip() lets you parallel iterate through multiple lists at once. Here is an example:

 fruit = ("apple", "banana", "oranges")
 price_list = (1.99, 2.99, 3.99)
 ​
 zip(fruit, price_list)

zip(fruit, price_list) will return a memory object that looks something like this:

 <zip object at 0x1928e98733>

If you want to see what your zip() tuple looks like, use the list() function. Here is an example of how to do so:

 fruit = ("apple", "banana", "oranges")
 price_list = (1.99, 2.99, 3.99)
 ​
 list(zip(fruit, price_list))

The above will return the following result:

 [('apple', 1.99), ('banana', 2.99), ('oranges', 3.99)]

This is the basics of how zip() works and what the returned tuple looks like. But what if you have a tuple dataset that doesn’t have the same length? For example:

 fruit = ("apple", "banana", "oranges")
 price_list = (1.99, 2.99, 3.99)
 brand = ("appleberry", "yellow")
 ​
 list(zip(fruit, price_list, brand))

By default, the tuple returned will be based on the shortest length collection. So the above example will produce the following output:

 [('apple', 1.99, 'appleberry'), ('banana', 2.99, 'yellow')]

If you want to create a zip item based on the longest collection, you will need to import zip_longest from itertools . Here is an example:

 from itertools import zip_longest
 ​
 fruit = ("apple", "banana", "oranges")
 price_list = (1.99, 2.99, 3.99)
 brand = ("appleberry", "yellow")
 ​
 list(zip_longest(fruit, price_list, brand))

zip_longest() will return a tuple that is the same length as the longest data collection set. None will be set in the spaces where the shortest collections do not have data for the tuple.

Here is an example of the output based on the above usage of zip_longest().

 [('apple', 1.99, 'appleberry'), ('banana', 2.99, 'yellow'), ('oranges', 3.99, None)]

If you don’t want the empty value to be None, you can set a fill value inside zip_longest() like this:

 from itertools import zip_longest
 ​
 fruit = ("apple", "banana", "oranges")
 price_list = (1.99, 2.99, 3.99)
 brand = ("appleberry", "yellow")
 ​
 list(zip_longest(fruit, price_list, brand, fillvalue="budgets"))

This will return the following:

 [('apple', 1.99, 'appleberry'), ('banana', 2.99, 'yellow'), ('oranges', 3.99, 'budgets')]

What happens if you only pass in one data collection set into zip()? You will still get the same format of data returned, but as tuples.

 fruit = ("apple", "banana", "oranges")
 ​
 list(zip(fruit))

The above example will return the following:

 [('apple',), ('banana',), ('oranges',)]

This is something important to note because if you want to access the information, you still need to map it against the layered index.

For example, if you called the following:

 list(zip(fruit))[0]
 list(zip(fruit))[1]
 list(zip(fruit))[2]

This will return:

 ('apple',)
 ('banana')
 ('oranges')

If you want the actual value, you’ll need to go into the object one more time like this:

 list(zip(fruit))[0][0]
 list(zip(fruit))[1][0]
 list(zip(fruit))[2][0]

The above will return the following:

 apple
 banana
 oranges

What happens if you just run an empty zip() function?

 list(zip())

Then it will return an empty object like this:

 []

What if you want to sort your zipped object? .sort() will sort your zip() based on the first tuple it gets. For example:

 fruit = ( "peaches", "banana", "oranges")
 price_list = (5.99, 8.99, 3.99)
 ​
 set_1 = list(zip(fruit, price_list))
 set_1.sort()

This will return the following:

 [('banana', 8.99), ('oranges', 3.99), ('peaches', 5.99)]

However, if we had price_list first, the following will be sorted differently:

 fruit = ( "peaches", "banana", "oranges")
 price_list = (5.99, 8.99, 3.99)
 ​
 set_2 = list(zip(price_list, fruit))
 set_2.sort()

The above will return the following:

 [(3.99, 'oranges'), (5.99, 'peaches'), (8.99, 'banana')]

un-zipping zip()

The zip() function in Python is the easiest way to iterate through a list in parallel. It is the minimal fuss way to ensure that whatever new dataset you are creating is also immutable, making it a trustworthy component of functional patterns.

To reverse a zip() function, aka, to unzip it, you can do so by preceding the zipped object with an asterisk *. Here is an example:

 zipped_item = [('apple', 1.99, 'appleberry'), ('banana', 2.99, 'yellow'), ('oranges', 3.99, 'budgets')]
 ​
 fruit, price_list, brand = zip(*zipped_item)

This will return the following:

 fruit = ("apple", "banana", "oranges")
 price_list = (1.99, 2.99, 3.99)
 brand = ("appleberry", "yellow", "budgets")

And that is basically Python’s zip() function in a nutshell.

Related articles

Identity Governance: What Is It And Why Should DevSecOps Care?

Did you know that the household data of 123 million Americans were recently stolen from Alteryx’s Amazon cloud servers in a single cyberattack? But the blame

Parallel Testing Unleashed: 10 Tips to Turbocharge Your DevOps Pipeline

Parallel Testing Unleashed: 10 Tips to Turbocharge Your DevOps Pipeline

Every software team is constantly looking for ways to increase their velocity. DevOps has emerged as a leading methodology that combines software development and IT operations

MongoDB Replica Set: A Developer's Tutorial to MongoDB Replication

MongoDB Replica Set: A Developer’s Tutorial to MongoDB Replication 

There’s one thing every developer should do – prepare for the unknown.  MongoDB is a NoSQL database widely used in web development, designed to handle unstructured

Stop leaks at the source!