Take a moment and try Keboola for free.

  • Easy setup, no data storage required
  • Free forever for core features
  • Simple expansion with additional credits
cross-icon
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Join our newsletter

#noSpamWePromise
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
cross-icon
Subscribe

Run your data operations on a single, unified platform.

  • Easy setup, no data storage required
  • Free forever for core features
  • Simple expansion with additional credits
cross-icon
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Community
An In-depth Tutorial on Data Structures in Python
Discover six fundamental data structures and their implementation in Python.

Pythonistas are well versed in the intricacies of Python data types, which come equipped with out-of-the-box built-in methods to store, manipulate, and reshape data at the rhythm of keyboard strokes.

But not many data scientists and data engineers fully unlock the potential of data structures in Python - the potential to write code that is easier to maintain, more efficient by orders of magnitude, and solves problems more elegantly.

In this article, we take an in-depth look at data structures in Python through multiple topics.

1. What are data structures (in Python)?

Data structures sit at the intersection between computer science and data science. They occupy the in-between world of highly performant machine code on one hand, and easy-to-comprehend abstracted functions on the other.

Data structures describe how data should be organized so that it solves a computational problem while making its storage (space) complexity and time complexity as optimized as possible.

We have touched on some data structures before - you may remember the dictionary in Python, which is highly performant (O(1) time complexity at retrieving values stored in a dictionary when given a key). 

For now, though, we will dive deeper into the classic and fundamental data structures:

  1. Stack
  2. Queue
  3. Map
  4. Array
  5. Graph
  6. Set

For each data type, we will do three things:

  1. Explain how it works.
  2. Showcase its importance and applications.
  3. Offer code examples for implementing it in the Python programming language.

Before we do that, we first need to clarify a murky boundary. How, if at all, do data structures differ from data types in Python?

1.1 How do Python data structures differ from data types?

In traditionally favored programming languages like Java and C++, the languages draw a strong distinction between Abstract Data Types (ADT) and Data Structures. 

In those strongly-typed languages, ADTs are an abstraction that saves us from writing computer code ourselves. ADTs offer groups of operations that you can perform over a data type. The Python equivalent would be the class methods over data types, such as round(float) or str.upper() or dict.keys().

On the other hand, data structures are the implementation of those ADTs. To keep things performant, strongly-typed languages build upon tried-and-tested data structures that offer the best possible methods for solving computational problems. An example of that would be the class dict() in Python, which is extremely efficient at reading operations for named key-value pairs.

But Python differs from those languages because it does not strictly distinguish between ADTs and data structures. In fact, everything is an object in Python and therefore inherits from the class that it descends from. This is why the number 1 is treated as True in Boolean logic - Boolean values are class descendants of the integer class.

So, why distinguish between data types and data structures in Python at all?

The main reason is performance - understanding how to implement a data structure can improve performance significantly. But there are other benefits to solving problems with data structures in Python...

2. What are the benefits of using data structures in Python?

There are multiple advantages to using data structures:

  1. Performance. Choosing the right data structure makes your code run more efficiently (aka takes less time, or less space, or both).
  2. Simplicity and maintainability. Data structures provide the backbone or main logic for an algorithmic solution, which makes the code easier to maintain and modify.
  3. Organization. Putting your logic on the level of a data structure means that your codebase is more organized instead of having lengthy helper function files and dispersed logic.
  4. Collaboration. Using both the right language and the correct implementation makes it easier to collaborate on projects which span multiple stages of the ETL pipeline, especially when working with coworkers who are more classically trained in computer science.

Let’s check how these benefits stem from actual data structure implementations.

3. Array

3.1 What Is An Array?

An array is a compact data structure that sequentially stores basic data types (integers, floats, etc.) of information by allocating sequential memory slots. It is the simplest way to store information on a computer. 

Technically, an array specifies that all array elements must be of the same data type.

Items in an array can be accessed instantly (O(1) time). 

3.2 How Is The Array Used?

Arrays are often used in numerical computation tasks or as a supporting data structure to implement other data structures (e.g. stacks, lists, and queues). When talking about numerical operations, arrays are the linear algebraic implementation that allows for vector and matrix operations or the building blocks of many machine-learning algorithms.

3.3 How To Implement The Array Data Structure In Python

Python has a numeric implementation of arrays within the array module. It allows users to specify the Python (or C) data type that the array holds at initiation. It can hold elements of floats, integers, or Unicode characters (16- or 32-bits, depending on the platform).

The Python array also comes with many in-built methods. These include append(item) for adding items to the end of the array, count(item) for counting the number of occurrences of an item within the array, insert(index, item) to insert an item at position index, remove(item) for item deletion, and more.

There are other more specialized implementations of arrays in Python, such as bytes objects (immutable sequence of single bytes), bytearray objects (mutable arrays of single bytes), str objects (immutable arrays of Unicode characters declared as single quotes, double quotes, or triple quotes), and tuple objects (check out our article on this immutable sequence).

When it comes to arrays, Python does implement this data structure as part of its Standard Library, but you are better off using the NumPy library instead. It is optimized for scientific and vectorized computing.

4. Stack (LIFO)

4.1 What Is a Stack?

A stack is a container of objects where item insertion and item deletion are carried out according to the LIFO principle. LIFO stands for ‘Last In First Out’ and it signifies the priority of items at insertion and removal.

You can think of a stack as a pile of plates. It would be impractical to pick up the bottom plate, so usually, we’d go for the one on top (the last one added to the stack). 

The stack needs to implement at least two operations:

  1. push(item): add the item to the top of the stack.
  2. pop(): remove the last added item from the stack.

Of course, the stack data structure can include other elements, such as a function that returns the number of elements in a stack, or one that checks whether a stack is empty.

4.2 How Is the Stack Used?

The stack has many uses in the world of computers:

  1. Depth-first search (DFS) algorithms are based on the stack data structure, and those algorithms are used across a variety of parser and runtime memory management algorithms.
  2. Stacks are used for scheduling routines and algorithms.
  3. On a higher level, stacks can be used as part of applications. For example, the “undo” button in a text editor checks the last inserted command, reverts it, and pops it from the stack.

4.3 How To Implement The Stack Data Structure In Python

The easiest way to implement a stack in Python is to use lists. Lists allow you to add new items to the existing stack with the append() method, and remove the last element added with the pop() method. 

If you need a refresher on how to initiate a list (spoiler alert: square brackets suffice) and other methods of working with lists, have a read of our article on built-in data types.

5. Queue (FIFO)

5.1 What Is a Queue?

Queues are also collections of items that hold elements, but you could say that they are the opposite of stacks. Queues collect elements according to the FIFO concept - ‘First in First Out’ - so the first element added to the queue is the first one to be deleted.

The queue needs to implement at least two operations:

  1. enqueue(item): add the item to the back of the queue.
  2. dequeue(): remove the item at the front of the queue.

You can think of a queue as a line of people waiting in front of a food truck. The first person in the queue is the first person to be served.

5.2 How Is a Queue Used?

The queue has many pragmatic applications:

  1. Breadth-first search (BFS) algorithms are based on the queue data structure. In turn, this is used by other tree or graph data structures.
  2. Message brokerage and distributed computing. Advanced software such as Kafka uses queues for message brokerage and distributed computing.
  3. Higher-level applications. Whenever the client produces many messages or events in a distributed or non-linear fashion, queues are set in place to process that information. Imagine a pizzeria that receives many orders. If the orders did not go in the same order as they were placed (e.g. they used LIFO or stacks), they would not have many happy customers.

5.3 How To Implement The Queue Data Structure In Python

Many programmers use lists to implement queues, since you can implement the FIFO principle with slicing. However, as is noted in the Python documentation:

“[L]ists are not efficient for this purpose. While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one).“

Instead, use collections.deque, which was designed to have fast appends and pops from both ends. The module collections.deque could also be used for stacks but in this case, we can focus on deque methods append() and popleft(). The former adds a new item to the right (end) of the queue, while the latter returns and removes the item to the left (beginning) of the queue.

6. Map

6.1 What Is a Map?

The map data structure (also referred to as hash tables, lookup tables, hashmaps, or associative arrays) is a collection of named items. It is extremely efficient at item insertion, lookup, and item deletion.

However, this efficiency does come at a cost - it requires a lot more space than other data structures.

This is because the map is implemented as a key-value table, where each value has an associated unique key used for fast item retrieval.

Maps implement the following operations:

  1. set(key, value): adds a key-value mapping to the map.
  2. delete(key): removes the key and its associated value.
  3. get(key): retrieves the value associated with the given key.

6.2 How Is a Map Used?

Maps are extremely beneficial for any applications that rely heavily on fast data read-time and are not too affected by space constraints. This involves the majority of modern web apps, which return information efficiently. The map is also very popular among data scientists during data cleaning, where checking values against a key-value baseline is used to determine if data has been corrupted in transit or during collection and transformation.

6.3 How To Implement The Map Data Structure In Python

Maps are implemented as the built-in data type dict! The Python implementation requires you to pick an immutable object for a key (mutable objects such as lists cannot be transformed into reliable hashes since they could change with the hash) and it then constructs a hash table in the background.

If you need a refresher on how to initiate the dictionary in Python (spoiler alert: put the key-value pairs between curly braces) and other methods of working with dictionaries, check out our article on built-in data types. 

7. Graph

7.1 What Is a Graph?

Graphs arrange data as multiple nodes (or vertices in mathematics) connected with relationships (or edges in mathematics).

Graphs are structures used to model pairwise relations between objects. As an example, the node “Customer John Smith” could have a relation of “bought” with the node “Shampoo ‘Amazing Hair’”.

7.2 How Is a Graph Used?

Graphs are extensively used to model other networks that naturally occur in real life, such as social media connections, semantic linguistic analysis, website traffic and interlinkedness, the topology of atoms, etc. (and the list really goes on).

This is the most flexible data structure of all because one could construct every other data structure using graphs.

7.3 How To Implement The Graph Data Structure In Python

A simple way to implement a graph is with a Python dictionary, whereby the direction of a relationship goes from the key (source node) to the value (target node). This might be cumbersome, but it is rather efficient.

A more intuitive approach (and one widely used by data scientists and network engineers) is to use the NetworkX library. This library allows you to easily add and remove both nodes and edges and comes with a lot of out-of-the-box methods, such as ones for finding the shortest path between nodes, or measures of associativity and centrality within the graph.

8. Set

8.1 What Is a Set?

A set is an unordered collection of unique (non-duplicated) items. It is a data structure that mimics mathematical sets, so it also comes with mathematical operations, such as membership checks (is an item part of a set? A subset of another set? And so on...), and between-set operations (unions, intersections, etc.).

8.2 How Is a Set Used?

Sets are used for extremely efficient set operations. They are often used for validating data (e.g. as part of data cleaning or even client-facing applications, such as checking if the client credentials are in the database), finding differences between two datasets, joining unique values together (e.g. when constructing new datasets), and more.

8.3 How To Implement The Set Data Structure In Python

Sets have a native implementation in Python with the set object. Using the set object’s methods enables efficient and optimized out-of-the-box set operations with method calls.

9. Conclusion: Python Data Structures

In this tutorial, you have learned:

  1. The 6 fundamental data structures.
  2. The implementation of data structures in Python (with built-in modules or external libraries).
  3. The practical advantages of using a specific data structure.

What’s next?

Now it’s time to put your new-found knowledge to the test. Use the data structures to improve existing code or write new high-performance scripts.

If you liked this article, be sure to subscribe to our newsletter so that you’ll always be informed when new articles and guides come out. 



Download the files
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.