PostgreSQL (or Postgres) and MySQL are both relational database management systems (RDBMS for short). They are complex technological inventions designed to simplify your data operations across a wide variety of business use cases.
The “relational” part of the name refers to the database architecture, which structures data according to the relational data model. The relational model organizes data as tables of rows and columns. In non-tech terms: they’re like Excel but with added features… so more of an Excel on steroids.
From storing a digital representation of all your purchasing orders to analyzing performance metrics, RDBMS were developed to facilitate working with data at scale.
But not all RDBMS were created the same. Choosing the right database from the very beginning can offset and mitigate problems that arise later on, such as limited analytics and the lack of support.
To make your decision easier, we’ve provided a side-by-side comparison of two of the most popular databases - Postgres and MySQL - across a range of criteria:
Both Postgres and MySQL are relational database management systems (RDBMS), which means that they store data in the relational (tabular) model. An RDBMS is expected to perform a variety of functions:
People often try to compare MySQL and Postgres on their level of performance - in other words, how fast and how well they realize their functions.
For decades, it was common knowledge that MySQL was better at read-heavy operations (e.g. a simple BI system for your e-commerce store), while Postgres shined in complex OLTP/OLAP systems but used more memory (each new Postgres process is allocated 10MB).
However, with each new update, their performance at the read-write level is becoming increasingly comparable between the database systems.
One point of difference which remains noteworthy is the performance features that are unique to Postgres:
These (and other features) position Postgres as a better choice than MySQL for analytically heavy operations.
For most use cases, both MySQL and Postgres are going to be similar in performance.
PostgreSQL performance is utilized to the fullest in systems that expect heavy analytic workloads.
And, unless you have extreme data needs (on the level of Netflix, Facebook, and other tech giants), you will barely notice performance issues. If those arise, you’re better off with a completely different solution to meet your needs, such as developing your own query language to handle large volumes of data.
MySQL is more popular than PostgreSQL on a variety of metrics:
Nevertheless, in a shocking plot twist, developers love Postgres a lot more than they do MySQL.
How do we reconcile these differences?
The truth is, both databases are superior in the wider world of relational databases. Postgres is relied upon by Netflix, Instagram, Spotify, Reddit, Twitch, and many more. MySQL is trusted by Uber, Verizon, NASA, Tesla, and other giants. No matter your database choice, you will join countless other market leaders in making that decision.
RDBMS do not hold prom king and queen elections (alas). So, why does popularity matter in the first place? It’s a great signal of market availability. The higher the popularity of an RDBMS, the higher your chances of finding (and hiring) developers and database administrators. This is especially important if you plan to extend your operations in the future.
Both MySQL and Postgres are popular open-source RDBMS, which is reflected in their vibrant active open-source communities, who offer support for free. Alongside its free version governed by the GNU General Public License, MySQL also offers paid-tier support, which can be attractive to enterprises. We can break down their support multiple tiers of support options into several categories:
d) Paid support: MySQL Oracle Partner Network, Postgres has no officially affiliated companies
e) Database management tools: both RDBMS have at their disposal multiple cross-platform third-party tools to help you access and manage the database from a GUI instead of the command line. For instance, PostgreSQL is administered via pgAdmin, DBeaver, Navicat, DataGrip, …, while MySQL users can pick from a variety of choices such as MySQL Workbench, phpMyAdmin, HeidiSQL, …
Even though both Postgres and MySQL come with a vibrant community of volunteers and supporters, MySQL has a better paid-tier support structure, which is directly connected to Oracle. Additionally, MySQL has more StackOverflow questions and answers, as well as a wider range of materials (books, documentation) to help you resolve your issues.
Sooner or later, you’re going to encounter problems with database administration or analytical queries. It’s best to prepare yourself beforehand by exploring your options. If you’re not very comfortable with RDBMS or lack the necessary experience, MySQL’s commercial license offered by the Oracle Corporation is the safer way to go.
Both MySQL and Postgres rely on SQL. In that respect, both languages are capable of analyzing a wide constellation of data and answering business questions across the board.
However, Postgres has developed more advanced features for analytics. These speed up and optimize analytical queries:
If your use case involves analytics - or is likely to involve more complex analytical queries in the foreseeable future - choose Postgres. MySQL supports only standard SQL syntaxes, while PostgreSQL supports very advanced custom-made procedures. Additionally, if you expect to be dealing with a lot of data cleaning in your pipelines, Postgres offers more out-of-the-box solutions for wrangling messy data. With its variety of analytic functions, it will speed up and optimize your analytical operations. This makes PostgreSQL an excellent choice for your data science and big data needs.
NoSQL refers to data modeling, which does not follow the tabular (Excel-like) form of relational databases. This includes anything from XML, JSON, and graphs, to other wildly imaginative representations of data.
The challenge of modeling and storing NoSQL data is relatively new. Despite this, the demand for NoSQL databases is increasing due to the proliferation of non-structured data (social networks, which are graphs, Internet of Things logs, XML as a representation of web pages, geospatial data (coordinates) for geolocating specific events, etc.).
Both Postgres and MySQL support the storage of JSON objects (one of the most popular NoSQL data representations), but Postgres trumps MySQL with its additional NoSQL features:
If your use case depends heavily on NoSQL data, Postgres is the popular choice.
When analyzing time series or working with event logs data, we often use timestamps in the form YYYY-MM-DD HH:MM:SS to determine exactly when an event was observed.
Postgres and MySQL treat timestamps very differently. MySQL will always convert the timestamp to the local time on the server in UTC before storing the value. On the other hand, Postgres offers the same functionality but also has the option to save timestamps with timezone as a native data type.
For the majority of use cases, users do not need to concern themselves with knowing when an event occurred and in which timezone. But for time-sensitive operations (such as financial trading, digital advertising attributions, certain IoT applications, etc.) the ability to determine the time AND the timezone is crucial for core business operations. If you do work with time series, MySQL is simply not the right choice. Opt for Postgres instead.
ACID is a standard set of properties for low-level operations, which guarantee that database transactions perform without errors or corruption. Whether your transaction involves writing something new into the database, updating existing records, deleting old records, or reading/retrieving existing information, you expect the database to do as instructed. That is, create, read, update, or delete (CRUD) records. What could go wrong?
The non-ACID nightmare: Imagine your customer buys from your e-shop while at the same time, your data engineer runs an update on the database and migrates tables to a different location. Because the database did not comply with ACID standards, (only) part of the purchase details were written into the orders table before it was migrated. You’re now left with a customer who has paid for an order, but you’re missing their address and product list. How will you resolve that? And how will you even know which products are missing?
ACID ensures that data does not go missing or get corrupted.
Postgres is fully ACID standards-compliant, while MySQL is only compliant when running on the InnoDB storage engine. ACID compliance is part of the wider SQL standard, which guarantees databases behave as expected. If reliability is high on your “must-have” list, Postgres is the best choice.
In some use cases, you need to have exhaustive data quality. Purchase transactions are one such example since you’re required to keep a transaction record by law. For these cases, make sure to pick a database (storage engine) that is ACID compliant. If you don’t care about missing or corrupted data, and you just want to keep a partial
Both MySQL and Postgres are fantastic examples of relational database management systems, but with different comparative strengths.
MySQL provides better support and is accompanied by a greater number of developers who know its workings inside out. Being slightly faster and scoring high on “ease of use”, it offers great value for low technical investment.
In comparison, Postgres is more ACID-compliant and comes with out-of-the-box solutions for NoSQL, analytic, and date and time use cases. But this comes at a slightly lower performance level. The performance itself can be enhanced with additional performance boosters, such as multiple indexes, but this requires a higher level of expertise in databases.
In conclusion, there is no ideal solution. The choice of which is the better database depends on your specific use cases: MySQL is straightforward and fast, while Postgres is more powerful but also more complex.
At Keboola, we are an equal opportunity kind of platform. Whether you swing towards Postgres, have a crush on MySQL, or have a soft spot for a dozen other database solutions, we accommodate your taste.
Why? Because we strongly believe that there are different ways to solve the same problem, and sometimes you should pick multiple tools to reap the best of them all.
But choosing the right database is just one of the puzzles in the bigger data picture. Get started with building data pipelines or data science models at no cost.