Keboola operates at the intersection of high performance and usability. It is designed to be used by both engineers and non-technical domain experts, by offering low-code and no-code solutions that automate the heavy lifting behind ETL processes.
One of the largest libraries of pre-built components. Keboola components are pre-built modules that help you extract raw data and load data between multiple endpoints: Relational databases (MySQL, Oracle, Postgres, SQL server …), SaaS apps (Salesforce, CRMs, Facebook Ads, …), files and unstructured data (JSON, Excel, XML, CSV), cloud data warehouses (BigQuery, Snowflake, Amazon Redshift, Microsoft Azure).
Fully extensible. If there is no pre-built connector, you can use the Generic Extractor that can collect data from any API-like source or the Generic Writer to load data to any source.
No-code and low-code transformations. In Keboola, you can use a fully flexible scripting transformation in Python, SQL, R, or Julia, use dbt transformations, or use pre-built no-code transformations. For example, you can remove duplicates from a dataset in a couple of clicks.
End-to-end automation. Data flows can be automated with Orchestrators and Webhooks. Every job is fully monitored, so you can always keep an eye on execution. And every ETL data process can easily be shared and reused to save you development time.
Easy to use. Keboola is all about democratization. It offers developer tools (CLI, CI/CD, git, IDEs, …) and no-code tools (visual builders, drag-and-drop graphical dashboards) to make the life of your users easier.
Keboola is not great for real-time data flows. Keboola offers near real-time data integration. Orchestrators can trigger data extraction every 1 minute and webhooks can be used for almost instantaneous data collection from different sources. But Keboola is not a data streaming service and does not offer continuous data extraction.
Best for: Teams of technical data experts (scientists, engineers, analysts) and data-driven business experts who would like an all-in-one ETL solution.
Price: An always free tier and a Pay As You Go tier. The first 3000 minutes are free every month for every client, irrespective of tier. Afterwards you are charged only for what you need at 14 cents per processing minute.
“Instead of separately selecting, acquiring, configuring and integrating an endless list of technologies to build your data stack, Keboola gets you there in one platform.”Robert C., Head of Product at Gymbeam
Set up an ETL process in minutes and go from data to insights immediately. Start for free.
Integrate.io launched in 2022 when Xplenty, FlyData, Intermix.io, and Dreamfactory were consolidated to create the Integrate.io ETL platform.
Integrate.io is a low-code ETL solution that allows you to integrate data flows into your apps and data warehousing solutions via APIs, webhooks, or natively within their cloud-based platform.
Can handle ETL, ELT, and reverse ETL.
Offers no-code solutions.
Great technical support staff.
Expect a steep learning curve to learn how to use the product and all its features.
Cloud platform, no on-premise solution.
Cannot be used for data replication.
Lacks real-time data synchronization features.
Logs (especially when the ETL workflows fail) are extremely hard to read and debug.
Best for: The technically savvy engineer who needs an easy-to-code ETL platform, but does not want to dive deeper into the ETL process (and failures).
No transparent pricing, you’ll need to talk to sales before you can start using the product.
Integrate.io offers a 14-days free trial.
You pay for each additional connector/integrator, so expect costs to scale non-linearly as you grow.
Fivetran is a cloud-based ETL solution primarily focused on the extraction and loading part of the ELT process.
Visualizations for data lineage tracing throughout your data flows.
Offers functionality for database and SaaS data replication. Great for embedding customer data into your application.
The data replication capabilities are near real-time and are excellently executed.
Your first historical data load is free of charge, allowing a smooth transition to Fivetran.
If a connector is not covered by Fivetran out-of-the-box, coding your own connector will require extensive technical and cloud infrastructure knowledge.
No native transformation ability. Fivetran relies on dbt Core for transformation, and you’ll have to integrate dbt into your data architecture to transform data.
No on-premise solution, only cloud.
Not fully transparent and compound pricing model. Fivetran charges based on Monthly Active Rows. The rows are counted multiple times: once they extract data, when it is used in transformations, and when it is written into a data source.
Best for: The data engineer who wants to embed common SaaS data into their own application.
No free plan, but they do offer a 14-days free trial.
Their pricing model is a bit complex. Fivetran charges you for “Monthly Active Rows”, aka the number of data rows you process within a month.
Use their calculator to get an estimate, but expect to pay anywhere around 1000$/month as a rule of thumb.
Stitch is an open-source ETL tool that focuses on data extraction and loading data to a data warehouse or data lake. It was acquired by Talend and is integrated with many complementary paid and proprietary data services by Talend.
Can extract data from multiple sources. Sources not covered out-of-the-box by Stitch can still be extracted using the company’s extensibility framework.
Matillion ETL is a data integration tool that can build ETL data processes through a simple no-code/low-code drag-and-drop user interface (UI).
Intuitive drag-and-drop UI that helps you build ETL processes with low-code or no-code.
Single data operations scale well with Change Data Capture (CDC) and batch processing ingrained in data operations.
Full support for ETL, ELT, and reverse ETL. The number and types of connectors covered by Matillion are extensive enough to cover the vast majority of use cases.
Limited to data warehousing destinations. You can connect your business intelligence tool to the data warehouse of your choice, but Matillion will not ingest data directly into your data visualization software.
The no-code ETL, metadata management, and data lineage tracing features are locked behind the higher tier pricing.
Can have issues with scaling hardware infrastructure, especially EC2 instances at higher loads.
Pricing is compounded - you pay for Matillion and the compute resources it uses to perform data operations on your cloud.
Users often report documentation can be stale, new versions of Matillion are rarely backward compatible (so you need to do a lot of maintenance when updating the software), and there is poor support for versioning (git).
Best for: Database engineers at startups, who would like to build a data model by integrating their various data sources into their data warehouse.
No free tier for the ETL tool, there is only a freemium model for data loading.
You pay for Matillion credits that are consumed when rows are being written, deleted, or updated or when you add the 6th user to your Matillion workspace.
Pricing starts at $2.00/credit.
Skyvia is a cloud-native data integration platform that taps into cloud-based features to offer virtually limitless scaling of ETL processes.
Scales easily with your cloud resources.
Low-code and no-code features help make Skyvia accessible to both your technical and not-so-technical coworkers.
Great support for Salesforce integrations.
They offer a free tier that is limited in functionality but can be used for cost-effective and simple ETL workflows.
Cloud platform, no on-premise solution. But they offer an on-premise data integration solution for ad hoc data extraction.
Their connector coverage is a weird mix of enterprise SaaS and startup SaaS offerings. Though Skyvia excels as a data integration tool for Salesforce, Excel, Google Analytics, and Google Sheets, their other source data endpoints are chosen seemingly at random.
Pricing increases steeply with data volume growth. Skyvia is not designed to scale with big data operations.
Best for: The cost-sensitive user who has simple data integration needs.
Offers a free tier which is limited to processing 10k monthly records, so you should think of it as a free trial rather than an actual free tier.
The lowest pricing plan starts at $15/month for up to 5 integrations. If you plan to add more data sources and destinations, expect to start at $79/month.
Runner-ups for the best ETL tool
The above 6 best tools were chosen with a specific goal: automate the end-to-end ETL workflows within the best performance per price tradeoff.
However, those are not the only tools on the market. Many great tools were not included because they are not interoperable (e.g. run just on one cloud provider), or are not fully capable of running ETL processes (e.g. are great for database replication, but not for extracting data from sources).
Here we take a look at the tools that did not make the final cut for best ETL tool:
Talend Open Studio - an open-source ETL tool that is fun to use, but requires a lot of tinkering and debugging.
Pentaho data integration (previously, Kettle) - a data integration tool, ideal for automating database administration tasks, but does not cover many popular data sources.
AWS Glue - a serverless data integration solution that can automate ETL, ELT, and reverse ETL, but is limited to AWS.
Informatica Powercenter - an ETL tool like Keboola, but for huge enterprises (think IBM, Microsoft, etc.). High performance at the expense of a high learning curve.
Oracle Data Integrator - a high-performance data integration platform geared towards extremely proficient data engineers at large enterprises who need to solve complex data streaming use cases.
Azure Data Factory - a fully managed, serverless data integration service provided by Microsoft Azure, that works best with the Azure cloud ecosystem.
With so many tools at your disposal, which one do you pick?
How to evaluate ETL tools?
To pick the best ETL tool for your company, evaluate it against these 6 criteria:
Data sources and destination coverage. ETL solutions differ in the number and type of connectors they offer. Prepare a list of all your data sources and destinations and match them against what the ETL tool offers. Make sure to check if the tool also offers universal connectors for sources and destinations not covered out of the box.
Data transformation abilities. Does the tool cover your data transformation needs? Can it extend to complex data transformations? Can it be used as low-code or no-code?
Target audience. Some tools are no-code, while others require you to know how to code in SQL, Python, Scala, Java, etc. Pick the right tool depending on who will use the ETL tool (ex.: data engineer vs business expert).
Pricing. Open-source ETL solutions are usually cheaper from the start (no set-up costs, no licensing fees), but are more expensive once you use them (maintenance, debugging, custom coding, …). Commercial ETL platforms are the other way around (the tool provider usually covers maintenance). Calculate the total cost of ownership to understand what pricing level is the best for your usage needs.
Ease of use. Intuitiveness will help you speed up the deployment of new ETL workflows. Also, check the tool for all the automation that makes your life easier. Does it offer scheduling features? Collaboration and user-role handling? Versioning of data? All the little automations that help you ease up the work surrounding ETL processes.
Support and documentation. When things go wrong, is there a strong support system, such as vendor-guaranteed SLAs for support? Or if the tool is open-source, is there a strong community of users who can answer your questions (e.g. on StackOverFlow)? Is there extensive documentation you can rely on?
Pro tips for the data engineers
The 6 criteria cover the most common considerations when scouting for a new ETL platform. But there are other technical considerations to keep in mind:
Deployment. Can you run the ETL tool on-premise or is it exclusively cloud-based? If you have special constraints, the deployment location might be important
Scalability. Can the tool scale and grow with your data needs to cover large volumes of data?
Tracing and monitoring. Metadata analytics, alerting systems, and monitoring of failed ETL workflows will be necessary at some point. Make sure your tool empowers you with all the observability you need.
Pick Keboola to set up scalable ETL processes in minutes
There are a lot of ETL tools on the market. But Keboola offers the best value for money.
Keboola has an always free tier that unlocks all the ETL, ELT, and reverse ETL features, without even swiping the credit card.
And you don’t have to worry about deployment or maintenance issues. Keboola takes care of all the heavy lifting in the background, so you can focus on more revenue-generating tasks.