Join our newsletter

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.



software agreement

Figure 1: High level overview of Keboola environment and data flows

Keboola Overview

Keboola is fundamentally built and operated on top of Amazon AWS and other 3rd party services, such as PaperTrail (logging), PagerDuty (alerting) or NewRelic (monitoring). Keboola inherits from Amazon important security characteristics for data storage, encryption, access control, archiving and others. Detailed description of underlying Amazon security concerns are well documented in Amazon whitepapers:

Data stored in Amazon (Amazon Redshift) can be encrypted using HSM (hardware encryption, ). Data storage then complies with PCI DSS, SOX and HIPAA (  ).

Access control within Keboola is based on Amazon IAM and uses two-phase authentication to Keboola. All communication goes through SSL connections calling REST APIs. Connection can be established only with a valid token encrypted with blowfish algorithm. Token authorization is performed by Storage API isolated from the rest of the infrastructure.

Keboola Architecture

security showcase graph

Figure 2: Keboola architecture components

Security Architecture

Security is the key concern underpinning all fundamental concepts of Keboola Architecture. There is a strict separation and audit mechanism in how data are uploaded and consequently processed and stored.

Data can be uploaded to Keboola in two ways:

By the Uploader - Data prepared by the customer are stored in an export area (with security requirements specified and employed by the customer). Uploader is a customer side component responsible for retrieving data from customer premises and uploading data securely via HTTPS through REST API to Keboola storage in Amazon S3. All data are at first imported to Amazon S3 service where each and every customer has access to a dedicated encrypted S3 bucket. Customer is in full control of the upload functionality. What data, when and where they should be uploaded is configured and initiated by the customer. Uploader is available for all most common platforms (C#, PHP, Java, Javascript, Ruby, Python) either as an executable or as a source code to be scrutinized, built and deployed by the customer.

By an Extractor - A server side component running in Keboola, remotely extracting data from various sources and uploading them via the very same API as the Uploader does. Extractors can be easily configured directly in Keboola UI.

Every data upload can be performed with a valid token only. For more see Authentication & Authorization.

After successful upload a job is initiated via Amazon SQS ( ). This job initiates an internal worker in an independent security group responsible for picking up data and storing data in Keboola. There is no direct access possible from the internet to any of the Keboola components.

Keboola is operated in 2 Amazon availability zones in a dedicated virtual private environment (Amazon VPC:   ) with hardened security framework of the Keboola application. Every Keboola component is located in a separate subnet with strict access control restrictions to protocol, service ports and source IP addresses, additionally separated into security groups within a subnet.

All data in transit and at rest are always encrypted. Data stored in Keboola reside in Amazon RDS, or optionally in Amazon Redshift. All data that occur in Amazon S3 (each and every import & export of data) is automatically archived to Amazon Glacier after certain time period (default is 180 days). All databases are backed up as well. In Amazon RDS or Amazon Redshift a snapshot is taken several times a day (currently 8 times a day). Snapshots are stored for 14 days.

Outgoing data are exported by writers - components responsible for upload of data for other data consumers. Upload is again performed via HTTPS connection.

Recipes (3rd party applications) run in Keboola environment within dockers and share the same security boundaries.

Keboola development environment is fully independent of the production environment.

agreement image

Figure 3: Keboola components & structure

Security Concerns

Physical Security

Keboola leverages sophisticated AWS cloud security infrastructure that has been architected to be one of the most flexible and secure cloud computing environments available today. Amazon AWS is for three subsequent years by far number 1 cloud provider on the market. Keboola runs in AWS’s highly secure data centers, which utilize state-of-the art electronic surveillance and multi-factor access control systems. Data centers are staffed 24x7 by trained security guards, and access is authorized strictly on a least privileged basis. All personnel must be screened when leaving areas that contain customer data. Environmental systems in the datacenters are designed to minimize the impact of disruptions to operations, and multiple geographic regions and Availability Zones allow you to remain resilient in the face of most failure modes, including natural disasters or system failures.

Availability & Failover

Keboola depends on availability of Amazon services. Amazon in case of EC2, EBS, RDS and Redshift claims to “use commercially reasonable efforts to make each available with a Monthly Uptime Percentage (defined below) of at least 99.95%” (, ). More expressive are hard statistics of availability at where all services met 99.99+ availability.
Keboola operates in two availability zones and uses load balancing across availability zones to increase performance and availability. In case of a component failure, load balancers route traffic to the other availability zone.

Data Location

As with other AWS services, customers can choose exact location (called “region”) of their data and Amazon guarantees that data never leave this location (they can only move within availability zones within the region). By default this is set to AWS US-East for all customers.

Data security & Encryption

In transit
All transport channels go through HTTPS protocol with the latest security policies of AWS

At rest
All data in Keboola regardless location (S3 storage, RDS or Redshift) are all encrypted. Keys are stored and managed by AWS Key Management Service ( ), optionally CloudHSM ( ) is available for additional fees.

Customers can verify grade of security configuration Keboola:

Figure 4: Verification of connection security configuration

Authentication & Authorization

Access control within Keboola is based on Amazon IAM and uses two-phase authentication to Keboola. All communication goes through SSL connections calling REST APIs. Connection can be established only with a valid token encrypted with blowfish algorithm provided in HTTP header. Token authorization is performed by Storage API isolated from the rest of the infrastructure. End users can access user interface with login/password credentials.

Authorization model

subscription logo

Top level access management entity is Organisation, typically this denotes a single customer. Administrator(s) have right to manage projects within the organization. Every user is assigned to a project, on the level of projects all users are equal. On the technical level, the access control is much more fine grained. Administrator can create tokens that enable access only to a particular set of data (bucket). Your can also limit read/write operation. Buckets are defined by the customer’s administrator.

Password management
Keboola enforces the following password management rules:

In development is two-phase authentication via Google Auth.

Separation of environments

KBC uses Amazon AWS VPC (, where all components are isolated to independent network subnets with own ACLs and routing tables. See schema in chapter Security Architecture.
Development environment is fully independent of the production environment, to the level of different Amazon region.

Separation of roles

In Keboola is strictly enforced separation of roles.

Client segregation

The following applies to ensure that client access to other client environments is segregated for processing and backup.

Data loss prevention

All data payloads are stored in Amazon AWS S3 with 99.999999999% durability. See “Data Protection” here  for more details and   for SLA description

Data backup & archiving

Data backups are stored in Amazon S3 storage with archive in Amazon Glacier handled by automatic S3 life-cycle managemet (see details: ). Server instances are not backed up - no data are stored in application itself.

Backend databases are backuped by automatic snapshot functionality ( ). A snapshot of data is taken cca 8 times a day and is available for 14 days. The same applies to Amazon Redshift, which is second supported data warehouse technology - Redshift allows access to backup via standard ODBC driver. Another way of retrieving data from Redshift is to use Keboola snapshotting API which allows customer to get Redshift tables snapshotted from Amazon S3.

All backups are also encrypted at rest.

Note: as Keboola works only as a “tunnel” for data and there are no primary operational data (only data replicas flow through Keboola), Keboola does not require data archiving.

Physical media management

All data copies are handled by native Amazon AWS functions. Amazon AWS uses Guidelines for Media Sanitization (NIST 800-88 or DoD 5220.22-M) where all physical devices are destroyed in Amazon premises and no storage can leave Amazon premises. Detailed description can be found at the AWS Security Whitepaper    page 8, paragraph “Storage Device Decommissioning”.

Incident management

All incidents (including identified bugs) are reported immediately at available to all customers. Customer can register to an RSS feed or subscribe to updates by email.

Access Audit

Keboola is fully audited on the level of Amazon services. Every operation in Amazon services is executed via Amazon API. All API calls (including any operation performed via administration console of Amazon services) are logged in S3 and available to the customer. These audit logs cannot be altered by Keboola (such an activity would be again logged in the audit trail by Amazon service). Customers have full overview all operations performed over their data.

On the application level Keboola logs all API calls as events in elasticsearch and provides customers with full text search capability. Audit trail is available also programatically via an API (   ).

Figure 5: Real screenshot from Amazon CloudTrail for Keboola


All Keboola components are continuously monitored. System availability is monitored by pingdom ( ). Application monitoring is realized by NewRelic ( ), this includes all API calls, SLAs, and overall application performance monitoring. All logs are processed by papertrail ( ) that enables fulltext log analysis. All logs are stored for 14 days. Alerts are managed and escalated via pagerduty ( ).

Communication among all monitoring components is encrypted.

Contract termination & Data deletion

After cancellation of the contract, all customer projects are marked as deleted. From then on customer can not access data anymore. Standard lifecycle management keeps data for 30 days. After that, objects are moved to Amazon Glacier, where they stay archived forever. All customer data can be removed fully and immediately upon customer request.

Certifications & Compliance

For full up-to-date list of certifications and compliance audit reports see  As a reference of used Amazon service please refer to chapter Security Architecture.