Join our newsletter

#noSpamWePromise
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
cross-icon
Subscribe

Run your data operations on a single, unified platform.

  • Easy setup, no data storage required
  • Free forever for core features
  • Simple expansion with additional credits
cross-icon
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Download the file

#getsmarter
Oops! Something went wrong while submitting the form.
cross-icon
How To
July 29, 2021
Why you need metadata management and how to approach it
Learn everything about metadata and metadata management. Discover more about how to get started.

As your data operations evolve, they become messier. 

Diverse data sources and data models at their sources, multiple movements of data throughout your platform, and cobbled-up infrastructure, which has grown in complexity through every deployment have made it hard to identify, trace, classify, and understand your data assets.

This can be as simple as an analyst spending hours trying to figure out where a data attribute in a table came from and whether it is trustworthy.

But it can also be as complex as a multi-month migration to the cloud, which is constantly delayed by trying to figure out whether a data pipeline should be moved, refactored, or simply dropped. Taxing your developers and prolonging your digital transformation due date.

And the issues scale with the growth of data your organization processes.

Metadata management and a good metadata strategy can become invaluable allies within your enterprise architecture to combat the entropy of evolving data operations.

What is metadata?

Simply put, metadata is “data about data”. It gives the context of who, what, where, why, when, and how interacted with the data.

For example:

  • What: the value of the field customer_acquisition_date in your table new_customers
  • Who: is the data steward responsible for that field
  • Where: is the origin of the field, which server hosts the data, ...
  • Why: additional business rules and context, such as why do we use UTC dates instead of client timestamps
  • When: the time the value was created and last updated
  • How: the pipeline/script that drove data through transformations to its destination

A good data platform will automatically generate metadata alongside its main processes. During the ETL process, the core data is augmented with metadata captured at all operational processes. Aka during:

  • Data capture from raw data sources
  • Data transformation and cleaning
  • Data ingestion into your data warehouse
  • Data movement around different storage and compute assets
  • Data load into an OLAP database or data warehouse
  • And later on, at any data usage - all user interactions are augmented with metadata 

Why is metadata important?

Metadata provides the context of who, what, where, why, when, and how. This information is crucial for establishing metadata management.

A properly configured metadata system helps us answer important questions:

  • What data assets do we have?
  • Where did they originate from?
  • Where is the data now?
  • When was the data created? When was it last updated (aka how fresh/stale is it)?
  • How has the data changed through the updates?
  • Is the data of sensitive nature?
  • Who has access to the data? In what role/capacity?
  • Who is responsible for data management?
  • Etc. 

The questions asked bridge the different types of metadata: business metadata used to drive data governance and understanding, and technical metadata (also called operational metadata), used to drive engineering efficiency. 

Let’s dive deeper into the world of metadata management.

What is metadata management?

Metadata management refers to the system of administering metadata. That is, taking care of its production, capture, systematization, and ultimately use throughout the enterprise. 

Several stakeholders have skin in the game with metadata management:

  • Data stewards rely on metadata to both interpret and guide data operations and interpretations throughout the organization. 
  • Data owners are responsible for the production and capturing of metadata. They use it to guide architectural and engineering choices.
  • Data consumers, such as business users, use metadata to inform their data-driven decision-making. 

Let’s look at the benefits of metadata management within an organization.

Why is metadata management important?

Metadata management is the driver behind the crucial data capabilities of modern enterprises.

Data governance

Governance processes encompass all high-level decision processes surrounding data management. From establishing data lineage to securing data quality and accessibility, data governance establishes a set of rules and best practices for driving data standards within an organization.


A metadata management system unlocks the potential to both build policies around metadata (e.g. establish who has access to certain data assets), as well as verify whether the governance processes being set in place are being adhered to.

Regulatory compliance

Regulatory compliance - as required by GDPR, HIPAA, PII, BCBS, or CCPA - legislates the allowed use of data and imposes certain standards surrounding sensitive data (such as personally identifiable information, PII).

Metadata management allows you to both control and retroactively inspect data assets for compliance. For example, if a customer requests all their data (as allowed by GDPR), the metadata management system allows you to quickly trace data throughout your organization and reply to the customer faster and without wasting time tracing data through your ETL pipelines

Data catalog 

The data catalog, or data dictionary, is a business glossary containing the business definitions surrounding data.

For example, how do you define a new customer? In a subscription model, where a customer churned and later returned, does that count towards a new customer, or an existing one?

Data catalogs bring clarity of interpretation and disambiguate potentially wrong analyses of data.

Data lineage 

Understanding where data comes from (data provenance) and how it moves throughout the workflows in your system is crucial for multiple features: from tracing (regulatory compliance), to root cause analysis (more below). 

Metadata management allows you to follow the data lineage of assets throughout your system. 

Root cause analysis

Data lineage is leveraged for error resolution via Root Cause Analysis (RCA). Consulting the mappings of the data journey helps you trace the error down the pathway to its source and remove the error. This speeds up error resolution as well as improves the data quality.

Increased data quality

Understanding the business context of your data (data catalog), where it came from (data lineage), resolving the issues surrounding your data (root cause analysis), and constantly monitoring metadata for outliers and unexpected patterns allows you to increase data quality and build trust throughout the system.

Faster speed to insights

Metadata management systems offer you faster time to insights. A common bottleneck of analytics is trying to understand the data analysts are working with, leading to slower analysis delivery.

This fosters greater collaboration between departments (the analyst can inspect the metadata to determine the owner of a workflow that produced the data) and shortcuts the time to understanding data (for example, by inspecting its origins or the data catalogs).

All together, faster insights also bring quicker project delivery, faster iterations, that both drive company growth as well as liberate the analysts’ time for more revenue-generating work.

Start using your metadata: get there faster with Keboola

Keboola is an end-to-end data platform. It is built to automate the entire data integration pipeline: from collecting data, transforming it, and storing it for analysis.

Though its central features are focused on automating data work, Keboola was built with as a metadata management system:

  1. Each data operation within the platform is tagged with operational metadata describing user activity, job activity, data flow, schema evolution, data pipeline performance, compliance with security rules, etc. The metadata is generated, captured, and processed automatically, so you only have to run your usual data operations, while Keboola creates your metadata management system.
  2. Keboola offers a sophisticated Data Catalog system that allows you to record the business logic behind data to foster cross-departmental understanding, but also share data alongside the data catalog. 

Take Keboola for a spin. Keboola has an always-free, no-questions-asked plan. So, you can explore all the power Keboola has to offer. Feel free to give it a go or reach out to us if you have any questions.


Run a 100% data-driven business without any extra hassle.
Pay as you go, starting with our free tier.

Run a 100% data-driven business without any extra hassle.
Pay as you go, starting with our free tier.

Recomended Articles