What is it, why do you need it, and how do you set it up?
Data governance is a set of rules, processes, role delegations, and responsibilities that clearly define data quality, accessibility, data security, and regulatory compliance for an organization’s enterprise data management.
You are most probably already applying some sort of a data governance program - setting up rules for the management of data and acceptable use of data assets. For example, the mantra “Do not share our customers’ personal data outside of the company” is already a type of data governance initiative.
A data governance framework does the same but more mindfully and explicitly. It defines clearly and transparently for the entire company what is the data governance strategy that applies to the data management of an enterprise’s data throughout its lifecycle.
Setting up data governance for your enterprise offers multiple advantages:
What it means to have a successful data governance model will depend strongly on the regulatory requirements of your company.
A bank needing to adhere to Basel II will not have the same data governance framework as an e-commerce shop that needs to safeguard the data privacy of its customers under GDPR.
But irrespective of what regulations apply to your business processes, four pillars are part of a successful data governance framework.
Data security refers to all the preventive actions, guardrails, and corrective mechanisms set up to guarantee data protection.
There are two streams to data security.
First, your company needs to have a day-to-day framework of how to manage data security. This often falls in the hands of the IT department or data engineering department. They set up rules surrounding data access. For example, they decide who can view data sets, who has credentials to access data in a data storage like a data lake or data warehouse, and how to revoke access to the data sources.
Second, there is a more nuanced and more regulated data governance practice for sensitive data. Data protection is especially concerned with protecting sensitive data as part of regulatory compliance (e.g. GDPR). A good data governance practice is to specify how an organization’s data can be used. For example, is customer profiling even allowed, or does it violate privacy rules?
Effective data governance frameworks clarify who is in charge of data assets and data governance processes.
At a minimum, a data governance framework needs to define who are the data owners and who are data stewards.
Data ownership refers to all the technical responsibilities for the production of data, its quality, and integrity.
Data stewardship, on the other hand, refers to the business aspect of data governance. Data stewards act as translators and culture champions, helping non-technical subject matter experts access and understand data to improve their decision-making.
Depending on the size of your organization, you might need a more branched out data governance team, with a Master Data Management leadership role, to coordinate the data owners and data stewards across your business units.
The goal of data governance is to deliver high-quality data you can trust. High quality is both a technical and business term.
On one hand, your technical ecosystem of tools needs to comply with high data standards. Data lineage tracing, monitoring for errors, alerts, and automated error corrections are just some of the technical checks and balances implemented in systems that guarantee high-quality data.
The classical minimal standards test is whether the data that was extracted from the data sources complies with the data integrity and referential constraints of your data model.
More advanced companies also impose semantic tests to reflect the business rules within an organization. For example, they validate and sanitize business metrics before they are used in analytics to derive insights and make business decisions.
Unless all your people can tap into data to make better decisions, your data governance framework is missing something.
This is why data democratization is often one of the pillars of a good data governance strategy.
Sharing data and understanding the data you work with is paramount to making good decisions. This is why the best data governance frameworks plan for a data catalog - a source of data definitions. It acts as a self-serve business glossary, that allows everyone to access and understand the data they work with.
The four pillars of a data governance framework act as guidance on what you should include in your data governance plan.
But you have to look out for potential hurdles that might come your way.
Setting up data governance sounds good in theory, but it can be easily led astray in practice.
There are two challenges you need to overcome before you can successfully deploy your data governance framework.
Let’s be honest, rarely anyone in your organization wants to optimize metadata management out of the blue and without it being tight to their business unit’s OKRs and KPIs.
The success of a data governance initiative will be highly correlated with its relevance to the organization’s needs.
Smart data teams tie the data governance efforts to existing processes, such as:
In other words, show how data governance fits into the wider data ecosystem of your organization.
Data governance programs are often destined to fail from the get-go because of the data team leading the data governance initiative.
In many organizations, the IT or data department tries to take over the data management on their own without buy-in from senior leadership like C-suits.
But without the buy-in, they are doomed to have limited resources, unaligned work with company-wide goals, and often clashing priorities with other change management within the organization.
As Takehiko (“Tak”) Nagumo, managing executive officer at Mitsubishi UFJ Research and Consulting (MURC) said:
“Just like any other important matters, we need the board’s backing on data. Data’s existed for a long time, of course, but at the same time, this is a relatively new area. So a clear understanding among the board is the starting point of everything. We provide our board educational sessions, our directors ask questions, and all that further deepens their understanding.”
To set yourself up from the get-go it is crucial that:
One such tool is Keboola.
Keboola is an end-to-end data operations platform that comes with in-built data governance tools, that help you automate a lot of the heavy lifting when implementing your data governance policies:
But Keboola is not just a tool for automating data governance. It is designed to automate and speed up all data operations, from ETL pipeline construction and maintenance to deploying machine learning models in production.
We offer a no-questions-asked always-free tier. Try Keboola out and check for yourself what Keboola can do for you.