Join our newsletter

#noSpamWePromise
By subscribing to our newsletter you agree with Keboola Czech s.r.o. Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
cross-icon

Run your data operations on a single, unified platform.

  • Easy setup, no data storage required
  • Free forever for core features
  • Simple expansion with additional credits
cross-icon
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

How to use Root Cause Analysis to Improve Engineering

Learn more about root cause analysis and its role in data engineering.

How To
August 13, 2021
How to use Root Cause Analysis to Improve Engineering
Learn more about root cause analysis and its role in data engineering.

Modern engineering has revolutionized almost every complex human endeavor.

From lean manufacturing to globe-wide telecommunications; from software and IT bringing the world to our fingertips to medical devices discovering previously invisible diseases, there is no human endeavor that engineering has not changed for the better. But engineers don’t only build complex systems and tools that help the world run around. They’re also the first line of defense when things turn south. 

Swapping the blown fuses in our electrical grids, replacing pumps to keep the machines running, and debugging software before critical data is lost to downtime.

And things go wrong more often than planned.

Root Cause Analysis (RCA) is one of the most useful problem-solving methods in the engineering toolbox. It is used to identify the root causes of failures within complex engineering systems and correct them.

#getsmarter
Oops! Something went wrong while submitting the form.
Oops! Something went wrong while submitting the form.

Run a 100% data-driven business without any extra hassle. Pay as you go, starting with our free tier.

How is RCA used in engineering?

Root Cause Analysis has three main use cases: 

1. Post-failure corrections. 

When things go wrong, RCA is used to identify the root cause of the problem that caused the accident. 

Within the accident analysis, RCA identifies the symptoms (problem), the causes of the problems that led to those symptoms, and then engineering teams recommend corrective actions to remove the causal source of the problems to prevent issues from happening again. 

For example, if a bug caused a database to crash and lose important information, the developer would not just return the database online, but would also remove the bug - thus, preventing the database from crashing again in the future.

2. Improve existing systems. 

The RCA logic can be applied to existing systems to make them better. 

If a component of an engineering system is working suboptimally, RCA can be used to trace the reason for the sluggish work back to its root. Thus identifying potential causes for the suboptimal system behavior and improving upon them. 

This technique is often used in change analysis and risk management, to determine how complex systems would look like under different hypothetical scenarios. A potential cause for change is identified and RCA is deployed to check how that potential cause would affect the overall system.

3. Monitor existing systems. 

Root Cause Analysis is used during regular monitoring and quality control, to guarantee a high standard of operations. By tracing monitored events to their source, RCA helps engineers identify which monitored elements are performing suboptimally. 

So how does Root Cause Analysis look in practice?

The RCA process and methodology

The Root Cause Analysis technique goes through 7 steps.

Step 1: Problem definition - Determine the what

The Root Cause Analysis process does not start with the root cause of problems. It starts with the problem.

Understand and define the problem that caused the defect in your engineering system in detail.

A good problem definition has three components:

  1. Description of expected system behavior.
  2. Description of how the problem deviates from the expected behavior.
  3. Context - timeline of when the problem started, underlying machinery/technology of the problem... anything else that would help a stranger reproduce your problem.

Step 2: 5-whys - Determine the cause

Ask yourself “Why?” the problem happened five times. The first “why” gets you the immediate cause of the problem.

The second “why” gets you the cause of the cause of the problem.

And so forth until you come to the root of the problem. 

Let’s look at a data example:
Problem definition: The machine learning algorithm used in production to predict the price of electricity your company is trading is outputting extreme outliers for the price of electricity. 

5 whys root cause analysis problem definition


The system of 5-whys was developed by Sakichi Toyoda, and it is widely accredited for the success of the Toyota production system. As you can see from the above example, 5-whys are usually sufficient to get you to the root of the problem. 

Step 3: Causal chain - Determine the how

If it is not clear already from the exercise of the 5-whys, establish a causal chain that links contiguous causes together from the root cause to the defined problem. 

This answers the “How did the problem arise from the root cause” question, by enlisting all the in-between contributors. 

Step 4 (optional): Understand multiple causes

If you have multiple causal factors and pathways leading from the root to the final problem, visualize the multiple causes with an Ishikawa diagram or fishbone diagram.

Named after the shape of a fishbone, the Ishikawa diagram represents multiple causes and their effect on the final problem as separate pathways:

ishikawa diagram
Source

Step 5 (optional): Prioritize which cause to solve

When you are dealing with multiple causes, it might be unclear which one is the root cause and even how to prioritize the different causes for problem resolution.

An RCA tool that comes in handy for this task is the Pareto analysis. The Pareto analysis estimates the importance or contribution of each cause towards the final effect and assigns a higher value to the cause whose resolution would solve the problem the most. 

For example, a Pareto analysis showing that the highest contributing cause to engine overheating was the damaged radiator pump:

pareto analysis example
Source



Step 6: Correct the root cause

Remove the root cause of your problem and make sure the cause does not repeat. 

This step is crucial for not just solving the problem, but also preventing it from reoccurring in the future. 

Step 7: Follow-up

To err is human, to err twice is to be an engineer. Our best efforts often fail due to unexamined assumptions or changing external circumstances. This is why it is important to revisit important problems and their root causes to make sure they have not reappeared (under different disguises). 

There are other root cause analysis techniques, such as the Fault tree analysis, the Failure mode and effects analysis (FEMA), and the barrier analysis. 

We do not cover these techniques here but feel free to research them further on your terms. 

Shortcomings of RCA in engineering

Root Cause Analysis also has drawbacks:

  1. Lack of crucial data. Not all information is available when we look for causes of problems. The faulty machines might have gone online or we failed to collect the data in the first place. This is why it is important to build engineering systems that extensively log metadata about the operational characteristics of the underlying system. So when the time comes to perform failure analysis, data is available. 
  2. Hard to access data. Some engineering systems are notoriously harder to understand than others. A prime example is distributed systems in software engineering. Those systems make data collection and tracing harder than a non-distributed, single-core-running system. 
  3. Subjectivity. RCA is not a scientific method, but an engineering one. There is an art for determining the true cause of the problem. And two engineers might suggest two different causes with the same faulty process. Part of the problem is the multiple underlying causes that affect each other. This is where expertise comes into play - understanding all the possible causes helps you direct RCA efforts towards the root cause of problems, where the problem definition is not just your subjective opinion but is based on years of understanding how multifacet systems work. 

Become a better data engineer by adding Keboola, a data stack as a service platform, to your skillset. Start with our forever-free tier and create your first project in minutes.

Use RCA to become a better data engineer

Even though Root Cause Analysis is used throughout engineering, data engineers tap into this problem-solving technique more often than others.

Data engineering is by its nature a multifaceted and continuously changing intricate system of co-dependent data pipelines. And the more data pipelines there are, the more they break.

Start your journey towards becoming a (better) data engineer with Keboola’s Data Engineer Certificate, learn how to put your RCA skills to the test, and develop a competitive engineering skillset that will help you stand out. 

Did you enjoy this content?
Have our newsletter delivered to your inbox.
By subscribing to our newsletter you agree with Keboola Czech s.r.o. Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Recommended Articles

Close Cookie Preference Manager
Cookie Settings
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage and assist in our marketing efforts. More info
Strictly Necessary (Always Active)
Cookies required to enable basic website functionality.
Made by Flinch 77
Oops! Something went wrong while submitting the form.