Discover what it takes to become a better data scientist and how we can support you each step of the way.
Being more productive than your super competitive peer group is hard. Being 10 times more productive might sound like an impossibility, an exaggeration.... or even a myth (unicorn, you say?).
A 10x data scientist is literally 10 times more productive than the average data scientist. The skillsets of these data scientists create better career opportunities, higher peer recognition, and more interesting projects to work on.
But the disproportionate gap between super performers and average Joes is one that is well known within the data community. Let’s explore how you can stand out from the crowd and become a top performer.
With the proliferation of data production, the increase in compute capacity and the competitive edge that data offers for innovation, data science arose to capitalize on the novel opportunity.
The core of data science consists of three pillars: math, computer science, and domain knowledge. A data scientist takes data (math and statistics), molds it at previously unimaginable speeds and with tailored techniques (computer science) to address business challenges which were once insurmountable (domain expertise).
But that seems like a wide surface area to cover. One can quickly spread themselves thin with their efforts to master all three pillars. So, by the pure interdisciplinary nature of data science, can one ever be a master - a 10x-er - or is one doomed to be a Jack of all trades?
Or as the famous quote puts it:
“A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.”
The truth is, real mastery is rare. So rare, in fact, that it’s highly unlikely that anyone can reach the top of a single discipline, let alone three.
This is also recognized by the industry hiring choices and organizational structure for data scientists. The best players in the data science world acknowledge that all three areas are necessary for data science, but that a person will mainly specialize in one area:
The industry giants hire some of the best players - the 10x-ers - in the field. So, to 10x your data science skills, you do not need to master all three areas. Nonetheless, you do need to become a data science expert.
To improve your data science skills significantly compared to your peer group, you have to follow a certain path.
The foundations are necessary for understanding how to deliver your work as a data scientist. Review your linear algebra, statistical inference, calculus, algorithms, programming design patterns, and get knowledgeable about the domain that you are operating in.
Unless you are working as a machine-learning engineer, there is no need to overindulge in the theory behind it. For example, you need to know that not setting your max depth on decision trees can cause overfitting, but you do not need to know how to implement a different version of C4.5 as the algorithm behind the decision tree.
Solidify the foundations by putting it into practice. Start up that Jupyter Notebook and tackle data science projects. You need to develop enough muscle memory to know how to call the relevant libraries, set up a classifier or regression model and optimize it (the hyperparameters are not going to tune themselves) without turning to Google for help at each step.
But once you’ve done your Kaggle exercises, it’s time to move on from guided learning. The tutorials, books, and lectures are a good companion on your path to 10x, but you need to carve your own way if you ever wish to be the leader in your peer group, not the follower.
What sets apart an expert (any expert, not just a data science expert) from a novice is the amount of practice that the individual put into their craftsmanship.
“The master has failed more times than the beginner has even tried” - Stephen McCranie
This idea has been popularized in Malcolm Gladwell’s book Outliers: The Story of Success. Gladwell gives us the magic number of 10,000 hours. Every outstanding performer, from the Beatles to hockey players, puts in 10,000 hours of sweat before eventually achieving greatness.
But practice in and of itself is insufficient. As we saw from the 10x developer research, programmers with comparable mileage can either be 10x-ers or average Joes, and they all had the same seven years of experience (7 years x 250 work days x 8 hours = 14,000 hours). Instead, what we need is to practice deliberately.
Deliberate practice involves consciously guiding your exercises to fill the gaps in your skillset. As a process, deliberate practice involves repeating these steps:
In a nutshell, deliberate practice means that you’ll be able to code out a solution to a data science problem. Then, with a fresh set of eyes, you will look at your solution, critique it, and recode it to improve.
Being a specialist means that you dig deeper than the majority of your peers in the field.
Specialists come in many different forms. You could specialize in the family of clustering algorithms for insight discovery, high-performance computing for programmatically speeding up machine learning algorithms, or be a domain expert for fraud detection approaches in banking transactions (among many others).
No matter what area you specialize in, being a specialist results in higher paychecks, more interesting work problems (often at the frontier of what is currently known), and a seat at the 10x table.
Specialization automatically makes you more productive than your peers in the specific field of your expertise. But it does come at a cost. Every hour that you devote to deliberately developing your area of expertise takes time away from practicing other areas, which is why you also need to lateralize.
Lateralization means acquiring the skills that are adjacent to your current know-how. A good way to think about lateralization is to look at your current specialty in comparison to other adjacent areas:
What are the benefits of lateralization?
The end goal of specialization and lateralization is to acquire a T-shaped skill profile to capitalize on the benefits of both. Concentrate on your areas of specialty (the vertical line in T) while keeping a wider overview of your field of work (the horizontal line in T).
Keboola was designed to set data practitioners on the path to success. The data operations platform automates the entire data process, from raw data collection to machine learning insights.
Doing data science within Keboola helps you to reach your stretch goals faster with its devoted features:
Getting started with Keboola is easy:
The first time you log into your free account, the platform will guide you through constructing your own ETL pipeline: from raw data to cleaned and analysis-ready data. Just follow the Guide Mode.
Pro tip: If you don’t have any extractors (raw data) which you can load in LESSON 2, pick a CSV file from a dataset in Kaggle. Below, you can see us loading the famous Iris dataset (you thought it would be the Titanic, didn’t you?).
Spinning up your Jupyter Notebook to start coding your data science project is as simple as creating a sandbox environment:
2. If you want, you can add additional transformations before you start coding:
3. Click on “Sandbox” in the right-hand panel.
4. An overlay will appear, where you can confirm that you want to create a Sandbox environment and load the data into it.
5. A new password-protected Sandbox will be created. Click on “Connect” to run it. A Jupyter Notebook will open in a new browser window, which is accessible using the password from the previous step.
6. Voila, you’re ready to start coding!
Sharing your work with others is as simple as copy and pasting the URL for your Jupyter Notebook. As long as the notebook is running, people can access it (provided they have a password).