Getting Into "the Sexiest Job of the 21st Century" - The Ultimate Guide

Hop Dao
Hop Dao
August 27, 2021

As companies and industries move online, data has taken over oil as the world's most valuable commodity. It is hardly surprising that data scientist became "the sexiest job of the 21st century" and data science is the most in-demand skill currently.

In short, data scientists ask questions related to a fundamental business problem, then work with raw data, collecting, organising and analysing it. They create and use algorithms for the identification of patterns and trends in the work of answering questions.

But to pursue and succeed in the hottest profession in the new era, professionals are required so much more than having an ordinary knowledge of programming, or coding. Let Mastt's data scientist Domenic Prestia walk you through some must-have skills for the job. Dom is working on the future of data science and data discover at Mastt, applying machine learning, data analysis and big data management to bring never before possible insights to the construction industry.

Mathematics - Statistics, Calculus and Linear Algebra

Understanding of statistics is a fundamental requirement to have. There are many subjects that fall under the topic of statistics, and you should understand the basics (random variables, basic probability, probability distributions, etc).

It allows you to explain and interpret data through concepts such as:

Exploratory Data Analysis
Descriptive Statistics
Hypothesis Testing
Experimental Design-Multi-variable Calculus and Linear Algebra - the pillars that machine learning is built on.
Learning them will enhance your skills

Machine Learning algorithms are basically out of the box nowadays, and even fine tuning of the algorithms can be done with minimal knowledge of what is going on under the hood. So knowing Calculus and Linear Algebra may not be required to get started, but to get the absolute most out of techniques and be confident you have the best results, knowing these areas are a must.

Programming

Essential to being a data scientist is putting your knowledge into practice. Being proficient in a language is well and good but practising good programming principles will keep your projects from becoming hot messes of bugs and failure.  

Key design principles for data science are:
KISS (Keep it Simple Stupid)
DRY (Don't Repeat Yourself)
Single Responsibility Principle

Python is a general purpose language, easy to learn and has a lot of support for machine learning/data science.  

R is another language used for machine learning and data analysis but is not general purpose like Python.

Julia is a relatively new programming language, which is a general-purpose language like python, but has a significant speed advantage.  

SQL for accessing databases, which is essential.

Data Wrangling and data manipulation

Data will always be messy, and cleaning it is essential to be useful in modelling. It can be as simple as correcting typos, fixing dates, or can be as hard as interpolating missing data.

Data Visualisation

This skill is needed for exploring data.

Exploration allows you to familiarise with data before modelling/analysing. How you decide to test/model data is heavily influenced by data exploration

Data Visualisation is also critical for storytelling as large amounts of data need to be transformed into something is easy to comprehend. This helps stakeholders make data-driven decisions. For example, your audience won't understand p-values or correlation coefficients. As such communicating results is essential and visualisation will help with this.

To master visualisation, you need to learn:
Basic chart types
Best visualisation tools for your chosen programming language
Which charts are useful for which scenario

Machine Learning

You will need a complete understanding of fundamental ML algorithms, including:

Linear Regression and Logistics Regression
Decision Trees
Naive Bayes
Support Vector Machines
K-nearest Neighbours

K-means-Knowing which algorithms are appropriate for which situations. The No Free Lunch Theorem states "All optimisation algorithms perform equally well when their performance is averaged across all possible problems". This implies there is no single best machine learning algorithm for predictive modelling problems, and so having a wide breadth of experience with different algorithms will allow you to get the best possible solution.

Source: Reddit

Interested in what we do at Mastt? Get in touch now via email at hello@mastt.com.au.

See how Capital Projects & Portfolios achieve the best for less