Are you an aspiring data scientist looking for a structured learning path?
We’ve put together a complete data science roadmap to help you reach your goals. Through a series of milestones, this roadmap will serve as a step-by-step guide throughout your data science learning journey.
To be a data scientist, you need a mix of programming, analytical, and decision making skills backed by business acumen. From programming for data science to building predictive models, this roadmap has you covered. We have categorized the skills—from beginner to advanced—along with the timeline that you’ll need to work on each tech stack to understand the foundational concepts.
In this part 1 of the data science roadmap, you’ll find learning paths for Python, SQL, and Math for Data Science.
Let’s get started!
Milestone 1: Python Programming for Data Science
Over the past several years, Python has emerged as the go-to programming language for data science. Its beginner-friendly syntax has made it a favorite first programming language amongst aspiring developers.
Week 0: Python Fundamentals (Basic)
- Coding environment setup
- Running a Python script
- Variables, data types
- Reading inputs from user
- Conditionals
- Loops
Practice:
Build a number guessing game with the following functionality:
- Read user input
- Validate guess against the secret number (use conditionals)
- Use loops for controlling the number of guesses allowed
- Breaking out of the loop if the user guesses the number or the maximum number of guesses has been exhausted
Week 1: Python Built-In Data Structures (Basic)
Goal: Learn built-in data structures and methods for CRUD operations
- Mutable vs Immutable objects
- Lists
- Tuples
- Dictionaries
- List and Dictionary Comprehensions
- Strings
Key takeaway: Understand when to use which data structure
Practice:
- Create all the above data structures in Python.
- Use built-in methods specific to the particular data structure.
- Create new lists and dictionaries using the syntax for comprehensions.
Week 2: Python Functions (Basic)
- Basic Python built-in functions
- User-defined Python functions Defining and calling functions Return value and multiple values Functions with default arguments, variable number of arguments Using command-line arguments
- Useful built-in functions such as
range()
andenumerate()
Practice:
Write a Python function to read in variable number of arguments (integers) and return their sum and the number of arguments passed in.
Week 3: File I/O and Exception Handling (Intermediate)
- Working with files
- Read from and write to files
- Exception handling using
try
andexcept
statements
Practice:
- Create and work with file objects
- Write to a text file
- Read text file
- For exception handling, practice how to handle the
FileNotFound
error
Week 4: Web Scraping and Data Collection with Python (Intermediate)
- HTTP requests with
urllib
and requests library - Working with JSON and XML
- Parsing date formats
- Web scraping with
BeautifulSoup
- [Good to have] Regular Expressions
- [Optional] Web scraping with Scrapy
Practice:
- Scrape a website of your choice, such as HackerNews or any developer-focused website. Be sure to verify from robots.txt that you have permission to scrape that site.
- Scrape and retrieve data from the website and parse it using the techniques you have learned this week.
Week 5: Functional Programming (Intermediate)
- First class functions
- Higher order functions
- Lambda functions
- map(), filter(), reduce()
Practice:
- Take an example where you used list comprehension in week 1
- Try using list comprehension construct along with conditions
- Use map() and filter() to rewrite the list comprehension as needed
- Use lambdas inside the map() and filter() functions
Week 6: Object-Oriented Programming (Advanced)
- Classes and Instances
- Class and instance variables
- Class, instance, and static methods
- Inheritance
- Dunder Methods
- Dataclasses
Practice:
- Create an employee class to store details of employees in a fictional firm of your choice
- Apply the various OOP concepts that you have learned this week
Week 7: Advanced Python (Advanced)
- Generators
- Decorators
- Context Managers
- Collections and Itertools module
Practice:
- Explore the collections and itertools module
- Take a function of your choice you should decorator to modify the function without changing the statement in the function’s body
Milestone 2: SQL for Data Science
After you’ve learned Python, your next milestone should be to gain a good understanding of SQL or Structured Query Language. Most interviews for data roles involve a programming round (preferably Python), followed by at least one round of SQL interviews.
Like Python, SQL is intuitive to learn and easy to understand, but cracking SQL interviews for data science roles requires constant practice to hone your problem solving skills, and it’s a skill that’ll help you greatly in your day-to-day job as well.
In this section, we’ll give you a learning path for SQL—from beginner to advanced—along with interview questions or practice problems.
Practice is the secret key to getting better at SQL!
If you are looking for resources to learn one or more of the following concepts check out our guide check out the guide below.
Week 1: Basics of SQL (Beginner)
- SELECT statement
- Logical and comparison operators the
- DISTINCT, WHERE clause
- ORDER BY, LIMIT
- LIKE, IN, BETWEEN
- IS NULL
Practice
Basic SELECT, HackerRank SQL Practice
Advanced SELECT, HackerRank SQL Practice
Week 2: Aggregate Functions (Intermediate)
- Why do we need aggregate functions in SQL?
- COUNT
- SUM, AVG
- MIN, MAX
- GROUP BY, HAVING
Practice
Aggregation in SQL, HackerRank SQL Practice
Week 3: JOINs and UNIONs (Intermediate)
- Inner and Outer JOINs
- Left and right JOINs
- Self JOINs
- UNIONs
Practice
Basic JOIN, HackerRank SQL Practice
Advanced JOIN, HackerRank SQL Practice
Week 4: Window Functions (Advanced)
- Why do we need window functions?
- LAG, LEAD
- NTILE
- RANK, DENSE_RANK
Window Functions Practice Questions, StrataScratch
Week 5: Advanced SQL (Advanced)
- Working with dates
- SQL subqueries
- Common Table Expressions (CTE)
Subquery Expressions Practice Questions, StrataScratch
Milestone 3: Math for Data Science
At this point, you’ve learned Python and SQL—the two key skills to get started with a career in data science. This section will be on Mathematics for Data Science. You don’t need a degree in mathematics to be a successful data scientist. But a fairly strong understanding of the following math fundamentals can help you become a better data scientist and better explain decisions.
A high-school level understanding of the math concepts will suffice. As a data scientist, you should always be willing to learn, explore and upskill yourself. You can always build on the foundations and learn advanced concepts as you progress in your data science career.
For each of the sections, we’ll also leave you with a few resources that can help you get a good grasp of the concepts within a minimal timeframe. We’ll also share a challenge application that you should try for yourself—putting your Python programming and math skills to test!✔
Week 1: Linear Algebra
- Vectors and vector spaces linear depends and independence
- Linear independence of vectors
- Bases of a vector space determinant
- Matrix as a two-dimensional array of numbers
- Matrix subspaces and rank
- Significance of matrix factorization techniques
- Eigenvector Decomposition
- Singular Value Decomposition (SVD)
- Principal Component Analysis (PCA)
Resources
Essence of Linear Algebra by Grant Sanderson, 3Blue1Brown
Linear Algebra, Khan Academy
Application: Choose a sample dataset and perform dimensionality reduction using Principal Component Analysis (PCA).
Week 2: Differential Calculus
- Understanding functions
- Computing derivatives of functions
- Rules of differentiation
- Local and global optima
- Chain rule of differentiation
- Partial derivatives
Resources
Essence of Calculus by Grant Sanderson, 3Blue1Brown
Differential Calculus, Khan Academy
Application: Understand the working and implement the gradient descent optimization algorithm from scratch. Put your linear algebra and calculus skills to test.
Week 3: Probability and Random Variables
- Basic principles of counting
- Permutation, Combination
- Set theory review: union, intersection, and more
- Sample space, event, probability of events
- Conditional Probability
- Bayes Theorem
- Random variables (functions mapping from the sample space to the real line)
- Discrete and Continuous Random Variables
- Probability Mass Function for Discrete Random Variables
- Probability Density Function for Continuous Random Variables
- Expectation and Moments of a Random Variable
- Independent random variables
Resources
Statistics and Probability, Khan Academy
Application: Learn to generate and sample from probability distributions in Python: Use Python for Probability, CS109 @Stanford as the reference.
Week 4: Statistics
- Quantitative vs qualitative data
- Summary statistics: mean, median, and mode
- Measures of dispersion: Standard deviation, variance
- Understanding data distributions
- Law of large numbers
- Hypothesis testing
- p-value and significance
- Correlation Coefficient and its significance
Resources
Statistics and Probability, Khan Academy
AP Statistics, Khan Academy
Intro to Statistics, Udacity
Intro to Inferential Statistics, Udacity
Application: Choose any dataset from your domain of interest. Analyze the distribution of various features, compute summary statistics and record your inferences in a report/ Jupyter notebook.
Check out part 2 of the data science roadmap: learn data analysis, visualization, and machine learning. Happy learning and coding!