 #### Data Science Roadmap, Part 1: Learn Python, SQL, and Math Are you an aspiring data scientist looking for a structured learning path?

To be a data scientist, you need a mix of programming, analytical, and decision making skills backed by business acumen. From programming for data science to building predictive models, this roadmap has you covered. We have categorized the skills—from beginner to advanced—along with the timeline that you’ll need to work on each tech stack to understand the foundational concepts.

In this part 1 of the data science roadmap, you’ll find learning paths for Python, SQL, and Math for Data Science.

Let’s get started!

## Milestone 1: Python Programming for Data Science

Over the past several years, Python has emerged as the go-to programming language for data science. Its beginner-friendly syntax has made it a favorite first programming language amongst aspiring developers.

### Week 0: Python Fundamentals (Basic)

• Coding environment setup
• Running a Python script
• Variables, data types
• Conditionals
• Loops

Practice:

Build a number guessing game with the following functionality:

• Validate guess against the secret number (use conditionals)
• Use loops for controlling the number of guesses allowed
• Breaking out of the loop if the user guesses the number or the maximum number of guesses has been exhausted

### Week 1: Python Built-In Data Structures (Basic)

Goal: Learn built-in data structures and methods for CRUD operations

• Mutable vs Immutable objects
• Lists
• Tuples
• Dictionaries
• List and Dictionary Comprehensions
• Strings

Key takeaway: Understand when to use which data structure

Practice:

• Create all the above data structures in Python.
• Use built-in methods specific to the particular data structure.
• Create new lists and dictionaries using the syntax for comprehensions.

### Week 2: Python Functions (Basic)

• Basic Python built-in functions
• User-defined Python functions Defining and calling functions Return value and multiple values Functions with default arguments, variable number of arguments Using command-line arguments
• Useful built-in functions such as `range()` and `enumerate()`

Practice:

Write a Python function to read in variable number of arguments (integers) and return their sum and the number of arguments passed in.

### Week 3: File I/O and Exception Handling (Intermediate)

• Working with files
• Read from and write to files
• Exception handling using `try` and `except` statements

Practice:

• Create and work with file objects
• Write to a text file
• For exception handling, practice how to handle the `FileNotFound` error

### Week 4: Web Scraping and Data Collection with Python (Intermediate)

• HTTP requests with `urllib` and requests library
• Working with JSON and XML
• Parsing date formats
• Web scraping with `BeautifulSoup`
• [Good to have] Regular Expressions
• [Optional] Web scraping with Scrapy

Practice:

• Scrape a website of your choice, such as HackerNews or any developer-focused website. Be sure to verify from robots.txt that you have permission to scrape that site.
• Scrape and retrieve data from the website and parse it using the techniques you have learned this week.

### Week 5: Functional Programming (Intermediate)

• First class functions
• Higher order functions
• Lambda functions
• map(), filter(), reduce()

Practice:

• Take an example where you used list comprehension in week 1
• Try using list comprehension construct along with conditions
• Use map() and filter() to rewrite the list comprehension as needed
• Use lambdas inside the map() and filter() functions

### Week 6: Object-Oriented Programming (Advanced)

• Classes and Instances
• Class and instance variables
• Class, instance, and static methods
• Inheritance
• Dunder Methods
• Dataclasses

Practice:

• Create an employee class to store details of employees in a fictional firm of your choice
• Apply the various OOP concepts that you have learned this week

• Generators
• Decorators
• Context Managers
• Collections and Itertools module

Practice:

• Explore the collections and itertools module
• Take a function of your choice you should decorator to modify the function without changing the statement in the function’s body

## Milestone 2: SQL for Data Science

After you’ve learned Python, your next milestone should be to gain a good understanding of SQL or Structured Query Language. Most interviews for data roles involve a programming round (preferably Python), followed by at least one round of SQL interviews.

Like Python, SQL is intuitive to learn and easy to understand, but cracking SQL interviews for data science roles requires constant practice to hone your problem solving skills, and it’s a skill that’ll help you greatly in your day-to-day job as well.

In this section, we’ll give you a learning path for SQL—from beginner to advanced—along with interview questions or practice problems.

Practice is the secret key to getting better at SQL!

If you are looking for resources to learn one or more of the following concepts check out our guide check out the guide below.

### Week 1: Basics of SQL (Beginner)

• SELECT statement
• Logical and comparison operators the
• DISTINCT, WHERE clause
• ORDER BY, LIMIT
• LIKE, IN, BETWEEN
• IS NULL

Practice

Basic SELECT, HackerRank SQL Practice

### Week 2: Aggregate Functions (Intermediate)

• Why do we need aggregate functions in SQL?
• COUNT
• SUM, AVG
• MIN, MAX
• GROUP BY, HAVING

Practice

Aggregation in SQL, HackerRank SQL Practice

### Week 3: JOINs and UNIONs (Intermediate)

• Inner and Outer JOINs
• Left and right JOINs
• Self JOINs
• UNIONs

Practice

Basic JOIN, HackerRank SQL Practice

### Week 4: Window Functions (Advanced)

• Why do we need window functions?
• NTILE
• RANK, DENSE_RANK

Window Functions Practice Questions, StrataScratch

• Working with dates
• SQL subqueries
• Common Table Expressions (CTE)

Subquery Expressions Practice Questions, StrataScratch

## Milestone 3: Math for Data Science

At this point, you’ve learned Python and SQL—the two key skills to get started with a career in data science. This section will be on Mathematics for Data Science. You don’t need a degree in mathematics to be a successful data scientist. But a fairly strong understanding of the following math fundamentals can help you become a better data scientist and better explain decisions.

A high-school level understanding of the math concepts will suffice. As a data scientist, you should always be willing to learn, explore and upskill yourself. You can always build on the foundations and learn advanced concepts as you progress in your data science career.

For each of the sections, we’ll also leave you with a few resources that can help you get a good grasp of the concepts within a minimal timeframe. We’ll also share a challenge application that you should try for yourself—putting your Python programming and math skills to test!✔

### Week 1: Linear Algebra

• Vectors and vector spaces linear depends and independence
• Linear independence of vectors
• Bases of a vector space determinant
• Matrix as a two-dimensional array of numbers
• Matrix subspaces and rank
• Significance of matrix factorization techniques
• Eigenvector Decomposition
• Singular Value Decomposition (SVD)
• Principal Component Analysis (PCA)

### Resources

Essence of Linear Algebra by Grant Sanderson, 3Blue1Brown

Application: Choose a sample dataset and perform dimensionality reduction using Principal Component Analysis (PCA).

### Week 2: Differential Calculus

• Understanding functions
• Computing derivatives of functions
• Rules of differentiation
• Local and global optima
• Chain rule of differentiation
• Partial derivatives

### Resources

Essence of Calculus by Grant Sanderson, 3Blue1Brown

Application: Understand the working and implement the gradient descent optimization algorithm from scratch. Put your linear algebra and calculus skills to test.

### Week 3: Probability and Random Variables

• Basic principles of counting
• Permutation, Combination
• Set theory review: union, intersection, and more
• Sample space, event, probability of events
• Conditional Probability
• Bayes Theorem
• Random variables (functions mapping from the sample space to the real line)
• Discrete and Continuous Random Variables
• Probability Mass Function for Discrete Random Variables
• Probability Density Function for Continuous Random Variables
• Expectation and Moments of a Random Variable
• Independent random variables

### Resources

Application: Learn to generate and sample from probability distributions in Python: Use Python for Probability, CS109 @Stanford as the reference.

### Week 4: Statistics

• Quantitative vs qualitative data
• Summary statistics: mean, median, and mode
• Measures of dispersion: Standard deviation, variance
• Understanding data distributions
• Law of large numbers
• Hypothesis testing
• p-value and significance
• Correlation Coefficient and its significance

### Resources

Intro to Statistics, Udacity

Intro to Inferential Statistics, Udacity

Application: Choose any dataset from your domain of interest. Analyze the distribution of various features, compute summary statistics and record your inferences in a report/ Jupyter notebook.

Check out part 2 of the data science roadmap: learn data analysis, visualization, and machine learning. Happy learning and coding!