Python uses indented space to indicate the level of statements. The following cell is an example where 'if' and 'else' are in same level, while 'print' is separated by space to a different level. Spacing should be the same for items that are on the same level.
student_number = input("Enter your student number:")
if student_number != 0:
print("Welcome student {}".format(student_number))
print("Try again!")
In Python, comments start with hash '#' and extend to the end of the line. '#' can be at the begining of the line or after code.
# This is code to print hello world!
print("Hello world!") # Print statement for hello world
print("# is not a comment in this case")
Like with other programming languages, there are four types of numbers:
Operation | Result |
x + y | Sum of x and y |
x - y | Difference of x and y |
x * y | Product of x and y |
x / y | Quotient of x and y |
x // y | Quotient of x and y (floored) |
x % y | Remainder of x / y |
abs(x) | Absolute value of x |
int(x) | x converted to integer |
long(x) | x converted to long integer |
float(x) | x converted to floating point |
pow(x, y) | x to the power y |
x ** y | x to the power y |
# Number examples
a = 5 + 8
print("Sum of int numbers: {} and number format is {}".format(a, type(a)))
b = 5 + 2.3
print ("Sum of int and {} and number format is {}".format(b, type(b)))
Python has rich features like other programming languages for string manipulation.
# Store strings in a variable
test_word = "hello world to everyone"
# Print the test_word value
# Use [] to access the character of the string. The first character is indicated by '0'.
# Use the len() function to find the length of the string
# Some examples of finding in strings
print(test_word.count('l')) # Count number of times l repeats in the string
print(test_word.find("o")) # Find letter 'o' in the string. Returns the position of first match.
print(test_word.count(' ')) # Count number of spaces in the string
print(test_word.upper()) # Change the string to uppercase
print(test_word.lower()) # Change the string to lowercase
print(test_word.replace("everyone","you")) # Replace word "everyone" with "you"
print(test_word.title()) # Change string to title format
print(test_word + "!!!") # Concatenate strings
print(":".join(test_word)) # Add ":" between each character
print("".join(reversed(test_word))) # Reverse the string
Python supports data types lists, tuples, dictionaries, and arrays.
A list is created by placing all the items (elements) inside square brackets [ ] separated by commas. A list can have any number of items, and they may be of different types (integer, float, strings, etc.).
# A Python list is similar to an array. You can create an empty list too.
my_list = []
first_list = [3, 5, 7, 10]
second_list = [1, 'python', 3]
# Nest multiple lists
nested_list = [first_list, second_list]
# Combine multiple lists
combined_list = first_list + second_list
# You can slice a list, just like strings
# Append a new entry to the list
# Remove the last entry from the list
# Iterate the list
for item in combined_list:
A tuple is similar to a list, but you use them with parentheses ( ) instead of square brackets. The main difference is that a tuple is immutable, while a list is mutable.
my_tuple = (1, 2, 3, 4, 5)
A dictionary is also known as an associative array. A dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its associated value.
desk_location = {'jack': 123, 'joe': 234, 'hary': 543}
JSON is text writen in JavaScript Object Notation. Python has a built-in package called json
that can be used to work with JSON data.
import json
# Sample JSON data
x = '{"first_name":"Jane", "last_name":"Doe", "age":25, "city":"Chicago"}'
# Read JSON data
y = json.loads(x)
# Print the output, which is similar to a dictonary
print("Employee name is "+ y["first_name"] + " " + y["last_name"])
If, Else, ElIf loop: Python supports conditional statements like any other programming language. Python relies on indentation (whitespace at the begining of the line) to define the scope of the code.
a = 22
b = 33
c = 100
# if ... else example
if a > b:
print("a is greater than b")
print("b is greater than a")
# if .. else .. elif example
if a > b:
print("a is greater than b")
elif b > c:
print("b is greater than c")
print("b is greater than a and c is greater than b")
While loop: Processes a set of statements as long as the condition is true
# Sample while example
i = 1
while i < 10:
print("count is " + str(i))
i += 1
# Continue to next iteration if x is 2. Finally, print message once the condition is false.
x = 0
while x < 5:
x += 1
if x == 2:
print("x is no longer less than 5")
For loop: A For
loop is more like an iterator in Python. A For
loop is used for iterating over a sequence (list, tuple, dictionay, set, string, or range).
# Sample for loop examples
fruits = ["orange", "banana", "apple", "grape", "cherry"]
for fruit in fruits:
# Iterating range
for x in range(1, 10, 2):
print("task complete")
# Iterating multiple lists
traffic_lights = ["red", "yellow", "green"]
action = ["stop", "slow down", "go"]
for light in traffic_lights:
for task in action:
print(light, task)
The key function for working with files in Python is the open()
function. The open()
function takes two parameters: filename and mode.
There are four different methods (modes) for opening a file:
In addition, you can specify if the file should be handled in binary or text mode.
# Let's create a test text file
!echo "This is a test file with text in it. This is the first line." > test.txt
!echo "This is the second line." >> test.txt
!echo "This is the third line." >> test.txt
# Read file
file = open('test.txt', 'r')
# Read first 10 characters of the file
file = open('test.txt', 'r')
# Read line from the file
file = open('test.txt', 'r')
# Create new file
file = open('test2.txt', 'w')
file.write("This is content in the new test2 file.")
# Read the content of the new file
file = open('test2.txt', 'r')
# Update file
file = open('test2.txt', 'a')
file.write("\nThis is additional content in the new file.")
# Read the content of the new file
file = open('test2.txt', 'r')
# Delete file
import os
file_names = ["test.txt", "test2.txt"]
for item in file_names:
if os.path.exists(item):
print(f"File {item} removed successfully!")
print(f"{item} file does not exist.")
A function is a block of code that runs when it is called. You can pass data, or parameters, into the function. In Python, a function is defined by def
# Defining a function
def new_funct():
print("A simple function")
# Calling the function
# Sample fuction with parameters
def param_funct(first_name):
print(f"Employee name is {first_name}.")
Anonymous functions (lambda): A lambda is a small anonymous function. A lambda function can take any number of arguments but only one expression.
# Sample lambda example
x = lambda y: y + 100
x = lambda a, b: a*b/100
import datetime
x =
print(x.strftime("%H:%M:%S %p"))
NumPy is the fundamental package for scientific computing with Python. Among other things, it contains:
# Install NumPy using pip
!pip install numpy
# Import NumPy module
import numpy as np
# Create array
a = np.arange(15).reshape(3, 5) # Create array with range 0-14 in 3 by 5 dimension
b = np.zeros((3,5)) # Create array with zeroes
c = np.ones( (2,3,4), dtype=np.int16 ) # Createarray with ones and defining data types
d = np.ones((3,5))
a.shape # Array dimension
len(b)# Length of array
c.ndim # Number of array dimensions
a.size # Number of array elements
b.dtype # Data type of array elements # Name of data type
c.astype(float) # Convert an array type to a different type
# Create array
a = np.arange(15).reshape(3, 5) # Create array with range 0-14 in 3 by 5 dimension
b = np.zeros((3,5)) # Create array with zeroes
c = np.ones( (2,3,4), dtype=np.int16 ) # Createarray with ones and defining data types
d = np.ones((3,5))
np.add(a,b) # Addition
np.subtract(a,b) # Substraction
np.divide(a,d) # Division
np.multiply(a,d) # Multiplication
np.array_equal(a,b) # Comparison - arraywise
# Create array
a = np.arange(15).reshape(3, 5) # Create array with range 0-14 in 3 by 5 dimension
b = np.zeros((3,5)) # Create array with zeroes
c = np.ones( (2,3,4), dtype=np.int16 ) # Createarray with ones and defining data types
d = np.ones((3,5))
a.sum() # Array-wise sum
a.min() # Array-wise min value
a.mean() # Array-wise mean
a.max(axis=0) # Max value of array row
np.std(a) # Standard deviation
# Create array
a = np.arange(15).reshape(3, 5) # Create array with range 0-14 in 3 by 5 dimension
b = np.zeros((3,5)) # Create array with zeroes
c = np.ones( (2,3,4), dtype=np.int16 ) # Createarray with ones and defining data types
d = np.ones((3,5))
a[1,2] # Select element of row 1 and column 2
a[0:2] # Select items on index 0 and 1
a[:1] # Select all items at row 0
a[-1:] # Select all items from last row
a[a<2] # Select elements from 'a' that are less than 2
# Create array
a = np.arange(15).reshape(3, 5) # Create array with range 0-14 in 3 by 5 dimension
b = np.zeros((3,5)) # Create array with zeroes
c = np.ones( (2,3,4), dtype=np.int16 ) # Createarray with ones and defining data types
d = np.ones((3,5))
np.transpose(a) # Transpose array 'a'
a.ravel() # Flatten the array
a.reshape(5,-2) # Reshape but don't change the data
np.append(a,b) # Append items to the array
np.concatenate((a,d), axis=0) # Concatenate arrays
np.vsplit(a,3) # Split array vertically at 3rd index
np.hsplit(a,5) # Split array horizontally at 5th index
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
Pandas DataFrames are the most widely used in-memory representation of complex data collections within Python.
# Install pandas, xlrd, and openpyxl using pip
!pip install pandas
!pip install xlrd openpyxl
# Import NumPy and Pandas modules
import numpy as np
import pandas as pd
# Sample dataframe df
df = pd.DataFrame({'num_legs': [2, 4, np.nan, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, np.nan, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
df # Display dataframe df
# Another sample dataframe df1 - using NumPy array with datetime index and labeled column
df1 = pd.date_range('20130101', periods=6)
df1 = pd.DataFrame(np.random.randn(6, 4), index=df1, columns=list('ABCD'))
df1 # Display dataframe df1
df1 = pd.date_range('20130101', periods=6)
df1 = pd.DataFrame(np.random.randn(6, 4), index=df1, columns=list('ABCD'))
df1.head(2) # View top data
df1.tail(2) # View bottom data
df1.index # Display index column
df1.dtypes # Inspect datatypes
df1.describe() # Display quick statistics summary of data
df1 = pd.date_range('20130101', periods=6)
df1 = pd.DataFrame(np.random.randn(6, 4), index=df1, columns=list('ABCD'))
df1.T # Transpose data
df1.sort_index(axis=1, ascending=False) # Sort by an axis
df1.sort_values(by='B') # Sort by values
df1['A'] # Select column A
df1[0:3] # Select index 0 to 2
df1['20130102':'20130104'] # Select from index matching the values
df1.loc[:, ['A', 'B']] # Select on a multi-axis by label
df1.iloc[3] # Select via the position of the passed integers
df1[df1 > 0] # Select values from a DataFrame where a boolean condition is met
df2 = df1.copy() # Copy the df1 dataset to df2
df2['E'] = ['one', 'one', 'two', 'three', 'four', 'three'] # Add column E with value
df2[df2['E'].isin(['two', 'four'])] # Use isin method for filtering
Pandas primarily uses the value np.nan
to represent missing data. It is not included in computations by default.
df = pd.DataFrame({'num_legs': [2, 4, np.nan, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, np.nan, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
df.dropna(how='any') # Drop any rows that have missing data
df.dropna(how='any', axis=1) # Drop any columns that have missing data
df.fillna(value=5) # Fill missing data with value 5
pd.isna(df) # To get boolean mask where data is missing
df = pd.DataFrame({'num_legs': [2, 4, np.nan, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, np.nan, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
df.to_csv('foo.csv') # Write to CSV file
pd.read_csv('foo.csv') # Read from CSV file
df.to_excel('foo.xlsx', sheet_name='Sheet1') # Write to Microsoft Excel file
pd.read_excel('foo.xlsx', 'Sheet1', index_col=None, na_values=['NA']) # Read from Microsoft Excel file
# Install Matplotlib using pip
!pip install matplotlib
from matplotlib import pyplot as plt # Import Matplotlib module
# Generate random time-series data
ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
ts.plot() # Plot graph
# On a DataFrame, the plot() method is convenient to plot all of the columns with labels
df4 = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,columns=['A', 'B', 'C', 'D'])
df4 = df4.cumsum()