Make easy things easy and hard things possible.
- A slogan of Perl (a predecessor language to Python)
While Python can be used to write very complicated programs, one of its strengths is that easy things are still easy. For example, Python can be a calculator.
1 + 2
3
12 * 4
48
Python allows you to comment your code -- to leave notes for yourself or others about the code.
Comments start with a #
and are ignored by Python when it runs your code.
# The ** operator is exponentiation.
2 ** 3
8
Once you start doing math, you may want to keep the values you calculate for later use.
Python allows you to do this with variables -- words that you choose to represent values you've stored.
# Place the result of "5 * 2" in a variable called "x".
x = 5 * 2
This process of storing something in a variable is often called variable assignment, or simply "assignment" for short. You can assign almost anything to a variable.
# "Assign" the value 42 to the variable "answer".
answer = 42
You can then use the stored values in new calculations.
answer + 5
47
ten = 10
eleven = 11
ten + eleven
21
Python lets you name your variables whatever you want – the only rule is that they must be composed of numbers, letters, and underscores, and they cannot begin with a number.
It's a good idea to take advantage of this flexibility and name your variables with descriptions that help you remember what they contain.
For example, calling your variables x
, y
, and z
is likely to lead to forgetting what you've stored where (unless you're working with coordinates, a domain where those names have meanings).
More descriptive names, like number_of_items
or size_of_container
, are better.
# Perfectly good variable name
my_3rd_favorite_number = 18
# Legal, but undescriptive, variable name
a = 7
# Illegal variable name -- it starts with a number
4_plus_1 = 4 + 1
Cell In[11], line 2 4_plus_1 = 4 + 1 ^ SyntaxError: invalid decimal literal
If you try to name a variable something illegal, Python will gently remind you to follow the rules with a SyntaxError
and an arrow indicating the location of the error.
Caution!
Sometimes Python doesn't pinpoint the error very well, and the error will not be in the same place as the arrow.
width
and height
, and store 3840 and 2160 in them (respectively).pixels = ___ * ___
Fortunately, Python can handle values beyond integers. It's happy to work with decimal numbers.
1 / 3
0.3333333333333333
1.5 * 1.5
2.25
In computer science lingo, decimal numbers are often called floating point numbers, or floats for short.
The name refers to how such numbers are stored by a computer internally, but you don't need to worry about that. Just be aware that many people on the internet and in data science industry will speak in terms of "floats" and "ints" when they refer to numbers in Python.
Python also can work with text data, like words and sentences.
my_name = 'ethan'
my_hobbies = 'coding, reading, basketball'
In Python, these bits of text are called strings and are enclosed in quotation marks.
Both single quotes ('
) and double quotes ("
) are fine, but most Pythonistas use single quotes as a matter of convention.
Conveniently, Python lets you "add" strings together to compose longer strings.
'Monty' + ' ' + 'Python'
'Monty Python'
first_name = 'Guido'
last_name = 'van Rossum'
# Remember to add a space between words!
first_name + ' ' + last_name
'Guido van Rossum'
The last kind of value that we'll talk about is a boolean, or a True/False value.
Python recognizes the words True
and False
as keywords -- words that have an implicit meaning in the language.
That means you can assign them to variables as you can with other data types.
is_the_moon_made_of_cheese = False
is_this_the_best_python_class = True
first_name
and last_name
variables with your name, and run first_name + ' ' + last_name
again -- make sure it produces what you expect!So far we've worked with single values: numbers, strings, and booleans. But Python also supports more complex data types, sometimes called data structures.
The two most common data structures are lists and dictionaries.
As you might expect, a list is an ordered collection of things.
Lists are represented using brackets ([]
).
# A list of integers
numbers = [1, 2, 3]
numbers
[1, 2, 3]
# A list of strings
strings = ['abc', 'def']
strings
['abc', 'def']
Lists are highly flexible. They can contain heterogeneous data (i.e. strings, booleans, and numbers can all be in the same list) and lists can even contain other lists!
combo = ['a', 'b', 3, 4]
combo_2 = [True, 'True', 1, 1.0]
# Note that the last element of the list is another list!
nested_list = [1, 2, 3, [4, 5]]
nested_list
[1, 2, 3, [4, 5]]
Individual elements of a list can be accessed by specifying a location in brackets. This is called indexing.
Beware: Python is zero-indexed, so the first element is element 0!
letters = ['a', 'b', 'c']
letters[0]
'a'
letters[2]
'c'
Specifying an invalid location will raise an error.
letters[4]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[24], line 1 ----> 1 letters[4] IndexError: list index out of range
Caution!
Most programming languages are zero indexed, so a list with 3 elements has valid locations [0, 1, 2]. But this means that there is no element #3 in a 3-element list! Trying to access it will cause an out-of-range error. This is a common mistake for those new to programming (and sometimes it bites the veterans too).
Not only can you read individual elements using indexing; you can also overwrite elements.
greek = ['alpha', 'beta', 'delta']
greek[2] = 'gamma'
greek
['alpha', 'beta', 'gamma']
Dictionaries are collections of key-value pairs. Think of a real dictionary -- you look up a word (a key), to find its definition (a value). Any given key can have only one value.
This concept has many names depending on language: map, associative array, dictionary, and more.
In Python, dictionaries are represented with curly braces. Colons separate a key from its value, and (like lists) commas delimit elements.
ethan = {'first_name': 'Ethan',
'last_name': 'Swan',
'alma_mater': 'Notre Dame',
'employer': '84.51˚',
'zip_code': 45208}
ethan
{'first_name': 'Ethan', 'last_name': 'Swan', 'alma_mater': 'Notre Dame', 'employer': '84.51˚', 'zip_code': 45208}
brad = {'first_name': 'Brad',
'last_name': 'Boehmke',
'alma_mater': 'NDSU',
'employer': '84.51˚',
'zip_code': 45385}
brad
{'first_name': 'Brad', 'last_name': 'Boehmke', 'alma_mater': 'NDSU', 'employer': '84.51˚', 'zip_code': 45385}
Values can be looked up and set by passing a key in brackets.
ethan['zip_code']
45208
ethan['employer']
'84.51˚'
ethan['employer'] = 'Eighty Four Fifty One'
ethan
{'first_name': 'Ethan', 'last_name': 'Swan', 'alma_mater': 'Notre Dame', 'employer': 'Eighty Four Fifty One', 'zip_code': 45208}
Dictionaries, like lists, are very flexible. Keys are generally strings (though some other types are allowed), and values can be anything -- including lists or other dictionaries!
ceos['Apple'] = 'Tim Cook'
. Try to add a few more keys to this starter dictionary. For example, Bob Iger is the CEO of Disney.ceos = {'Apple': 'Tim Cook',
'Microsoft': 'Satya Nadella'}
Question
How might you approach #2 if you needed to look up both the CEO and the CFO?
What data structure would you use? There are several possible solutions.
In data science, the most important complex data structure is the DataFrame. DataFrames are a collection of tabular data -- you might think of them as tables or datasets, depending on your background.
Let's take a look at one.
# Don't worry about this "boilerplate" code for now.
import pandas as pd
planes = pd.read_csv('../data/planes.csv')
# Asking for the "head" of a DataFrame will show you the first 5 rows.
planes.head()
tailnum | year | type | manufacturer | model | engines | seats | speed | engine | |
---|---|---|---|---|---|---|---|---|---|
0 | N10156 | 2004.0 | Fixed wing multi engine | EMBRAER | EMB-145XR | 2 | 55 | NaN | Turbo-fan |
1 | N102UW | 1998.0 | Fixed wing multi engine | AIRBUS INDUSTRIE | A320-214 | 2 | 182 | NaN | Turbo-fan |
2 | N103US | 1999.0 | Fixed wing multi engine | AIRBUS INDUSTRIE | A320-214 | 2 | 182 | NaN | Turbo-fan |
3 | N104UW | 1999.0 | Fixed wing multi engine | AIRBUS INDUSTRIE | A320-214 | 2 | 182 | NaN | Turbo-fan |
4 | N10575 | 2002.0 | Fixed wing multi engine | EMBRAER | EMB-145LR | 2 | 55 | NaN | Turbo-fan |
DataFrames have column names (tailnum, year, type, etc) and row indexes (the bold numbers on the left, starting at zero).
# Asking for the "head" of a DataFrame will show you the first 5 rows.
planes.head()
tailnum | year | type | manufacturer | model | engines | seats | speed | engine | |
---|---|---|---|---|---|---|---|---|---|
0 | N10156 | 2004.0 | Fixed wing multi engine | EMBRAER | EMB-145XR | 2 | 55 | NaN | Turbo-fan |
1 | N102UW | 1998.0 | Fixed wing multi engine | AIRBUS INDUSTRIE | A320-214 | 2 | 182 | NaN | Turbo-fan |
2 | N103US | 1999.0 | Fixed wing multi engine | AIRBUS INDUSTRIE | A320-214 | 2 | 182 | NaN | Turbo-fan |
3 | N104UW | 1999.0 | Fixed wing multi engine | AIRBUS INDUSTRIE | A320-214 | 2 | 182 | NaN | Turbo-fan |
4 | N10575 | 2002.0 | Fixed wing multi engine | EMBRAER | EMB-145LR | 2 | 55 | NaN | Turbo-fan |
The values (elements) within the DataFrame are the Python types we covered above: integers, floats, strings, and booleans.
Question
Which of these columns are strings?
Because DataFrames can hold almost any kind of data and support powerful data wrangling features, they have become the basic unit of data science work.
We will have a whole lesson later on DataFrames, so for now we'll move on.
How can you determine the type of a variable?
Pass it to the type
function (we'll talk more about functions later).
x = 5
type(x)
int
type(planes)
pandas.core.frame.DataFrame
You can even pass values directly to the type
function.
type(7.2)
float
Are there any questions before we move on?