Python Fundamentals¶

Make easy things easy and hard things possible.

- A slogan of Perl (a predecessor language to Python)

Applied Review¶

Python and Jupyter¶

  • Python is a flexible, general-purpose language that is popular in many fields, but particularly in data science.
  • Jupyter is an IDE, or Integrated Development Environment, that lets us view and run code in notebooks.
  • We are using Jupyter via Binder in this course.

Python at Its Simplest: Basic Data Types and Math¶

While Python can be used to write very complicated programs, one of its strengths is that easy things are still easy. For example, Python can be a calculator.

In [2]:
1 + 2
Out[2]:
3
In [3]:
12 * 4
Out[3]:
48

Python allows you to comment your code -- to leave notes for yourself or others about the code. Comments start with a # and are ignored by Python when it runs your code.

In [4]:
# The ** operator is exponentiation.
2 ** 3
Out[4]:
8

Once you start doing math, you may want to keep the values you calculate for later use.

Python allows you to do this with variables -- words that you choose to represent values you've stored.

In [5]:
# Place the result of "5 * 2" in a variable called "x".
x = 5 * 2

This process of storing something in a variable is often called variable assignment, or simply "assignment" for short. You can assign almost anything to a variable.

In [6]:
# "Assign" the value 42 to the variable "answer".
answer = 42

You can then use the stored values in new calculations.

In [7]:
answer + 5
Out[7]:
47
In [8]:
ten = 10
eleven = 11
ten + eleven
Out[8]:
21

Python lets you name your variables whatever you want – the only rule is that they must be composed of numbers, letters, and underscores, and they cannot begin with a number.

It's a good idea to take advantage of this flexibility and name your variables with descriptions that help you remember what they contain.

For example, calling your variables x, y, and z is likely to lead to forgetting what you've stored where (unless you're working with coordinates, a domain where those names have meanings).

More descriptive names, like number_of_items or size_of_container, are better.

In [9]:
# Perfectly good variable name
my_3rd_favorite_number = 18
In [10]:
# Legal, but undescriptive, variable name
a = 7
In [11]:
# Illegal variable name -- it starts with a number
4_plus_1 = 4 + 1
  Cell In[11], line 2
    4_plus_1 = 4 + 1
     ^
SyntaxError: invalid decimal literal

If you try to name a variable something illegal, Python will gently remind you to follow the rules with a SyntaxError and an arrow indicating the location of the error.

Caution!

Sometimes Python doesn't pinpoint the error very well, and the error will not be in the same place as the arrow.

Your Turn¶

  1. 4k monitors, counterintuitively, typically have a resolution of 3840x2160. Create two variables, width and height, and store 3840 and 2160 in them (respectively).
  2. How many total pixels are in a display with this resolution? Hint: fill in the blanks with variable names: pixels = ___ * ___

Beyond Integers¶

Fortunately, Python can handle values beyond integers. It's happy to work with decimal numbers.

In [12]:
1 / 3
Out[12]:
0.3333333333333333
In [13]:
1.5 * 1.5
Out[13]:
2.25

In computer science lingo, decimal numbers are often called floating point numbers, or floats for short.

The name refers to how such numbers are stored by a computer internally, but you don't need to worry about that. Just be aware that many people on the internet and in data science industry will speak in terms of "floats" and "ints" when they refer to numbers in Python.

Python also can work with text data, like words and sentences.

In [14]:
my_name = 'ethan'
my_hobbies = 'coding, reading, basketball'

In Python, these bits of text are called strings and are enclosed in quotation marks. Both single quotes (') and double quotes (") are fine, but most Pythonistas use single quotes as a matter of convention.

Conveniently, Python lets you "add" strings together to compose longer strings.

In [15]:
'Monty' + ' ' + 'Python'
Out[15]:
'Monty Python'
In [16]:
first_name = 'Guido'
last_name = 'van Rossum'
# Remember to add a space between words!
first_name + ' ' + last_name
Out[16]:
'Guido van Rossum'

The last kind of value that we'll talk about is a boolean, or a True/False value.

Python recognizes the words True and False as keywords -- words that have an implicit meaning in the language.

That means you can assign them to variables as you can with other data types.

In [17]:
is_the_moon_made_of_cheese = False
is_this_the_best_python_class = True

Your Turn¶

  1. Overwrite the first_name and last_name variables with your name, and run first_name + ' ' + last_name again -- make sure it produces what you expect!
  2. What happens when you try to add together two different kinds of values, like an integer and a string? Does this behavior make sense?

Lists and Dictionaries¶

So far we've worked with single values: numbers, strings, and booleans. But Python also supports more complex data types, sometimes called data structures.

The two most common data structures are lists and dictionaries.

Lists¶

As you might expect, a list is an ordered collection of things. Lists are represented using brackets ([]).

In [18]:
# A list of integers
numbers = [1, 2, 3]
numbers
Out[18]:
[1, 2, 3]
In [19]:
# A list of strings
strings = ['abc', 'def']
strings
Out[19]:
['abc', 'def']

Lists are highly flexible. They can contain heterogeneous data (i.e. strings, booleans, and numbers can all be in the same list) and lists can even contain other lists!

In [20]:
combo = ['a', 'b', 3, 4]
combo_2 = [True, 'True', 1, 1.0]
In [21]:
# Note that the last element of the list is another list!
nested_list = [1, 2, 3, [4, 5]]
nested_list
Out[21]:
[1, 2, 3, [4, 5]]

Individual elements of a list can be accessed by specifying a location in brackets. This is called indexing.

Beware: Python is zero-indexed, so the first element is element 0!

In [22]:
letters = ['a', 'b', 'c']
letters[0]
Out[22]:
'a'
In [23]:
letters[2]
Out[23]:
'c'

Specifying an invalid location will raise an error.

In [24]:
letters[4]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[24], line 1
----> 1 letters[4]

IndexError: list index out of range

Caution!

Most programming languages are zero indexed, so a list with 3 elements has valid locations [0, 1, 2]. But this means that there is no element #3 in a 3-element list! Trying to access it will cause an out-of-range error. This is a common mistake for those new to programming (and sometimes it bites the veterans too).

Not only can you read individual elements using indexing; you can also overwrite elements.

In [25]:
greek = ['alpha', 'beta', 'delta']
greek[2] = 'gamma'
greek
Out[25]:
['alpha', 'beta', 'gamma']

Dictionaries¶

Dictionaries are collections of key-value pairs. Think of a real dictionary -- you look up a word (a key), to find its definition (a value). Any given key can have only one value.

This concept has many names depending on language: map, associative array, dictionary, and more.

In Python, dictionaries are represented with curly braces. Colons separate a key from its value, and (like lists) commas delimit elements.

In [26]:
ethan = {'first_name': 'Ethan',
         'last_name': 'Swan',
         'alma_mater': 'Notre Dame',
         'employer': '84.51˚',
         'zip_code': 45208}
ethan
Out[26]:
{'first_name': 'Ethan',
 'last_name': 'Swan',
 'alma_mater': 'Notre Dame',
 'employer': '84.51˚',
 'zip_code': 45208}
In [27]:
brad = {'first_name': 'Brad',
        'last_name': 'Boehmke',
        'alma_mater': 'NDSU',
        'employer': '84.51˚',
         'zip_code': 45385}
brad
Out[27]:
{'first_name': 'Brad',
 'last_name': 'Boehmke',
 'alma_mater': 'NDSU',
 'employer': '84.51˚',
 'zip_code': 45385}

Values can be looked up and set by passing a key in brackets.

In [28]:
ethan['zip_code']
Out[28]:
45208
In [29]:
ethan['employer']
Out[29]:
'84.51˚'
In [30]:
ethan['employer'] = 'Eighty Four Fifty One'
ethan
Out[30]:
{'first_name': 'Ethan',
 'last_name': 'Swan',
 'alma_mater': 'Notre Dame',
 'employer': 'Eighty Four Fifty One',
 'zip_code': 45208}

Dictionaries, like lists, are very flexible. Keys are generally strings (though some other types are allowed), and values can be anything -- including lists or other dictionaries!

Your Turn¶

  1. Create a list of the first 10 even numbers. Use indexing to find the 4th even number. Remember that the 4th element is at location 3 because of zero-indexing!
  2. Imagine you need a way to quickly determine a company's CEO given the company name. You could use a dictionary such that ceos['Apple'] = 'Tim Cook'. Try to add a few more keys to this starter dictionary. For example, Bob Iger is the CEO of Disney.
ceos = {'Apple': 'Tim Cook',
        'Microsoft': 'Satya Nadella'}



Question

How might you approach #2 if you needed to look up both the CEO and the CFO?

What data structure would you use? There are several possible solutions.

DataFrames¶

In data science, the most important complex data structure is the DataFrame. DataFrames are a collection of tabular data -- you might think of them as tables or datasets, depending on your background.

Let's take a look at one.

In [31]:
# Don't worry about this "boilerplate" code for now.
import pandas as pd
planes = pd.read_csv('../data/planes.csv')
In [32]:
# Asking for the "head" of a DataFrame will show you the first 5 rows.
planes.head()
Out[32]:
tailnum year type manufacturer model engines seats speed engine
0 N10156 2004.0 Fixed wing multi engine EMBRAER EMB-145XR 2 55 NaN Turbo-fan
1 N102UW 1998.0 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2 182 NaN Turbo-fan
2 N103US 1999.0 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2 182 NaN Turbo-fan
3 N104UW 1999.0 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2 182 NaN Turbo-fan
4 N10575 2002.0 Fixed wing multi engine EMBRAER EMB-145LR 2 55 NaN Turbo-fan

DataFrames have column names (tailnum, year, type, etc) and row indexes (the bold numbers on the left, starting at zero).

In [33]:
# Asking for the "head" of a DataFrame will show you the first 5 rows.
planes.head()
Out[33]:
tailnum year type manufacturer model engines seats speed engine
0 N10156 2004.0 Fixed wing multi engine EMBRAER EMB-145XR 2 55 NaN Turbo-fan
1 N102UW 1998.0 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2 182 NaN Turbo-fan
2 N103US 1999.0 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2 182 NaN Turbo-fan
3 N104UW 1999.0 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2 182 NaN Turbo-fan
4 N10575 2002.0 Fixed wing multi engine EMBRAER EMB-145LR 2 55 NaN Turbo-fan

The values (elements) within the DataFrame are the Python types we covered above: integers, floats, strings, and booleans.

Question

Which of these columns are strings?

Because DataFrames can hold almost any kind of data and support powerful data wrangling features, they have become the basic unit of data science work.

We will have a whole lesson later on DataFrames, so for now we'll move on.

Determining What Type of Data Structure Something Is¶

How can you determine the type of a variable? Pass it to the type function (we'll talk more about functions later).

In [34]:
x = 5
type(x)
Out[34]:
int
In [35]:
type(planes)
Out[35]:
pandas.core.frame.DataFrame

You can even pass values directly to the type function.

In [36]:
type(7.2)
Out[36]:
float

Questions¶

Are there any questions before we move on?