Python Fundamentals¶

Data Structures¶

Applied Review¶

Variables and Operators¶

  • Python comes with a variety of built-in simple types, like integers, floats, strings, and booleans
  • Operators like +, -, *, /, >, and < can be used with the above and work largely as you'd expect

Lists and Dictionaries¶

So far we've worked with single values: numbers, strings, and booleans. But Python also supports more complex data types, sometimes called data structures.

The two most common data structures are lists and dictionaries.

Lists¶

As you might expect, a list is an ordered collection of things. Lists are represented using brackets ([]).

In [17]:
# A list of integers
numbers = [1, 2, 3]
numbers
Out[17]:
[1, 2, 3]
In [18]:
# A list of strings
strings = ['abc', 'def']
strings
Out[18]:
['abc', 'def']

Lists are highly flexible. They can contain heterogeneous data (i.e. strings, booleans, and numbers can all be in the same list) and lists can even contain other lists!

In [19]:
combo = ['a', 'b', 3, 4]
combo_2 = [True, 'True', 1, 1.0]
In [20]:
# Note that the last element of the list is another list!
nested_list = [1, 2, 3, [4, 5]]
nested_list
Out[20]:
[1, 2, 3, [4, 5]]

Individual elements of a list can be accessed by specifying a location in brackets. This is called indexing.

Beware: Python is zero-indexed, so the first element is element 0!

In [21]:
letters = ['a', 'b', 'c']
letters[0]
Out[21]:
'a'
In [22]:
letters[2]
Out[22]:
'c'

Specifying an invalid location will raise an error.

In [23]:
letters[4]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[23], line 1
----> 1 letters[4]

IndexError: list index out of range

Caution!

Most programming languages are zero indexed, so a list with 3 elements has valid locations [0, 1, 2]. But this means that there is no element #3 in a 3-element list! Trying to access it will cause an out-of-range error. This is a common mistake for those new to programming (and sometimes it bites the veterans too).

Not only can you read individual elements using indexing; you can also overwrite elements.

In [28]:
greek = ['alpha', 'beta', 'delta']
greek[2] = 'gamma'
greek
Out[28]:
['alpha', 'beta', 'gamma']

If given a negative number as an index, Python counts backward from the end of the list.

In [29]:
greek
Out[29]:
['alpha', 'beta', 'gamma']
In [30]:
greek[-1]
Out[30]:
'gamma'
In [31]:
greek[-3]
Out[31]:
'alpha'
In [32]:
greek[-4]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[32], line 1
----> 1 greek[-4]

IndexError: list index out of range

Slicing¶

  • We've seen how we can index a list to read or write at a specific location
  • Lists also support slicing: reading or writing over a sequential range of indices
In [33]:
letters = [
    'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
    'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
]
In [43]:
# What are the first 10 letters?
letters[0:10]
Out[43]:
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
In [44]:
# 6th-10th letters
letters[5:10]
Out[44]:
['f', 'g', 'h', 'i', 'j']
In [54]:
# The last 5 letters
letters[21:26]
Out[54]:
['v', 'w', 'x', 'y', 'z']

You can omit either part of the slice index if you want your slice to go all the way to one end of the list.

In [46]:
letters[:2]
Out[46]:
['a', 'b']
In [50]:
letters[-2:]
Out[50]:
['y', 'z']

Caution!

The starting index of a slice is inclusive but the ending index is exclusive.

In [53]:
l = [0, 1, 2, 3, 4]
In [52]:
l[1:3]
Out[52]:
[1, 2]

Dictionaries¶

Dictionaries are collections of key-value pairs. Think of a real dictionary -- you look up a word (a key), to find its definition (a value). Any given key can have only one value.

This concept has many names depending on language: map, associative array, dictionary, and more.

In Python, dictionaries are represented with curly braces. Colons separate a key from its value, and (like lists) commas delimit elements.

In [25]:
ethan = {
    'name': 'Ethan',
    'employer': 'ReviewTrackers',
    'number_of_pets': 0,
    'lives_in_ohio': False,
}
In [26]:
ethan
Out[26]:
{'name': 'Ethan',
 'employer': 'ReviewTrackers',
 'number_of_pets': 0,
 'lives_in_ohio': False}
In [27]:
gus = {
    'name': 'Gus',
    'employer': '84.51˚',
    'number_of_pets': 1,
    'lives_in_ohio': False,
}
jay = {
    'name': 'Jay',
    'employer': '84.51˚',
    'number_of_pets': 4,
    'lives_in_ohio': True,
}

Values can be looked up and set by passing a key in brackets.

In [28]:
ethan['number_of_pets']
Out[28]:
0
In [29]:
gus['employer']
Out[29]:
'84.51˚'
In [30]:
gus['employer'] = 'Eighty Four Fifty One'
gus
Out[30]:
{'name': 'Gus',
 'employer': 'Eighty Four Fifty One',
 'number_of_pets': 1,
 'lives_in_ohio': False}

Dictionaries, like lists, are very flexible. Keys are generally strings (though some other types are allowed), and values can be anything -- including lists or other dictionaries!

Your Turn¶

  1. Create a list of the first 10 even numbers. Use indexing to find the 4th even number. Remember that the 4th element is at location 3 because of zero-indexing!
  2. Imagine you need a way to quickly determine a company's CEO given the company name. You could use a dictionary such that ceos['Apple'] = 'Tim Cook'. Try to add a few more keys to this starter dictionary. For example, Bob Iger is the CEO of Disney.
ceos = {'Apple': 'Tim Cook',
        'Microsoft': 'Satya Nadella'}

How might you approach #2 if you needed to look up both the CEO and the CFO? What data structure would you use? There are several possible solutions.

Containers¶

  • Lists and dictionaries are both "containers"
  • In Python jargon, a container is a data structure that supports checking whether an element is "in" the container
  • This membership test is done using the in keyword

Using lists as containers checks the elements of the list...

In [1]:
l = ['a', 'e', 'i', 'o', 'u']
In [10]:
'u' in l
Out[10]:
True
In [3]:
'b' in l
Out[3]:
False

Using dictionaries as containers checks the keys, not the values...

In [7]:
d = {
    'ethan': 0,
    'gus': 1,
    'jay': 4,
}
In [8]:
'gus' in d
Out[8]:
True
In [9]:
'bob' in d
Out[9]:
False
In [11]:
d = {
    'ethan': 0,
    'gus': 1,
    'jay': 4,
}
In [12]:
'ethan' in d
Out[12]:
True
In [13]:
0 in d
Out[13]:
False

Mutability¶

  • Some data structures in Python are mutable -- though not all of them
  • Mutability is the ability to modify a piece of data
  • Lists and dictionaries are both mutable
In [14]:
l = ['a', 'e', 'i', 'o', 'u']
In [15]:
l[0] = 'A'
In [16]:
l.append('y')
In [17]:
l
Out[17]:
['A', 'e', 'i', 'o', 'u', 'y']
In [ ]:
d = {
    'ethan': 0,
    'gus': 1,
    'jay': 4,
}
In [19]:
d['jay'] = 100
In [21]:
d['brad'] = 1
In [22]:
d
Out[22]:
{'ethan': 0, 'gus': 1, 'jay': 100, 'brad': 1}

So what does immutability look like?

In [23]:
# Tuples, which we haven't discussed yet, are immutable
t = ('a', 'e', 'i', 'o', 'u')
In [24]:
t[0] = 'A'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[24], line 1
----> 1 t[0] = 'A'

TypeError: 'tuple' object does not support item assignment
In [25]:
t.append('y')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[25], line 1
----> 1 t.append('y')

AttributeError: 'tuple' object has no attribute 'append'

Determining What Type of Data Structure Something Is¶

How can you determine the type of a variable? Pass it to the type function (we'll talk more about functions later).

In [31]:
type("hello")
Out[31]:
str
In [32]:
x = 5
type(x)
Out[32]:
int
In [33]:
jay
Out[33]:
{'name': 'Jay',
 'employer': '84.51˚',
 'number_of_pets': 4,
 'lives_in_ohio': True}
In [34]:
type(jay)
Out[34]:
dict
In [35]:
type(jay['lives_in_ohio'])
Out[35]:
bool

Other Types¶

While we won't spend much time on these, there are many more types of data in Python.

tuple -- Like a list, but immutable. Good for storing data in which order is meaningful, like records of relational data.

In [36]:
# SELECT name, email, suite_number FROM office_managers;
mgrs = [
    ('W.B. Jones', 'wbj@wbjhvac.com', 110),
    ('Bob Vance', 'bob.vance@vancerefrigeration.com', 210),
    ('Michael Scott', 'mscott@dundermifflin.com', 200),
    ('Bill Cress', 'bill.cress@cresstools.com', 302),
    ('Paul Faust', 'pfaust@disasterkits.com', 310),
]
In [37]:
bob = mgrs[1]
In [38]:
bob[1]
Out[38]:
'bob.vance@vancerefrigeration.com'
In [39]:
bob[2] = 111
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[39], line 1
----> 1 bob[2] = 111

TypeError: 'tuple' object does not support item assignment

set -- An unordered collection of items, where an item can only exist at most once. Very performant for checking if an item is present. Mutable.

In [40]:
primes = {2, 3, 5, 7}
primes
Out[40]:
{2, 3, 5, 7}
In [41]:
primes.add(11)
primes
Out[41]:
{2, 3, 5, 7, 11}
In [42]:
primes.add(5)
primes
Out[42]:
{2, 3, 5, 7, 11}
In [43]:
print(5 in primes)
print(9 in primes)
True
False
  • bytes, bytearray -- Raw arrays of bytes, that can be decoded into a string. The latter is mutable and the former is not.
  • complex, Decimal, Fraction -- Numeric types for specialized use cases. If you need these, you probably already know.
  • DataFrame -- A tabular dataset. We'll cover this in later sections.

Questions¶

Are there any questions before we move on?