Packages, Modules, & Libraries¶

The Python source distribution has long maintained the philosophy of "batteries included" -- having a rich and versatile standard library which is immediately available, without making the user download separate packages. This gives the Python language a head start in many projects.

- PEP 206

Applied Review¶

Python and Notebooks¶

  • We're working with Python in Notebooks, a common workflow in data science.
  • Notebooks are composed of cells, which can contain either code or markdown text.

Fundamentals¶

  • Python's common atomic, or basic, data types are:
    • Integers
    • Floats (decimals)
    • Strings
    • Booleans
  • These simple types can be combined to form more complex types, including:
    • Lists: Ordered collections
    • Dictionaries: Key-value pairs
    • DataFrames: Tabular datasets

Packages (aka Modules)¶

So far we've seen several data types that Python offers out-of-the-box. However, to keep things organized, some Python functionality is stored in standalone packages, or libraries of code. The word "module" is generally synonymous with "package," and "library"; you will hear all three in discussions of Python.

If you want more clear definitions, the three can thought of this way:

  • A package refers to source code that is bundled in a way that can be distributed to users
    • You can install packages using pip install package_name
  • A library refers to a package that has been installed to a centralized location on an operating system
    • You can view your libraries using pip list or conda list
  • A module refers to code that you can import from outside your current script or notebook
    • To use code from a module, you'll need to import module_name

For example, functionality related to the operating system -- such as creating files and folders -- is stored in a built-in library called os. To use the tools in os, we import the package.

In [5]:
import os

Once we import it, we gain access to everything inside. With VSCode's autocomplete, we can view what's available.

In [ ]:
# Move your cursor the end of the below line and press tab.
os.

Some libraries, like os, are bundled with every Python install; downloading Python guarantees you'll have these libraries. Collectively, this group of libraries is known as the standard library.

Other packages must be downloaded separately, either because

  • they aren't sufficiently popular to merit inclusion in the standard library
  • or they change too quickly for the maintainers of Python to keep up

One very commonly-used data science package is called pandas (short for Panel Data). Since pandas is specific to data science and is still rapidly evolving, it is not part of the standard library.

Note: We'll cover pandas in more detail in later modules.

We can download packages like pandas from the internet using a website called PyPI, the Python Package Index. Fortunately, since we are using a pre-built conda environment today, that has been handled for us and pandas is already installed.

It's possible to import modules under an alias, or a nickname. The community has adopted certain conventions for aliases for common packages; while following them isn't mandatory, it's highly recommended, as it makes your code easier for others to understand.

pandas is conventionally imported under the alias pd.

In [6]:
import pandas as pd

Importing pandas has given us access to the DataFrame, accessible as pd.DataFrame

In [7]:
pd.DataFrame
Out[7]:
pandas.core.frame.DataFrame

Question:

What is the type of pd? Guess before you run the code below.

In [ ]:
type(pd)

Third-party packages unlock a huge range of functionality that isn't available in native Python; much of Python's data science capabilities come from a handful of packages outside the standard library:

  • pandas
  • numpy (numerical computing)
  • scikit-learn (modeling)
  • scipy (scientific computing)
  • matplotlib (graphing)

We won't have time to touch on most of these in this training, but if you're interested in one, google it!

Your Turn¶

  1. Import the numpy library, listed above. Give it the alias "np".
  2. Using autocomplete, determine what variable or function inside the numpy library starts with "asco". Hint: remember you'll need to preface the variable name with the library alias, e.g. np.asco

Dot Notation with Modules¶

We've seen it a few times already, but now it's time to discuss it explicitly: things inside modules can be accessed with dot-notation.

Dot notation looks like this:

pd.Series

or

import numpy as np
np.array

You can read this is "the array variable, within the Numpy library".

Packages can contain pretty much anything that's legal in Python; if it's code, it can be in a package.

This flexibility is part of the reason that Python's package ecosystem is so expansive and powerful.

Objects and Dot Notation¶

Dot-notation has another use -- accessing things inside of objects.

What's an object? Basically, a variable that contains other data or functionality inside of it that is exposed to users.

For example, DataFrames are objects.

Note: We'll cover pandas and DataFrames in far more detail in later modules.

In [9]:
df
Out[9]:
first_name last_name
0 Ethan Swan
1 Brad Boehmke
2 Jay Cunningham
3 Gus Powers
In [10]:
df.shape
Out[10]:
(4, 2)
In [11]:
df.describe()
Out[11]:
first_name last_name
count 4 4
unique 4 4
top Ethan Swan
freq 1 1

You can see that DataFrames have a shape variable and a describe function inside of them, both accessible through dot notation.

Note: Variables inside an object are often called attributes and functions inside objects are called methods.

Your Turn¶

Using the math library:

  1. Find a function that will help you to compute the square root of $14 \times 0.51$
  2. Find a function that will round a number up to the next whole number integer. That is, f(3) => 3, but f(3.2) => 4 and f(3.7) => 4.

On Consistency and Language Design¶

One of the great things about Python is that its creators really cared about internal consistency.

What that means to us, as users, is that syntax is consistent and predictable -- even across different uses that would appear to be different at first.

Dot notation reveals something kind of cool about Python: packages are just like other objects, and the variables inside them are just attributes and methods!

This standardization across packages and objects helps us remember a single, intuitive syntax that works for many different things.

Questions¶

Are there any questions before we move on?