Intermediate Python for Data Science¶

Gus Powers and Jay Cunningham

October and November 2023

Introductions¶

Gus Powers¶

Lead Data Scientist at 84.51°

  • Creating and maintaining data science tools for internal use
  • Python, Bash (shell), & R

Academic

  • BS, Chemistry, Thomas More College
  • MS, Chemistry, University of Cincinnati
  • MS, Business Analytics, University of Cincinnati

Contact

  • GitHub: augustopher
  • LinkedIn: August Powers
  • Email: guspowers0@gmail.com

Jay Cunningham¶

Lead Data Scientist at 84.51°

  • Researching and developing forecasting models
  • Machine learning, Python

Academic

  • BA, Mathematics, University of Kentucky
  • MA, Economics, University of North Carolina (Greensboro)

Contact

  • LinkedIn: Jay Cunningham
  • Email: james@notbadafterall.com

Your Turn¶

We'll go around the room. Please share:

  • Your name
  • Your job or field
  • How you hope to use Python in the future

Course¶

Defining Data Science¶

data-science.png

Data Science and Technology¶

data-science-and-tech.png

Applied Data Science¶

applied-data-science.gif

Course Objectives¶

The following are the primary learning objectives of this course:

  • Learn to use control flow and custom functions to work with data more efficiently.
  • Build awareness and basic skills in working with Python from the shell and its environments.
  • Gain exposure to Python's data science ecosystem and modeling via scikit-learn.

Course Agenda¶

Day Topic Time
1 Introductions 12:45 - 1:00
Setting the Stage 1:00 - 1:30
Conditions 1:30 - 2:15
Break 2:15 - 2:30
Iterations 2:30 - 3:45
Q&A 3:45 - 4:15
2 Q&A 12:45 - 1:15
Functions 1:15 - 2:15
Applying Functions to pandas Data Frames 2:15 - 2:45
Break 2:45 - 3:00
Case Study, pt. 1 3:00 - 4:00
Q&A 4:00 - 4:15
Day Topic Time
3 Q&A 12:45 - 1:15
Case Study Review, pt. 1 1:15 - 1:45
Python from the Shell 1:45 - 2:45
Break 2:45 - 3:00
Kernels and Environments 3:00 - 3:45
Python Data Science Ecosystem 3:45 - 4:00
Q&A 4:00 - 4:15
4 Q&A 12:45 - 1:15
Modeling with scikit-learn 1:15 - 2:15
Case Study, pt. 2 2:15 - 3:30
Case Study Review, pt. 2 3:30 - 4:00
Q&A 4:00 - 4:15

Prerequisites¶

Python¶

  • If you're attending this class, it's assumed you're comfortable with the material covered in the Introduction to Python for Data Science class.
  • As a reminder, that course's objectives are:
    • Develop comprehensive skills in the importing/exporting, wrangling, aggregating and joining of data using Python.
    • Establish a mental model of the Python programming language to enable future self-learning.
    • Build awareness and basic skills in the core data science area of data visualization.

Jupyter¶

  • If you're attending this class, it's assumed you're comfortable with launching and using Python via Jupyter Notebooks.
  • Course materials (slides, case studies, etc.) will be in Jupyter Notebooks, but you're free to use your IDE of choice when completing exercises and case studies.

Technology Setup¶

Binder¶

  • This class is designed to be accessible through Binder -- a cloud-based JupyterLab hosting platform.
  • As a result, no setup is technically required on your part if you would like to use Binder.
  • However, Binder sessions are ephemeral and will not save your work
    • You can download your notebooks if you want to keep them
  • Thus, we recommend doing exercises and case studies in your own Python environment if possible.
    • This way you can save your work.

Anaconda¶

  • Anaconda is the easiest way to install Python 3 and Jupyter.
  • If you have not yet installed Anaconda, please follow the directions in the course README.
  • Be sure that all Python packages mentioned in the README are also installed: pandas, numpy, scikit-learn, and seaborn.
  • This Anaconda installation will not be able to natively display the course content as slides, but we recommend it for completing exercises and the case studies.

JupyterLab¶

  • JupyterLab is the application that lets us view and edit notebooks.
  • JupyterLab comes with Anaconda.

Course Materials¶

  • All of the material for this course can be reached from our GitHub repository.
  • This repository has access to the slides, notebooks and the training source code.
  • You can either access this material through Binder or by downloading the material

and opening it via Anaconda Navigator and Jupyter Lab.

Slides are Notebooks¶

  • I'll be showing the material in slide format.
  • These slides contain the same content as your notebooks, so you can follow along and run cells as we go.

Source Code¶

  • Source code for the training can be found on GitHub.
  • This repository is public so you can clone (download) and/or refer to the materials at any point in the future.

Questions¶

Are there any questions before moving on?