Introduction to Python for Data Science¶

Instructors¶

Brad Boehmke¶

Director, Data Science at 84.51°

  • Productionizing models and science solutions
  • R&D and protogyping new solutions
  • Python, R, & MLOps toolchain

Academic

  • BS, Kinesiology, North Dakota State University
  • MS, Cost Analytics, Air Force Institute of Technology
  • PhD, Logistics, Air Force Institute of Technology

Contact

  • Website: bradleyboehmke.github.io
  • GitHub: bradleyboehmke
  • Twitter: @bradleyboehmke
  • LinkedIn: Brad Boehmke, PhD
  • Email: bradleyboehmke@gmail.com

Ethan Swan¶

Senior Backend Engineer at ReviewTrackers

  • Rest API development
  • Putting ML models in production
  • Python, Go, Ruby, & ReactJS (JavaScript)

Academic

  • BS, Computer Science, University of Notre Dame
  • MBA, Business Analytics, University of Notre Dame

Contact

  • Website: ethanswan.com
  • GitHub: eswan18
  • Twitter: @eswan18
  • LinkedIn: Ethan Swan
  • Email: ethanpswan@gmail.com

Jay Cunningham¶

Lead Data Scientist at 84.51°

  • Researching and developing forecasting models
  • Machine learning, Python

Academic

  • BA, Mathematics, University of Kentucky
  • MA, Economics, University of North Carolina (Greensboro)

Contact

  • LinkedIn: Jay Cunningham
  • Email: james@notbadafterall.com

Gus Powers¶

Lead Data Scientist at 84.51°

  • Creating and maintaining data science tools for internal use
  • Python, Bash (shell), & R

Academic

  • BS, Chemistry, Thomas More College
  • MS, Chemistry, University of Cincinnati
  • MS, Business Analytics, University of Cincinnati

Contact

  • GitHub: augustopher
  • LinkedIn: August Powers
  • Email: guspowers0@gmail.com

Around The Room¶

We'll go around the room. Please share:

  1. Your name
  2. Your job or field
  3. How you use Python now or would like to in the future

Course¶

Defining Data Science¶

data-science.png

Data Science and Technology¶

data-science-and-tech.png

Applied Data Science¶

applied-data-science.gif

Course Objectives¶

  1. Build familiarity with the Python language, so that students can solve simple problems with code.
  2. Gain experience with data wrangling tasks: importing/exporting, filtering & selecting, aggregating, and joining data.
  3. Establish a mental model of the Python data science ecosystem to enable learning beyond this course.

Course Agenda¶

Day Topic Time
Day 1 Introductions 1:00 - 1:15
Python, VSCode, & Notebooks 1:15 - 1:45
Variables, Operators, & Types 1:45 - 2:45
Break 2:45 - 3:00
Data Structures 3:00 - 3:30
Conditional Statements 3:30 - 4:00
Q&A 4:00 - 4:30
Day 2 Q&A 12:30 - 1:00
Iteration 1:00 - 1:45
Writing Functions 1:45 - 2:30
Break 2:30 - 2:45
Case Study, pt. 1 2:45 - 4:00
Case Study Review; Q&A 4:00 - 4:30
Day 3 Q&A 12:30 - 1:00
Packages, Libraries, & Modules 1:00 - 2:00
Importing Data 2:00 - 2:30
Break 2:30 - 2:45
Intro to Pandas 2:45 - 4:00
Q&A 4:00 - 4:30
Day 4 Q&A 12:30 - 1:00
Advanced Pandas 1:00 - 1:45
Python from the Shell 1:45 - 2:30
Case Study, pt. 2 2:30 - 3:45
Case Study Review; Concluding Remarks 3:45 - 4:30

Technologies¶

Python¶

  • Python is the programming language we'll be learning in this class.
  • We are using Python 3.10, but examples will work on both 3.10 and 3.11 (and 3.12, when it comes out next year).
  • We will spend a lot of time with the pandas third-party library.

VSCode¶

  • VSCode is the integrated development environment (IDE) we will be using.
  • This is where we will write and run our Python code.
  • You can use VSCode to edit both notebooks and regular Python scripts.
    • We'll do some of both.

Course Material¶

  • All of the material for this course can be reached from our GitHub repository.
  • You can download the material and open it on your own computer using VSCode.
  • We have an environment.yml file that can be used with the conda program to set up an environment with the right version of Python and supporting packages we'll use in this class.

Slides are notebooks¶

  • We will be teaching using slides.
  • These slides are created directly from the notebooks in the course repository -- so you can follow along and run the code in your notebook.
    • Almost everything will be the same between the notebooks and the slides.

Source Code¶

  • Source code for the training can be found on GitHub.
  • This repository is public so you can clone (download) and/or refer to the materials at any point in the future.

Questions¶

Are there any questions before moving on?