Python and Notebooks¶

Python¶

Python is...¶

  • a high-level, multi-paradigm, dynamically-typed programming language
  • a really good choice for almost any programming task
  • a very popular and effective choice for data science tasks

According to StackOverflow's Developer Survey, Python was the 4th most popular language among developers in 2022

  • And two of the three ahead of it -- HTML and SQL -- are really domain-specific languages

SO Developer Survey

Python in the Real World¶

Python is common in many domains:

  • Web development
  • Automation
  • Testing
  • Data science

Python is a very popular language (probably the most popular language) for data science, for a few reasons:

  • Python has a large, rich collection of third party data wrangling, data analysis, and modeling tools.
    • Pandas is one such library, and we'll cover it in detail during this course.
  • Many deep learning frameworks, such as PyTorch and Tensorflow, have offered support for Python since their early days.
    • This is one of the reasons Python has only risen in popularity within data science over the last few years.

Why are data scientists choosing Python?¶

  • It can do anything...so everybody uses it
  • Consistency across engineering and data science teams
  • Open-source and community support
  • Concise syntax, readability, and ease-of-use
  • Strength in numeric computations and cutting-edge data science libraries

VSCode¶

VSCode¶

  • A language-agnostic integrated development environment (IDE)

  • The most popular IDE right now, according to surveys

  • Supports notebook files, a common format for data scientists

IDE Popularity

Notebooks in VSCode¶

  • Python notebooks (files with .ipynb extension) are designed to work with an application called Jupyter
  • However, VSCode can be used instead, via a Jupyter Extension supported by Microsoft itself.

    • If you're already familiar with VSCode, or planning to use Python beyond just notebooks, it's probably a better option than Jupyter.
  • You'll need to have a Python environment set up for it to run the code in.

    • We'll discuss more about Python environments later in the class.

Example Python Notebook

Course File Structure¶

  • VSCode shows a file browser on the left pane (you may have to click the document icon).

  • If you open the folder containing the course files, you'll be able to see how it's organized: we have folders such as scripts, notebooks, and .github.

    • However, the only one that matters to you is notebooks -- that's where the content of the training lives.
  • These slides, for example, were created from the notebooks/01-Python-and-Notebooks.ipynb file.

Python Notebooks¶

  • Notebooks allow the writing AND running of Python code
  • By interleaving code and commentary about the code, notebooks provide excellent documentation in data science tasks

    • Commentary might be: the question being investigated, the reasons for a particular analytical technique, etc.
  • All notebook files have the extension .ipynb (interactive python notebook)

You can create new notebooks by making a new file with a .ipynb extension.

When you open it, VSCode is smart enough to infer that it's a notebook based on that suffix.

Notebook Cells¶

Notebooks are organized by cells. These cells are at the core of a notebook:

Example Notebook Code Cell

  • When using Jupyter, all Python code is typed into and run from a code cell
  • Commentary and supporting documentation can be written in markdown cells -- or, less commonly, LaTeX or HTML.
  • In VSCode, you can add a new cell by hovering over an existing cell and clicking either the "+ Code" button or the "+ Markdown" button.
  • You can see what type of cell you're currently editing by looking in the bottom right corner, which will say "Markdown" or "Python".

    • You can click on this to toggle between them.

Code Cells¶

Code cells are meant for -- you guessed it -- running Python code. To do so:

  1. Click on a cell's input area
  2. Type Python code into the cell
  3. Press CTRL + RETURN (or SHIFT + RETURN)

If the last line of a code cell is an expression, its result will automatically be printed below the cell.

In [1]:
a = 3
b = 4

a * b
Out[1]:
12

Non-code Cells¶

There are a few kinds of non-code cells, but the most common is Markdown.

  • Markdown is a simple language for creating styled text using only plain text

  • These cells are meant for providing supporting commentary around the code cells

  • They also support:

    • LaTeX, for things like math formulas
    • Embedded images

Non-code cells are "rendered" when you hit CTRL + RETURN or SHIFT + RETURN

Keyboard Shortcuts¶

When not editing text:

  • a -> create a new cell before the one that's selected

  • b -> create a new cell after the one that's selected

  • m -> switch a code cell into a markdown cell

  • y -> switch a markdown cell into a code cell (think Python)

Your Turn¶

  1. Create a new notebook file and open it.
  2. Create a new markdown cell. Write your name in it.
  3. Create a new code cell. Write print(1 + 2) and run it.

Questions¶

Are there any questions before moving on?