Applying Functions to DataFrames¶

Applied Review¶

Functions¶

  • Functions are Python codeblocks abstracted into a single name.
  • Functions take inputs and produce outputs.
  • Using functions well means
    • Naming your functions meaningfully
    • Creating docstrings for your functions
    • Commenting throughout -- as always!
  • Functions allow their arguments to be passed in by order or by name.
  • It's possible to create default values for function arguments using the argument=<value> syntax in the function definition.

The Series.apply Method¶

Pandas Series objects have a method called apply that applies a function to elements of the Series.

Let's define a simple function that returns its input plus one.

In [2]:
def add_one(x):
    '''Adds 1 to the input.'''
    return x + 1

And then try applying it to a Series, s.

In [3]:
s = pd.Series([3, 2, 3, 9])
s
Out[3]:
0    3
1    2
2    3
3    9
dtype: int64
In [4]:
s.apply(add_one)
Out[4]:
0     4
1     3
2     4
3    10
dtype: int64

What happened? For each value in s, that value was passed into the add_one function and replaced with the result.

In [5]:
def make_it_a_sentence(x):
    string = 'The best number is ' + str(x)
    return string
In [6]:
s.apply(make_it_a_sentence)
Out[6]:
0    The best number is 3
1    The best number is 2
2    The best number is 3
3    The best number is 9
dtype: object

We can apply any function that takes one argument and returns one result.

Of course, we may encounter errors if we expect the wrong kind of argument -- e.g. if we write a function for numbers but apply it to a Series of strings.

In [8]:
s2 = pd.Series(['a', 'e', 'i', 'o', 'u'])
s2.apply(add_one)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb Cell 23 line 2
      <a href='vscode-notebook-cell:/Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb#X31sZmlsZQ%3D%3D?line=0'>1</a> s2 = pd.Series(['a', 'e', 'i', 'o', 'u'])
----> <a href='vscode-notebook-cell:/Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb#X31sZmlsZQ%3D%3D?line=1'>2</a> s2.apply(add_one)

File /opt/homebrew/anaconda3/envs/uc-python/lib/python3.11/site-packages/pandas/core/series.py:4630, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4520 def apply(
   4521     self,
   4522     func: AggFuncType,
   (...)
   4525     **kwargs,
   4526 ) -> DataFrame | Series:
   4527     """
   4528     Invoke function on values of Series.
   4529 
   (...)
   4628     dtype: float64
   4629     """
-> 4630     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File /opt/homebrew/anaconda3/envs/uc-python/lib/python3.11/site-packages/pandas/core/apply.py:1025, in SeriesApply.apply(self)
   1022     return self.apply_str()
   1024 # self.f is Callable
-> 1025 return self.apply_standard()

File /opt/homebrew/anaconda3/envs/uc-python/lib/python3.11/site-packages/pandas/core/apply.py:1076, in SeriesApply.apply_standard(self)
   1074     else:
   1075         values = obj.astype(object)._values
-> 1076         mapped = lib.map_infer(
   1077             values,
   1078             f,
   1079             convert=self.convert_dtype,
   1080         )
   1082 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1083     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1084     #  See also GH#25959 regarding EA support
   1085     return obj._constructor_expanddim(list(mapped), index=obj.index)

File /opt/homebrew/anaconda3/envs/uc-python/lib/python3.11/site-packages/pandas/_libs/lib.pyx:2834, in pandas._libs.lib.map_infer()

/Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb Cell 23 line 3
      <a href='vscode-notebook-cell:/Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb#X31sZmlsZQ%3D%3D?line=0'>1</a> def add_one(x):
      <a href='vscode-notebook-cell:/Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb#X31sZmlsZQ%3D%3D?line=1'>2</a>     '''Adds 1 to the input.'''
----> <a href='vscode-notebook-cell:/Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb#X31sZmlsZQ%3D%3D?line=2'>3</a>     return x + 1

TypeError: can only concatenate str (not "int") to str

Our functions can perform more complex logic.

For example, perhaps you want to store the sign of each element (positive, negative, or zero)

In [9]:
def sign(x):
    '''Reduce an input number to its sign (+, -, 0)'''
    if x > 0:
        return 1
    elif x < 0:
        return -1
    # In this case x must be equal to 0
    else:
        return 0
In [11]:
s3
Out[11]:
0    13
1   -83
2   -64
3     0
4     4
5   -34
dtype: int64
In [12]:
s3.apply(sign)
Out[12]:
0    1
1   -1
2   -1
3    0
4    1
5   -1
dtype: int64

Your Turn¶

  1. Make a Series that is filled with the letters of your name. Enter the letters in lowercase.
  2. Write a function that transforms an input letter to uppercase.
  3. Apply your new function to the Series.

Additional Arguments to .apply¶

What if the function we pass to apply requires multiple arguments?

For example, the built-in pow function requires two arguments.

In [13]:
pow(2, 3)
Out[13]:
8

How would we raise all elements of our Series to the third power?

apply takes an argument called args for this purpose -- additional arguments can be passed into it, as a list.

In [14]:
s
Out[14]:
0    3
1    2
2    3
3    9
dtype: int64
In [15]:
# Apply pow(x, 3) to each x in s
s.apply(pow, args=[3])
Out[15]:
0     27
1      8
2     27
3    729
dtype: int64

This is essentially just a more concise version of:

In [16]:
def raise_to_3(x):
    return pow(x, 3)
s.apply(raise_to_3)
Out[16]:
0     27
1      8
2     27
3    729
dtype: int64

The DataFrame.apply method¶

Like Series, DataFrames have an apply method.

But in this case, apply applies a function to each row or column of the DataFrame, not each element.

Remember that DataFrame columns and rows are Series -- so the input to the function will be a Series!

Applying Functions to Columns¶

In [17]:
def maximum(column):
    '''Calculates the maximum value in a column'''
    return column.max()
In [19]:
df
Out[19]:
digit_one digit_two digit_three
0 1 7 -7
1 2 1 -4
2 3 5 2
3 6 6 -1
In [20]:
df.apply(maximum)
Out[20]:
digit_one      6
digit_two      7
digit_three    2
dtype: int64

What happened here?

  • The maximum function was applied to each column.

  • The result of each column was a scalar (a single number).

  • All the results were combined into a single Series.

Applying Functions to Rows¶

The apply method is very useful for rows, because individal elements of the row can be accessed using bracket syntax.

By default, apply works on columns though -- but we can switch to rows with the axis=1 argument.

In [21]:
def formula(row):
    '''Applies a custom formula `a + (b / c)`'''
    result = row['digit_one'] + row['digit_two'] / row['digit_three']
    return result
In [22]:
df
Out[22]:
digit_one digit_two digit_three
0 1 7 -7
1 2 1 -4
2 3 5 2
3 6 6 -1
In [23]:
df.apply(formula, axis=1)
Out[23]:
0    0.00
1    1.75
2    5.50
3    0.00
dtype: float64
  • The formula function was applied to each row and returned a scalar for each.
  • Those scalars were combined into a Series, which is our result.

Applying Functions that Return Series¶

Our last two examples have applied a function to a DataFrame to produce a Series. But what if we wanted to get a DataFrame back instead?

If the applied function returns a Series, all of the resulting Series will be combined back into a DataFrame.

Let's take a look using an updated version of our add_one function.

In [24]:
def add_one(col):
    # Add one to each element of the input column
    new_col = col + 1
    # Return the updated column
    return new_col
In [25]:
df
Out[25]:
digit_one digit_two digit_three
0 1 7 -7
1 2 1 -4
2 3 5 2
3 6 6 -1
In [26]:
df.apply(add_one)
Out[26]:
digit_one digit_two digit_three
0 2 8 -6
1 3 2 -3
2 4 6 3
3 7 7 0

Your Turn¶

  1. What two types of Pandas objects support the apply method? How are their respective apply methods different?
  2. Create a Series with the values [3, 1, -4, 4, -9]. Apply the built-in Python function abs to your Series.
  3. Given the below DataFrame, write and apply a custom function to it to create a Series that looks like this:
0   'Looking North means West is to your left',
1   'Looking East means North is to your left',
2   'Looking South means East is to your left',
3   'Looking West means South is to your left'
dtype: object
In [27]:
directions = [('North', 'West'),
              ('East', 'North'),
              ('South', 'East'),
              ('West', 'South')]
directions = pd.DataFrame(directions, columns=['facing', 'leftward'])

Questions¶

Are there any questions before we move on?