argument=<value>
syntax in the function definition.Series.apply
Method¶Pandas Series objects have a method called apply
that applies a function to elements of the Series.
Let's define a simple function that returns its input plus one.
def add_one(x):
'''Adds 1 to the input.'''
return x + 1
And then try applying it to a Series, s
.
s = pd.Series([3, 2, 3, 9])
s
0 3 1 2 2 3 3 9 dtype: int64
s.apply(add_one)
0 4 1 3 2 4 3 10 dtype: int64
What happened? For each value in s
, that value was passed into the add_one
function and replaced with the result.
def make_it_a_sentence(x):
string = 'The best number is ' + str(x)
return string
s.apply(make_it_a_sentence)
0 The best number is 3 1 The best number is 2 2 The best number is 3 3 The best number is 9 dtype: object
We can apply any function that takes one argument and returns one result.
Of course, we may encounter errors if we expect the wrong kind of argument -- e.g. if we write a function for numbers but apply it to a Series of strings.
s2 = pd.Series(['a', 'e', 'i', 'o', 'u'])
s2.apply(add_one)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb Cell 23 line 2 <a href='vscode-notebook-cell:/Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb#X31sZmlsZQ%3D%3D?line=0'>1</a> s2 = pd.Series(['a', 'e', 'i', 'o', 'u']) ----> <a href='vscode-notebook-cell:/Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb#X31sZmlsZQ%3D%3D?line=1'>2</a> s2.apply(add_one) File /opt/homebrew/anaconda3/envs/uc-python/lib/python3.11/site-packages/pandas/core/series.py:4630, in Series.apply(self, func, convert_dtype, args, **kwargs) 4520 def apply( 4521 self, 4522 func: AggFuncType, (...) 4525 **kwargs, 4526 ) -> DataFrame | Series: 4527 """ 4528 Invoke function on values of Series. 4529 (...) 4628 dtype: float64 4629 """ -> 4630 return SeriesApply(self, func, convert_dtype, args, kwargs).apply() File /opt/homebrew/anaconda3/envs/uc-python/lib/python3.11/site-packages/pandas/core/apply.py:1025, in SeriesApply.apply(self) 1022 return self.apply_str() 1024 # self.f is Callable -> 1025 return self.apply_standard() File /opt/homebrew/anaconda3/envs/uc-python/lib/python3.11/site-packages/pandas/core/apply.py:1076, in SeriesApply.apply_standard(self) 1074 else: 1075 values = obj.astype(object)._values -> 1076 mapped = lib.map_infer( 1077 values, 1078 f, 1079 convert=self.convert_dtype, 1080 ) 1082 if len(mapped) and isinstance(mapped[0], ABCSeries): 1083 # GH#43986 Need to do list(mapped) in order to get treated as nested 1084 # See also GH#25959 regarding EA support 1085 return obj._constructor_expanddim(list(mapped), index=obj.index) File /opt/homebrew/anaconda3/envs/uc-python/lib/python3.11/site-packages/pandas/_libs/lib.pyx:2834, in pandas._libs.lib.map_infer() /Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb Cell 23 line 3 <a href='vscode-notebook-cell:/Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb#X31sZmlsZQ%3D%3D?line=0'>1</a> def add_one(x): <a href='vscode-notebook-cell:/Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb#X31sZmlsZQ%3D%3D?line=1'>2</a> '''Adds 1 to the input.''' ----> <a href='vscode-notebook-cell:/Users/guspowers/repos/uc/intermediate-python-datasci/notebooks/05-Applying_Functions_to_DataFrames.ipynb#X31sZmlsZQ%3D%3D?line=2'>3</a> return x + 1 TypeError: can only concatenate str (not "int") to str
Our functions can perform more complex logic.
For example, perhaps you want to store the sign of each element (positive, negative, or zero)
def sign(x):
'''Reduce an input number to its sign (+, -, 0)'''
if x > 0:
return 1
elif x < 0:
return -1
# In this case x must be equal to 0
else:
return 0
s3
0 13 1 -83 2 -64 3 0 4 4 5 -34 dtype: int64
s3.apply(sign)
0 1 1 -1 2 -1 3 0 4 1 5 -1 dtype: int64
.apply
¶What if the function we pass to apply
requires multiple arguments?
For example, the built-in pow
function requires two arguments.
pow(2, 3)
8
How would we raise all elements of our Series to the third power?
apply
takes an argument called args
for this purpose -- additional arguments can be passed into it, as a list.
s
0 3 1 2 2 3 3 9 dtype: int64
# Apply pow(x, 3) to each x in s
s.apply(pow, args=[3])
0 27 1 8 2 27 3 729 dtype: int64
This is essentially just a more concise version of:
def raise_to_3(x):
return pow(x, 3)
s.apply(raise_to_3)
0 27 1 8 2 27 3 729 dtype: int64
DataFrame.apply
method¶Like Series, DataFrames have an apply
method.
But in this case, apply
applies a function to each row or column of the DataFrame, not each element.
Remember that DataFrame columns and rows are Series -- so the input to the function will be a Series!
def maximum(column):
'''Calculates the maximum value in a column'''
return column.max()
df
digit_one | digit_two | digit_three | |
---|---|---|---|
0 | 1 | 7 | -7 |
1 | 2 | 1 | -4 |
2 | 3 | 5 | 2 |
3 | 6 | 6 | -1 |
df.apply(maximum)
digit_one 6 digit_two 7 digit_three 2 dtype: int64
What happened here?
The maximum function was applied to each column.
The result of each column was a scalar (a single number).
All the results were combined into a single Series.
The apply
method is very useful for rows, because individal elements of the row can be accessed using bracket syntax.
By default, apply
works on columns though -- but we can switch to rows with the axis=1
argument.
def formula(row):
'''Applies a custom formula `a + (b / c)`'''
result = row['digit_one'] + row['digit_two'] / row['digit_three']
return result
df
digit_one | digit_two | digit_three | |
---|---|---|---|
0 | 1 | 7 | -7 |
1 | 2 | 1 | -4 |
2 | 3 | 5 | 2 |
3 | 6 | 6 | -1 |
df.apply(formula, axis=1)
0 0.00 1 1.75 2 5.50 3 0.00 dtype: float64
formula
function was applied to each row and returned a scalar for each.Our last two examples have applied a function to a DataFrame to produce a Series. But what if we wanted to get a DataFrame back instead?
If the applied function returns a Series, all of the resulting Series will be combined back into a DataFrame.
Let's take a look using an updated version of our add_one
function.
def add_one(col):
# Add one to each element of the input column
new_col = col + 1
# Return the updated column
return new_col
df
digit_one | digit_two | digit_three | |
---|---|---|---|
0 | 1 | 7 | -7 |
1 | 2 | 1 | -4 |
2 | 3 | 5 | 2 |
3 | 6 | 6 | -1 |
df.apply(add_one)
digit_one | digit_two | digit_three | |
---|---|---|---|
0 | 2 | 8 | -6 |
1 | 3 | 2 | -3 |
2 | 4 | 6 | 3 |
3 | 7 | 7 | 0 |
apply
method? How are their respective apply
methods different?[3, 1, -4, 4, -9]
. Apply the built-in Python function abs
to your Series.0 'Looking North means West is to your left',
1 'Looking East means North is to your left',
2 'Looking South means East is to your left',
3 'Looking West means South is to your left'
dtype: object
directions = [('North', 'West'),
('East', 'North'),
('South', 'East'),
('West', 'South')]
directions = pd.DataFrame(directions, columns=['facing', 'leftward'])
Are there any questions before we move on?