import numpy as np
☑ Masking and aggregation
1 Aggregation
In this section, we introduce the concept of aggregation, and cover a number of vectorized aggregation functions that come with the NumPy library.
1.1 Aggregation functions
= np.arange(12).reshape(3, 4)
x x
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
sum(x) np.
66
sum(x, axis=0) np.
array([12, 15, 18, 21])
sum(x, axis=1) np.
array([ 6, 22, 38])
1.2 Other aggregations
sum(axis=1) x.
array([ 6, 22, 38])
max() x.
11
max(axis=1) x.
array([ 3, 7, 11])
min(axis=1) x.
array([0, 4, 8])
=1) x.prod(axis
array([ 0, 840, 7920])
any(axis=1) x.
array([ True, True, True])
all(axis=1) x.
array([False, True, True])
1.3 Statistical metrics as aggregations
np.mean(x)
5.5
np.median(x)
5.5
75) np.percentile(x,
8.25
np.std(x)
3.452052529534663
np.var(x)
11.916666666666666
1.4 Argmin and Argmax
= np.random.uniform(0, 1, (3,4)).round(2)
x x
array([[0.97, 0.18, 0.87, 0.09],
[0.52, 0.92, 0.12, 0.46],
[0.48, 0.09, 0.85, 0.4 ]])
=0) x.argmin(axis
array([2, 2, 1, 0])
=1) x.argmin(axis
array([3, 2, 1])
x.argmin()
3
=0) np.argmax(x, axis
array([0, 1, 0, 1])
=1) np.argmax(x, axis
array([0, 1, 2])
np.argmax(x)
0
2 Selection using boolean arrays
To illustrate the concepts of boolean arrays and how to use them for selection, let’s consider an example.
Suppose we use the performance of 5 students over three different subjects:
Index Math CS Biology
Jack 0. 90 80 75
Jill 1. 93 89 87
Joe 2. 67 98. 88
Jason 3. 77. 89. 80
Jennifer 4. 83. 70. 95
= np.array([
grades 90, 80, 75],
[93, 95, 87],
[67, 98, 88],
[77, 89, 80],
[93, 97, 95],
[
])
= np.array([
names 'Jack',
'Jill',
'Joe',
'Jason',
'Jennifer',
])
A boolean array can be obtained using various logical python predicates that are overloaded by Numpy.
==
equality<
,>
,<=
,>=
np.logical_not
&
and|
# here are the math grades
0] grades[:,
array([90, 93, 67, 77, 83])
# boolean mask of who got A+ in math
0] >= 90 grades[:,
array([ True, True, False, False, False])
# boolean mask can be used as a selection index
0] >= 90] names[grades[:,
array(['Jack', 'Jill'], dtype='<U8')
We can use the logical predicates to express more complex selection conditions.
# boolean mask for A+ in math and CS.
0] >= 90) & (grades[:, 1] >= 90) (grades[:,
array([False, True, False, False, True])
0] >= 90) & (grades[:, 1] >= 90)] names[(grades[:,
array(['Jill', 'Jennifer'], dtype='<U8')
# boolean mask for A+ in math and CS, but not in biology
0] >= 90) & (grades[:, 1] >= 90) & np.logical_not(grades[:, 2]>= 90) (grades[:,
array([False, True, False, False, False])
names[0] >= 90) &
(grades[:, 1] >= 90) &
(grades[:, 2]>= 90)
np.logical_not(grades[:, ]
array(['Jill'], dtype='<U8')