import numpy as np
from time import time
☑ Vectorized computation
1 Why not use loop
1.1 Arrays are mutable
= np.arange(5)
x_array x_array
array([0, 1, 2, 3, 4])
0] = 99
x_array[
x_array
array([99, 1, 2, 3, 4])
3:6] = [99, 99]
x_array[
x_array
array([99, 1, 2, 99, 99])
1.2 Loops are slow
= np.random.random((1000, 2000))
bigmatrix bigmatrix.size
2000000
#
# a loop based computation of sum of max of rows
#
def sum_of_max_loop(bigmatrix):
= bigmatrix.shape
m,n
= np.full(m, np.nan)
max_values
for row in range(m):
for col in range(n):
= bigmatrix[row, col]
v if col == 0:
= v
max_values[row] else:
= max(max_values[row], v)
max_values[row]
= 0
sum_of_max for row in range(m):
+= max_values[row]
sum_of_max
return sum_of_max
= time()
start = sum_of_max_loop(bigmatrix)
sum_of_max = time() - start
duration_loop
print("sum_of_max =", sum_of_max)
print("Took: %.4f seconds." % duration_loop)
sum_of_max = 999.466897192648
Took: 0.6748 seconds.
1.3 Vectorized computation
def sum_of_max_vec(bigmatrix):
= np.max(bigmatrix, axis=1)
max_values return np.sum(max_values)
= time()
start = sum_of_max_vec(bigmatrix)
sum_of_max = time() - start
duration_vec
print("sum_of_max =", sum_of_max)
print("Took: %.4f seconds." % duration_vec)
print("Speed-up factor: %f" % (duration_loop / duration_vec))
sum_of_max = 999.466897192648
Took: 0.0011 seconds.
Speed-up factor: 622.691529
2 UFunc: Universal Functions
2.1 Unary ufunc
We can think of arrays as a collection of values, organized into a multidimensional grid. Universal functions, aka ufunc, are functions that can be applied to the entire array effciently. When processing the entire array, highly optimized low-level execution plan is used, and thus the efficiency is magnitudes better than the equivalent loop-based computation.
A unary function is defined as a function with just a single input.
A unary ufunc is a function that processes an entire array by applying the unary function to each element of the array. The output is an array of the output values, organized in the same shape as the input array.
#
# An arbitrarily shaped NDArray
#
= np.random.uniform(-100, 100, (3, 2, 2))
x = x.round(2)
x x.shape
(3, 2, 2)
abs(x) np.
array([[[24.88, 27.58],
[ 0.34, 25.56]],
[[50.63, 21.67],
[71.52, 31.08]],
[[45.12, 15.64],
[85.27, 81.13]]])
/ 100 x
array([[[ 0.2488, 0.2758],
[-0.0034, -0.2556]],
[[ 0.5063, 0.2167],
[ 0.7152, -0.3108]],
[[ 0.4512, -0.1564],
[ 0.8527, -0.8113]]])
1 / x
array([[[ 0.04019293, 0.03625816],
[-2.94117647, -0.03912363]],
[[ 0.01975114, 0.04614675],
[ 0.0139821 , -0.03217503]],
[[ 0.02216312, -0.06393862],
[ 0.01172745, -0.0123259 ]]])
/ 100 * 2 * np.pi x
array([[[ 1.5632565 , 1.73290251],
[-0.02136283, -1.60598216]],
[[ 3.18117672, 1.36156626],
[ 4.49373413, -1.95281399]],
[[ 2.83497321, -0.98269018],
[ 5.35767211, -5.09754824]]])
np.sin(x)
array([[[-0.25005904, 0.63987368],
[-0.33348709, -0.41437756]],
[[ 0.35649858, 0.31565662],
[ 0.67179621, 0.32964406]],
[[ 0.90767182, -0.06791096],
[-0.43226075, 0.52388791]]])
2) np.power(x,
array([[[6.1901440e+02, 7.6065640e+02],
[1.1560000e-01, 6.5331360e+02]],
[[2.5633969e+03, 4.6958890e+02],
[5.1151104e+03, 9.6596640e+02]],
[[2.0358144e+03, 2.4460960e+02],
[7.2709729e+03, 6.5820769e+03]]])
abs(x)) np.log(np.
array([[[ 3.21406427, 3.31709087],
[-1.07880966, 3.24102863]],
[[ 3.92454429, 3.07592882],
[ 4.26997713, 3.43656453]],
[[ 3.80932561, 2.74983174],
[ 4.44582269, 4.39605281]]])
2.2 Binary ufunc
A binary function is a function with two parameters, \(f(x,y)\).
Binary ufunc is a universal function that processes two arrays, \(A\) and \(B\), in their entirety.
In the same way as unary ufunc, binary ufunc computes the output elements based on elements in the input arrays. This is done by pairing up the elements of \(A\) and \(B\) by their indexes. Thus, it is assumed that \(A\) and \(B\) have exactly the same shape.
The output array has the same shape as the input arrays.
Note:
The restriction of \(\mathrm{shape}(A) = \mathrm{shape}(B)\) can be relaxed under certain conditions, known as broadcasting.
= np.arange(1, 1+12).reshape(3,4)
A = np.arange(10, 10+12).reshape(3,4)
B
print("A =", A)
print("B =", B)
A = [[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
B = [[10 11 12 13]
[14 15 16 17]
[18 19 20 21]]
+ B A
array([[11, 13, 15, 17],
[19, 21, 23, 25],
[27, 29, 31, 33]])
/ A B
array([[10. , 5.5 , 4. , 3.25 ],
[ 2.8 , 2.5 , 2.28571429, 2.125 ],
[ 2. , 1.9 , 1.81818182, 1.75 ]])
np.divide(B, A)
array([[10. , 5.5 , 4. , 3.25 ],
[ 2.8 , 2.5 , 2.28571429, 2.125 ],
[ 2. , 1.9 , 1.81818182, 1.75 ]])
// A B
array([[10, 5, 4, 3],
[ 2, 2, 2, 2],
[ 2, 1, 1, 1]])
np.floor_divide(B, A)
array([[10, 5, 4, 3],
[ 2, 2, 2, 2],
[ 2, 1, 1, 1]])
** A B
array([[ 10, 121, 1728,
28561],
[ 537824, 11390625, 268435456,
6975757441],
[ 198359290368, 6131066257801, 204800000000000,
7355827511386641]])
np.power(B, A)
array([[ 10, 121, 1728,
28561],
[ 537824, 11390625, 268435456,
6975757441],
[ 198359290368, 6131066257801, 204800000000000,
7355827511386641]])
2.3 Broadcasting
Given a binary ufunc, \(F\). If we are to apply \(F\) to two arrays as \(F(A, B)\), it is required that \(A\) and \(B\) have the same shape.
So, what happens when
\[A.\mathrm{shape} \not= B.\mathrm{shape}\]
In general, this will lead to a runtime error.
We can manually adjust the shape using array transformation (e.g. repeat) to make the shapes identical.
Broadcasting is a mechanism to allow binary ufunc applied to certain cases in which the two input arrays differ in their shapes.
Rules of broadcasting
Any axis with size 1 will be automatically repeated to match the shapes.
If the number of axes of \(A\) and \(B\) do not agree, then the one with fewer axes will automatically have
np.newaxis
added.
A
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
= np.array([10, 20, 30, 40]).reshape(1, 4)
B B
array([[10, 20, 30, 40]])
= np.array([10, 20, 30]).reshape(3, 1)
C C
array([[10],
[20],
[30]])
+ B A
array([[11, 22, 33, 44],
[15, 26, 37, 48],
[19, 30, 41, 52]])
+ np.repeat(B, 3, axis=0) A
array([[11, 22, 33, 44],
[15, 26, 37, 48],
[19, 30, 41, 52]])
+ C A
array([[11, 12, 13, 14],
[25, 26, 27, 28],
[39, 40, 41, 42]])
+ np.repeat(C, 4, axis=1) A
array([[11, 12, 13, 14],
[25, 26, 27, 28],
[39, 40, 41, 42]])