import numpy as np
import matplotlib.pyplot as pl
from matplotlib_inline.backend_inline import set_matplotlib_formats
'svg') set_matplotlib_formats(
☑ Calculus
1 Derivatives
1.1 Functions
Function can take zero or more inputs
Function can return a value of any dimension
Some Definitions
Arity of a function
The number of arguments expected by the function is called the arity of the function. A function \(f(x, y)\) has an arity of 2. Note, each argument can be a vector over multiple dimensions. A function can have zero arity, which means that the function returns a constant value.
Scalar functions
Potential field
Vector field
def f1():
return 3.1415
f1()
3.1415
def f2(x):
return x**2 + 3*x + 15
0.5) f2(
16.75
def f3(x, y):
return (x-1)**2 + (y-2)**2
0.5, -1.5) f3(
12.5
1.2 Derivatives
Given a function \(f:\mathbb{R}\to\mathbb{R}\), the derivative is given by:
\[ f'(x) = \lim_{h \rightarrow 0} \frac{f(x+h) - f(x)}{h} \]
There are several notations for derivatives:
\[ f'(x) = y' = \frac{dy}{dx} = \frac{df}{dx} = \frac{d}{dx} f(x) = Df(x) = D_x f(x) \]
1.3 Derivatives of elementary functions
- \(DC = 0\) where \(C\) is a constant,
- \(Dx^n = nx^{n-1}\). Note \(n\) need not be an integer.
- \(De^x = e^x\),
- \(D\ln(x) = 1/x\)
- \(D\sin(x) = \cos(x)\)
- \(D\cos(x) = -\sin(x)\)
1.4 Derivatives of arithmetics of functions
Scaling by constant: \(D(c\cdot f) = c\cdot(Df)\)
Addition: \(D(f + g) = Df + Dg\)
Multiplications: \(D(f\cdot g) = (Df)\cdot g + f\cdot(Dg)\)
Fractions:
\[ D\left[\frac{f}{g}\right] = \frac{g(x)\cdot Df(x) - f(x)\cdot Dg(x)}{g(x)^2} \]
1.5 Derivatives of composition of functions
Composition: consider \(f(g(x))\). Let’s define:
- \(u = g(x)\)
- \(y = f(u)\)
Then, we have:
\[ \frac{dy}{dx} = \frac{dy}{du}\cdot\frac{du}{dx} \]
So, we have:
\[ D(fg)(x) = (Df)(g(x))\cdot (Dg)(x) \]
This is known as the chain rule, and sometimes written as:
\[(fg)'(x) = f'(g(x))\cdot g'(x)\]
2 Understanding derivatives
2.1 What’s a derivative?
Think of the slopes as the directions to take along the \(x\)-axis in order to increase the value of \(f(x)\).
= np.linspace(-10, 10)
x = f2(x)
y =(4,4))
pl.figure(figsize; pl.plot(x, y)
def f2_der(x):
return 2*x + 3
= f2_der(x)
y_der '--', x, y_der); pl.plot(x, y,
#
# This is a line going through (x0, y0) with slope m
#
def line(x, x0, y0, m):
= y0 - m * x0
b return m * x + b
= np.linspace(-10, 10)
x = f2(x)
y = -5, f2(-5), f2_der(-5)
x0, y0, m0 = 2.5, f2(2.5), f2_der(2.5)
x1, y1, m1
'--',
pl.plot(x, y, '-',
x, line(x, x0, y0, m0), ; x, line(x, x1, y1, m1))
2.2 Partial Derivative
The definition:
\[\frac{\partial y}{\partial x_i} = \lim_{h \rightarrow 0} \frac{f(x_1, \ldots, x_{i-1}, x_i+h, x_{i+1}, \ldots, x_n) - f(x_1, \ldots, x_i, \ldots, x_n)}{h}\]
It is expressed by several equivalent notations:
\[\frac{\partial y}{\partial x_i} = \frac{\partial f}{\partial x_i} = f'_{x_i} = f'_i = D_i f = D_{x_i} f\]
2.3 Gradients
Suppose \(f:\mathbb{R}^n\to\mathbb{R}\) is a potential function over a vector space \(\mathbb{R}^n\).
We write the function \(f(\mathbf{x})\) with the input vector as:
\(\mathbf{x} = (x_1, x_2, \dots, x_n)\).
\[ \nabla f(\mathbf{x}) = \left[ \begin{array}{c} \frac{\partial f(\mathbf{x})}{\partial x_1} \\ \frac{\partial f(\mathbf{x})}{\partial x_2} \\ \vdots \\ \frac{\partial f(\mathbf{x})}{\partial x_n} \end{array} \right] \]
2.4 Chain Rules
Recall the basic form of chain rule for single variable function:
- \(y = f(u)\) where \(f:\mathbb{R}\to\mathbb{R}\)
- \(u = g(x)\) where \(g:\mathbb{R}\to\mathbb{R}\)
Then,
\[\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} \]
Recall the basic form of chain rule for single variable function:
- \(y = f(u_1, u_2, u_3, \dots, u_m)\)
- \(u_i = g_i(x_1, x_2, \dots, x_n)\)
Then,
\[\frac{\partial y}{\partial x_i} = \sum_{j=1}^m \frac{\partial y}{\partial u_j}\frac{\partial u_j}{\partial x_i} \]
2.5 Gradient of potential field
def g(x, y):
return (x-1)**2 + (y-2)**2
# Let's plot this in the 2D
def contour():
= np.linspace(-4, 4, 100)
xs = np.linspace(-4, 4, 100)
ys = np.meshgrid(xs, ys)
xx, yy = g(xx, yy)
z =20);
pl.contour(xx, yy, z, levels
=(4,4))
pl.figure(figsize contour()
from matplotlib import cm
def bowl(ax):
= np.linspace(-4, 4, 100)
xs = np.linspace(-4, 4, 100)
ys = np.meshgrid(xs, ys)
xx, yy = g(xx, yy)
z =cm.coolwarm);
ax.plot_surface(xx, yy, z, cmap
= pl.figure(figsize=(7,7))
fig = fig.add_subplot(projection='3d')
ax
bowl(ax)
def g_der_x(x, y):
return 2*(x-1)
def g_der_y(x, y):
return 2*(y-2)
= np.linspace(-4, 4, 10)
xs = np.linspace(-4, 4, 10)
ys = np.meshgrid(xs, ys)
xx, yy = g(xx, yy)
z
= g_der_x(xx, yy)
u = g_der_y(xx, yy)
v
= pl.figure(figsize=(12,6))
fig 1, 2, 1)
fig.add_subplot(
contour();
pl.quiver(xx, yy, u, v)
= fig.add_subplot(1, 2, 2, projection='3d')
ax ; bowl(ax)
2.6 Interpretation of gradient
Let \(\mathbf{v} = \nabla f(\mathbf{x})\)
The direction of \(\mathbf{v}\) tells us the direction to take to maximize the change in \(f(\mathbf{x}+\Delta\mathbf{x})\).
The magnitude \(\|\mathbf{v}\|\) tells us how much change we can expect by taking a step of length 1, i.e. \(\|\Delta x\| = 1\), if the gradient at \(\mathbf{x}\) applies to the entire input space.