import torch
import numpy as np
☑ PyTorch
1 Introduction
PyTorch is NumPy extended with additional features:
- Gradient management
- GPU acceleration (when CUDA compatible cards are available)
2 NumPy compatibility
= np.array([1., 2., 3.])
x_np = torch.tensor([[3., 4., 5.],
y_torch 6., 7., 8.]])
[
print('x =', x_np)
print('y =', y_torch)
x = [1. 2. 3.]
y = tensor([[3., 4., 5.],
[6., 7., 8.]])
@ x_np y_torch
tensor([26., 44.], dtype=torch.float64)
We can convert between NumPy and Torch tensor.
= y_torch.numpy()
y_np y_np
array([[3., 4., 5.],
[6., 7., 8.]], dtype=float32)
= torch.tensor(x_np)
x_torch x_torch
tensor([1., 2., 3.], dtype=torch.float64)
Broadcastin works the same way for PyTorch tensors.
+ y_torch x_torch
tensor([[ 4., 6., 8.],
[ 7., 9., 11.]], dtype=torch.float64)
+ y_np x_np
array([[ 4., 6., 8.],
[ 7., 9., 11.]])
3 Scalar tensor
= torch.tensor(3.1415, dtype=torch.float16)
x x
tensor(3.1406, dtype=torch.float16)
x.shape
torch.Size([])
x.numpy()
array(3.14, dtype=float16)
x.item()
3.140625
4 Gradient computation with autograd
4.1 Computational graph
Consider the expressions:
= f(x)
u = g(u) v
We can express this as a computational graph:
\[ x \overset{f}{\longrightarrow} u \overset{g}{\longrightarrow} v \]
So often, we are interested in the gradients of computed values: \(\nabla u\) and \(\nabla v\). The gradients require the partial derivatives of \(f(\cdot)\) and \(g(f(\cdot))\) respectively.
More complex computational graphs are possible:
-- u -- z
/ \ \
x -- --- + -- * --> w
\ /
-- v ------
# x: (2, n)
def f(x):
return x[0] ** 2 + 2 * (x[1] ** 2)
def g(x):
return (3*(x**2) - 2*x + 10).mean()
= torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float)
x x.shape
torch.Size([2, 4])
= f(x)
u = g(u)
v v
tensor(2354.5000)
5 Autograd
If \(x\) is a tensor created with requires_grad=True
, then it collects gradient vectors with respect to itself.
= torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
x x
tensor([[ 1., 2., 3., 4.],
[-1., -2., -3., -4.]], requires_grad=True)
= g(f(x))
v v
tensor(2354.5000, grad_fn=<MeanBackward0>)
v.backward()
x.grad
tensor([[ 8., 70., 240., 572.],
[ -16., -140., -480., -1144.]])
6 Gradient managment at a glance
6.1 Zero gradient
x.grad.zero_()
x.grad
tensor([[ 8., 70., 240., 572.],
[ -16., -140., -480., -1144.]])
x.grad.zero_() x.grad
tensor([[0., 0., 0., 0.],
[0., 0., 0., 0.]])
6.2 Prevent multiple backprop
#
# Forward computation ONCE
#
= g(f(x))
v
#
# Backward computation ONCE
#
v.backward()
#
# Cannot have another backward computation (by default)
#
reload(my)
try:
v.backward()except Exception as e:
my.show(e)
Traceback (most recent call last): File "...", line 6, inv.backward() File "/opt/miniconda3/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/opt/miniconda3/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
6.3 Prevent recursive forward computation
= torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
x x
tensor([[ 1., 2., 3., 4.],
[-1., -2., -3., -4.]], requires_grad=True)
reload(my)
try:
1)
x.sub_(except Exception as e:
my.show(e)
Traceback (most recent call last): File "...", line 3, inx.sub_(1) RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
6.4 Prevent NumPy conversion for gradient-based tensors
= torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
x = g(f(x))
v v
tensor(2354.5000, grad_fn=<MeanBackward0>)
try:
v.numpy()except Exception as e:
my.show(e)
Traceback (most recent call last): File "...", line 2, inv.numpy() RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
6.5 Disable gradient tracking in context
#
# recursive computational graph
#
= torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
x x
tensor([[ 1., 2., 3., 4.],
[-1., -2., -3., -4.]], requires_grad=True)
with torch.no_grad():
1)
x.sub_(
x
tensor([[ 0., 1., 2., 3.],
[-2., -3., -4., -5.]], requires_grad=True)
6.6 Detaching nodes from computational graph
= torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
x = g(f(x))
v v
tensor(2354.5000, grad_fn=<MeanBackward0>)
v.detach()
tensor(2354.5000)
v.detach().numpy()
array(2354.5, dtype=float32)
7 Optimization using PyTorch features
import torch.optim as optim
import torch.nn as nn
= torch.tensor([-5, -2], dtype=torch.float, requires_grad=True)
x = optim.SGD([x], lr=0.1)
optimizer
for i in range(20):
optimizer.zero_grad()= f(x)
loss
loss.backward()
optimizer.step()print(loss.detach().numpy().round(2))
33.0
18.88
11.28
6.93
4.33
2.73
1.74
1.11
0.71
0.45
0.29
0.18
0.12
0.08
0.05
0.03
0.02
0.01
0.01
0.01