☑ PyTorch

Author

Ken Pu

1 Introduction

PyTorch is NumPy extended with additional features:

  1. Gradient management
  2. GPU acceleration (when CUDA compatible cards are available)
import torch
import numpy as np

2 NumPy compatibility

x_np = np.array([1., 2., 3.])
y_torch = torch.tensor([[3., 4., 5.],
                        [6., 7., 8.]])

print('x =', x_np)
print('y =', y_torch)
x = [1. 2. 3.]
y = tensor([[3., 4., 5.],
        [6., 7., 8.]])
y_torch @ x_np
tensor([26., 44.], dtype=torch.float64)

We can convert between NumPy and Torch tensor.

y_np = y_torch.numpy()
y_np
array([[3., 4., 5.],
       [6., 7., 8.]], dtype=float32)
x_torch = torch.tensor(x_np)
x_torch
tensor([1., 2., 3.], dtype=torch.float64)

Broadcastin works the same way for PyTorch tensors.

x_torch + y_torch
tensor([[ 4.,  6.,  8.],
        [ 7.,  9., 11.]], dtype=torch.float64)
x_np + y_np
array([[ 4.,  6.,  8.],
       [ 7.,  9., 11.]])

3 Scalar tensor

x = torch.tensor(3.1415, dtype=torch.float16)
x
tensor(3.1406, dtype=torch.float16)
x.shape
torch.Size([])
x.numpy()
array(3.14, dtype=float16)
x.item()
3.140625

4 Gradient computation with autograd

4.1 Computational graph

Consider the expressions:

u = f(x)
v = g(u)

We can express this as a computational graph:

\[ x \overset{f}{\longrightarrow} u \overset{g}{\longrightarrow} v \]

So often, we are interested in the gradients of computed values: \(\nabla u\) and \(\nabla v\). The gradients require the partial derivatives of \(f(\cdot)\) and \(g(f(\cdot))\) respectively.

More complex computational graphs are possible:

       -- u -- z
     /     \    \
x --        --- + -- * --> w
     \             /
       -- v ------
# x: (2, n)

def f(x):
    return x[0] ** 2 + 2 * (x[1] ** 2)

def g(x):
    return (3*(x**2) - 2*x + 10).mean()
x = torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float)
x.shape
torch.Size([2, 4])
u = f(x)
v = g(u)
v
tensor(2354.5000)

5 Autograd

If \(x\) is a tensor created with requires_grad=True, then it collects gradient vectors with respect to itself.

x = torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
x
tensor([[ 1.,  2.,  3.,  4.],
        [-1., -2., -3., -4.]], requires_grad=True)
v = g(f(x))
v
tensor(2354.5000, grad_fn=<MeanBackward0>)
v.backward()
x.grad
tensor([[    8.,    70.,   240.,   572.],
        [  -16.,  -140.,  -480., -1144.]])

6 Gradient managment at a glance

6.1 Zero gradient

x.grad.zero_()
x.grad
tensor([[    8.,    70.,   240.,   572.],
        [  -16.,  -140.,  -480., -1144.]])
x.grad.zero_()
x.grad
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.]])

6.2 Prevent multiple backprop

#
# Forward computation ONCE
#
v = g(f(x))

#
# Backward computation ONCE
#
v.backward()
#
# Cannot have another backward computation (by default)
#
reload(my)
try:
    v.backward()
except Exception as e:
    my.show(e)
Traceback (most recent call last):

  File "...", line 6, in 
    v.backward()

  File "/opt/miniconda3/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(

  File "/opt/miniconda3/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

RuntimeError: Trying to backward through the graph a second
time (or directly access saved tensors after they have
already been freed). Saved intermediate values of the graph
are freed when you call .backward() or autograd.grad().
Specify retain_graph=True if you need to backward through
the graph a second time or if you need to access saved
tensors after calling backward.

6.3 Prevent recursive forward computation

x = torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
x
tensor([[ 1.,  2.,  3.,  4.],
        [-1., -2., -3., -4.]], requires_grad=True)
reload(my)
try:
    x.sub_(1)
except Exception as e:
    my.show(e)
Traceback (most recent call last):

  File "...", line 3, in 
    x.sub_(1)

RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

6.4 Prevent NumPy conversion for gradient-based tensors

x = torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
v = g(f(x))
v
tensor(2354.5000, grad_fn=<MeanBackward0>)
try:
    v.numpy()
except Exception as e:
    my.show(e)
Traceback (most recent call last):

  File "...", line 2, in 
    v.numpy()

RuntimeError: Can't call numpy() on Tensor that requires
grad. Use tensor.detach().numpy() instead.

6.5 Disable gradient tracking in context

#
# recursive computational graph
#

x = torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
x
tensor([[ 1.,  2.,  3.,  4.],
        [-1., -2., -3., -4.]], requires_grad=True)
with torch.no_grad():
    x.sub_(1)

x
tensor([[ 0.,  1.,  2.,  3.],
        [-2., -3., -4., -5.]], requires_grad=True)

6.6 Detaching nodes from computational graph

x = torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
v = g(f(x))
v
tensor(2354.5000, grad_fn=<MeanBackward0>)
v.detach()
tensor(2354.5000)
v.detach().numpy()
array(2354.5, dtype=float32)

7 Optimization using PyTorch features

import torch.optim as optim
import torch.nn as nn
x = torch.tensor([-5, -2], dtype=torch.float, requires_grad=True)
optimizer = optim.SGD([x], lr=0.1)

for i in range(20):
    optimizer.zero_grad()
    loss = f(x)
    loss.backward()
    optimizer.step()
    print(loss.detach().numpy().round(2))
33.0
18.88
11.28
6.93
4.33
2.73
1.74
1.11
0.71
0.45
0.29
0.18
0.12
0.08
0.05
0.03
0.02
0.01
0.01
0.01