☑ PyTorch

Author

Ken Pu

1 Introduction

PyTorch is NumPy extended with additional features:

Gradient management
GPU acceleration (when CUDA compatible cards are available)

import torch
import numpy as np

2 NumPy compatibility

x_np = np.array([1., 2., 3.])
y_torch = torch.tensor([[3., 4., 5.],
                        [6., 7., 8.]])

print('x =', x_np)
print('y =', y_torch)

x = [1. 2. 3.]
y = tensor([[3., 4., 5.],
        [6., 7., 8.]])

y_torch @ x_np

tensor([26., 44.], dtype=torch.float64)

We can convert between NumPy and Torch tensor.

y_np = y_torch.numpy()
y_np

array([[3., 4., 5.],
       [6., 7., 8.]], dtype=float32)

x_torch = torch.tensor(x_np)
x_torch

tensor([1., 2., 3.], dtype=torch.float64)

Broadcastin works the same way for PyTorch tensors.

x_torch + y_torch

tensor([[ 4.,  6.,  8.],
        [ 7.,  9., 11.]], dtype=torch.float64)

x_np + y_np

array([[ 4.,  6.,  8.],
       [ 7.,  9., 11.]])

3 Scalar tensor

x = torch.tensor(3.1415, dtype=torch.float16)
x

tensor(3.1406, dtype=torch.float16)

x.shape

torch.Size([])

x.numpy()

array(3.14, dtype=float16)

x.item()

3.140625

4 Gradient computation with autograd

4.1 Computational graph

Consider the expressions:

u = f(x)
v = g(u)

We can express this as a computational graph:

\[ x \overset{f}{\longrightarrow} u \overset{g}{\longrightarrow} v \]

So often, we are interested in the gradients of computed values: \(\nabla u\) and \(\nabla v\). The gradients require the partial derivatives of \(f(\cdot)\) and \(g(f(\cdot))\) respectively.

More complex computational graphs are possible:

       -- u -- z
     /     \    \
x --        --- + -- * --> w
     \             /
       -- v ------

# x: (2, n)

def f(x):
    return x[0] ** 2 + 2 * (x[1] ** 2)

def g(x):
    return (3*(x**2) - 2*x + 10).mean()

x = torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float)
x.shape

torch.Size([2, 4])

u = f(x)
v = g(u)
v

tensor(2354.5000)

5 Autograd

If \(x\) is a tensor created with requires_grad=True, then it collects gradient vectors with respect to itself.

x = torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
x

tensor([[ 1.,  2.,  3.,  4.],
        [-1., -2., -3., -4.]], requires_grad=True)

v = g(f(x))
v

tensor(2354.5000, grad_fn=<MeanBackward0>)

v.backward()

x.grad

tensor([[    8.,    70.,   240.,   572.],
        [  -16.,  -140.,  -480., -1144.]])

6 Gradient managment at a glance

6.1 Zero gradient

x.grad.zero_()

x.grad

tensor([[    8.,    70.,   240.,   572.],
        [  -16.,  -140.,  -480., -1144.]])

x.grad.zero_()
x.grad

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.]])

6.2 Prevent multiple backprop

#
# Forward computation ONCE
#
v = g(f(x))

#
# Backward computation ONCE
#
v.backward()

#
# Cannot have another backward computation (by default)
#
reload(my)
try:
    v.backward()
except Exception as e:
    my.show(e)

Traceback (most recent call last):

  File "...", line 6, in 
    v.backward()

  File "/opt/miniconda3/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(

  File "/opt/miniconda3/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

RuntimeError: Trying to backward through the graph a second
time (or directly access saved tensors after they have
already been freed). Saved intermediate values of the graph
are freed when you call .backward() or autograd.grad().
Specify retain_graph=True if you need to backward through
the graph a second time or if you need to access saved
tensors after calling backward.

6.3 Prevent recursive forward computation

x = torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
x

tensor([[ 1.,  2.,  3.,  4.],
        [-1., -2., -3., -4.]], requires_grad=True)

reload(my)
try:
    x.sub_(1)
except Exception as e:
    my.show(e)

Traceback (most recent call last):

  File "...", line 3, in 
    x.sub_(1)

RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

6.4 Prevent NumPy conversion for gradient-based tensors

x = torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
v = g(f(x))
v

tensor(2354.5000, grad_fn=<MeanBackward0>)

try:
    v.numpy()
except Exception as e:
    my.show(e)

Traceback (most recent call last):

  File "...", line 2, in 
    v.numpy()

RuntimeError: Can't call numpy() on Tensor that requires
grad. Use tensor.detach().numpy() instead.

6.5 Disable gradient tracking in context

#
# recursive computational graph
#

x = torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
x

tensor([[ 1.,  2.,  3.,  4.],
        [-1., -2., -3., -4.]], requires_grad=True)

with torch.no_grad():
    x.sub_(1)

x

tensor([[ 0.,  1.,  2.,  3.],
        [-2., -3., -4., -5.]], requires_grad=True)

6.6 Detaching nodes from computational graph

x = torch.tensor([[1, 2, 3, 4], [-1, -2, -3, -4]], dtype=torch.float, requires_grad=True)
v = g(f(x))
v

tensor(2354.5000, grad_fn=<MeanBackward0>)

v.detach()

tensor(2354.5000)

v.detach().numpy()

array(2354.5, dtype=float32)

7 Optimization using PyTorch features

import torch.optim as optim
import torch.nn as nn
x = torch.tensor([-5, -2], dtype=torch.float, requires_grad=True)
optimizer = optim.SGD([x], lr=0.1)

for i in range(20):
    optimizer.zero_grad()
    loss = f(x)
    loss.backward()
    optimizer.step()
    print(loss.detach().numpy().round(2))