= [
xs "name": "Joe", "grade": 80},
{"name": "Jack", "grade": 85},
{"name": "Jill", "grade": 90},
{
] xs
[{'name': 'Joe', 'grade': 80},
{'name': 'Jack', 'grade': 85},
{'name': 'Jill', 'grade': 90}]
xs = [
{"name": "Joe", "grade": 80},
{"name": "Jack", "grade": 85},
{"name": "Jill", "grade": 90},
]
xs
[{'name': 'Joe', 'grade': 80},
{'name': 'Jack', 'grade': 85},
{'name': 'Jill', 'grade': 90}]
(tensor([[ 1., 80.],
[ 2., 85.],
[ 3., 90.]]),
torch.Size([3, 2]))
We can express update_fn
as a linear layer in the form of: \[ s = As + Bt + c\]
With \(s\in\mathbb{R}^2\) and \(t\in\mathbb{R}^2\), \[ \begin{eqnarray} \left[\begin{array}{c} s[0] + t[1] \\ s[1] + 1 \end{array}\right] = \left[\begin{array}{c} 1 & 0 \\ 0 & 1 \end{array}\right]s + \left[\begin{array}{c} 0 & 1 \\ 0 & 0 \end{array}\right]t + \left[\begin{array}{c} 0 \\ 1 \end{array} \right] \end{eqnarray} \]
Suppose we have a sequence of vectors:
\[\mathrm{seq}\in\mathbb{R}^{L\times n}\]
We want to design a neural network \(F\) that can accept sequences of different \(L\) as inputs.
\[ F(\mathrm{seq})\in\mathbb{R}^k \]
The design of \(F\) involves:
\[ F(\mathrm{sequence}) = \mathbf{REDUCE}(h, \mathrm{seq}, s_0) \]
Letβs implement the model as a PyTorch module.
This is the non-batch processing version. Namely,
input_sequence
is just one sequence, of the shape (L, dim_input)
, where L
is the length of the sequence.init_state
is a single vector of the shape (dim_state,)
.There is no axis for batch.
Letβs try it out with some random inputs.
Suppose we capture the cursor movement on the screen, and model them as variable length sequences of 2D vectors. The learning task is to classify them into four known categories.
Given enough training data, we can use a RNN to perform the classification.
tensor([[ 0.7020, -0.7205],
[-0.2819, 0.5182],
[-0.3640, -0.9410],
[-1.1234, -0.8819],
[ 1.2646, 1.3496]])
rnn = MyRNN(
dim_input=dim_input,
dim_state=dim_state,
activation_fn=nn.ReLU())
final_state = rnn(seq_1, init_state)
final_state
tensor([1.0819, 1.2778, 0.1784, 0.0000, 0.0000, 1.2444, 0.0000, 0.0000, 0.2301,
0.0000], grad_fn=<SqueezeBackward1>)
Since we want to do 4-category classification, we will want to have a logistic regression head to compute logits.
tensor([[-0.1173, 0.8585, 0.4113, 0.3788]], grad_fn=<AddmmBackward0>)
Once sufficient training data has been collected, we can utilize an appropriate loss function, optimizer and other elements of neural networks to train a complete sequence classifier.
PyTorch provides a built-in RNN. It performs batch processing on sequences.
The first returned value all_states
is a tensor containing the states produced by RNN for every state of the reduce operation. Since the input sequence has length of 5, all_states
will contain 5 state vectors. Thus, its shape is (1, 5, 10)
.
The general case of the all_states
shape is
(batch_size, length, dim_out)
if we use batch_first=True
.(length, batch_size, dim_out)
if we use batch_first=False
(default).The second return value final_state
is the final state vector, which corresponds to the very last state vector in the all_states
tensor.
The shape of final_state
is given by (num_layers, batch_size, dim_state)
. In our case, it is (1, 1, 10)
.
tensor([[True, True, True, True, True, True, True, True, True, True]])
We can verify that the last state vector in all_states
is the final state vector final_state
.