RNN中output和hidden_state的区别
# Difference between output
and hidden_state
in RNN
Tags: #RNN
- 首先要将RNN理解为一个二维的网络, 它不仅可能有多个隐藏层, 还在时间维度上有多个时间步.
output
是 最后一层隐藏层 在 每一个时间步 的状态hidden_state
是 最后一个时间步 所有隐藏层 的状态output
常常被用作Encoder-Decoder架构里面Attention的输入hidden_state
常常在Encoder-Decoder架构里面被用来初始化Decoder隐藏状态
# Bidirectional Case
The
output
will give you the hidden layer outputs of the network for each time-step, but only for the final layer. This is useful in many applications, particularly encoder-decoders using attention. (These architectures build up a ‘context’ layer from all the hidden outputs, and it is extremely useful to have them sitting around as a self-contained unit.)The
h_n
will give you the hidden layer outputs for the last time-step only, but for all the layers. Therefore, if and only if you have a single layer architecture,h_n
is a strict subset ofoutput
. Otherwise,output
andh_n
intersect, but are not strict subsets of one another. (You will often want these, in an encoder-decoder model, from the encoder in order to jumpstart the decoder.)If you are using a bidirectional output and you want to actually verify that part of
h_n
is contained inoutput
(and vice-versa) you need to understand what PyTorch does behind the scenes in the organization of the inputs and outputs. Specifically, it concatenates a time-reversed input with the time-forward input and runs them together. This is literal. This means that the ‘forward’ output at time T is in the final position of theoutput
tensor sitting right next to the ‘reverse’ output at time 0; if you’re looking for the ‘reverse’ output at time T, it is in the first position.1