Cyan's Blog

Search

Search IconIcon to open search

D2L-74-Transformer

Last updated Apr 27, 2022 Edit Source

# Transformer

2022-04-27

Tags: #Transformer #Attention #DeepLearning

1

# Motivation

# 整体架构

# Encoder Block

300

# Decoder Block

300

# Attention: 3 different kinds

499

# Position-wise Feed-Forward Networks

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class PositionWiseFFN(nn.Module):
    """基于位置的前馈网络"""
    def __init__(self, ffn_num_input, ffn_num_hiddens, ffn_num_outputs,
                 **kwargs):
        super(PositionWiseFFN, self).__init__(**kwargs)
        self.dense1 = nn.Linear(ffn_num_input, ffn_num_hiddens)
        self.relu = nn.ReLU()
        self.dense2 = nn.Linear(ffn_num_hiddens, ffn_num_outputs)

    def forward(self, X):
        return self.dense2(self.relu(self.dense1(X)))
1
2
# D2L里面的例子
ffn_num_input, ffn_num_hiddens = 32, 64

原论文的说明 FFN dimensions

# Add & Norm

# Residual Connection

# Layer Norm

# 实现

1
2
3
4
5
6
7
8
9
class AddNorm(nn.Module):
    """残差连接后进行层规范化"""
    def __init__(self, normalized_shape, dropout, **kwargs):
        super(AddNorm, self).__init__(**kwargs)
        self.dropout = nn.Dropout(dropout)
        self.ln = nn.LayerNorm(normalized_shape)

    def forward(self, X, Y):
        return self.ln(self.dropout(Y) + X)

# Input Preprocessing: Positional Encoding & Embedding

对于输入我们需要做两件事:

# 实现

1
2
3
4
# Since positional encoding values are between -1 and 1, the embedding
# values are multiplied by the square root of the embedding dimension
# to rescale before they are summed up
X = self.pos_encoding(self.embedding(X) * math.sqrt(self.num_hiddens))

# Putting it all together

More illustrations:4

# Further Development


  1. 10.7. Transformer — Dive into Deep Learning 0.17.5 documentation ↩︎

  2. 10.7. Transformer — 动手学深度学习 2.0.0-beta0 documentation ↩︎

  3. 10.7. Transformer — 动手学深度学习 2.0.0-beta0 documentation ↩︎

  4. The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time. ↩︎

  5. Google AI Blog: Transformer: A Novel Neural Network Architecture for Language Understanding ↩︎