Search
Search Icon Icon to open search Part.22_Model_Representation-Neural_Network(ML_Andrew.Ng.) Last updated
Sep 11, 2021
Edit Source
Table of Contents Hypothesis # Model Representation - NN2021-09-11
Tags: #NeuralNetwork #MachineLearning
# Hypothesis
一个神经元(Neuron / Activation Unit)的输出计算公式由如下公式给出:
h Θ ( x ) = a = g ( x 0 θ 0 + x 1 θ 1 + ⋯ + x n θ n ) h_\Theta(x)=
a = g(x_0\theta_0+x_1\theta_1+\cdots+x_n\theta_n) h Θ ( x ) = a = g ( x 0 θ 0 + x 1 θ 1 + ⋯ + x n θ n )
是线性的. (g ( x ) g(x) g ( x ) 是Sigmoid Function)
神经网络的结构如下:
可以看出, 每一层(Layer)都有许多节点组成, 所有的e节点一层层组成一个网络, 构成了比线性Hypothesis更复杂的结构.
进一步思考: 为什么一定要是一层一层的呢? 为什么不是图结构的呢?
每一个节点的权重(Weights, 即θ \theta θ )都构成一个行向量, 所有的行向量组成这一层的权重矩阵Θ ( i ) \Theta^{(i)} Θ ( i )
以上图为例:
a 1 ( 2 ) = g ( Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) x 1 + Θ 12 ( 1 ) x 2 + Θ 13 ( 1 ) x 3 ) a 2 ( 2 ) = g ( Θ 20 ( 1 ) x 0 + Θ 21 ( 1 ) x 1 + Θ 22 ( 1 ) x 2 + Θ 23 ( 1 ) x 3 ) a 3 ( 2 ) = g ( Θ 30 ( 1 ) x 0 + Θ 31 ( 1 ) x 1 + Θ 32 ( 1 ) x 2 + Θ 33 ( 1 ) x 3 ) h Θ ( x ) = a 1 ( 3 ) = g ( Θ 10 ( 2 ) a 0 ( 2 ) + Θ 11 ( 2 ) a 1 ( 2 ) + Θ 12 ( 2 ) a 2 ( 2 ) + Θ 13 ( 2 ) a 3 ( 2 ) ) \begin{aligned}
a_{1}^{(2)} &=g\left(\Theta_{10}^{(1)} x_{0}+\Theta_{11}^{(1)} x_{1}+\Theta_{12}^{(1)} x_{2}+\Theta_{13}^{(1)} x_{3}\right) \\ a_{2}^{(2)} &=g\left(\Theta_{20}^{(1)} x_{0}+\Theta_{21}^{(1)} x_{1}+\Theta_{22}^{(1)} x_{2}+\Theta_{23}^{(1)} x_{3}\right) \\ a_{3}^{(2)} &=g\left(\Theta_{30}^{(1)} x_{0}+\Theta_{31}^{(1)} x_{1}+\Theta_{32}^{(1)} x_{2}+\Theta_{33}^{(1)} x_{3}\right) \\ h_{\Theta}(x) &=a_{1}^{(3)}=g\left(\Theta_{10}^{(2)} a_{0}^{(2)}+\Theta_{11}^{(2)} a_{1}^{(2)}+\Theta_{12}^{(2)} a_{2}^{(2)}+\Theta_{13}^{(2)} a_{3}^{(2)}\right)
\end{aligned} a 1 ( 2 ) a 2 ( 2 ) a 3 ( 2 ) h Θ ( x ) = g ( Θ 10 ( 1 ) x 0 + Θ 11 ( 1 ) x 1 + Θ 12 ( 1 ) x 2 + Θ 13 ( 1 ) x 3 ) = g ( Θ 20 ( 1 ) x 0 + Θ 21 ( 1 ) x 1 + Θ 22 ( 1 ) x 2 + Θ 23 ( 1 ) x 3 ) = g ( Θ 30 ( 1 ) x 0 + Θ 31 ( 1 ) x 1 + Θ 32 ( 1 ) x 2 + Θ 33 ( 1 ) x 3 ) = a 1 ( 3 ) = g ( Θ 10 ( 2 ) a 0 ( 2 ) + Θ 11 ( 2 ) a 1 ( 2 ) + Θ 12 ( 2 ) a 2 ( 2 ) + Θ 13 ( 2 ) a 3 ( 2 ) )
The dimensions of these matrices of weights is determined as follows:
If network has s j s_j s j units in layer j j j and s j + 1 s_{j+1} s j + 1 units in layer j + 1 j+1 j + 1 , then Θ ( j ) Θ^{(j)} Θ ( j ) will be of dimension s j + 1 × ( s j + 1 ) s_{j+1}×(s_j+1) s j + 1 × ( s j + 1 ) . If network has s j s_j s j units in layer j j j and s j + 1 s_{j+1} s j + 1 units in layer j + 1 j+1 j + 1 , then Θ ( j ) \Theta^{(j)} Θ ( j ) will be of dimension s j + 1 × ( s j + 1 ) s_{j+1} \times (s_j + 1) s j + 1 × ( s j + 1 ) .
The + 1 +1 + 1 comes from the addition in Θ ( j ) \Theta^{(j)} Θ ( j ) of the “bias nodes,” x 0 x_0 x 0 and Θ 0 ( j ) \Theta_0^{(j)} Θ 0 ( j ) . In other words the output nodes will not include the bias nodes while the inputs will.