Cyan's Blog

Search

Search IconIcon to open search

Part.22_Model_Representation-Neural_Network(ML_Andrew.Ng.)

Last updated Sep 11, 2021 Edit Source

# Model Representation - NN

2021-09-11

Tags: #NeuralNetwork #MachineLearning

# Hypothesis

一个神经元(Neuron / Activation Unit)的输出计算公式由如下公式给出: hΘ(x)=a=g(x0θ0+x1θ1++xnθn)h_\Theta(x)= a = g(x_0\theta_0+x_1\theta_1+\cdots+x_n\theta_n) 是线性的. (g(x)g(x)是Sigmoid Function)

神经网络的结构如下: 可以看出, 每一层(Layer)都有许多节点组成, 所有的e节点一层层组成一个网络, 构成了比线性Hypothesis更复杂的结构.

进一步思考: 为什么一定要是一层一层的呢? 为什么不是图结构的呢?

每一个节点的权重(Weights, 即θ\theta)都构成一个行向量, 所有的行向量组成这一层的权重矩阵Θ(i)\Theta^{(i)}

以上图为例: a1(2)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3)a2(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)a3(2)=g(Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3)hΘ(x)=a1(3)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2))\begin{aligned} a_{1}^{(2)} &=g\left(\Theta_{10}^{(1)} x_{0}+\Theta_{11}^{(1)} x_{1}+\Theta_{12}^{(1)} x_{2}+\Theta_{13}^{(1)} x_{3}\right) \\ a_{2}^{(2)} &=g\left(\Theta_{20}^{(1)} x_{0}+\Theta_{21}^{(1)} x_{1}+\Theta_{22}^{(1)} x_{2}+\Theta_{23}^{(1)} x_{3}\right) \\ a_{3}^{(2)} &=g\left(\Theta_{30}^{(1)} x_{0}+\Theta_{31}^{(1)} x_{1}+\Theta_{32}^{(1)} x_{2}+\Theta_{33}^{(1)} x_{3}\right) \\ h_{\Theta}(x) &=a_{1}^{(3)}=g\left(\Theta_{10}^{(2)} a_{0}^{(2)}+\Theta_{11}^{(2)} a_{1}^{(2)}+\Theta_{12}^{(2)} a_{2}^{(2)}+\Theta_{13}^{(2)} a_{3}^{(2)}\right) \end{aligned}

The dimensions of these matrices of weights is determined as follows:

If network has sjs_j units in layer jj and sj+1s_{j+1} units in layer j+1j+1, then Θ(j)Θ^{(j)} will be of dimension sj+1×(sj+1)s_{j+1}×(s_j+1). If network has sjs_j units in layer jj and sj+1s_{j+1} units in layer j+1j+1, then Θ(j)\Theta^{(j)} will be of dimension sj+1×(sj+1)s_{j+1} \times (s_j + 1).

The +1+1 comes from the addition in Θ(j)\Theta^{(j)} of the “bias nodes,” x0x_0 and Θ0(j)\Theta_0^{(j)}. In other words the output nodes will not include the bias nodes while the inputs will.