Cyan's Blog

Search

Search IconIcon to open search

Normal_Equation_Proof_2_Matrix_Method

Last updated Oct 15, 2022 Edit Source

# 首先补充一点矩阵的知识:求导、迹的性质

矩阵的求导和矩阵的迹是密不可分的

# 矩阵的求导

矩阵的求导

# 矩阵的迹

矩阵迹的性质

# 证明中需要的一些其他性质

结合矩阵的求导, 还有以下性质:

$$\begin{align} \nabla_{A^T} f(A) &=\left[\begin{array}{ccc} \frac{\partial f}{\partial A^T_{11}} & \cdots & \frac{\partial f}{\partial A^T_{1 n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f}{\partial A^T_{m 1}} & \cdots & \frac{\partial f}{\partial A^T_{m n}} \end{array}\right]\\ &=\left[\begin{array}{ccc} \frac{\partial f}{\partial A_{11}} & \cdots & \frac{\partial f}{\partial A_{1 n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f}{\partial A_{m 1}} & \cdots & \frac{\partial f}{\partial A_{m n}} \end{array}\right]^T\\ &=\left(\nabla_{A} f(A)\right)^{T} \end{align}$$

# 然后是证明

内积的另一种表述: $z^{T} z=\sum_{i} z_{i}^{2}$ : $$ \begin{aligned} \frac{1}{2}(X \theta-\vec{y})^{T}(X \theta-\vec{y}) &=\frac{1}{2} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2} \\ &=J(\theta) \end{aligned} $$ 为了最小化 $J$, 我们对 $\theta$求导. 结合上面的补充性质, 我们有: $$ \nabla_{A^{T} } \operatorname{tr} A B A^{T} C=B^{T} A^{T} C^{T}+B A^{T} C $$ 这个在下面会用到. 求导有: $$\begin{aligned} \nabla_{\theta} J(\theta) &=\nabla_{\theta} \frac{1}{2}(X \theta-\vec{y})^{T}(X \theta-\vec{y}) \\ \text{(展开)}&=\frac{1}{2} \nabla_{\theta}\left(\theta^{T} X^{T} X \theta-\theta^{T} X^{T} \vec{y}-\vec{y}^{T} X \theta+\vec{y}^{T} \vec{y}\right) \\ \text{(标量的迹就是它自己)}&=\frac{1}{2} \nabla_{\theta} \operatorname{tr}\left(\theta^{T} X^{T} X \theta-\theta^{T} X^{T} \vec{y}-\vec{y}^{T} X \theta+\vec{y}^{T} \vec{y}\right) \\ &=\frac{1}{2} \nabla_{\theta}\left(\operatorname{tr} \theta^{T} X^{T} X \theta-2 \operatorname{tr} \vec{y}^{T} X \theta\right) \\ \text{(利用上面的推论)}&=\frac{1}{2}\left(X^{T} X \theta+X^{T} X \theta-2 X^{T} \vec{y}\right) \\ &=X^{T} X \theta-X^{T} \vec{y} \end{aligned}$$

零导数为零, 有$X^{T} X \theta=X^{T} \vec{y}$ , 所以 $\theta=(X^{T} X )^{-1}X^{T} \vec{y}$ 证毕.