D2L-5-拓展链式法则
# 拓展的求导链式法则
Tags: #Math #Derivative
- 从标量到向量, 不仅符号需要对应上, 矩阵的形状也需要对应上
$$\begin{align} &\frac{\partial y}{\partial \mathbf{x}}=\frac{\partial y}{\partial u} \frac{\partial u}{\partial \mathbf{x}}\\ &\small{(1, n)}\quad{(1,1)(1,n)} \end{align}$$
$$\begin{align} &\frac{\partial y}{\partial \mathbf{x}}=\frac{\partial y}{\partial \mathbf{u}} \frac{\partial \mathbf{u}}{\partial \mathbf{x}}\\ &\small{(1, n)}\quad{(1,k)(k,n)} \end{align}$$
$$\begin{align} &\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\frac{\partial \mathbf{y}}{\partial \mathbf{u}} \frac{\partial \mathbf{u}}{\partial \mathbf{x}}\\ &\small{(m, n)}\quad{(m,k)(k,n)} \end{align}$$
# 例子: 线性回归
# 单个样本点的损失
$$\mathbf{x}, \mathbf{w} \in \mathbb{R}^{n}, \quad y \in \mathbb{R},\quad z=(\langle\mathbf{x}, \mathbf{w}\rangle-y)^{2}$$
- 计算: $$\frac{\partial z}{\partial \mathbf{w}}$$
解:
- 利用链式法则:
首先进行变量替换:
- $a=\langle\mathbf{x}, \mathbf{w}\rangle$ , 为标量
- $b=\langle\mathbf{x}, \mathbf{w}\rangle-y=a-y$, 为标量
- $z=a^2$, 为标量
我们有: $$\begin{aligned} \frac{\partial z}{\partial \mathbf{w}} &=\frac{\partial z}{\partial b} \frac{\partial b}{\partial a} \frac{\partial a}{\partial \mathbf{w}} \\ &=\frac{\partial b^{2}}{\partial b} \frac{\partial (a-y)}{\partial a} \frac{\partial\langle\mathbf{x}, \mathbf{w}\rangle}{\partial \mathbf{w}} \\ (\because \langle\mathbf{x}, \mathbf{w}\rangle=\mathbf{x^T\cdot w})&=2 b \cdot 1 \cdot \mathbf{x}^{T} \\ &=2(\langle\mathbf{x}, \mathbf{w}\rangle-y) \mathbf{x}^{T} \end{aligned}$$
# n个样本点的损失
$$\mathbf{X} \in \mathbb{R}^{m \times n}, \quad \mathbf{w} \in \mathbb{R}^{n}, \quad \mathbf{y} \in \mathbb{R}^{m}$$ $$ z=|\mathbf{X} \mathbf{w}-\mathbf{y}|^{2} $$
- 计算: $$\frac{\partial z}{\partial \mathbf{w}}$$
解:
- 利用链式法则:
首先进行变量替换:
- $\mathbf a=\mathbf{Xw},\quad \mathbf{a} \in \mathbb{R}^{m}$
- $\mathbf b=\mathbf{a-y,\quad b} \in \mathbb{R}^{m}$
- $z=|\mathbf{b}|^2$, 为标量
我们有: $$\begin{aligned} \frac{\partial z}{\partial \mathbf{w}} &=\frac{\partial z}{\partial \mathbf{b}} \frac{\partial \mathbf{b}}{\partial \mathbf{a}} \frac{\partial \mathbf{a}}{\partial \mathbf{w}} \\ &=\frac{\partial|\mathbf{b}|^{2}}{\partial \mathbf{b}} \frac{\partial \mathbf{a}-\mathbf{y}}{\partial \mathbf{a}} \frac{\partial \mathbf{X} \mathbf{w}}{\partial \mathbf{w}} \\ &=2 \mathbf{b}^{T} \times \mathbf{I} \times \mathbf{X} \\ &=2(\mathbf{X} \mathbf{w}-\mathbf{y})^{T} \mathbf{X} \end{aligned}$$ 检查维度是否匹配: $$\begin{aligned} \frac{\partial z}{\partial \mathbf{w}} &=\frac{\partial z}{\partial \mathbf{b}} \frac{\partial \mathbf{b}}{\partial \mathbf{a}} \frac{\partial \mathbf{a}}{\partial \mathbf{w}} \\ \small{\frac{1\times 1}{n\times 1}}&=\small{\frac{1\times 1}{m\times 1}\frac{m\times 1}{m\times 1}\frac{m\times 1}{n\times 1}}\\small{1\times n}&=\small{(1\times m)\space (m\times m) (m\times n)}\end{aligned}$$