Cyan's Blog

Search

Search IconIcon to open search

D2L-4-矩阵求导

Last updated Feb 1, 2022 Edit Source

# 矩阵求导

2022-02-01

Tags: #Math #Matrix

# 从标量到向量

400 其中 $\Large{\frac{\partial y}{\partial x}, \frac{\partial \mathbf y}{\partial x}}$都很好理解, 尤其需要注意的是当求导的自变量$\mathbf x$为向量的时候, 为 $$\mathbf{x}=\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right] \quad \frac{\partial y}{\partial \mathbf{x}}=\left[\frac{\partial y}{\partial x_{1}}, \frac{\partial y}{\partial x_{2}}, \ldots, \frac{\partial y}{\partial x_{n}}\right]$$ 结果变成了一个行向量.

# Useful Results

# $\frac{\partial |\mathbf{x}|^2}{\partial \mathbf x}=2\mathbf{x^T}$

$$\mathbf{x}=\left[\begin{array}{c} x_{1} \x_{2} \\vdots \x_{n} \end{array}\right]\quad |\mathbf{x}|^2=\sum^n_1x_i^2$$ $$\begin{aligned} \frac{\partial|\mathbf{x}|^2}{\partial \mathbf{x}}&= \left[\frac{\partial x_i^2}{\partial x_{1}}, \frac{\partial x_i^2}{\partial x_{2}}, \ldots, \frac{\partial x_i^2}{\partial x_{n}}\right]\\ &=\left[2x_1, 2x_{2}, \ldots, 2x_{n}\right]\\ &=2\mathbf{x^T} \end{aligned}$$

# 进一步

而当$\mathbf{x, y}$都是向量的时候, 可以这样理解: $$\begin{aligned} &\mathbf{x}=\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right] \quad \mathbf{y}=\left[\begin{array}{c} y_{1} \\ y_{2} \\ \vdots \\ y_{m} \end{array}\right] \\ \end{aligned}$$ $$\frac{\partial \mathbf{y}}{\partial \mathbf{x}}= \left[\begin{array}{c} \frac{\partial y_{1}}{\partial \mathbf{x}} \\ \frac{\partial y_{2}}{\partial \mathbf{x}} \\ \vdots \\ \frac{\partial y_{m}}{\partial \mathbf{x}} \end{array}\right]=\begin{bmatrix} \frac{\partial y_{1}}{\partial x_{1}}& \frac{\partial y_{1}}{\partial x_{2}}& \ldots& \frac{\partial y_{1}}{\partial x_{n}} \\ \frac{\partial y_{2}}{\partial x_{1}}& \frac{\partial y_{2}}{\partial x_{2}}& \ldots& \frac{\partial y_{2}}{\partial x_{n}} \\ &&\vdots \\ \frac{\partial y_{m}}{\partial x_{1}}& \frac{\partial y_{m}}{\partial x_{2}}& \ldots& \frac{\partial y_{m}}{\partial x_{n}} \end{bmatrix}$$

也就是说: $$\mathbf{x} \in \mathbb{R}^{n}, \quad \mathbf{y} \in \mathbb{R}^{m}, \quad \frac{\partial \mathbf{y}}{\partial \mathbf{x}} \in \mathbb{R}^{m \times n}$$

# 直观含义

求导后得到梯度向量, 为增长最快的方向

# Useful Results

# $\frac{\partial \mathbf x}{\partial \mathbf x}=I$

$$\frac{\partial \mathbf{x}}{\partial \mathbf{x}}= \begin{bmatrix} \frac{\partial x_{1}}{\partial \mathbf{x}} \\ \frac{\partial x_{2}}{\partial \mathbf{x}} \\ \vdots \\ \frac{\partial x_{m}}{\partial \mathbf{x}} \end{bmatrix}=\begin{bmatrix} \frac{\partial x_{1}}{\partial x_{1}}& \frac{\partial x_{1}}{\partial x_{2}}& \ldots& \frac{\partial x_{1}}{\partial x_{n}} \\ \frac{\partial x_{2}}{\partial x_{1}}& \frac{\partial x_{2}}{\partial x_{2}}& \ldots& \frac{\partial x_{2}}{\partial x_{n}} \\ &&\vdots \\ \frac{\partial x_{m}}{\partial x_{1}}& \frac{\partial x_{m}}{\partial x_{2}}& \ldots& \frac{\partial x_{m}}{\partial x_{n}} \end{bmatrix}= \begin{bmatrix} 1 &0 &\ldots &0 \\ 0 &1 &\ldots &0 \\ \vdots &\vdots&\ddots &\vdots \\ 0 &0 &\ldots &1 \end{bmatrix} $$

# $\frac{\partial \mathbf{Ax}}{\partial \mathbf x}=\mathbf A$

$$\mathbf{x} \in \mathbb{R}^{n}, \quad \mathbf{A} \in \mathbb{R}^{m\times n}$$ 令$\mathbf r_i$代表矩阵$\mathbf A$的行向量, 用 $\langle \mathbf{a} , \mathbf{b} \rangle$表示内积, 第二行为了简化表示, 使用了 Einstein Notation:

$$\begin{aligned}\frac{\partial \mathbf{Ax}}{\partial \mathbf{x}}&= \begin{bmatrix} \frac{\partial \langle r_1 , \mathbf x\rangle}{\partial \mathbf{x}} \\ \frac{\partial \langle r_2 , \mathbf x\rangle}{\partial \mathbf{x}} \\ \vdots \\ \frac{\partial \langle r_m , \mathbf x\rangle}{\partial \mathbf{x}} \\ \end{bmatrix}\&= \large{\begin{bmatrix} \frac{\partial a_{1i}x_i}{\partial x_{1}}& \frac{\partial a_{1i}x_i}{\partial x_{2}}& \ldots& \frac{\partial a_{1i}x_i}{\partial x_{n}} \\ \frac{\partial a_{2i}x_i}{\partial x_{1}}& \frac{\partial a_{2i}x_i}{\partial x_{2}}& \ldots& \frac{\partial a_{2i}x_i}{\partial x_{n}} \\ &&\vdots \\ \frac{\partial a_{mi}x_i}{\partial x_{1}}& \frac{\partial a_{mi}x_i}{\partial x_{2}}& \ldots& \frac{\partial a_{mi}x_i}{\partial x_{n}} \end{bmatrix}}\&= \begin{bmatrix} a_{11} &a_{12} &\ldots &a_{1n} \\ a_{21} &a_{22} &\ldots &a_{2n} \\ \vdots &\vdots&\ddots &\vdots \\ a_{m1} &a_{m2} &\ldots &a_{mn} \end{bmatrix}=\mathbf{A} \end{aligned}$$

# $\frac{\partial \mathbf{x^{T}A}}{\partial \mathbf x}=\mathbf{A^T}$

$$\mathbf{x} \in \mathbb{R}^{n}, \quad \mathbf{A} \in \mathbb{R}^{\color{red}{n\times m}}$$ 令$\mathbf c_i$代表矩阵$\mathbf A$的列向量: $$\begin{aligned}\frac{\partial \mathbf{x^{T}A}}{\partial \mathbf{x}}&= \begin{bmatrix} \frac{\partial \langle c_1 , \mathbf x\rangle}{\partial \mathbf{x}} \\ \frac{\partial \langle c_2 , \mathbf x\rangle}{\partial \mathbf{x}} \\ \vdots \\ \frac{\partial \langle c_n , \mathbf x\rangle}{\partial \mathbf{x}} \\ \end{bmatrix}\&= \large{\begin{bmatrix} \frac{\partial a_{1i}x_i}{\partial x_{1}}& \frac{\partial a_{1i}x_i}{\partial x_{2}}& \ldots& \frac{\partial a_{1i}x_i}{\partial x_{n}} \\ \frac{\partial a_{2i}x_i}{\partial x_{1}}& \frac{\partial a_{2i}x_i}{\partial x_{2}}& \ldots& \frac{\partial a_{2i}x_i}{\partial x_{n}} \\ &&\vdots \\ \frac{\partial a_{mi}x_i}{\partial x_{1}}& \frac{\partial a_{mi}x_i}{\partial x_{2}}& \ldots& \frac{\partial a_{mi}x_i}{\partial x_{n}} \end{bmatrix}}\&= \begin{bmatrix} a_{11} &a_{12} &\ldots &a_{1n} \\ a_{21} &a_{22} &\ldots &a_{2n} \\ \vdots &\vdots&\ddots &\vdots \\ a_{m1} &a_{m2} &\ldots &a_{mn} \end{bmatrix}=\mathbf{A^T} \end{aligned}$$

# 特例: $\frac{\partial \mathbf{x^T}}{\partial \mathbf x}=\mathbf{I}$

$$\frac{\partial \mathbf{x^T}}{\partial \mathbf x}= \frac{\partial\mathbf{x^{T}I}}{\partial \mathbf x}=\mathbf{I^T}=\mathbf{I}$$

# 复合运算

# 从向量到矩阵