Cyan's Blog

Search

Search IconIcon to open search

Linear_Regression&Gradient_Descent

Last updated Oct 15, 2022 Edit Source

%%下面这里我一来就想要写一个最普适的情况, 但是弄得能难懂, 吴恩达在这里比我讲的清晰多了%%

把梯度下降方法应用到我们的线性回归问题里面, 可以得到我们Hypothesis函数参数更新的方法(如何求Cost Function最小值Minimal的方法): $$\begin{align*} \text{repeat until convergence: } \lbrace & \newline \theta_0 := & \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)}) \newline \theta_1 := & \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}\left((h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}_1\right) \newline \rbrace& \end{align*}$$

下面我们用一个一般的形式, 详细解释一下应用的过程:

其中 $\mathrm{m}$ 是数据点的个数, $x^{(i)}, y^{(i)}$ 是单个数据点, 这里$x^{(i)}$应当是一个矢量. $\theta_{0}, \cdots ,\theta_{n}$ 是Hypothesis里面的参数, 也是Cost Function里面的变量.

下面把Cost Function$:J\left(\theta_{0},\cdots ,\theta_{n}\right)$带入上面的梯度下降公式, 得出具体的梯度下降表达式.

首先是对于$\theta_j$的偏导数计算, 即 $\frac{\Large\partial}{\Large\partial \Large\theta_{j}} J(\theta)$ . 为了使思路清晰, 我们先计算对于一个数据点的平方误差的偏导数$\frac{\Large\partial}{\Large\partial \Large\theta_{j}} K(\theta)$, 根据导数的性质, 有$\frac{\Large\partial}{\Large\partial \Large\theta_{j}} J(\theta)=\frac{\Large1}{\Large2 m} \sum_{i=1}^{m} \frac{\Large\partial}{\Large\partial \Large\theta_{j}} K(\theta)$ :

$$ \begin{aligned} \frac{\partial}{\partial \theta_{j}} K(\theta) &=\frac{\partial}{\partial \theta_{j}}\left(\hat{y}-y\right)^{2}\

&=\frac{\partial}{\partial \theta_{j}} \left(h_{\theta}(x)-y\right)^{2} \

&= 2 \left(h_{\theta}(x)-y\right) \cdot \frac{\partial}{\partial \theta_{j}}\left(h_{\theta}(x)-y\right) \

&=2\left(h_{\theta}(x)-y\right) \cdot \frac{\partial}{\partial \theta_{j}}\left(\sum_{i=0}^{n} \theta_{i} f_i(x)-y\right) \

&=2\left(h_{\theta}(x)-y\right) f_{j}(x)

\end{aligned} $$ 带入所有样本点, 计算$\frac{\Large\partial}{\Large\partial \Large\theta_{j}} J(\theta)$: $$ \begin{aligned} \frac{\partial}{\partial \theta_{j}} J(\theta) &=\frac{1}{2 m} \sum_{i=1}^{m} \frac{\partial}{\partial \theta_{j}} K(\theta)\

&=\frac{1}{2 m} \sum_{i=1}^{m} 2\left(h_{\theta}(x^{(i)})-y^{(i)}\right) f_{j}(x^{(i)}) \

&=2\cdot\frac{1}{2}\cdot \frac 1 m \sum_{i=1}^{m} \left(h_{\theta}(x^{(i)})-y^{(i)}\right) f_{j}(x^{(i)}) \

&=\frac 1 m \sum_{i=1}^{m} \left(h_{\theta}(x^{(i)})-y^{(i)}\right) f_{j}(x^{(i)}) \

\end{aligned} $$

对于我们的线性回归问题, : $$\frac{\partial}{\partial \theta_{j}} J(\theta) =\frac 1 m \sum_{i=1}^{m} \left(h_{\theta}(x^{(i)})-y^{(i)}\right)x_j^{(i)} $$ 再乘上学习率$\alpha$, 即是每一次参数$\theta$变化的大小, 与旧参数相减即得到新的参数.