Cyan's Blog

Search

Search IconIcon to open search

Part.20_Regularized_Logistic_Regression(ML_Andrew.Ng.)

Last updated Sep 11, 2021 Edit Source

# Regularized Logistic Regression

2021-09-11

Tags: #MachineLearning #LogisticRegression #Regularization

# 回顾一下没有正则化的情况

# 损失函数

更简洁的形式 $$\begin{align} J(\theta) &=-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right]\\ \end{align}$$ or 向量化的 $$\begin{aligned} h&=g(X \theta) \\ J(\theta)&=-\frac{1}{m} \cdot\left[y^{T} \log (h)+(1-y)^{T} \log (1-h)\right] \end{aligned}$$

# 梯度下降公式

结果 $$\begin{aligned} \frac{\partial}{\partial \theta_{j}} J(\theta) &=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)})-y^{(i)}\right) x^{(i)}_{j} \end{aligned}$$


# 正则化以后的损失函数

$$\begin{align} J(\theta) &=-\frac{1}{m} \sum_{i=1}^{m} \left[y^{(i)} \log \left(h\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h\left(x^{(i)}\right)\right)\right] +\frac\lambda{2m}\sum^n_{i=1}\theta_i^2\\ &=P(\theta)+\frac\lambda{2m}\sum^n_{i=1}\theta_i^2 \end{align}$$ 需要注意的就是正则项应该是加上去的, 原理损失函数前面的负号是为了"反转"$log$函数.

# 正则化以后的梯度下降

在计算偏导数的时候, 利用偏导数的性质, 我们只需要在最后加上正则项的偏导数即可: $$\begin{aligned} \frac{\partial}{\partial \theta_{j}} J(\theta) &=\frac{\partial}{\partial \theta_{j}}P(\theta)+ \frac\lambda{2m}\frac{\partial}{\partial \theta_{j}}\sum^n_{i=1}\theta_i^2\\ &=\frac{\partial}{\partial \theta_{j}}P(\theta)+ \frac\lambda{m}\theta_{j} \\ &=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)})-y^{(i)}\right) x^{(i)}{j} +\frac\lambda{m}\theta{j} \end{aligned}$$

带入梯度更新公式有: $$ \begin{aligned} Re&peat\space {\\ &\theta_{0}:=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)} \\ &\theta_{j}:=\theta_{j}-\alpha\left[ \frac{1}{m} \sum_{i=1}^{m} \left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}+\frac{\lambda}{m} \theta_{j}\right] \quad\quad j \in{1,2 \ldots n} \\ } \end{aligned} $$ 要是把方括号打开, 第二行的更新公式可以变为: $$ \theta_{j}:=\theta_{j}\left(1-\alpha\frac\lambda m\right) -\alpha\frac{1}{m} \sum_{i=1}^{m} \left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} \quad\quad j \in{1,2 \ldots n} $$ 因为$\left(1-\alpha\frac\lambda m\right)$一定小于1, 所以这个更新公式每次都会缩小一点点$\theta_i$, 而公式的后半部分和没有正则化的公式是完全一样的.

# 证明正则以后还是凸的

正则项不影响Logistic回归损失函数凸性