正则项不影响线性回归损失函数的凸性
# 正则项不影响线性回归损失函数的凸性
Tags: #MachineLearning #Regularization #GradientDescent #LinearRegression #ConvexOptimization
- Question: 加上正则项以后函数还是凸的吗? 梯度下降还适用吗?
- 还是适用的, 证明如下
# 首先, 如何证明一个函数为凸函数?
如果$f$是二阶可微的,那么如果$f$的定义域是凸集,并且$\forall x\in dom(f), \nabla^2 f(x)\geqslant0$,那么$f$ 就是一个凸函数.1
- 严格凸函数则要求二阶导数恒大于零
- $dom(f)$意指函数$f$的定义域(Domian)
# 我们首先证明没有正则项的$J(\theta)$是凸的
$$\begin{aligned} \frac{\partial}{\partial \theta_{j}} J(\theta)
&=\frac{1}{2 m} \sum_{i=1}^{m} \frac{\partial}{\partial \theta_{j}} \left(h_{\theta}(x)-y\right)^{2} \
&=\frac{1}{2 m} \sum_{i=1}^{m} 2 \left(h_{\theta}(x)-y\right) \cdot \frac{\partial}{\partial \theta_{j}}\left(h_{\theta}(x)-y\right) \
&=\frac{1}{ m} \sum_{i=1}^{m} \left(h_{\theta}(x)-y\right) x_j \\ \end{aligned}$$
$$\begin{aligned} \frac{\partial^2}{\partial \theta_{j}^2} J(\theta)
&=\frac{\partial}{\partial \theta_{j}}\left(\frac{\partial}{\partial \theta_{j}} J(\theta)\right) \
&=\frac{1}{ m} \sum_{i=1}^{m}\frac{\partial}{\partial \theta_{j}} \left(h_{\theta}(x)-y\right) x_j \
&=\frac{1}{ m} \sum_{i=1}^{m}x_j^2 \\ \end{aligned}$$ 显然是凸的.
# 然后加上正则项
$$\begin{aligned} \frac{\partial}{\partial \theta_{j}} J(\theta)
&=\frac{1}{2 m} \left[\sum_{i=1}^{m} \frac{\partial}{\partial \theta_{j}} \left(h_{\theta}(x)-y\right)^{2} +\lambda \frac{\partial}{\partial \theta_{j}} \sum_{i=1}^{n} \theta_{i}^{2}\right]\
&=\frac{1}{2 m}\left[ \sum_{i=1}^{m} 2 \left(h_{\theta}(x)-y\right) x_j+2\lambda\theta_{j}\right]\
&=\lambda\theta_{j}+\frac{1}{ m} \sum_{i=1}^{m} \left(h_{\theta}(x)-y\right) x_j \\ \end{aligned}$$
$$\begin{aligned} \frac{\partial^2}{\partial \theta_{j}^2} J(\theta)
&=\frac{\partial}{\partial \theta_{j}}\left(\frac{\partial}{\partial \theta_{j}} J(\theta)\right) \
&=\frac{\partial}{\partial \theta_{j}}\left(\lambda\theta_{j}+\frac{1}{ m} \sum_{i=1}^{m} \left(h_{\theta}(x)-y\right) x_j\right) \
&=\lambda+\frac1{m} \sum_{i=1}^m x_j^2 \end{aligned}$$ 当然在$\lambda>0$的时候上式恒大于零, 根据上面的定理, 损失函数一定是凸函数, 证毕.
更详细的推导详见知乎文章: https://zhuanlan.zhihu.com/p/210252556 ↩︎