Cyan's Blog

Search

Search IconIcon to open search

Part.29_Fisher_Linear_Discriminant(Pattern_Classification-Chapter_4)

Last updated Oct 28, 2021 Edit Source

# Fisher Linear Discriminant

2021-10-28

Tags: #MachineLearning #PatternClassification #Course #DimensionalityReduction

# Motivation

# Interlude - Linear Transformation & Dot Product

Dot_Product_and_Linear_Transformation-向量内积与线性变换

# Interlude - Covariance and Covariance Matrix

协方差矩阵_Covariance_Matrix

# Intuition

https://sthalles.github.io/fisher-linear-discriminant/

如果我们直接投影到样本均值的连线的方向的话, 可以看到将会有很多的重合: fisher-ld generator network|300 fisher-ld generator network|300

Fisher的方法基于以下直觉:

fisher-ld generator network 所以我们这样构造准则函数(Criterion Function):

fisher-ld generator network|500 我们需要找到使$J(w)$取得最大值的$w$, 即找到最优的投影方向.

# 详细推导

# 构造准则函数

# ==最大化==准则函数

# 将w提出来

# 关于两个矩阵

We call $S_{W}$ the within-class scatter matrix. It is proportional to the sample covariance matrix for the pooled $d$-dimensional data. It is symmetric and positive semi-definite, and is usually non-singular if $n>d$. Likewise, $\mathbf{S}{B}$ is called the between class scatter matrix. It is also symmetric and positive semi-definite, but because it is the outer product of two vectors, its rank is at most one. In particular, for any $\mathrm{w}$, $\mathbf{S}{B} \mathbf{w}$ is in the direction of $\mathbf{m}{1}-\mathbf{m}{2}$, and $\mathbf{S}_{B}$ is quite singular.

# 解$\mathbf{w}$

解$\mathbf{w}$需要用到拉格朗日乘子法: 思路: 用拉格朗日乘子法得到以下条件 $$\mathbf{S}{B} \mathbf{w}=\lambda \mathbf{S}{W} \mathbf{w}$$ If $\mathbf{S}{W}$ is non-singular we can obtain a conventional eigenvalue problem by writing $$\mathbf{S}{W}^{-1} \mathbf{S}{B} \mathbf{w}=\lambda \mathbf{w}$$ In our particular case, it is unnecessary to solve for the eigenvalues and eigenvectors of $\mathbf{S}{W}^{-1} \mathbf{S}_{B}$ due to the fact that $\mathbf{S_B w}$ is always in the direction of $m_1 −m_2$. Since the scale factor for $\mathbf{w}$ is immaterial, we can immediately write the solution for the $\mathbf{w}$ that optimizes $J(·)$:

$$\mathbf{w}=\mathbf{S}{W}^{-1}\left(\mathbf{m}{1}-\mathbf{m}_{2}\right)$$

详细过程

# 可以将这个方法推广到多维的情况

推广到高维:2 我们需要改变以下地方:

若将 W 视为一个投影矩阵,则多分类 LDA 将样本投影到 N-1 维空间,N-1 通常远小子数据原有的属性数. 于是,可通过这个投影来减小样本点的维数,且投影过程中使用了类别信息, 因此 LDA 也常被视为一种经典的监督降维技术3


  1. https://en.wikipedia.org/wiki/Linear_discriminant_analysis ↩︎

  2. https://sthalles.github.io/fisher-linear-discriminant/ ↩︎

  3. 周志华 机器学习 ↩︎