Cyan's Blog

Search

Search IconIcon to open search

Johnson Lindenstrauss Lemma - Publish Version

Last updated Dec 3, 2021 Edit Source

# Johnson Lindenstrauss Lemma

2021-12-03

Tags: #MachineLearning #Math

这是徐亦达老师让我们学习的第一个主题

1

# Study Materials

# Johnson Lindenstrauss Lemma

# Some random facts about JL lemma

  • The JL Lemma says essentially “you give me the error you want, and I’ll give you a low dimensional space that captures the distances upto that error”. It’s also a worst-case pairwise guarantee: for each pair of points, etc etc
  • The SVD essentially promises “you tell me what dimension you want to live in, and I’ll give you the best possible embedding”, where “best” is defined as on average: the total error of true similarity versus projected similarity is minimum.

# Proof

证明的思路如下图所示:

# 用Union Bound推出JL

# Norm Preservation Property

$$\operatorname{Pr}\left[1-\epsilon \leq \frac{|f(v)|}{\sqrt{d}} \leq 1+\epsilon\right] \geq 1-2 / m^{3} .$$

如果我们将结果应用到 $v=x_{j}$ 和任意的 $v=x_{j}-x_{j^{\prime}}$ (对于 $j \neq j^{\prime}$ ). 因为一共有$(^m_2)=O(m^{2})$ 对向量, 所以进一步应用 union bound 可以得到: 任意一对向量不满足上面式子的概率最大为 $2 / \mathrm{m}$, 也就是成功的的概率为$1-2 / \mathrm{m}$.

# Union Bound

Union_Bound-布尔不等式-Boole’s_inequality

For any events $A_{1}, A_{2}, \ldots, A_{n}$, we have $$ P\left(\bigcup_{i=1}^{n} A_{i}\right) \leq \sum_{i=1}^{n} P\left(A_{i}\right) $$

展开就是:

$$ \mathbb{P}\left(A_{1} \bigcup A_{2} \bigcup \cdots\right) \leq \mathbb{P}\left(A_{1}\right)+\mathbb{P}\left(A_{2}\right)+\cdots $$

# Using Union Bound

注意现在有 $O\left(m^{2}\right)$ 对向量 $u, v .$ 根据 union bound, $$ \begin{aligned} & \operatorname{Pr}\left(\exists u, v \text { s.t. the following event fails: }(1-\epsilon)|u-v| \leq|L(u-v)| \leq(1+\epsilon)|u-v|\right) \\ \leq & \sum_{\forall u, v} \operatorname{Pr}\left(\text { s.t. the following event fails: }(1-\epsilon)|u-v| \leq|L(u-v)| \leq(1+\epsilon)|u-v|\right) \\ \leq & \space m^2\times\frac 2 {m^3} \\ =& \frac 2 {m} \end{aligned} $$ 所以总的成功概率为$1-2 / \mathrm{m}$.

# Prove the Norm Preservation Property

# $N(0, \sigma_X^2)+N(0, \sigma_Y^2)=N(0,\sigma_X^2+\sigma_Y^2)$

正态分布_高斯分布_Normal_Distribution-Gaussian_Distribution

# Chi-Squared Distribution - 卡方分布

Chi-Squared_Distribution-卡方分布

正态独立随机变量的平方和是卡方分布的:

$$Y=Z_{1}^{2}+Z_{2}^{2}+\cdots+Z_{n}^{2}$$ then $Y$ is said to have a chi-squared distribution with $n$ degrees of freedom shown by $$Y \sim \chi^{2}(n)$$

# Prove The Norm Preservation Lemma

$$\operatorname{Pr}\left[1-\epsilon \leq \frac{|f(v)|}{\sqrt{d}} \leq 1+\epsilon\right] \geq 1-\frac2 {m^{3}}$$

这个结论实际上是对称的, 我们只证明一半, 即只证明右边那半边的失败概率小于 $1/m^3$ :

$$\operatorname{Pr}\left[\frac{|f(v)|}{\sqrt{d}}\geq 1+\epsilon\right] \leq \frac1 {m^{3}}$$

为了方便, 我们可以平方一下里面的部分:

$$\operatorname{Pr}\left[|f(v)|^{2}>(1+\epsilon)^2 d\right]\leq1/m^3$$

$$|f(v)|^{2}=|Fv|^2=\sum_{i=1}^{d}\left(r_{i}^{T} v\right)^{2}$$

其中 $r_{i}^{T} v$ 是第 $i$ 行与向量 $v$ 的点乘, 即转化后的向量$f(v)$里面的第$i$个元素. 我们令其为 $X_i$, 容易知道 $X_i$ 是 $n$ 个服从标准正态分布的随机变量 $r_i$ 的加权和: $v_1r_{i1}+v_2r_{i2}+\cdots+v_nr_{in}$, 权值是 $v$ 的对应元素, 所以 $X_i\sim N\left(0, \sum_{i} v_{i}^{2}\right)=N(0,1)$. (因为 $\sum_{i} v_{i}^{2}=|v|=1$)

所以上面的式子可以表示为:$$|f(v)|^{2}=\sum_{i=1}^{d} X_{i}^{2}$$

容易见得$|f(v)|^{2}\sim \chi^{2}(d)$, $d$是自由度.

我们先简化一下问题的表述:

令 $Y=|f(v)|^2=\sum_{i=1}^{d} X_{i}^{2}$ , 令 $\alpha=d(1+\epsilon)^{2}$. 需要证明的命题可以表述为:$$\operatorname{Pr}[Y>\alpha] \leq \frac1 {m^{3}} \tag{1}$$

我们下面将证明

$$\operatorname{Pr}[Y>\alpha] \leq \exp \left(-(3 / 4) d \epsilon^{2}\right)\tag{2}$$

在 $d=4 \ln (m) / \epsilon^{2}$ 的时候, $(1)$与$(2)$等价: $$\begin{aligned}\exp(\frac{-3} 4 d \epsilon^{2}) &=\exp(\frac{-3}4 \epsilon^{2}\frac{4\ln(m)}{\epsilon^{2}})\\ &=\exp(-3\ln(m))\\ &=\exp(\ln(m^{-3}))\\ &=\frac1 {m^{3}} \end{aligned}$$

令 $t\in [0,1/2)$, 像 Chernoff Bounds里面一样, 我们可以把不等式换到指数部分里面去, 然后再应用Markov不等式:

$$\operatorname{Pr}[Y>\alpha]=\operatorname{Pr}\left[e^{t Y}>e^{t \alpha}\right] \leq \frac {\mathrm{E}\left[e^{t Y}\right]}{e^{t \alpha}}\tag{3}$$

随后, 因为$Y=\sum_{i=1}^{d} X_{i}^{2}$, 而 $X_i$ 是相互独立的, 所以求和符号可以换到求数学期望的外面去:

$$\mathrm{E}\left[e^{t Y}\right]=\mathrm{E}\left[\exp \left(t \sum_{i=1}^{d} X_{i}^{2}\right)\right]=\prod_{i=1}^{d} \mathrm{E}\left[\exp \left(t X_{i}^{2}\right)\right]\tag{4}$$

对于$\mathrm{E}\left[\exp \left(t X_{i}^{2}\right)\right]$, 结合 关于随机变量函数的期望, 我们可以将这个期望按照定义展开:

$$\mathrm{E}(g(X))=\int_{\Omega} g(x) f(x) \mathrm{d} x$$ $$\Rightarrow$$ $$\begin{aligned} \mathrm{E}\left[\exp \left(t X_{i}^{2}\right)\right] &= \int_{-\infty}^{\infty} \exp \left(t y^{2}\right) f(y) d y\qquad (其中f(y)是X的概率密度函数)\\ &=\int_{-\infty}^{\infty} \exp \left(t y^{2}\right) \frac{1}{\sqrt{2 \pi}} \exp \left(-y^{2} / 2\right) d y\\ &=\frac{1}{\sqrt{2 \pi}} \int_{-\infty}^{\infty} \exp \left(t y^{2}\right) \exp \left(-y^{2} / 2\right) d y\\ &=\frac{1}{\sqrt{2 \pi}} \int_{-\infty}^{\infty} \exp \left(-y^{2}\left(\frac{1}{2}-t\right)\right) d y\end{aligned}$$

观察上面的式子, 如果 $t=0$, 那么这个式子就是一个标准正态分布的概率密度函数在整个数轴上面的积分, 结果为$1$. 为了达到相似的效果, 我们可以通过换元得到一个类似的形式:

令$z=y \sqrt{1-2 t}$, 有:

$$\begin{aligned} \mathrm{E}\left[\exp \left(t X_{i}^{2}\right)\right] &=\frac{1}{\sqrt{2 \pi}} \int_{-\infty}^{\infty} \exp \left(\frac{-(y \sqrt{1-2 t})^{2}}2\right) d y \\ &=\frac{1}{\sqrt{2 \pi} \sqrt{1-2 t}} \int_{-\infty}^{\infty} \exp \left(-z^{2} / 2\right) d z \\ &=\frac{1}{\sqrt{1-2 t}} \end{aligned}$$

结合$(3), (4)$, 可以得到: $$\begin{aligned} \mathrm{E}\left[e^{t Y}\right]&= \prod_{i=1}^{d}\mathrm{E}\left[\exp\left(tX_i^{2}\right) \right]\\ &=(1-2 t)^{-d / 2} \end{aligned}$$ $$\operatorname{Pr}[Y>\alpha] \leq \frac {(1-2 t)^{-d / 2}}{e^{t \alpha}}=e^{-t \alpha}(1-2 t)^{-d / 2}$$

接下来我们只需要选择一个合适的 $t$ 即可, 我们令 $t=(1-d / \alpha) / 2$

$$\operatorname{Pr}[Y>\alpha] \leq e^{-t \alpha}(1-2 t)^{-d / 2}=e^{(d-\alpha) / 2}(d / \alpha)^{-d / 2}$$

现在我们带入$\alpha=d(1+\epsilon)^{2}$, 得到: $$\exp \left(\frac{d}{2}\left(1-(1+\epsilon)^{2}\right)-\frac{d}{2} \ln \left(\frac{1}{(1+\epsilon)^{2}}\right)\right) =\exp \left(-d\left(\epsilon+\epsilon^{2} / 2-\ln (1+\epsilon)\right)\right)$$

%%注意括号里面的这部分: $\epsilon+\epsilon^{2} / 2-\ln (1+\epsilon)$%%

我们的证明目标是 $-(3 / 4) d \epsilon^{2}=-d\frac3 4\epsilon^{2}$, 所以我们只需要证明: $$\begin{aligned} -d\left(\epsilon+\epsilon^{2} / 2-\ln (1+\epsilon)\right)&\leq-d\frac3 4\epsilon^{2}\\ \epsilon+\epsilon^{2} / 2-\ln (1+\epsilon)&\geq\frac3 4\epsilon^{2}\\ (\epsilon+\epsilon^{2} / 2)-\frac3 4\epsilon^{2}&\geq\ln (1+\epsilon)\\ \epsilon-\epsilon^{2} / 4&\geq\ln (1+\epsilon)\\ \end{aligned}$$ 即可

利用函数的凹凸性, 我们可以证明在$x \in[0,1]$有 $\ln (1+x) \leq x-x^{2} / 4$

两者在0取值都是0, 一阶导数都是1, 二阶导数$\ln (1+x)$是负的, $x-x^{2} / 4$是正的.

综上: $$\operatorname{Pr}[Y>\alpha] \leq \exp \left(-d\left(\epsilon+\epsilon^{2} / 2-\left(\epsilon-\epsilon^{2} / 4\right)\right)\right) \leq \exp \left(-(3 / 4) d \epsilon^{2}\right)$$

我们即证明了Norm Preservation Lemma, 从而证明了Johnson Lindenstrauss Lemma

# 卡方分布的Chernoff Bound

有的证明, 比如 TTI (zotero://select/items/@shamkakade2009TTICMSC), MIT(zotero://select/items/@prof.ankurmoitra2016MIT854)讲义里面的证明, 都应用了Chi-Square Distribution的一个 Chernoff Bounds:

Lemma 4 (Chernoff bound for chi-square distributions). $$ \begin{aligned} &\mathbb{P}\left[\sum_{i=1}^{k} Y_{i}^{2}>(1+\epsilon) k\right] \leq e^{-\frac{k}{4}\left(\epsilon^{2}-\epsilon^{3}\right)} \\ &\mathbb{P}\left[\sum_{i=1}^{k} Y_{i}^{2}<(1-\epsilon) k\right] \leq e^{-\frac{k}{4}\left(\epsilon^{2}-\epsilon^{3}\right)} \end{aligned} $$ Now, we may set $k=O\left(\frac{\log (1 / \delta)}{\epsilon^{2}}\right)$ and get $|A x|{2}^{2} {\sim\over1\pm\epsilon}|x|{2}^{2}$ with probability at least $1-\delta$.

但是Chernoff Bound的证明很复杂, 我觉得UBC课件里面的证明(即上面的证明)更好.


  1. The Johnson-Lindenstrauss Lemma. Why you don’t always need all of your… | by Haris Angelidakis | Cantor’s Paradise This is actually from a Presentation slides of Laurent Jacques. ↩︎