The Johnson-Lindenstrauss Lemma

The Johnsons-Lindenstrauss Lemma, or JL Lemma, says that we can embed nn points from Rd\mathbb{R}^d in a O(log(n)ε2)O(\frac{\log(n)}{\varepsilon^2}) dimensional space while preserving all pairwise distances to relative error ε\varepsilon. We prove this in the simplest case, where we choose this embedding to be a random Gaussian matrix. We start with a simple concentration for a single vector xRd\mathbf{x}\in\mathbb{R}^{d}:

Lemma 1: Gaussians Preserve Norms
Lemma

Fix ε>0\varepsilon>0, δ>0\delta>0, dimension dd, and k=Ω(1ε2log(1δ))k = \Omega(\frac1{\varepsilon^2}\log(\frac1\delta)). Let GRk×d\boldsymbol{G}\in\mathbb{R}^{k \times d} be a matrix of iid N(0,1)\mathcal{N}(0,1) entries. Then with probability 1δ1-\delta, for any fixed xRd\mathbf{x}\in\mathbb{R}^d, we have

(1ε)x221kGx22(1+ε)x22 (1-\varepsilon)\|\mathbf{x}\|_2^2 \leq \|{\textstyle\frac{ 1}{ \sqrt k}}\boldsymbol{G}\mathbf{x}\|_2^2 \leq (1+\varepsilon)\|\mathbf{x}\|_2^2

Proof. Consider the ithi^{th} entry of w := Gx\mathbf{w} \;{\vcentcolon=}\; \boldsymbol{G}\mathbf{x}. Letting g1,,gdg_1,\ldots,g_d denote the entries of the ithi^{th} row of G\boldsymbol{G}, we have

wi=j=1dgjxjj=1dN(0,xj2)=N(0,x22) w_i = \sum_{j=1}^d g_j x_j \sim \sum_{j=1}^d \mathcal{N}(0,x_j^2) = \mathcal{N}(0,\|\mathbf{x}\|_2^2)

That is, w\mathbf{w} is just a vector of iid N(0,x22)\mathcal{N}(0,\|\mathbf{x}\|_2^2) random variables. So,

1kw22=1ki=1kwi2x221ki=1k(N(0,1))2 \frac1k\|\mathbf{w}\|_2^2 = \frac1k \sum_{i=1}^k w_i^2 \sim \|\mathbf{x}\|_2^2 \cdot \frac1k\sum_{i=1}^k(\mathcal{N}(0,1))^2

Fortunately, the concentration of the average of squared Gaussians is well understood. Equation 2.2.1 from Wainwright (2015) says that, with probability 1δ1-\delta, standard normal random variables z1,,zkz_1,\ldots,z_k have

1ki=1kzi218kln(2δ)ε \left|\frac{1}{k}\sum_{i=1}^k z_i^2 - 1\right| \leq \sqrt{\frac{8}{k}\ln(\frac2\delta)} \leq \varepsilon

where our value of k=Ω(1ε2log(1δ))k = \Omega(\frac1{\varepsilon^2}\log(\frac1\delta)) guarantees this error term is at most ε\varepsilon. We conclude that

x221kGx22=x221kw22=x221ki=1kzi21εx22 \left|\|\mathbf{x}\|_2^2 - \|{\textstyle\frac{ 1}{ \sqrt k}} \boldsymbol{G}\mathbf{x}\|_2^2\right| = \left|\|\mathbf{x}\|_2^2 - {\textstyle\frac{ 1}{ k}}\|\mathbf{w}\|_2^2\right| = \|\mathbf{x}\|_2^2 \cdot \left|\frac{1}{k}\sum_{i=1}^k z_i^2 - 1\right| \leq \varepsilon \|\mathbf{x}\|_2^2

Which completes the proof.

\blacksquare \, \,

Given this simple concentration, we can state the Johnson-Lindenstrauss Lemma:

Lemma 2: Johnson-Lindenstrauss
Lemma

Let x1,,xnRd\mathbf{x}_1,\ldots,\mathbf{x}_n\in\mathbb{R}^{d}. Fix ε(0,1)\varepsilon\in(0,1), δ>0\delta>0, and k=Ω(1ε2log(nδ))k = \Omega(\frac1{\varepsilon^2}\log(\frac n\delta)). Let ΠRk×d\boldsymbol{\Pi}\in\mathbb{R}^{k \times d} be a matrix of iid N(0,1k)\mathcal{N}(0,\frac1k) entries. Then, with probability 1δ1-\delta, for all pairs i,ji,j we have

(1ε)xixj2ΠxiΠxj2(1+ε)xixj2 (1-\varepsilon) \|\mathbf{x}_i - \mathbf{x}_j\|_2 \leq \|\boldsymbol{\Pi}\mathbf{x}_i - \boldsymbol{\Pi}\mathbf{x}_j\|_2 \leq (1+\varepsilon) \|\mathbf{x}_i - \mathbf{x}_j\|_2

Proof. For all pairs i,ji,j, consider vi,j := xixj\mathbf{v}_{i,j} \;{\vcentcolon=}\; \mathbf{x}_i - \mathbf{x}_j, and union bound Lemma 1 to find a matrix G\boldsymbol{G} which preserves the norms of all vi,j\mathbf{v}_{i,j}. This involves union bounding over (n2)=O(n2)\binom{n}{2} = O(n^2) vectors, which causes log(nδ)\log(\frac n\delta) to appear in the requirement on kk. Since Π=1kG\boldsymbol{\Pi} = \frac1{\sqrt k} \boldsymbol{G}, we have

1εxixj2ΠxiΠxj21+εxixj2 \sqrt{1-\varepsilon} \|\mathbf{x}_i - \mathbf{x}_j\|_2 \leq \|\boldsymbol{\Pi}\mathbf{x}_i - \boldsymbol{\Pi}\mathbf{x}_j\|_2 \leq \sqrt{1+\varepsilon} \|\mathbf{x}_i - \mathbf{x}_j\|_2

We then note that 1ε>1ε\sqrt{1-\varepsilon} > 1-\varepsilon and 1+ε<1+ε\sqrt{1+\varepsilon} < 1+\varepsilon (see this on Desmos), which completes the proof.

\blacksquare \, \,

See Also

The proofs above are ubiquitous, but can be found for example in Wainwright (2015) and Musco (2018).

Here's some important papers in the area of JL:

  • Johnson Lindenstrauss (1984) is the original paper of Johnson and Lindenstrauss.

  • Musco (2018) has a nice short proof, which this page basically copies.

  • Larsen Nelson (2017) shows that no other mapping, even possibly nonlinear mapping, can outperform the rate of k=Ω(1ε2log(n))k = \Omega(\frac{1}{\varepsilon^2} \log(n)).

  • Ailon Chazelle (2009) shows the Fast-JL Embedding, which samples columns from Fourier or Hadamard matrices, allowing for faster computation of the matrix-vector product Πx\boldsymbol{\Pi}\mathbf{x}.

  • Kane Nelson (2014) gives the sparsest known embedding matrix Π\boldsymbol{\Pi}.

  • Nelson Nguyễn (2013) lower bounds the neccessary sparsity.

  • Let me know if anything is missing

Bibliography