Proof. Consider the entry of . Letting denote the entries of the row of , we have
That is, is just a vector of iid random variables. So,
Fortunately, the concentration of the average of squared Gaussians is well understood. Equation 2.2.1 from Wainwright (2015) says that, with probability , standard normal random variables have
where our value of guarantees this error term is at most . We conclude that
Which completes the proof.
Given this simple concentration, we can state the Johnson-Lindenstrauss Lemma:
Proof. For all pairs , consider , and union bound Lemma 1 to find a matrix which preserves the norms of all . This involves union bounding over vectors, which causes to appear in the requirement on . Since , we have
We then note that and (see this on Desmos), which completes the proof.
The proofs above are ubiquitous, but can be found for example in Wainwright (2015) and Musco (2018).
Here's some important papers in the area of JL:
Johnson Lindenstrauss (1984) is the original paper of Johnson and Lindenstrauss.
Musco (2018) has a nice short proof, which this page basically copies.
Larsen Nelson (2017) shows that no other mapping, even possibly nonlinear mapping, can outperform the rate of .
Ailon Chazelle (2009) shows the Fast-JL Embedding, which samples columns from Fourier or Hadamard matrices, allowing for faster computation of the matrix-vector product .
Kane Nelson (2014) gives the sparsest known embedding matrix .
Nelson Nguyễn (2013) lower bounds the neccessary sparsity.
Let me know if anything is missing
Ailon and Chazelle. The fast Johnson–Lindenstrauss transform and approximate nearest neighbors. SICOMP 2009.
Johnson and Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics 1984.
Kane and Nelson. Sparser Johnson-Lindenstrauss Transforms. JACM 2014.
Larsen and Nelson. Optimality of the Johnson-Lindenstrauss Lemma. FOCS 2017.
Musco. Lecture 10: Dimensionality Reduction and the Johnson-Lindenstrauss Lemma. Lecture Notes, 2018.
Nelson and Nguyễn. Sparsity Lower Bounds for Dimensionality Reducing Maps. STOC 2013.
Wainwright. Draft of High-dimensional statistics: A Non-Asymptotic Viewpoint, Chapter 2. Draft of publication at Cambridge University Press, 2015.