Subspace Embedding via ε\varepsilon-Nets

On this page we prove that a Johnson-Lindenstrauss Transform is an oblivious subspace embedding. That is, the JL matrix is a map from Rn\mathbb{R}^n to RO(d)\mathbb{R}^{O(d)} which preserves the norms of all vectors in any fixed dd-dimensional subspace of Rn\mathbb{R}^n. This is called oblivious because the JL map doesn't know anything about the subspace beyond how big it is (i.e. it's dimension).

Prerequisite: Gaussian Johnson-Lindenstrauss Lemma

Theorem 1: Subspace Embedding
Theorem

Fix ε(0,1)\varepsilon\in(0,1) and δ(0,1)\delta\in(0,1). Let V\mathcal{V} be a dd-dimensional linear subspace of Rn\mathbb{R}^n. Let ΠRk×n\boldsymbol{\Pi}\in\mathbb{R}^{k \times n} be a JL sketch for k=Ω(d+log(1/δ)ε2)k = \Omega(\frac{d + \log(1/\delta)}{\varepsilon^2}). Then with probability 1δ1-\delta, for all xV\mathbf{x}\in\mathcal{V} we have

(1ε)x2Πx2(1+ε)x2 (1-\varepsilon)\|\mathbf{x}\|_2 \leq \|\boldsymbol{\Pi}\mathbf{x}\|_2 \leq (1+\varepsilon) \|\mathbf{x}\|_2

We will prove this via an ε\varepsilon-Net Argument, which works in two stages. First, we build a "Net" of O((1ε)d)O( (\frac1\varepsilon)^d) many points and union-bound the JL guarantee over all of these fixed points. Second, we make a rounding argument, which says that since linear maps are smooth, the JL guarantee on the net implies the JL guarantee on all points in V\mathcal{V}.

We show two different rounding arguments. The first is simpler and just uses the triangle inequality, but incurs a suboptimal k=Ω(dlog(1/ε)+log(1/δ)ε2)k=\Omega(\frac{d\log(1/\varepsilon) + \log(1/\delta)}{\varepsilon^2}) dependence. The second is more careful and leverages the relationship between the 2\ell_2 norm and the inner product, which achieves the rate in Theorem 1.

Building a Net

Our first step in the proof is to build an fine net. Note that, by linearity, the guarantee from Theorem 1 is equivalent to saying that all unit vectors in V\mathcal{V} have their norms preserved:

Πx2(1±ε)xV s.t. x2=1 \|\boldsymbol{\Pi}\mathbf{x}\|_2 \in (1\pm \varepsilon) \hspace{1cm} \forall \mathbf{x}\in\mathcal{V} ~ \text{ s.t. } \|\mathbf{x}\|_2=1

That is, we only need to verify that Πx2\|\boldsymbol{\Pi}\mathbf{x}\|_2 is approximately accurate for the unit sphere in V\mathcal{V}. So, we will only build a net over the unit sphere in V\mathcal{V}:

Theorem 2: 2\ell_2 Net Size
Theorem

Fix ε(0,2)\varepsilon\in(0,2), and dimension d1d\geq1. Then there exists a set Nε\mathcal{N}_\varepsilon with Nε(6ε)d\left|\mathcal{N}_\varepsilon\right| \leq (\frac 6\varepsilon)^d such that, for all x\mathbf{x} with x2=1\|\mathbf{x}\|_2=1 there exists some yNε\mathbf{y}\in\mathcal{N}_\varepsilon such that xy2ε\|\mathbf{x}-\mathbf{y}\|_2\leq\varepsilon.

Proof. Consider a greedy algorithm that constructs Nε\mathcal{N}_\varepsilon:

  • Start with Nε={}\mathcal{N}_\varepsilon=\{\}

  • While there exists any x\mathbf{x} with x2=1\|\mathbf{x}\|_2=1 such that xy2>ε\|\mathbf{x}-\mathbf{y}\|_2>\varepsilon for all yN\mathbf{y}\in\mathcal{N}, add x\mathbf{x} to Nε\mathcal{N}_\varepsilon

Then Nε\mathcal{N}_\varepsilon clearly satisfies the correctness property – all x\mathbf{x} on the unit ball must be ε\varepsilon-close to some yNε\mathbf{y}\in\mathcal{N}_\varepsilon. We just have to bound Nε\left|\mathcal{N}_\varepsilon\right|. Note that for all yi,yjNε\mathbf{y}_i,\mathbf{y}_j\in\mathcal{N}_\varepsilon, we have yiyj2>ε\|\mathbf{y}_i-\mathbf{y}_j\|_2>\varepsilon, or else one of those y\mathbf{y} vectors would not have been added by the greedy algorithm.

Note that a ball of radius rr in Rd\mathbb{R}^d has volume crdcr^d, where c=πd/2Γ(d2+1)c = \frac{\pi^{d/2}}{\Gamma(\frac{d}{2}+1)} does not depend on rr (link).

We now make the Volume Argument. We consider placing balls of radius ε2\frac\varepsilon2 on all yiNε\mathbf{y}_i\in\mathcal{N}_\varepsilon:

B(y1,ε2), , B(yNε,ε2) \mathcal{B}(\mathbf{y}_1,{\textstyle\frac{ \varepsilon}{ 2}}), ~ \ldots, ~ \mathcal{B}(\mathbf{y}_{\left|\mathcal{N}_\varepsilon\right|},{\textstyle\frac{ \varepsilon}{ 2}})

Since these balls do not intersect, and since the volume of each ball is c (ε2)dc\, (\frac{\varepsilon}{2})^d, these balls have total volume equal to Nε c (ε2)d\left|\mathcal{N}_\varepsilon\right| \, c \, (\frac{\varepsilon}{2})^d. Next, since the union of these disjoint balls all fits within a ball of radius 1+ε231+\frac{\varepsilon}{2} \leq 3, we must have

Nε c (ε2)dc 3d \left|\mathcal{N}_\varepsilon\right| \, c \, ({\textstyle\frac{ \varepsilon}{ 2}})^d \leq c \, 3^d

Rearranging this, we get Nε(6ε)d\left|\mathcal{N}_\varepsilon\right| \leq (\frac{6}{\varepsilon})^d.

\blacksquare \, \,

  • Theorem 2 does not care what dd-dimensional space is considered, be it Rd\mathbb{R}^d or a dd-dimensional linear subspace of Rn\mathbb{R}^n.

We will use it to make nets over the unit ball of V\mathcal{V}.

  • The Volume Argument used in the proof does not really use the fact that x2=1\|\mathbf{x}\|_2=1, and instead actually bounds the size of net over the whole ball of x21\|\mathbf{x}\|_2\leq1.

So, if you need an argument that uses a net over the whole interior of the ball, the same bound of Nε(6ε)d\left|\mathcal{N}_\varepsilon\right| \leq (\frac6\varepsilon)^d is correct.

  • The same proof when constrained to a more typical ε(0,1)\varepsilon\in(0,1) achieves a different constant Nε(4ε)d\left|\mathcal{N}_\varepsilon\right|\leq (\frac4\varepsilon)^d, which may appear in other works.

Net Expansion of a Vector

To prove Theorem 1, we first have to represent an arbitrary x\mathbf{x} on the unit ball in terms of points on the net:

Lemma 1: Net Expansion of a Vector
Lemma

Let x2=1\|\mathbf{x}\|_2=1 and let Nε\mathcal{N}_\varepsilon be an ε\varepsilon-Net for the unit ball. Then, there exists a sequence y0,,yn,Nε\mathbf{y}_0,\ldots,\mathbf{y}_n,\ldots\in\mathcal{N}_\varepsilon such that x=i=0αiyi\mathbf{x}=\sum_{i=0}^\infty \alpha_i \mathbf{y}_i where 0αiεi0 \leq \alpha_i \leq \varepsilon^i and α0=1\alpha_0=1.

Proof. By construction of the net, we know there exists some y0Nε\mathbf{y}_0\in\mathcal{N}_\varepsilon such that xy0ε\|\mathbf{x}-\mathbf{y}_0\|\leq\varepsilon. That is, the residual r0 := xy0\mathbf{r}_0 \;{\vcentcolon=}\; \mathbf{x}-\mathbf{y}_0 has norm c1 := r0εc_1\;{\vcentcolon=}\;\|\mathbf{r}_0\|\leq\varepsilon. Then, again by the net, we know that some y1Nε\mathbf{y}_1\in\mathcal{N}_\varepsilon has r0c1y1ε\|\frac{\mathbf{r}_0}{c_1} - \mathbf{y}_1\| \leq \varepsilon. That is, the residual r1 := r0c1y1\mathbf{r}_1 \;{\vcentcolon=}\; \frac{\mathbf{r}_0}{c_1} - \mathbf{y}_1 has norm c2 := r1εc_2\;{\vcentcolon=}\;\|\mathbf{r}_1\|\leq\varepsilon. Repeating this process, we get x=y0+r0=y0+c1(y1+r2)=y0+c1(y1+c2(y2+))=y0+c1y1+c1c2y2+c1c2c3y3+\begin{aligned} \mathbf{x} &= \mathbf{y}_0 + \mathbf{r}_0 \\ &= \mathbf{y}_0 + c_1(\mathbf{y}_1 + \mathbf{r}_2) \\ &= \mathbf{y}_0 + c_1(\mathbf{y}_1 + c_2(\mathbf{y}_2 + \ldots)) \\ &= \mathbf{y}_0 + c_1\mathbf{y}_1 + c_1c_2\mathbf{y}_2 + c_1c_2c_3\mathbf{y}_3 + \ldots \end{aligned} Since each ci[0,ε]c_i\in[0,\varepsilon], we get that αi := c1c2ci[0,εi]\alpha_i \;{\vcentcolon=}\; c_1c_2\ldots c_i \in[0,\varepsilon^i].
\blacksquare \, \,

Rounding via Triangle Inequality

Here, we present a proof of Theorem 1, but which uses sample complexity O(dlog(1/ε)+log(1/δ)ε2)O(\frac{d \log(1/\varepsilon) + \log(1/\delta)}{\varepsilon^2}) instead of the tighter rate O(d+log(1/δ)ε2)O(\frac{d + \log(1/\delta)}{\varepsilon^2}) promised in theorem statement. This proof, however, is very simple and just uses Lemma 1 and the triangle inequality:

Proof. Let ε0 := ε4\varepsilon_0 \;{\vcentcolon=}\; \frac{\varepsilon}{4}. Let Nε0\mathcal{N}_{\varepsilon_0} be an ε0\varepsilon_0-Net for the unit ball, so that Nε0(24ε)d\left|\mathcal{N}_{\varepsilon_0}\right|\leq (\frac{24}\varepsilon)^d and so that union bounding JL over all yNε0\mathbf{y}\in\mathcal{N}_{\varepsilon_0} requires sketching dimension k=Ω(log(Nε0/δ)ε2)=Ω(dlog(1/ε)+log(1/δ)ε2)k=\Omega(\frac{\log(\left|\mathcal{N}_{\varepsilon_0}\right|/\delta)}{\varepsilon^2}) = \Omega(\frac{d \log(1/\varepsilon) + \log(1/\delta)}{\varepsilon^2}). Namely, we have Πyi(1±ε0)\|\boldsymbol{\Pi}\mathbf{y}_i\|\in(1\pm\varepsilon_0) since y2=1\|\mathbf{y}\|_2=1 for all yNε0\mathbf{y}\in\mathcal{N}_{\varepsilon_0}.

Let x\mathbf{x} be any vector with x2=1\|\mathbf{x}\|_2=1, and let x=i=0αiyi\mathbf{x} = \sum_{i=0}^\infty \alpha_i \mathbf{y}_i be its Net Expansion. Then, we have

Πx2i=0αiΠyi(1+ε0)i=0ε0i=1+ε01ε0 \|\boldsymbol{\Pi}\mathbf{x}\|_2 \leq \sum_{i=0}^\infty \alpha_i\|\boldsymbol{\Pi}\mathbf{y}_i\| \leq (1+\varepsilon_0)\sum_{i=0}^\infty \varepsilon_0^i = \frac{1+\varepsilon_0}{1-\varepsilon_0}

since 1+ε01ε01+4ε0=1+ε\frac{1+\varepsilon_0}{1-\varepsilon_0} \leq 1+4\varepsilon_0 = 1+\varepsilon for ε[0,1]\varepsilon\in[0,1], we get Πx21+ε\|\boldsymbol{\Pi}\mathbf{x}\|_2\leq1+\varepsilon. Similarly, by the reverse triangle inequality a+bab\|\mathbf{a}+\mathbf{b}\|\geq\|\mathbf{a}\|-\|\mathbf{b}\|, ΠxΠy0i=1αiΠyiΠy0i=1αiΠyi(1ε0)(1+ε0)i=1ε0i=(1ε0)(1+ε0)ε01ε0\begin{aligned} \|\boldsymbol{\Pi}\mathbf{x}\| &\geq \|\boldsymbol{\Pi}\mathbf{y}_0\| - \left\|\sum_{i=1}^\infty \alpha_i\boldsymbol{\Pi}\mathbf{y}_i\right\| \\ &\geq \|\boldsymbol{\Pi}\mathbf{y}_0\| - \sum_{i=1}^\infty \alpha_i \|\boldsymbol{\Pi}\mathbf{y}_i\| \\ &\geq (1-\varepsilon_0) - (1+\varepsilon_0) \sum_{i=1}^\infty \varepsilon_0^i \\ &= (1-\varepsilon_0) - (1+\varepsilon_0) \frac{\varepsilon_0}{1-\varepsilon_0} \\ \end{aligned} So we get Πx(1ε0)ε01+ε01ε013ε01ε\|\boldsymbol{\Pi}\mathbf{x}\| \geq (1-\varepsilon_0) - \varepsilon_0 \frac{1+\varepsilon_0}{1-\varepsilon_0} \geq 1-3\varepsilon_0 \geq 1-\varepsilon. That is, we overall find

Πx2(1±ε)xV s.t. x2=1 \|\boldsymbol{\Pi}\mathbf{x}\|_2 \in (1\pm \varepsilon) \hspace{1cm} \forall \mathbf{x}\in\mathcal{V} ~ \text{ s.t. } \|\mathbf{x}\|_2=1

Which completes the proof.

\blacksquare \, \,

Rounding via Inner Products

We now present a sharper analysis that achieves the rate of k=Ω(d+log(1/δ)ε2)k=\Omega(\frac{d+\log(1/\delta)}{\varepsilon^2}):

We do this by decreasing the precision of the net from an ε\varepsilon-Net to a 12\frac12-Net, which therefore needs a new tighter rounding argument. To understand how this works, we ask why the proof via triangle inequality has to use a net with precision O(ε)O(\varepsilon). Suppose the JL matrix preserved the norm of all yNε\mathbf{y}\in\mathcal{N}_{\varepsilon} perfectly, then that proof bounds

xi=0αiyii=0εi=1+ε \|\mathbf{x}\|\leq\sum_{i=0}^\infty \alpha_i \|\mathbf{y}_i\| \leq \sum_{i=0}^\infty \varepsilon^i = 1+\varepsilon

This proof, even for a perfectly accurate JL matrix, still overestimates the norm of x\mathbf{x} because the triangle inequality is loosing vital information. Specifically, note that a+b2=a2+b2\|\mathbf{a}+\mathbf{b}\|_2 = \|\mathbf{a}\|_2 + \|\mathbf{b}\|_2 only if a\mathbf{a} and b\mathbf{b} are perfectly orthogonal. If we somehow had a net Nε\mathcal{N}_\varepsilon such that y1,,y\mathbf{y}_1,\ldots,\mathbf{y}_\infty were orthogonal, then the triangle inequality proof would be tight.

However, that's trivially not the case here. For instance, by the pigeonhole principle we guarantee that y1,,y\mathbf{y}_1,\ldots,\mathbf{y}_\infty has infinitely many repeated vectors, since Nε<\left|\mathcal{N}_\varepsilon\right|<\infty, and so those repeated vectors are deeply non-orthogonal.

So, we need to find a new way to preserve x\|\mathbf{x}\| in terms of y1,,y\mathbf{y}_1,\ldots,\mathbf{y}_\infty, and that approach follows by examining the unique properties of the 2\ell_2 norm, namely that

ab22=a22+b222ab \|\mathbf{a}-\mathbf{b}\|_2^2 = \|\mathbf{a}\|_2^2 + \|\mathbf{b}\|_2^2 - 2\mathbf{a}^\intercal\mathbf{b}

Or, equivalently,

ab=12(a22+b22ab22) \mathbf{a}^\intercal\mathbf{b} = \frac12 \left(\|\mathbf{a}\|_2^2 + \|\mathbf{b}\|_2^2 - \|\mathbf{a}-\mathbf{b}\|_2^2\right)

We then can preserve all three terms on the right to relative error by union bounding JL over a\mathbf{a}, b\mathbf{b}, and ab\mathbf{a}-\mathbf{b}, and So, we can expand x22=(iαiyi)(iαiyi)\|\mathbf{x}\|_2^2 = (\sum_i \alpha_i \mathbf{y}_i)^\intercal(\sum_i \alpha_i \mathbf{y}_i) as a large sum of inner products, preserve all the corresponding norms by JL, and recover a relative error guarantee with a coarser net and (1+ε)(1+\varepsilon)-accurate JL.

Let ε0=ε24\varepsilon_0 = \frac{\varepsilon}{24}. Let N2\mathcal{N}_2 be a 12\frac12-Net for the unit ball, so that N212d\left|\mathcal{N}_2\right| \leq 12^d. We union bound JL over all pairs y,yN2\mathbf{y},\mathbf{y}'\in\mathcal{N}_2, so that both Πy2(1±ε0)\|\boldsymbol{\Pi}\mathbf{y}\|_2\in(1\pm\varepsilon_0) and Π(yy)2(1±ε0)yy2\|\boldsymbol{\Pi}(\mathbf{y}-\mathbf{y}')\|_2\in(1\pm\varepsilon_0)\|\mathbf{y}-\mathbf{y}'\|_2 hold for all y,yN2\mathbf{y},\mathbf{y}'\in\mathcal{N}_2. This requires sketching dimension k=Ω(log(N2/δ)ε02)=Ω(d+log(1/δ)ε2)k = \Omega(\frac{\log(\left|\mathcal{N}_2\right|/\delta)}{\varepsilon_0^2}) = \Omega(\frac{d + \log(1/\delta)}{\varepsilon^2}).

Let x\mathbf{x} be any vector with x2=1\|\mathbf{x}\|_2=1, and let x=i=0αiyi\mathbf{x} = \sum_{i=0}^\infty \alpha_i \mathbf{y}_i be its Net Expansion. Then, we have

Πx22=(i=0αiΠyi)(j=0αjΠyj)=i=0j=0αiαj(Πyi)(Πyj)\begin{aligned} \|\boldsymbol{\Pi}\mathbf{x}\|_2^2 &= (\textstyle{\sum_{i=0}^\infty} \alpha_i \boldsymbol{\Pi}\mathbf{y}_i)^\intercal (\textstyle{\sum_{j=0}^\infty} \alpha_j \boldsymbol{\Pi}\mathbf{y}_j) \\ &= \sum_{i=0}^\infty \sum_{j=0}^\infty \alpha_i \alpha_j (\boldsymbol{\Pi}\mathbf{y}_i)^\intercal(\boldsymbol{\Pi}\mathbf{y}_j) \end{aligned}
We then bound the accuracy of this inner product, using the fact that (1+ε0)21+3ε0(1+\varepsilon_0)^2 \leq 1+3\varepsilon_0 for ε01\varepsilon_0 \leq 1:
(Πyi)(Πyj)=12(Πyi22+Πyj22Π(yiyj)22)12(yi22+yj22yiyj22)+3ε02(yi22+yj22+yiyj22)12(yi22+yj22yiyj22)+3ε02(1+1+2)=yiyj+6ε0\begin{aligned} (\boldsymbol{\Pi}\mathbf{y}_i)^\intercal(\boldsymbol{\Pi}\mathbf{y}_j) &= \frac12 \left( \|\boldsymbol{\Pi}\mathbf{y}_i\|_2^2 + \|\boldsymbol{\Pi}\mathbf{y}_j\|_2^2 - \|\boldsymbol{\Pi}(\mathbf{y}_i-\mathbf{y}_j)\|_2^2 \right) \\ &\leq \frac12 \left( \|\mathbf{y}_i\|_2^2 + \|\mathbf{y}_j\|_2^2 - \|\mathbf{y}_i-\mathbf{y}_j\|_2^2 \right) + \frac{3\varepsilon_0}2 \left( \|\mathbf{y}_i\|_2^2 + \|\mathbf{y}_j\|_2^2 + \|\mathbf{y}_i-\mathbf{y}_j\|_2^2 \right) \\ &\leq \frac12 \left( \|\mathbf{y}_i\|_2^2 + \|\mathbf{y}_j\|_2^2 - \|\mathbf{y}_i-\mathbf{y}_j\|_2^2 \right) + \frac{3\varepsilon_0}2 \left( 1 + 1 + 2 \right) \\ &= \mathbf{y}_i^\intercal\mathbf{y}_j + 6\varepsilon_0 \end{aligned}
With the matching lower bound following from (1ε0)213ε0(1-\varepsilon_0)^2 \geq 1-3\varepsilon_0. So the subspace embedding error grows as
Πx22=i=0j=0αiαj(Πyi)(Πyj)i=0j=0αiαj(yiyj+6ε0)x22+i=0j=0αiαj6ε01+6ε0i=0j=02i2j=1+24ε0\begin{aligned} \|\boldsymbol{\Pi}\mathbf{x}\|_2^2 &= \sum_{i=0}^\infty \sum_{j=0}^\infty \alpha_i \alpha_j (\boldsymbol{\Pi}\mathbf{y}_i)^\intercal(\boldsymbol{\Pi}\mathbf{y}_j) \\ &\leq \sum_{i=0}^\infty \sum_{j=0}^\infty \alpha_i \alpha_j (\mathbf{y}_i^\intercal\mathbf{y}_j + 6\varepsilon_0) \\ &\leq \|\mathbf{x}\|_2^2 + \sum_{i=0}^\infty \sum_{j=0}^\infty \alpha_i \alpha_j 6\varepsilon_0 \\ &\leq 1 + 6\varepsilon_0 \sum_{i=0}^\infty \sum_{j=0}^\infty 2^{-i} 2^{-j} \\ &= 1 + 24\varepsilon_0 \end{aligned}
And the lower bound similarly is Πx22124ε0\|\boldsymbol{\Pi}\mathbf{x}\|_2^2 \geq 1 - 24\varepsilon_0. So, we have Πx21+24ε01+24ε0=1+ε\|\boldsymbol{\Pi}\mathbf{x}\|_2 \leq \sqrt{1+24\varepsilon_0} \leq 1+24\varepsilon_0 = 1+\varepsilon and Πx2124ε0124ε0=1ε\|\boldsymbol{\Pi}\mathbf{x}\|_2 \geq \sqrt{1-24\varepsilon_0} \leq 1-24\varepsilon_0 = 1-\varepsilon (see this on Desmos), which completes the proof.

See Also

The proofs above are ubiquitous, for example the coarser argument is in Musco (2018).

Here's some important papers in world of oblivious subspace embeddings:

  • Johnson Lindenstrauss (1984) is the original paper of Johnson and Lindenstrauss.

  • Musco (2018) has a nice short proof, which this page basically copies.

  • I'm sure a lot of great references are missing

  • Let me know if anything is missing

Bibliography