Subspace Embedding via $\varepsilon$ -Nets

On this page we prove that a Johnson-Lindenstrauss Transform is an oblivious subspace embedding. That is, the JL matrix is a map from $\mathbb{R}^n$ to $\mathbb{R}^{O(d)}$ which preserves the norms of all vectors in any fixed $d$ -dimensional subspace of $\mathbb{R}^n$ . This is called oblivious because the JL map doesn't know anything about the subspace beyond how big it is (i.e. it's dimension).

Prerequisite: Gaussian Johnson-Lindenstrauss Lemma

Theorem 1: Subspace Embedding
Theorem

Fix

\varepsilon\in(0,1)

and

\delta\in(0,1)

. Let

\mathcal{V}

be a

d

-dimensional linear subspace of

\mathbb{R}^n

. Let

\boldsymbol{\Pi}\in\mathbb{R}^{k \times n}

be a JL sketch for

k = \Omega(\frac{d + \log(1/\delta)}{\varepsilon^2})

. Then with probability

1-\delta

, for all

\mathbf{x}\in\mathcal{V}

we have

(1-\varepsilon)\|\mathbf{x}\|_2 \leq \|\boldsymbol{\Pi}\mathbf{x}\|_2 \leq (1+\varepsilon) \|\mathbf{x}\|_2

We will prove this via an $\varepsilon$ -Net Argument, which works in two stages. First, we build a "Net" of $O( (\frac1\varepsilon)^d)$ many points and union-bound the JL guarantee over all of these fixed points. Second, we make a rounding argument, which says that since linear maps are smooth, the JL guarantee on the net implies the JL guarantee on all points in $\mathcal{V}$ .

We show two different rounding arguments. The first is simpler and just uses the triangle inequality, but incurs a suboptimal $k=\Omega(\frac{d\log(1/\varepsilon) + \log(1/\delta)}{\varepsilon^2})$ dependence. The second is more careful and leverages the relationship between the $\ell_2$ norm and the inner product, which achieves the rate in Theorem 1.

Building a Net

Our first step in the proof is to build an fine net. Note that, by linearity, the guarantee from Theorem 1 is equivalent to saying that all unit vectors in $\mathcal{V}$ have their norms preserved:

\|\boldsymbol{\Pi}\mathbf{x}\|_2 \in (1\pm \varepsilon) \hspace{1cm} \forall \mathbf{x}\in\mathcal{V} ~ \text{ s.t. } \|\mathbf{x}\|_2=1

That is, we only need to verify that $\|\boldsymbol{\Pi}\mathbf{x}\|_2$ is approximately accurate for the unit sphere in $\mathcal{V}$ . So, we will only build a net over the unit sphere in $\mathcal{V}$ :

Theorem 2: $\ell_2$ Net Size
Theorem

Fix

\varepsilon\in(0,2)

, and dimension

d\geq1

. Then there exists a set

\mathcal{N}_\varepsilon

with

\left|\mathcal{N}_\varepsilon\right| \leq (\frac 6\varepsilon)^d

such that, for all

\mathbf{x}

with

\|\mathbf{x}\|_2=1

there exists some

\mathbf{y}\in\mathcal{N}_\varepsilon

such that

\|\mathbf{x}-\mathbf{y}\|_2\leq\varepsilon

Proof. Consider a greedy algorithm that constructs $\mathcal{N}_\varepsilon$ :

Start with $\mathcal{N}_\varepsilon=\{\}$
While there exists any $\mathbf{x}$ with $\|\mathbf{x}\|_2=1$ such that $\|\mathbf{x}-\mathbf{y}\|_2>\varepsilon$ for all $\mathbf{y}\in\mathcal{N}$ , add $\mathbf{x}$ to $\mathcal{N}_\varepsilon$

Then $\mathcal{N}_\varepsilon$ clearly satisfies the correctness property – all $\mathbf{x}$ on the unit ball must be $\varepsilon$ -close to some $\mathbf{y}\in\mathcal{N}_\varepsilon$ . We just have to bound $\left|\mathcal{N}_\varepsilon\right|$ . Note that for all $\mathbf{y}_i,\mathbf{y}_j\in\mathcal{N}_\varepsilon$ , we have $\|\mathbf{y}_i-\mathbf{y}_j\|_2>\varepsilon$ , or else one of those $\mathbf{y}$ vectors would not have been added by the greedy algorithm.

Note that a ball of radius $r$ in $\mathbb{R}^d$ has volume $cr^d$ , where $c = \frac{\pi^{d/2}}{\Gamma(\frac{d}{2}+1)}$ does not depend on $r$ (link).

We now make the Volume Argument. We consider placing balls of radius $\frac\varepsilon2$ on all $\mathbf{y}_i\in\mathcal{N}_\varepsilon$ :

\mathcal{B}(\mathbf{y}_1,{\textstyle\frac{ \varepsilon}{ 2}}), ~ \ldots, ~ \mathcal{B}(\mathbf{y}_{\left|\mathcal{N}_\varepsilon\right|},{\textstyle\frac{ \varepsilon}{ 2}})

Since these balls do not intersect, and since the volume of each ball is $c\, (\frac{\varepsilon}{2})^d$ , these balls have total volume equal to $\left|\mathcal{N}_\varepsilon\right| \, c \, (\frac{\varepsilon}{2})^d$ . Next, since the union of these disjoint balls all fits within a ball of radius $1+\frac{\varepsilon}{2} \leq 3$ , we must have

\left|\mathcal{N}_\varepsilon\right| \, c \, ({\textstyle\frac{ \varepsilon}{ 2}})^d \leq c \, 3^d

Rearranging this, we get $\left|\mathcal{N}_\varepsilon\right| \leq (\frac{6}{\varepsilon})^d$ .

\blacksquare \, \,

Theorem 2 does not care what $d$ -dimensional space is considered, be it $\mathbb{R}^d$ or a $d$ -dimensional linear subspace of $\mathbb{R}^n$ .

We will use it to make nets over the unit ball of $\mathcal{V}$ .

The Volume Argument used in the proof does not really use the fact that $\|\mathbf{x}\|_2=1$ , and instead actually bounds the size of net over the whole ball of $\|\mathbf{x}\|_2\leq1$ .

So, if you need an argument that uses a net over the whole interior of the ball, the same bound of $\left|\mathcal{N}_\varepsilon\right| \leq (\frac6\varepsilon)^d$ is correct.

The same proof when constrained to a more typical $\varepsilon\in(0,1)$ achieves a different constant $\left|\mathcal{N}_\varepsilon\right|\leq (\frac4\varepsilon)^d$ , which may appear in other works.

Net Expansion of a Vector

To prove Theorem 1, we first have to represent an arbitrary $\mathbf{x}$ on the unit ball in terms of points on the net:

Lemma 1: Net Expansion of a Vector
Lemma

Let

\|\mathbf{x}\|_2=1

and let

\mathcal{N}_\varepsilon

be an

\varepsilon

-Net for the unit ball. Then, there exists a sequence

\mathbf{y}_0,\ldots,\mathbf{y}_n,\ldots\in\mathcal{N}_\varepsilon

such that

\mathbf{x}=\sum_{i=0}^\infty \alpha_i \mathbf{y}_i

where

0 \leq \alpha_i \leq \varepsilon^i

and

\alpha_0=1

Proof. By construction of the net, we know there exists some

\mathbf{y}_0\in\mathcal{N}_\varepsilon

such that

\|\mathbf{x}-\mathbf{y}_0\|\leq\varepsilon

. That is, the residual

\mathbf{r}_0 \;{\vcentcolon=}\; \mathbf{x}-\mathbf{y}_0

has norm

c_1\;{\vcentcolon=}\;\|\mathbf{r}_0\|\leq\varepsilon

. Then, again by the net, we know that some

\mathbf{y}_1\in\mathcal{N}_\varepsilon

has

\|\frac{\mathbf{r}_0}{c_1} - \mathbf{y}_1\| \leq \varepsilon

. That is, the residual

\mathbf{r}_1 \;{\vcentcolon=}\; \frac{\mathbf{r}_0}{c_1} - \mathbf{y}_1

has norm

c_2\;{\vcentcolon=}\;\|\mathbf{r}_1\|\leq\varepsilon

. Repeating this process, we get

\begin{aligned} \mathbf{x} &= \mathbf{y}_0 + \mathbf{r}_0 \\ &= \mathbf{y}_0 + c_1(\mathbf{y}_1 + \mathbf{r}_2) \\ &= \mathbf{y}_0 + c_1(\mathbf{y}_1 + c_2(\mathbf{y}_2 + \ldots)) \\ &= \mathbf{y}_0 + c_1\mathbf{y}_1 + c_1c_2\mathbf{y}_2 + c_1c_2c_3\mathbf{y}_3 + \ldots \end{aligned}

Since each

c_i\in[0,\varepsilon]

, we get that

\alpha_i \;{\vcentcolon=}\; c_1c_2\ldots c_i \in[0,\varepsilon^i]

\blacksquare \, \,

Rounding via Triangle Inequality

Here, we present a proof of Theorem 1, but which uses sample complexity $O(\frac{d \log(1/\varepsilon) + \log(1/\delta)}{\varepsilon^2})$ instead of the tighter rate $O(\frac{d + \log(1/\delta)}{\varepsilon^2})$ promised in theorem statement. This proof, however, is very simple and just uses Lemma 1 and the triangle inequality:

Proof. Let $\varepsilon_0 \;{\vcentcolon=}\; \frac{\varepsilon}{4}$ . Let $\mathcal{N}_{\varepsilon_0}$ be an $\varepsilon_0$ -Net for the unit ball, so that $\left|\mathcal{N}_{\varepsilon_0}\right|\leq (\frac{24}\varepsilon)^d$ and so that union bounding JL over all $\mathbf{y}\in\mathcal{N}_{\varepsilon_0}$ requires sketching dimension $k=\Omega(\frac{\log(\left|\mathcal{N}_{\varepsilon_0}\right|/\delta)}{\varepsilon^2}) = \Omega(\frac{d \log(1/\varepsilon) + \log(1/\delta)}{\varepsilon^2})$ . Namely, we have $\|\boldsymbol{\Pi}\mathbf{y}_i\|\in(1\pm\varepsilon_0)$ since $\|\mathbf{y}\|_2=1$ for all $\mathbf{y}\in\mathcal{N}_{\varepsilon_0}$ .

Let $\mathbf{x}$ be any vector with $\|\mathbf{x}\|_2=1$ , and let $\mathbf{x} = \sum_{i=0}^\infty \alpha_i \mathbf{y}_i$ be its Net Expansion. Then, we have

\|\boldsymbol{\Pi}\mathbf{x}\|_2 \leq \sum_{i=0}^\infty \alpha_i\|\boldsymbol{\Pi}\mathbf{y}_i\| \leq (1+\varepsilon_0)\sum_{i=0}^\infty \varepsilon_0^i = \frac{1+\varepsilon_0}{1-\varepsilon_0}

since $\frac{1+\varepsilon_0}{1-\varepsilon_0} \leq 1+4\varepsilon_0 = 1+\varepsilon$ for $\varepsilon\in[0,1]$ , we get $\|\boldsymbol{\Pi}\mathbf{x}\|_2\leq1+\varepsilon$ . Similarly, by the reverse triangle inequality $\|\mathbf{a}+\mathbf{b}\|\geq\|\mathbf{a}\|-\|\mathbf{b}\|$ , $\begin{aligned} \|\boldsymbol{\Pi}\mathbf{x}\| &\geq \|\boldsymbol{\Pi}\mathbf{y}_0\| - \left\|\sum_{i=1}^\infty \alpha_i\boldsymbol{\Pi}\mathbf{y}_i\right\| \\ &\geq \|\boldsymbol{\Pi}\mathbf{y}_0\| - \sum_{i=1}^\infty \alpha_i \|\boldsymbol{\Pi}\mathbf{y}_i\| \\ &\geq (1-\varepsilon_0) - (1+\varepsilon_0) \sum_{i=1}^\infty \varepsilon_0^i \\ &= (1-\varepsilon_0) - (1+\varepsilon_0) \frac{\varepsilon_0}{1-\varepsilon_0} \\ \end{aligned}$ So we get $\|\boldsymbol{\Pi}\mathbf{x}\| \geq (1-\varepsilon_0) - \varepsilon_0 \frac{1+\varepsilon_0}{1-\varepsilon_0} \geq 1-3\varepsilon_0 \geq 1-\varepsilon$ . That is, we overall find

\|\boldsymbol{\Pi}\mathbf{x}\|_2 \in (1\pm \varepsilon) \hspace{1cm} \forall \mathbf{x}\in\mathcal{V} ~ \text{ s.t. } \|\mathbf{x}\|_2=1

Which completes the proof.

\blacksquare \, \,

Rounding via Inner Products

We now present a sharper analysis that achieves the rate of $k=\Omega(\frac{d+\log(1/\delta)}{\varepsilon^2})$ :

We do this by decreasing the precision of the net from an $\varepsilon$ -Net to a $\frac12$ -Net, which therefore needs a new tighter rounding argument. To understand how this works, we ask why the proof via triangle inequality has to use a net with precision $O(\varepsilon)$ . Suppose the JL matrix preserved the norm of all $\mathbf{y}\in\mathcal{N}_{\varepsilon}$ perfectly, then that proof bounds

\|\mathbf{x}\|\leq\sum_{i=0}^\infty \alpha_i \|\mathbf{y}_i\| \leq \sum_{i=0}^\infty \varepsilon^i = 1+\varepsilon

This proof, even for a perfectly accurate JL matrix, still overestimates the norm of $\mathbf{x}$ because the triangle inequality is loosing vital information. Specifically, note that $\|\mathbf{a}+\mathbf{b}\|_2 = \|\mathbf{a}\|_2 + \|\mathbf{b}\|_2$ only if $\mathbf{a}$ and $\mathbf{b}$ are perfectly orthogonal. If we somehow had a net $\mathcal{N}_\varepsilon$ such that $\mathbf{y}_1,\ldots,\mathbf{y}_\infty$ were orthogonal, then the triangle inequality proof would be tight.

However, that's trivially not the case here. For instance, by the pigeonhole principle we guarantee that $\mathbf{y}_1,\ldots,\mathbf{y}_\infty$ has infinitely many repeated vectors, since $\left|\mathcal{N}_\varepsilon\right|<\infty$ , and so those repeated vectors are deeply non-orthogonal.

So, we need to find a new way to preserve $\|\mathbf{x}\|$ in terms of $\mathbf{y}_1,\ldots,\mathbf{y}_\infty$ , and that approach follows by examining the unique properties of the $\ell_2$ norm, namely that

\|\mathbf{a}-\mathbf{b}\|_2^2 = \|\mathbf{a}\|_2^2 + \|\mathbf{b}\|_2^2 - 2\mathbf{a}^\intercal\mathbf{b}

Or, equivalently,

\mathbf{a}^\intercal\mathbf{b} = \frac12 \left(\|\mathbf{a}\|_2^2 + \|\mathbf{b}\|_2^2 - \|\mathbf{a}-\mathbf{b}\|_2^2\right)

We then can preserve all three terms on the right to relative error by union bounding JL over $\mathbf{a}$ , $\mathbf{b}$ , and $\mathbf{a}-\mathbf{b}$ , and So, we can expand $\|\mathbf{x}\|_2^2 = (\sum_i \alpha_i \mathbf{y}_i)^\intercal(\sum_i \alpha_i \mathbf{y}_i)$ as a large sum of inner products, preserve all the corresponding norms by JL, and recover a relative error guarantee with a coarser net and $(1+\varepsilon)$ -accurate JL.

Let $\varepsilon_0 = \frac{\varepsilon}{24}$ . Let $\mathcal{N}_2$ be a $\frac12$ -Net for the unit ball, so that $\left|\mathcal{N}_2\right| \leq 12^d$ . We union bound JL over all pairs $\mathbf{y},\mathbf{y}'\in\mathcal{N}_2$ , so that both $\|\boldsymbol{\Pi}\mathbf{y}\|_2\in(1\pm\varepsilon_0)$ and $\|\boldsymbol{\Pi}(\mathbf{y}-\mathbf{y}')\|_2\in(1\pm\varepsilon_0)\|\mathbf{y}-\mathbf{y}'\|_2$ hold for all $\mathbf{y},\mathbf{y}'\in\mathcal{N}_2$ . This requires sketching dimension $k = \Omega(\frac{\log(\left|\mathcal{N}_2\right|/\delta)}{\varepsilon_0^2}) = \Omega(\frac{d + \log(1/\delta)}{\varepsilon^2})$ .

Let $\mathbf{x}$ be any vector with $\|\mathbf{x}\|_2=1$ , and let $\mathbf{x} = \sum_{i=0}^\infty \alpha_i \mathbf{y}_i$ be its Net Expansion. Then, we have

\begin{aligned} \|\boldsymbol{\Pi}\mathbf{x}\|_2^2 &= (\textstyle{\sum_{i=0}^\infty} \alpha_i \boldsymbol{\Pi}\mathbf{y}_i)^\intercal (\textstyle{\sum_{j=0}^\infty} \alpha_j \boldsymbol{\Pi}\mathbf{y}_j) \\ &= \sum_{i=0}^\infty \sum_{j=0}^\infty \alpha_i \alpha_j (\boldsymbol{\Pi}\mathbf{y}_i)^\intercal(\boldsymbol{\Pi}\mathbf{y}_j) \end{aligned}

We then bound the accuracy of this inner product, using the fact that

(1+\varepsilon_0)^2 \leq 1+3\varepsilon_0

for

\varepsilon_0 \leq 1

\begin{aligned} (\boldsymbol{\Pi}\mathbf{y}_i)^\intercal(\boldsymbol{\Pi}\mathbf{y}_j) &= \frac12 \left( \|\boldsymbol{\Pi}\mathbf{y}_i\|_2^2 + \|\boldsymbol{\Pi}\mathbf{y}_j\|_2^2 - \|\boldsymbol{\Pi}(\mathbf{y}_i-\mathbf{y}_j)\|_2^2 \right) \\ &\leq \frac12 \left( \|\mathbf{y}_i\|_2^2 + \|\mathbf{y}_j\|_2^2 - \|\mathbf{y}_i-\mathbf{y}_j\|_2^2 \right) + \frac{3\varepsilon_0}2 \left( \|\mathbf{y}_i\|_2^2 + \|\mathbf{y}_j\|_2^2 + \|\mathbf{y}_i-\mathbf{y}_j\|_2^2 \right) \\ &\leq \frac12 \left( \|\mathbf{y}_i\|_2^2 + \|\mathbf{y}_j\|_2^2 - \|\mathbf{y}_i-\mathbf{y}_j\|_2^2 \right) + \frac{3\varepsilon_0}2 \left( 1 + 1 + 2 \right) \\ &= \mathbf{y}_i^\intercal\mathbf{y}_j + 6\varepsilon_0 \end{aligned}

With the matching lower bound following from

(1-\varepsilon_0)^2 \geq 1-3\varepsilon_0

. So the subspace embedding error grows as

\begin{aligned} \|\boldsymbol{\Pi}\mathbf{x}\|_2^2 &= \sum_{i=0}^\infty \sum_{j=0}^\infty \alpha_i \alpha_j (\boldsymbol{\Pi}\mathbf{y}_i)^\intercal(\boldsymbol{\Pi}\mathbf{y}_j) \\ &\leq \sum_{i=0}^\infty \sum_{j=0}^\infty \alpha_i \alpha_j (\mathbf{y}_i^\intercal\mathbf{y}_j + 6\varepsilon_0) \\ &\leq \|\mathbf{x}\|_2^2 + \sum_{i=0}^\infty \sum_{j=0}^\infty \alpha_i \alpha_j 6\varepsilon_0 \\ &\leq 1 + 6\varepsilon_0 \sum_{i=0}^\infty \sum_{j=0}^\infty 2^{-i} 2^{-j} \\ &= 1 + 24\varepsilon_0 \end{aligned}

And the lower bound similarly is

\|\boldsymbol{\Pi}\mathbf{x}\|_2^2 \geq 1 - 24\varepsilon_0

. So, we have

\|\boldsymbol{\Pi}\mathbf{x}\|_2 \leq \sqrt{1+24\varepsilon_0} \leq 1+24\varepsilon_0 = 1+\varepsilon

and

\|\boldsymbol{\Pi}\mathbf{x}\|_2 \geq \sqrt{1-24\varepsilon_0} \leq 1-24\varepsilon_0 = 1-\varepsilon

(see this on Desmos), which completes the proof.

Bibliography

Johnson and Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics 1984.
Musco. Lecture 11: Approximate regression, $\varepsilon$ -nets, and faster JL embeddings. Lecture Notes, 2018.

Subspace Embedding via ε\varepsilonε-Nets

Theorem 1: Subspace Embedding Theorem

Building a Net

Theorem 2: ℓ2\ell_2ℓ2​ Net Size Theorem

Net Expansion of a Vector

Lemma 1: Net Expansion of a Vector Lemma

Rounding via Triangle Inequality

Rounding via Inner Products

See Also

Bibliography

Subspace Embedding via $\varepsilon$ -Nets

Theorem 1: Subspace Embedding
Theorem

Theorem 2: $\ell_2$ Net Size
Theorem

Lemma 1: Net Expansion of a Vector
Lemma