Basic Properties of Leverage Scores

On this page, we will show that three different definitions of Leverage Scores are equivalent. As a small supplement, we also use two of these forms to bound the maximum value of a leverage score, and also to compute the sum of all leverage scores.

Definition 1: Max Characterization
Definition

Let

\boldsymbol{A}\in\mathbb{R}^{n \times d}

. Then the Maximum Characterization of the Leverage Score for row

i

\boldsymbol{A}

\tau_i(\boldsymbol{A}) \;{\vcentcolon=}\; \max_{\mathbf{x}\in\mathbb{R}^d} \frac{[\boldsymbol{A}\mathbf{x}]_i^2}{\|\boldsymbol{A}\mathbf{x}\|_2^2}

Lemma 1: Inner Product Characterization
Lemma

Let

\mathbf{a}_i

be the

i^{th}

row of

\boldsymbol{A}

. Then,

\tau_i(\boldsymbol{A}) = \mathbf{a}_i^\intercal(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i

Proof. This is proven in two steps. First, we show that the inner product characterization upper bounds the max characterization. Then, we show a matching lower bound. For simplicity, we prove this for full-rank $\boldsymbol{A}$ .

Before getting started, we note a useful equation: $\begin{aligned} \|\boldsymbol{A}(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i\|_2^2 &= (\mathbf{a}_i^\intercal(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\boldsymbol{A}^\intercal\boldsymbol{A}(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i) \\ &= (\mathbf{a}_i^\intercal(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i) \end{aligned}$

Upper Bound: To create the upper bound, we relate $\begin{aligned} [\boldsymbol{A}\mathbf{x}]_i^2 &= (\mathbf{a}_i^\intercal\mathbf{x})^2 \\ &= (\mathbf{a}_i^\intercal(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}(\boldsymbol{A}^\intercal\boldsymbol{A})\mathbf{x})^2 \\ &= ((\boldsymbol{A}(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i)^\intercal ~ (\boldsymbol{A}\mathbf{x}))^2 \\ &\leq \|\boldsymbol{A}(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i\|_2^2 \cdot \|\boldsymbol{A}\mathbf{x}\|_2^2 \\ &= (\mathbf{a}_i^\intercal(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i) \cdot \|\boldsymbol{A}\mathbf{x}\|_2^2 \end{aligned}$ Where the inequality used is the Cauchy-Schwarz Inequality: $(\mathbf{v}^\intercal\mathbf{y})^2\leq \|\mathbf{v}\|_2^2 \cdot \|\mathbf{y}\|_2^2$ . We can then give an upper bound to the max characterization:

\tau_i(\boldsymbol{A}) = \max_{\mathbf{x}\in\mathbb{R}^d} \frac{[\boldsymbol{A}\mathbf{x}]_i^2}{\|\boldsymbol{A}\|_2^2} \leq \max_{\mathbf{x}\in\mathbb{R}^d} \frac{(\mathbf{a}_i(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i) \cdot \|\boldsymbol{A}\mathbf{x}\|_2^2}{\|\boldsymbol{A}\mathbf{x}\|_2^2} = \mathbf{a}_i(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i

Lower Bound: For the lower bound, we just plug $\mathbf{x}=(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i$ into the max characterization: $\begin{aligned} \tau_i(\boldsymbol{A}) \geq \frac{[\boldsymbol{A}(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i]_i^2}{\|\boldsymbol{A}(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i\|_2^2} = \frac{(\mathbf{a}_i^\intercal(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i)^2}{\mathbf{a}_i^\intercal(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i} = \mathbf{a}_i^\intercal(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i \end{aligned}$ Which completes the proof.

\blacksquare \, \,

Lemma 2: Minimum Characterization
Lemma

Let

\mathbf{a}_i

be the

i^{th}

row of

\boldsymbol{A}

. Then

\tau_i(\boldsymbol{A}) = \min_{\mathbf{y}\in\mathbb{R}^n,\,\boldsymbol{A}^\intercal\mathbf{y}=\mathbf{a}_i} \|\mathbf{y}\|_2^2

Proof. For simplicity, we assume that $\boldsymbol{A}$ is both full-rank and tall-and-skinny.

Notice this minimization problem a minimum-norm underdetermined least-squares problem, with known solution $\mathbf{y}=\boldsymbol{A}(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i$ . So, we know that

\min_{\mathbf{y}\in\mathbb{R}^n,\,\boldsymbol{A}^\intercal\mathbf{y}=\mathbf{a}_i} \|\mathbf{y}\|_2^2 = \|\boldsymbol{A}(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i\|_2^2 = \mathbf{a}_i^\intercal(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i = \tau_i(\boldsymbol{A})

where the second equality is shown in the start to the proof of Lemma 1, and the last equality is Lemma 1.

\blacksquare \, \,

Lemma 3: Basic Properties of $\tau_i(\boldsymbol{A})$
Lemma

Let

\boldsymbol{A}\in\mathbb{R}^{n \times d}

full-rank with

n \geq d

. Then,

Each leverage score has $\tau_i(\boldsymbol{A})\in[0,1]$
If $\boldsymbol{B}\in\mathbb{R}^{d \times d}$ is full-rank, then $\tau_i(\boldsymbol{A}\boldsymbol{B}) = \tau_i(\boldsymbol{A})$
If $\boldsymbol{A}^\intercal\boldsymbol{A}=\boldsymbol{I}$ , then $\tau_i(\boldsymbol{A})=\|\mathbf{a}_i\|_2^2$
If $\boldsymbol{U}\in\mathbb{R}^{n \times d}$ with $\boldsymbol{U}^\intercal\boldsymbol{U}=\boldsymbol{I}$ has the same columnspace as $\boldsymbol{A}$ , then $\tau_i(\boldsymbol{A})=\|\mathbf{u}_i\|_2^2$ where $\mathbf{u}_i$ is the $i^{th}$ row of $\boldsymbol{U}$ .
The sum of leverages is $\sum_{i=1}^n \tau_i(\boldsymbol{A})=d$

Proof.

Point 1 follows directly from the Max Characterization (Definition 1).

Point 2 also follows form the Max Characterization. In particular, for any $\mathbf{x}\in\mathbb{R}^{d}$ we can define $\mathbf{y}\;{\vcentcolon=}\;\boldsymbol{B}\mathbf{x}$ . Since $\boldsymbol{B}$ is invertible, the maximizing over all $\mathbf{x}\in\mathbb{R}^d$ is equivalent to maximizing over all $\mathbf{y}\in\mathbb{R}^d$ :

\tau_i(\boldsymbol{A}\boldsymbol{B}) = \max_{\mathbf{x}\in\mathbb{R}^d} \frac{[\boldsymbol{A}\boldsymbol{B}\mathbf{x}]_i^2}{\|\boldsymbol{A}\boldsymbol{B}\mathbf{x}\|_2^2} = \max_{\mathbf{y}\in\mathbb{R}^d} \frac{[\boldsymbol{A}\mathbf{y}]_i^2}{\|\boldsymbol{A}\mathbf{y}\|_2^2} = \tau_i(\boldsymbol{A})

Point 3 follows from the Inner Product Characterization:

\tau_i(\boldsymbol{A}) = \mathbf{a}_i^\intercal(\boldsymbol{A}^\intercal\boldsymbol{A})^{-1}\mathbf{a}_i = \mathbf{a}_i^\intercal\mathbf{a}_i = \|\mathbf{a}_i\|_2^2

Point 4 follows from Points 2 and 3.

Point 5 follows from Point 4: Since every $\boldsymbol{A}$ has an SVD $\boldsymbol{A}=\boldsymbol{U}\boldsymbol{\Sigma}\boldsymbol{V}^\intercal$ , we can find an orthogonal $\boldsymbol{U}$ that satisfied Point 4. Let $\mathbf{u}_i$ denote the $i^{th}$ row of $\boldsymbol{U}$ and let $\hat{\mathbf{u}}_j$ denote the $j^{th}$ column of $\boldsymbol{U}$ . Then,

\sum_{i=1}^n \tau_i(\boldsymbol{A}) = \sum_{i=1}^n \|\mathbf{u}_i\|_2^2 = \sum_{i=1}^n \sum_{j=1}^d \boldsymbol{U}_{ij}^2 = \sum_{j=1}^d \|\hat{\mathbf{u}}_j\|_2^2 = d

\blacksquare \, \,

There are some intuitive implications from these bullet points:

The leverage scores of $\boldsymbol{A}$ depend only on the range of $\boldsymbol{A}$ . Any other matrix with the same column space has the same leverage scores.
If $\boldsymbol{A}=\boldsymbol{Q}\boldsymbol{R}$ is the economic QR decomposition of $\boldsymbol{A}$ , and if $\mathbf{q}_1,\ldots,\mathbf{q}_n$ are the rows of $\boldsymbol{Q}$ , then $\tau_i(\boldsymbol{A})=\|\mathbf{q}\|_2^2$ .

Bibliography

Alaoui and Mahoney. Fast Randomized Kernel Methods with Statistical Guarantees.. NIPS 2015.
Avron, Kapralov, Musco, Musco, Velingker, and Zandieh. Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees. ICML 2017.
Avron, Kapralov, Musco, Musco, Velingker, and Zandieh. A Universal Sampling Method for Reconstructing Signals with Simple Fourier Transforms. STOC 2019.
Chen and Price. Active Regression via Linear-Sample Sparsification. COLT 2019.
Cohen, Kapralov, Musco, and Musco. Input Sparsity Time Low-Rank Approximation via Ridge Leverage Score Sampling. SODA 2017.
Cohen and Peng. $\ell_p$ Row Sampling by Lewis Weights. STOC 2015.

Basic Properties of Leverage Scores

Definition 1: Max Characterization Definition

Lemma 1: Inner Product Characterization Lemma

Lemma 2: Minimum Characterization Lemma

Lemma 3: Basic Properties of τi(A)\tau_i(\boldsymbol{A})τi​(A) Lemma

See Also

Bibliography

Definition 1: Max Characterization
Definition

Lemma 1: Inner Product Characterization
Lemma

Lemma 2: Minimum Characterization
Lemma

Lemma 3: Basic Properties of $\tau_i(\boldsymbol{A})$
Lemma