Proof. This is proven in two steps. First, we show that the inner product characterization upper bounds the max characterization. Then, we show a matching lower bound. For simplicity, we prove this for full-rank .
Before getting started, we note a useful equation:
Upper Bound: To create the upper bound, we relate Where the inequality used is the Cauchy-Schwarz Inequality: . We can then give an upper bound to the max characterization:
Lower Bound: For the lower bound, we just plug into the max characterization: Which completes the proof.
Proof. For simplicity, we assume that is both full-rank and tall-and-skinny.
Notice this minimization problem a minimum-norm underdetermined least-squares problem, with known solution . So, we know that
where the second equality is shown in the start to the proof of Lemma 1, and the last equality is Lemma 1.
Each leverage score has
If is full-rank, then
If , then
If with has the same columnspace as , then where is the row of .
The sum of leverages is
Proof.
Point 1 follows directly from the Max Characterization (Definition 1).
Point 2 also follows form the Max Characterization. In particular, for any we can define . Since is invertible, the maximizing over all is equivalent to maximizing over all :
Point 3 follows from the Inner Product Characterization:
Point 4 follows from Points 2 and 3.
Point 5 follows from Point 4: Since every has an SVD , we can find an orthogonal that satisfied Point 4. Let denote the row of and let denote the column of . Then,
There are some intuitive implications from these bullet points:
The leverage scores of depend only on the range of . Any other matrix with the same column space has the same leverage scores.
If is the economic QR decomposition of , and if are the rows of , then .
The proofs above are most closely related to their highly generalized analysis in Theorem 5 from Avron et al. (2019).
There are many extensions of leverage scores in the literature:
(Cohen et al. (2017)) Ridge Leverage Scores, which relate to Ridge Regression.
(Alaoui Mahoney (2015)) Kernel Ridge Leverage Scores, which relate to Kernel Ridge Regression.
(Avron et al. (2017)) Kernel Ridge Leverage Function, which relates to Random Fourier Features for Kernel Ridge Regression.
(Chen Price (2019)) Leverage Function, which relates to Linear Operators instead of matrices. Sometimes called a Sensitvitiy Score.
(Avron et al. (2019)) Ridge Leverage Function, which relates to Linear Operators instead of matrices for Ridge Regression .
(Cohen Peng (2019)) Lewis Weights, which relate to norms instead of just norms.
Let me know if anything is missing
Alaoui and Mahoney. Fast Randomized Kernel Methods with Statistical Guarantees.. NIPS 2015.
Avron, Kapralov, Musco, Musco, Velingker, and Zandieh. Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees. ICML 2017.
Avron, Kapralov, Musco, Musco, Velingker, and Zandieh. A Universal Sampling Method for Reconstructing Signals with Simple Fourier Transforms. STOC 2019.
Chen and Price. Active Regression via Linear-Sample Sparsification. COLT 2019.
Cohen, Kapralov, Musco, and Musco. Input Sparsity Time Low-Rank Approximation via Ridge Leverage Score Sampling. SODA 2017.
Cohen and Peng. Row Sampling by Lewis Weights. STOC 2015.