Introduction
Consider the problem of estimating the trace of a PSD matrix from a small number of matrix-vector products. There's a variety of algorithms which achieve this goal using merely matrix-vector products. Here, we will show a succint proof of the complementary lower bound: that matrix-vector products is necessary in the worst case.
Theorem 1: Trace Estimation Lower Bound Theorem
To prove this lower bound, we will need two ingredients: the hidden Wishart theorem, and the conditional expectation.
The Hidden Wishart Theorem
We will rely on the following remarkable result, whose proof we omit. This is the core theorem that enables this entire lower bound technique.
Theorem 2: Hidden Wishart Theorem Theorem
Then, there exists a matrix and orthogonal matrix , each constructed deterministically from the queries , such that
where and has iid entries independent of .
A succinct proof of Theorem 2 can be found in Appendix B.1 of Amsel et al. (2026).
The matrices and follow the Wishart distribution, hence the name of the theorem. The result shows that after matrix-vector products, there remains a large random component of the matrix which is completely independent of the algorithm's matrix-vector queries. This is the titular "hidden Wishart".
Notice that this theorem has robbed the matrix-vector algorithm of any and all agency – no matter how the method chooses its (possibly adaptive) queries, there is always a large random component of the matrix which it has no information about. Since we can characterize a large part of that the algorithm has no information about at all, we can use simple statistical tools to prove Theorem 1.
A simple statistical observation
In Theorem 1, we care about minimizing the mean squared error of our algorithms given some data about (namely, a sequence of matrix-vector products). Classical statistics tells us that the best possible algorithm for this goal is the conditional expectation:
Lemma 1: Conditional Expectation Minimizes MSE Lemma
and this lower bound is achieved by the conditional expectation .
Proof. By the tower rule,
For any fixed , the inner expectation is minimized by choosing . Depending on who you ask, this is either the definition of conditional expectation or a basic fact about it. Either way, we have
Taking the expectation over finished the proof.
Proof of Trace Estimation Lower Bound
From this result, we can now prove the theorem. It will be helpful to keep in mind that if is a chi-squared random variable with degrees of freedom, then and .
Proof. Let be defined as in Theorem 2. Note that . By Theorem 2, after matrix-vector products, there exists a decomposition
where is independent of the algorithm's queries. By Lemma 1, the lowest possible error any algorithm can achieve is . We note that
Since is independent of and , and since has a distribution with degrees of freedom, we have
So, any estimator that achieves root mean squared error at most must satisfy
Rearranging this inequality yields
Maximizing the right-hand side over gives the desired lower bound of .
See Also
There have been many papers that use the Hidden Wishart Theorem to prove matrix-vector complexity lower bounds. The proof here is a special case of an analysis in (Meyer Avron (2023)).
Other relevant papers to this article include:
Braverman et al. (2020) Introduces the hidden Wishart method to prove lower bounds for linear regression and eigenvalue estimation.
Meyer et al. (2021) Has the first optimal lower bounds for trace estimation, but uses more complex methods.
Amsel et al. (2026) Gives a succinct proof of the hidden Wishart theorem.
Jiang et al. (2021) Sharpens the analysis above to get a nearly tight dependence on failure probability.
Amsel et al. (2025) Uses the hidden Wishart method to prove lower bounds for learning the diagonal of a matrix.
Let me know if anything is missing. I'm sure many papers are missing.
References
Braverman, Hazan, Simchowitz, and Woodworth. The gradient complexity of linear regression. COLT 2020.
Jiang, Pham, Woodruff, and Zhang. Optimal Sketching for Trace Estimation. NeurIPS 2021.
Meyer, Musco, Musco, and Woodruff. Hutch++: Optimal stochastic trace estimation. SOSA 2021.
Meyer and Avron. Hutchinson's Estimator is Bad at Kronecker-Trace-Estimation. preprint 2023.
Amsel, Avi, Chen, Keles, Hegde, Musco, Musco, and Persson. Query Efficient Structured Matrix Learning. preprint 2025.
Amsel, Chen, Keles, Halikias, Musco, and Musco. Fixed-sparsity matrix approximation from matrix-vector products. SIMAX 2026.