Limits on the Universal Method for Matrix Multiplication

Authors Josh Alman,
Plaintext
                           T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30
                                        www.theoryofcomputing.org

                                        S PECIAL ISSUE : CCC 2019



                  Limits on the Universal Method
                     for Matrix Multiplication
                                                     Josh Alman∗
                   Received July 24, 2019; Revised July 19, 2020; Published September 3, 2021




       Abstract. We prove limitations on the known methods for designing matrix multiplication
       algorithms. Alman and Vassilevska Williams (FOCS’18) recently defined the Universal
       Method, which generalizes all the known approaches, including Strassen’s Laser Method (J.
       reine angew. Math., 1987) and Cohn and Umans’s Group Theoretic Method (FOCS’03). We
       prove concrete lower bounds on the algorithms one can design by applying the Universal
       Method to many different tensors. Our proofs use new tools to give upper bounds on the
       asymptotic slice rank of a wide range of tensors. Our main result is that the Universal
       Method applied to any Coppersmith–Winograd tensor CWq cannot yield a bound on ω, the
       exponent of matrix multiplication, better than 2.16805. It was previously known (Alman and
       Vassilevska Williams, FOCS’18) that the weaker “Galactic Method” applied to CWq could
       not achieve an exponent of 2.
            We also study the Laser Method (which is a special case of the Universal Method) and
       prove that it is “complete” for matrix multiplication algorithms: when it applies to a tensor
       T , it achieves ω = 2 if and only if it is possible for the Universal Method applied to T to
       achieve ω = 2. Hence, the Laser Method, which was originally used as an algorithmic tool,
    A conference version of this paper appeared in the 34th Computational Complexity Conference, 2019 [1].
   ∗ Supported in part by NSF grants CCF-1651838 and CCF-1741615. The work reported here was done while the author was

a Ph. D. student at MIT.


ACM Classification: F.2.2
AMS Classification: 68Q17
Key words and phrases: matrix multiplication, algebraic complexity


© 2021 Josh Alman
c b Licensed under a Creative Commons Attribution License (CC-BY)                           DOI: 10.4086/toc.2021.v017a001
                                              J OSH A LMAN

      can also be seen as a lower-bound tool for a large class of algorithms. For example, in their
      landmark paper, Coppersmith and Winograd (J. Symbolic Computation, 1990) achieved a
      bound of ω ≤ 2.376, by applying the Laser Method to CWq . By our result, the fact that they
      did not achieve ω = 2 implies a lower bound on the Universal Method applied to CWq .


1    Introduction
One of the biggest open questions in computer science asks how quickly one can multiply two matrices.
Progress on this problems is measured by giving bounds on ω, the exponent of matrix multiplication,
defined as the smallest real number such that two n × n matrices over a field can be multiplied using
nω+o(1) field operations. Since Strassen’s breakthrough algorithm [30] showing that ω ≤ log2 (7) ≈ 2.81,
there has been a long line of work, resulting in the current best bound of ω ≤ 2.37286 [36, 26, 4], and it
is popularly conjectured that ω = 2 (see, e. g., [19, 15, 17, 5] which present conjectures that would imply
ω = 2).
    The key to Strassen’s algorithm is an algebraic identity showing how 2 × 2 × 2 matrix multiplica-
tion can be computed surprisingly efficiently (in particular, Strassen showed that the 2 × 2 × 2 matrix
multiplication tensor has rank at most 7; see Section 2 for precise definitions). Arguing about the ranks
of larger matrix multiplication tensors has proven to be quite difficult—in fact, even the rank of the
3 × 3 × 3 matrix multiplication tensor isn’t currently known. Progress on bounding ω since Strassen’s
algorithm has thus taken the following approach: Pick a tensor (trilinear form) T , typically not a matrix
multiplication tensor, such that
    • Powers T ⊗n of T can be “efficiently computed” (i. e., T has low asymptotic rank), and

    • T is useful for performing matrix multiplication, since large matrix multiplication tensors can be
      “embedded” within powers of T .
Combined, these give an upper bound on the rank of matrix multiplication itself, and hence ω.
    The most general type of embedding which is known to preserve the ranks of tensors as required for
the above approach is a degeneration. In [3], Vassilevska Williams and the author called this method of
taking a tensor T and finding the best possible degeneration of powers T ⊗n into matrix multiplication
tensors the Universal Method applied to T , and the best bound on ω which can be proved in this way
is written ωu (T ). They also defined two weaker methods: the Galactic Method applied to T , in which
the “embedding” must be a more restrictive monomial degeneration, resulting in the bound ωg (T ) on ω,
and the Solar Method applied to T , in which the “embedding” must be an even more restrictive zeroing
out, resulting in the bound ωs (T ) on ω. Since monomial degenerations and zeroing outs are successively
more restrictive types of degenerations, we have that for all tensors T ,

                                     ω ≤ ωu (T ) ≤ ωg (T ) ≤ ωs (T ).

    These methods are very general; there are no known methods for computing ωu (T ), ωg (T ), or ωs (T )
for a given tensor T , and these quantities are even unknown for very well-studied tensors T . The two
main approaches to designing matrix multiplication algorithms are the Laser Method of Strassen [32]
and the Group-Theoretic Method of Cohn and Umans [16]. Both of these approaches show how to give

                       T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                              2
                      L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

upper bounds on ωs (T ) for particular structured tensors T (and hence upper bound ω itself). In other
words, they both give ways to find zeroing outs of tensors into matrix multiplication tensors, but not
necessarily the best zeroing outs. In fact, it is known that the Laser Method does not always give the best
zeroing out for a particular tensor T , since the improvements from [19] to later works [21, 36, 26, 4] can
be seen as giving slight improvements to the Laser Method to find better and better zeroing outs.1 The
Group-Theoretic Method, like the Solar Method, is very general, and it is not clear how to optimally
apply it to a particular group or family of groups.
    All of the improvements on bounding ω for the past 30+ years have come from studying the
Coppersmith–Winograd family of tensors {CWq }q∈N . The Laser Method applied to powers of CW5
gives the bound ωs (CW5 ) ≤ 2.3729. The Group-Theoretic Method can also prove the best known bound
ω ≤ 2.3729, by simulating the Laser Method analysis of CWq (see, e. g., [2] for more details). Despite a
long line of work on matrix multiplication, there are no known tensors2 which seem to come close to
achieving the bounds one can obtain using CWq . This leads to the first main question of this paper:
Question 1.1. How much can we improve our bound on ω using a more clever analysis of the
Coppersmith–Winograd tensor?
   Vassilevska Williams and the author [3] addressed this question by showing that there is a constant
c > 2 so that for all q, ωg (CWq ) > c. In other words, the Galactic Method (monomial degenerations)
cannot be used with CWq to prove ω = 2. However, this leaves open a number of important questions:
How close to 2 can we get using monomial degenerations; could it be that ωg (CWq ) ≤ 2.1? Perhaps more
importantly, what if we are allowed to use arbitrary degenerations; could it be that ωu (CWq ) ≤ 2.1, or
even ωu (CWq ) = 2?
   The second main question of this paper concerns the Laser Method. The Laser Method computes an
upper bound on ωs (T ) for any tensor T with certain structure (which we describe in detail in Section 5),
and has led to every improvement on ω since its introduction by Strassen [32].
Question 1.2. When the Laser Method applies to a tensor T , how close does it come to optimally
analyzing T ?
    As discussed, we know the Laser Method does not always give a tight bound on ωs (T ). For instance,
Coppersmith–Winograd [19] applied the Laser Method to CWq to prove ωs (CWq ) ≤ 2.376, and then later
work [21, 36, 26, 4] analyzed higher and higher powers of CWq to show ωs (CWq ) ≤ 2.373. Ambainis,
Filmus and Le Gall [6] showed that analyzing higher and higher powers of CWq itself with the Laser
Method cannot yield an upper bound better than ωs (CWq ) ≤ 2.3725. What about for other tensors? Could
there be a tensor such that applying the Laser Method to T yields ωs (T ) ≤ c for some c > 2, but applying
the Laser Method to high powers T ⊗n of T yields ωs (T ) = 2? Could applying an entirely different
method to such a T , using arbitrary degenerations and not just zeroing outs, show that ωu (T ) = 2?

1.1    Our results
We give strong resolutions to both Question 1.1 and Question 1.2.
    1 These works apply the Laser Method to higher powers of the tensor T = CW , a technique which is still captured by the
                                                                                    q
Solar Method.
    2 Vassilevska Williams and the author [3] study a generalization of CW which can tie the best known bound, but its analysis
                                                                           q
is identical to that of CWq . Our lower bounds in this paper will apply equally well to this generalized class as to CWq itself.


                            T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                              3
                                               J OSH A LMAN

Universal Method lower bounds. To resolve Question 1.1, we prove a new lower bound for the
Coppersmith–Winograd tensor:

Theorem 1.3. ωu (CWq ) ≥ 2.16805 for all q.

     In other words, no analysis of CWq , using any techniques within the Universal Method, can prove a
bound on ω better than 2.16805. This generalizes the main result of [3] from the Galactic Method to the
Universal Method, and gives a more concrete lower bound, increasing the bound from “a constant greater
than 2” to 2.16805. We also give stronger lower bounds for particular tensors in the family. For instance,
for the specific tensor CW5 which yields the current best bound on ω, we show ωu (CW5 ) ≥ 2.21912...
     Our techniques for proving lower bounds on ωu (T ) apply to CWq as well as many other tensors of
interest. We thus prove many other lower bounds, including: the same lower bound ωu (CWq,σ ) ≥ 2.16805
for any generalized Coppersmith–Winograd tensor CWq,σ as introduced in [3], a similar lower bound for
cwq,σ , the generalized “simple” Coppersmith–Winograd tensor missing its “corner terms,” and a lower
bound for Tq , the structural tensor of the cyclic group Cq , matching the lower bounds obtained by [2, 9].
In Section 4 we give tables of our precise lower bounds for these and other tensors. Furthermore, we will
show that our new tools for proving lower bounds on ωu (T ) are able to prove at least as high a lower
bound as the tools of [3] can prove on ωg (T ). In other words, all of those previously known Galactic
Method lower bounds hold for the Universal Method as well.
     We also show how our lower-bound techniques can be used to study other properties of tensors.
Coppersmith and Winograd [19] introduced the notion of the value Vτ (T ) of a tensor T , which is useful
when applying the Laser Method to a larger tensor T 0 which contains T as a subtensor. We show how
our slice rank lower-bound tools yield a tight upper bound on the value of t112 , the notorious subtensor
of CWq⊗2 which arises when applying the Laser Method to powers of CWq . Although the value Vτ (t112 )
appears in every analysis of CWq since [19], including [21, 36, 26, 4, 25, 27], the best lower bound on it
has not improved since [19], and our new upper bound here helps explain why. See Subsection 2.5 and
Subsection 4.4 for more details.
     We briefly note that our lower bound of 2.16805 > 2 + 61 in Theorem 1.3 may be significant when
compared to the recent algorithm of Cohen, Lee and Song [14] which solves n-variable linear programs
in time about O(nω + n2+1/6 ).

The Laser Method is “complete.” We next study in more detail how our results apply to tensors which
we call laser-ready tensors. These are the tensors to which the Laser Method (as used by [19] on CWq )
applies; see Definition 5.1 for the precise definition. Tensors need certain structure to be laser-ready, but
tensors T with this structure are essentially the only ones for which upper-bound techniques for ωu (T )
are known. In fact, every record-holding tensor in the history of matrix multiplication algorithm design
has been laser-ready.
    We will see below that many of the same properties of a tensor T which we use in this paper to prove
lower bounds on ωu (T ) are also used in the Laser Method to give upper bounds on ωs (T ) when T is
laser-ready. In particular, this connection will give an intriguing answer to Question 1.2:

Theorem 1.4. If T is a laser-ready tensor, and ωu (T ) = 2, then the Laser Method applied to T yields the
bound ωu (T ) = 2.

                        T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                               4
                    L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

    In other words: If T is any tensor to which the Laser Method applies (as in Definition 5.1), and the
Laser Method does not yield ω = 2 when applied to T , then in fact ωu (T ) > 2, and even the substantially
more general Universal Method applied to T cannot yield ω = 2. Hence, the Laser Method, which was
originally used as an algorithmic tool, can also be seen as a lower-bound tool. Conversely, Theorem 1.4
shows that the Laser Method is “complete,” in the sense that it cannot yield a bound on ω worse than 2
when applied to a tensor which is able to prove ω = 2.
    Theorem 1.4 explains and generalizes a number of phenomena:

      • The fact that Coppersmith–Winograd [19] applied the Laser Method to the tensor CWq and achieved
        an upper bound greater than 2 on ω implies that ωu (CWq ) > 2, and no arbitrary degeneration of
        powers of CWq can yield ω = 2.

      • As mentioned above, it is known that applying the Laser Method to higher and higher powers of a
        tensor T can successively improve the resulting upper bound on ω. Theorem 1.4 shows that if the
        Laser Method applied to the first power of any tensor T did not yield ω = 2, then this sequence of
        Laser Method applications (which is a special case of the Universal Method) must converge to a
        value greater than 2 as well. This generalizes the result of Ambainis, Filmus and Le Gall [6], who
        proved this about applying the Laser Method to higher and higher powers of the specific tensor
        T = CWq .

      • Our result also generalizes the result of Kleinberg, Speyer and Sawin [24] which studied the tensor
        Tqlower , the lower triangular part of Tq . Kleinberg, Speyer and Sawin studied the asymptotic slice
        rank of Tqlower , a quantity which will be featured prominently in our proofs below. They showed
        that (what can be seen as) the Laser Method achieves a tight lower bound on the asymptotic slice
        rank, matching an upper bound of Blasiak et al. [9]. Indeed, since Tqlower is a laser-ready tensor,
        our proof will also imply this.

1.2     Overview of our techniques
We give a brief overview of the techniques we use to prove our main results, Theorem 1.3 and Theorem 1.4.
All the technical terms we refer to here will be precisely defined in Section 2.

Section 2.6: Asymptotic slice rank and its connection with matrix multiplication. The tensors we
study are 3-tensors, which can be seen as trilinear forms over three sets X,Y, Z of formal variables. A key
notion in all our proofs will be the slice rank S(T ) of a tensor T . S(T ) is a measure of the complexity of
T , analogous to the rank of a matrix. The notion of slice rank was first introduced by Tao [34], and then
used by Blasiak et al. [9] in the context of lower bounds against the Group-Theoretic Method. In order to
study degenerations of powers of tensors, rather than just tensors themselves, we study in this paper the
asymptotic slice rank S̃(T ) of tensors T :

                                          S̃(T ) := sup S(T ⊗n )1/n .
                                                    n∈N

S̃ satisfies two key properties:

                         T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                              5
                                               J OSH A LMAN

   1. Degenerations cannot increase the asymptotic slice rank of a tensor. In other words, if A degenerates
      to B, then S̃(B) ≤ S̃(A).

   2. Matrix multiplication tensors have high asymptotic slice rank.

This means that if a certain tensor T has a small value of ωu (T ), or in other words, powers T ⊗n can
degenerate into large matrix multiplication tensors, then T itself must have large asymptotic slice rank.
Hence, in order to lower bound ωu (T ), it suffices to upper bound S̃(T ).

Section 3: Tools for proving upper bounds on asymptotic slice rank. In general, bounding S̃(T ) for
a tensor T can be much more difficult than bounding S(T ). This is because S can be supermultiplicative,
i. e., there are tensors A and B such that S(A) · S(B)  S(A ⊗ B). Indeed, we will show that S̃(T ) > S(T )
for many tensors T of interest, including T = CWq .
      We will give three new upper-bound tools for S̃(T ) for many tensors T . Each applies to tensors with
different properties:

   • Theorem 3.2: If T is over X,Y, Z, then it is straightforward to see that if one of the sets of variables
     is not too large, then S̃(T ) must be small: S̃(T ) ≤ min{|X|, |Y |, |Z|}. In this first tool, we show how
     if T can be written as a sum T = T1 + · · · + Tk of a few tensors, and each Ti does not have many of
     one type of variable, then we can still derive an upper bound on S̃(T ).

   • Theorem 3.4: The second tool concerns partitions of the sets of variables X,Y, Z. It shows that if
     S̃(T ) is large, then there is a probability distribution on the blocks of T (subtensors corresponding
     to a choice of one part from each of the three partitions) so that the total probability mass assigned
     to each part of each partition is proportional to its size. Loosely, this means that T must have many
     different “symmetries,” no matter how its variables are partitioned.

   • Theorem 3.9: Typically, for tensors A and B, even if S̃(A) and S̃(B) are “small,” it may still be the
     case that S̃(A + B) is large. This third tool shows that if A has an additional property, then one can
     still bound S̃(A + B). Roughly, the property that A must satisfy is that not only is S̃(A) small, but a
     related notion called the “x-rank” of A must also be small.

    In particular, we will remark that our three tools for bounding S̃(T ) strictly generalize similar tools
introduced by [3] for bounding I(T˜ ) (the asymptotic independence number of T , which is a weaker notion
than S̃(T ); it is sometimes also called the “galactic subrank” or the “monomial degeneration subrank”).
Hence, we generalize all of their results bounding ωg (T ) for various tensors T to at least as strong a
bound on ωu (T ). See Section 1.3 below for more about prior work on bounding S̃. It is less clear how to
compare our tools with this prior work beyond what we mention there, as prior work typically focuses on
different tensors or different fields; our bounds hold over every field.

Section 4: Universal Method lower bounds. We apply our tools to prove upper bounds on S̃(T ), and
hence lower bounds on ωu (T ), for a number of tensors T of interest. To prove Theorem 1.3, we show that
all three tools can be applied to CWq . We also apply our tools to many other tensors of interest including
the generalized Coppersmith–Winograd tensors CWq,σ , the generalized small Coppersmith–Winograd

                        T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                 6
                   L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

tensors cwq,σ , the structural tensor Tq of the cyclic group Cq as well as its “lower triangular version”
Tqlower , and the subtensor t112 of CWq⊗2 which arises in [19, 21, 36, 26, 4, 25, 27]. Throughout Section 4
we give many tables of concrete lower bounds that we prove for the tensors in all these different families.

Section 5: “Completeness” of the Laser Method. Finally, we study the Laser Method. The Laser
Method applied to a tensor T shows that powers T ⊗n can zero out into large matrix multiplication tensors.
Using the properties of S̃ that we prove in Section 2.6, we will show that the Laser Method can also be
applied to a tensor T to prove a lower bound on S̃(T ).
    We prove Theorem 1.4 by combining this construction with Theorem 3.4, one of our upper-bound tools
S̃(T ). Intuitively, both Theorem 3.4 and the Laser Method are concerned with probability distributions on
blocks of a tensor, and both involve counting the number of variables in powers T ⊗n that are consistent
with these distributions. We use this intuition to show that the upper bound given by Theorem 3.4 is equal
to the lower bound given by the Laser Method.
    This also implies that for any laser-ready tensor T , including CWq , cwq , Tq , and all the other tensors
we study in Section 4, our tools are tight, meaning they not only give an upper bound on S̃(T ), but they
also give a matching lower bound. Hence, for these tensors T , no better lower bound on ωu (T ) is possible
by arguing only about S̃(T ).
    Our proof of Theorem 1.4 also sheds light on a notion related to the asymptotic slice rank S̃(T ) of a
tensor T , called the asymptotic subrank Q̃(T ) of T . Q̃ is a “dual” notion of asymptotic rank, and it is
important in the definition of Strassen’s asymptotic spectrum of tensors [32].
    It is not hard to see (and follows, for instance, from Proposition 2.3 and Proposition 2.4 below)
that Q̃(T ) ≤ S̃(T ) for all tensors T . However, there are no known separations between the two notions;
whether there exists a tensor T such that Q̃(T ) < S̃(T ) is an open question. As a corollary of Theorem 1.4,
we prove:

Corollary 1.5. Every laser-ready tensor T has Q̃(T ) = S̃(T ).

Since, as discussed above, almost all of the most-studied tensors are laser-ready, this might help explain
why we have been unable to separate the two notions.

1.3   Other related work
Probabilistic tensors and support rank. Cohn and Umans [17] introduced the notion of the support
rank of tensors, and showed that upper bounds on the support rank of matrix multiplication tensors can
be used to design faster Boolean matrix multiplication algorithms. Recently, Karppa and Kaski [23] used
“probabilistic tensors” as another way to design Boolean matrix multiplication algorithms.
    In fact, our tools for proving asymptotic slice rank upper bounds can be used to prove lower bounds on
these approaches as well. For instance, our results imply that finding a “weighted” matrix multiplication
tensor as a degeneration of a power of CWq (in order to prove a support rank upper bound) cannot result
in a better exponent for Boolean matrix multiplication than 2.16805.
    This is because “weighted” matrix multiplication tensors can degenerate into independent tensors
just as large as their unweighted counterparts. Similarly, if a probabilistic tensor T is degenerated into
a (probabilistic) matrix multiplication tensor, Karppa and Kaski show that this gives a corresponding

                        T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                7
                                                       J OSH A LMAN

support rank expression for matrix multiplication as well, and so upper bounds on S̃(T ) for any T in the
support of T also result in lower bounds on this approach.


Rectangular matrix multiplication. Our tools can also be used to prove lower bounds on approaches
to designing rectangular matrix multiplication algorithms. For instance, the best known rectangular
matrix multiplication algorithms [27] show that powers of CWq zero out into large rectangular matrix
multiplication tensors. Using the fact that CWq is variable-symmetric, this implies a corresponding upper
bound on ωu (CWq ),3 against which our tools give a lower bound.


Slice Rank upper bounds. Our limitation results critically make use of upper bounds on the asymptotic
slice rank of CWq and other tensors of interest. Slice rank was first introduced by Tao [34] in a symmetric
formulation of the recent proof of the capset bound [20, 22], which shows how to prove slice rank
upper bounds using the “polynomial method.” Since then, a number of papers have focused on proving
slice rank upper bounds for many different tensors. Sawin and Tao [35, Proposition 6] show slice rank
upper bounds by studying the combinatorics of the support of the power of a fixed tensor, and Naslund
and Sawin [28] use that approach to study sunflower-free sets;4 one of our slice rank bounding tools,
Theorem 3.4, uses this type of approach applied to blocked tensors. Slice rank was first used in the context
of matrix multiplication by Blasiak et al. [9], and this line of work has led to more techniques for proving
slice rank upper bounds, including connections to the notion of instability from geometric invariant
theory [9], and a generalization of the polynomial method to the nonabelian setting [10]. Christandl et
al. [12] show how to characterize the asymptotic slice rank of tensors over C as the minimum value of a
“quantum functional.”


Concurrent work. Christandl, Vrana and Zuiddam [13] independently proved some of the same lower
bounds on ωu as us, including Theorem 1.3. Although we achieve the same upper bounds on ωu (T ) for a
number of tensors, our techniques seem different: we use simple combinatorial tools generalizing those
from our prior work [3], while their bounds use the seemingly more complicated machinery of Strassen’s
asymptotic spectrum of tensors [33], showing how Strassen’s powerful theory comes into play in this
setting. They thus phrase their results in terms of the asymptotic subrank Q̃(T ) of tensors rather than
the asymptotic slice rank S̃(T ), and the fact that their bounds are often the same as ours is related to
the fact we prove, in Corollary 1.5, that Q̃(T ) = S̃(T ) for all of the tensors we study; see the bottom of
Section 2.6 for a more technical discussion of the differences between the two notions. Our other results
and applications of our techniques are, as far as we know, entirely new, including our matching lower
bounds for S̃(CWq ), S̃(cwq ), and S̃(Tq ), bounding the value Vτ (T ) of tensors, and all our results about the
completeness of the Laser Method.

    3 If CW ⊗n degenerates to ha, b, ci, then by symmetry, CW ⊗3n degenerates to habc, abc, abci, yielding ω (CW ) ≤
             q                                                     q                                                u    q
3n log(R̃(CWq ))/ log(abc).
    4 In fact, the tensor T whose slice rank is bounded in [28, Section 3] can be viewed as a change of basis of a Generalized

Simple Coppersmith–Winograd tensor cwq,σ which we study below in Section 4.2.


                            T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                            8
                     L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

2     Preliminaries
We begin by introducing the relevant notions and notation related to tensors and matrix multiplication.
We use the same notions as in the standard literature (e. g., [31, 8, 11]), but using the notation from past
work on the Galactic and Universal Methods. In particular, we will use the same notation introduced in
[3, Section 3], and readers familiar with that paper may skip to Subsection 2.5.


2.1    Tensor basics
For sets X = {x1 , . . . , xq }, Y = {y1 , . . . , yr }, and Z = {z1 , . . . , zs } of formal variables, a tensor over X,Y, Z
is a trilinear form
                                                    T=            ∑             αi jk xi y j zk ,
                                                           xi ∈X,y j ∈Y,zk ∈Z

where the αi jk coefficients come from an underlying field F. The terms, which we write as xi y j zk , are
sometimes written as xi ⊗ y j ⊗ zk in the literature. We say T is minimal for X,Y, Z if, for each xi ∈ X,
there is a term involving xi with a nonzero coefficient in T , and similarly for Y and Z (i. e., T can’t be seen
as a tensor over a strict subset of the variables). We say that two tensors T1 , T2 are isomorphic, written
T1 ' T2 , if they are equal up to renaming variables.
      If T1 is a tensor over X1 ,Y1 , Z1 , and T2 is a tensor over X2 ,Y2 , Z2 , then the tensor product T1 ⊗ T2
is a tensor over X1 × X2 ,Y1 × Y2 , Z1 × Z2 such that, for any (x1 , x2 ) ∈ X1 × X2 , (y1 , y2 ) ∈ Y1 × Y2 , and
(z1 , z2 ) ∈ Z1 × Z2 , the coefficient of (x1 , x2 )(y1 , y2 )(z1 , z2 ) in T1 ⊗ T2 is the product of the coefficient of
x1 y1 z1 in T1 , and the coefficient of x2 y2 z2 in T2 . For any tensor T and positive integer n, the tensor power
T ⊗n is the tensor over X n ,Y n , Z n resulting from taking the tensor product of n copies of T .
      If T1 is a tensor over X1 ,Y1 , Z1 , and T2 is a tensor over X2 ,Y2 , Z2 , then the direct sum T1 ⊕ T2 is a tensor
over X1 t X2 , Y1 tY2 , Z1 t Z2 which results from forcing the sets of variables to be disjoint (as in a normal
disjoint union) and then summing the two tensors. For a nonnegative integer m and tensor T we write
m T for the disjoint sum of m copies of T .


2.2    Tensor rank
A tensor T has rank one if there are values ai ∈ F for each xi ∈ X, b j ∈ F for each y j ∈ Y , and ck ∈ F for
each zk ∈ Z, such that the coefficient of xi y j zk in T is ai b j ck , or in other words,
                                                                                      !                   !           !
                   T=           ∑             ai b j ck · xi y j zk =    ∑ ai xi               ∑ b jy j       ∑ ck zk .
                         xi ∈X,y j ∈Y,zk ∈Z                             xi ∈X                y j ∈Y           zk ∈Z


The rank of a tensor T , denoted R(T ), is the smallest number of rank one tensors whose sum (summing
the coefficient of each term individually) is T . It is not hard to see that for tensors T and positive integers
n, we always have R(T ⊗n ) ≤ R(T )n , but for some tensors T of interest this inequality is not tight. We
thus define the asymptotic rank of tensor T as R̃(T ) := infn∈N (R(T ⊗n ))1/n .

                            T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                           9
                                                         J OSH A LMAN

2.3     Matrix multiplication tensors
For positive integers a, b, c, the matrix multiplication tensor ha, b, ci is a tensor over {xi j }i∈[a], j∈[b] ,
{y jk } j∈[b],k∈[c] , {zki }k∈[c],i∈[a] given by

                                                              a   b   c
                                               ha, b, ci = ∑ ∑ ∑ xi j y jk zki .
                                                             i=1 j=1 k=1


It is not hard to verify that for positive integers a1 , a2 , b1 , b2 , c1 , c2 , we have ha1 , b1 , c1 i ⊗ ha2 , b2 , c2 i '
ha1 a2 , b1 b2 , c1 c2 i. The exponent of matrix multiplication, denoted ω, is defined as

                                            ω := inf 3 logabc (R(ha, b, ci)).
                                                   a,b,c∈N


Because of the tensor product property above, we can alternatively define ω in a number of ways:

                 ω = inf 3 logabc (R̃(ha, b, ci)) = inf logn R̃(hn, n, ni) = log2 (R̃(h2, 2, 2i)).
                       a,b,c∈N                                n∈N


For instance, Strassen [30] showed that R(h2, 2, 2i) ≤ 7, which implies that ω ≤ log2 (7).


2.4     Degenerations and the Universal Method
We now describe a very general way to transform from a tensor T1 over X1 ,Y1 , Z1 to a tensor T2 over
X2 ,Y2 , Z2 . For a formal variable λ , pick maps α : X1 × X2 → F[λ ], β : Y1 ×Y2 → F[λ ], and γ : Z1 × Z2 →
F[λ ], which map pairs of variables to polynomials in λ , and pick a nonnegative integer h. Then, when
you replace each x ∈ X1 with ∑x0 ∈X2 α(x, x0 )x0 , each y ∈ Y1 with ∑y0 ∈Y2 β (y, y0 )y0 , and each z ∈ Z1 with
∑z0 ∈Z2 γ(z, z0 )z0 , in T1 , then the resulting tensor T 0 is a tensor over X2 ,Y2 , Z2 with coefficients over F[λ ].
When T 0 is instead viewed as a polynomial in λ whose coefficients are tensors over X2 ,Y2 , Z2 with
                                                                                                0
coefficients in F, it must be that T2 is the coefficient of λ h , and the coefficient of λ h is 0 for all h0 < h.
     If such a transformation is possible, we say T2 is a degeneration of T1 . There are also two more
restrictive types of degenerations:

      • T2 is a monomial degeneration of T1 if such a transformation is possible where the polynomials in
        the ranges of α, β , γ have at most one monomial, and furthermore, for each x ∈ X1 or x0 ∈ X2 , there
        is at most one x0 ∈ X2 or x ∈ X1 , respectively, such that α(x, x0 ) 6= 0, and similarly for β and γ.5

      • T2 is a zeroing out of T1 if, in addition to the restrictions of a monomial degeneration, the ranges of
        α, β , γ must be {0, 1}.
   5 Some definitions of monomial degenerations, including the original by Strassen [32], do not have this second condition,

or equivalently, consider a monomial degeneration to be a “restriction” composed with what we defined here. The distinction
is not important for this paper, but we give this definition since it captures Strassen’s monomial degeneration from matrix
multiplication tensors to independent tensors [31] (see also Proposition 2.5 below), and it is the notion against which the earlier
paper [3] proved lower bounds.


                            T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                                10
                    L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

    Degenerations are useful in the context of matrix multiplication algorithms because degenerations
cannot increase the asymptotic rank of a tensor. In other words, if T2 is a degeneration of T1 , then
R̃(T2 ) ≤ R̃(T1 ) [7]. It is often hard to bound the rank of matrix multiplication tensors directly, so all
known approaches proceed by bounding the rank of a different tensor T and then showing that powers of
T degenerate into matrix multiplication tensors.
    More precisely, all known approaches fall within the following method, which we call the Universal
Method [3] applied to a tensor T of asymptotic rank R = R̃(T ): Consider all positive integers n, and all
ways to degenerate T ⊗n into a disjoint sum m    i=1 hai , bi , ci i of matrix multiplication tensors, resulting in
                                               L

an upper bound on ω by the asymptotic sum inequality [29] of ∑m            i=1 (ai bi ci )
                                                                                          ω/3 ≤ Rn . Then, ω (T ), the
                                                                                                            u
bound on ω from the Universal Method applied to T , is the inf over all such n and degenerations, of the
resulting upper bound on ω.
    In [3], two weaker versions of the Universal Method are also defined: the Galactic Method, in which
the degeneration must be a monomial degeneration, resulting in a bound ωg (T ), and the Solar Method,
in which the degeneration must be a zeroing out, resulting in a bound ωs (T ). To be clear, all three of
these methods are very general, and we don’t know the values of ωs (T ), ωg (T ), or ωu (T ) for almost
any nontrivial tensors T . In fact, all the known approaches to bounding ω proceed by giving upper
bounds on ωs (T ) for some carefully chosen tensors T ; the most successful has been the Coppersmith–
Winograd family of tensors T = CWq , which has yielded all the best known bounds on ω since the
80s [18, 21, 36, 26, 4]. Indeed, the two most successful approaches, the Laser Method [32] and the
Group-Theoretic Approach [16] ultimately use zeroing outs of tensors. We refer the reader to [3, Sections
3.3 and 3.4] for more details on these approaches and how they relate to the notions used here.

2.5    Tensor value
Coppersmith and Winograd [19] defined the value of a tensor in their analysis of the CWq tensor. For
a tensor T , and any τ ∈ [2/3, 1], the τ-value of T , denoted Vτ (T ), is defined as follows: Consider all
                                                                               Lq(σ )
positive integers n, and all ways σ to degenerate T ⊗n into a direct sum i=1 haσi , bσi , cσi i of matrix
multiplication tensors. Then, Vτ (T ) is given by
                                                                                 !1/n
                                                         q(σ )
                                      Vτ (T ) := sup     ∑     (aσi bσi cσi )τ          .
                                                  n,σ    i=1

We can then equivalently define ωu (T ) as the inf of ωu , over all ωu ∈ [2, 3] such that Vωu /3 (T ) ≥ R̃(T ).
We can see from the power mean inequality that Vτ (T ) ≥ V2/3 (T )3τ/2 for all τ ∈ [2/3, 1], although this
bound is often not tight as there can be better degenerations of T ⊗n depending on the value of τ.

2.6    Asymptotic slice rank
The main new notions we will need in this paper relate to the slice rank of tensors. We say a tensor T
over X,Y, Z has x-rank 1 if it is of the form
                                      !                 !
                      T=      ∑ αx · x ⊗ ∑ ∑ βy,z · y ⊗ z                =         ∑        αx βy,z · xyz
                             x∈X               y∈Y z∈Z                       x∈X,y∈Y,z∈Z


                         T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                     11
                                                 J OSH A LMAN

for some choices of the α and β coefficients over the base field. More generally, the x-rank of T , denoted
Sx (T ), is the minimum number of tensors of x-rank 1 whose sum is T . We can similarly define the y-rank,
Sy , and the z-rank, Sz . Then, the slice rank of T , denoted S(T ), is the minimum k such that there are
tensors TX , TY and TZ with T = TX + TY + TZ and Sx (TX ) + Sy (TY ) + Sz (TZ ) = k.
     Unlike tensor rank, the slice-rank is not submultiplicative in general, i. e., there are tensors A
and B such that S(A ⊗ B) > S(A) · S(B). For instance, it is not hard to see that S(CW5 ) = 3, but
since it is known [36, 26, 4] that ωs (CW5 ) ≤ 2.373, it follows (e. g., from Theorem 2.9 below) that
S(CWq⊗n ) ≥ 7n·2/2.373−o(n) ≥ 5.15n−o(n) . We are thus interested in the asymptotic slice rank, S̃(T ), of
tensors T , defined as
                                          S̃(T ) := sup[S(T ⊗n )]1/n .
                                                     n∈N

    We note a few simple properties of slice rank which will be helpful in our proofs:

Lemma 2.1. For tensors A and B:

(1) S(A) ≤ Sx (A) ≤ R(A),

(2) Sx (A ⊗ B) ≤ Sx (A) · Sx (B),

(3) S(A + B) ≤ S(A) + S(B), and Sx (A + B) ≤ Sx (A) + Sx (B),

(4) S(A ⊗ B) ≤ S(A) · max{Sx (B), Sy (B), Sz (B)}, and

(5) if A is a tensor over X,Y, Z, then Sx (A) ≤ |X| and hence S(A) ≤ min{|X|, |Y |, |Z|}.

Proof. (1) and (2) are straightforward. (3) follows since the sum of the slice rank (resp. x-rank) ex-
pressions for A and for B gives a slice rank (resp. x-rank) expression for A + B. To prove (4), let
m = max{Sx (B), Sy (B), Sz (B)}, and note that if A = AX + AY + AZ such that Sx (AX ) + Sy (AY ) + Sz (AZ ) =
S(A), then
                                   A ⊗ B = AX ⊗ B + AY ⊗ B + AZ ⊗ B,
and so

                        S(A ⊗ B) ≤ S(AX ⊗ B) + S(AY ⊗ B) + S(AZ ⊗ B)
                                    ≤ Sx (AX ⊗ B) + Sy (AY ⊗ B) + Sz (AZ ⊗ B)
                                    ≤ Sx (AX ) Sx (B) + Sy (AY ) Sy (B) + Sz (AZ ) Sz (B)
                                    ≤ Sx (AX )m + Sy (AY )m + Sz (AZ )m = S(A) · m.

    Finally, (5) follows since, for instance, any tensor with one only x-variable has x-rank 1.

    Asymptotic slice rank is interesting in the context of matrix multiplication algorithms because of the
following facts.

Definition 2.2. For a positive integer q, the independent tensor of size q, denoted hqi, is the tensor
 q
∑i=1 xi yi zi with q terms that do not share any variables.

                       T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                               12
                    L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

Proposition 2.3 ([35, Corollary 2]). If A and B are tensors such that A has a degeneration to B, then
S(B) ≤ S(A), and hence S̃(B) ≤ S̃(A).
Proposition 2.4 ([34, Lemma 1]; see also [9, Lemma 4.7]). For any positive integer q, we have S(hqi) =
S̃(hqi) = q, where hqi is the independent tensor of size q.
Proposition 2.5 ([32, Theorem 6.6]; see also [3, Lemma 4.2]). For any positive integers a, b, c, the matrix
multiplication tensor ha, b, ci has a (monomial) degeneration to an independent tensor of size at least
0.75 · abc/ max{a, b, c}.
Corollary 2.6. For any positive integers a, b, c, we have S̃(ha, b, ci) = abc/ max{a, b, c}.
Proof. Assume without loss of generality that c ≥ a, b. For any positive integer n, we have that
ha, b, ci⊗n ' han , bn , cn i has a degeneration to an independent tensor of size at least 0.75 · an bn , mean-
ing S(ha, b, ci⊗n ) ≥ 0.75 · an bn and hence S̃(ha, b, ci) ≥ (0.75)1/n ab, which means S̃(ha, b, ci) ≥ ab.
Meanwhile, ha, b, ci has ab different x-variables, so it must have Sx (ha, b, ci) ≤ ab and more generally,
S(ha, b, ci⊗n ) ≤ Sx (ha, b, ci⊗n ) ≤ (ab)n , which means S̃(ha, b, ci) ≤ ab.

    To summarize: we know that degenerations cannot increase asymptotic slice rank, and that matrix
multiplication tensors have a high asymptotic slice rank. Hence, if T is a tensor such that ωu (T ) is
“small,” meaning a power of T has a degeneration to a disjoint sum of many large matrix multiplication
tensors, then T itself must have “large” asymptotic slice rank. This can be formalized identically to [3,
Theorem 4.1 and Corollary 4.3] to show the following.
Theorem 2.7. For any tensor T ,
                                                                6
                                                                      −2
                                            S̃(T ) ≥ R̃(T ) ωu (T )        .
Corollary 2.8. For any tensor T , if ωu (T ) = 2, then S̃(T ) = R̃(T ). Moreover, for every constant s < 1,
there is a constant w > 2 such that every tensor T with S̃(T ) ≤ R̃(T )s must have ωu (T ) ≥ w.
      Almost all the tensors we consider in this note are variable-symmetric tensors, and for these tensors
T we can get a better lower bound on ωu (T ) from an upper bound on S̃(T ). We say that a tensor T over
X,Y, Z is variable-symmetric if |X| = |Y | = |Z|, and the coefficient of xi y j zk equals the coefficient of
x j yk zi in T for all (xi , y j , zk ) ∈ X ×Y × Z.
Theorem 2.9. For a variable-symmetric tensor T we have ωu (T ) ≥ 2 log(R̃(T ))/ log(S̃(T )).
Proof. As in the proof of [3, Theorem 4.1], by definition of ωu , we know that for every δ > 0, there
is a positive integer n such that T ⊗n has a degeneration to F ha, b, ci for integers F, a, b, c such that
ωu (T )1+δ ≥ 3 log(R̃(T )n /F)/ log(abc). In fact, since T is symmetric, we know T ⊗n also has a degen-
eration to F hb, c, ai and to F hc, a, bi, and so T ⊗3n has a degeneration to F 3 habc, abc, abci. As
above, it follows that S̃(T ⊗3n ) ≥ S̃(F 3 habc, abc, abci) = F 3 · (abc)2 . Rearranging, we see
                                            abc ≤ S̃(T )3n/2 /F 3/2 .
Hence,
                    log(R̃(T )n /F)       log(R̃(T )n /F)          log(R̃(T )) − 1n log(F)     log(R̃(T ))
   ωu (T )1+δ ≥ 3                   ≥3                         = 2               1
                                                                                           ≥ 2             ,
                       log(abc)        log(S̃(T )3n/2 /F 3/2 )     log(S̃(T )) − n log(F)      log(S̃(T ))

                        T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                   13
                                                          J OSH A LMAN

where the last step follows because R̃(T ) ≥ S̃(T ) and so subtracting the same quantity from both the
numerator and denominator cannot decrease the value of the fraction. This holds for all δ > 0 and hence
implies the desired result.

Slice rank versus subrank. For a tensor T , let Q0 (T ) denote the largest integer q such that there is
a degeneration from T to hqi. The asymptotic subrank of T is defined as Q̃(T ) := supn∈N Q0 (T ⊗n )1/n .
Proposition 2.3 and Proposition 2.4 above imply that Q̃(T ) ≤ S̃(T ) for all tensors T . Similarly, it is not
hard to see that Theorem 2.7 and Theorem 2.9 hold with S̃ replaced by Q̃. One could thus conceivably
hope to prove stronger lower bounds than those in this paper by bounding Q̃ instead of S̃. However, we
will prove in Corollary 5.5 below that Q̃(T ) = S̃(T ) for every tensor we study in this paper, so such an
improvement using Q̃ is impossible. More generally, there are currently no known tensors T for which
the best known upper bound on Q̃(T ) is smaller than the best known upper bound on S̃(T ) (including the
new bounds of [12, 13]; more precisely, [12] give tight upper bounds on asymptotic slice rank). Hence,
new upper-bound tools for Q̃ would be required for such an approach to proving better lower bounds on
ωu .

2.7     Partition notation
In several of our results, we will be partitioning the terms of tensors into blocks defined by partitions of
the three sets of variables. Here we introduce some notation for some properties of such partitions; these
definitions all depend on the particular partition of the variables being used, which will be clear from
context.
    Suppose T is a tensor minimal over X,Y, Z, and let X = X1 ∪ · · · ∪ XkX , Y = Y1 ∪ · · · ∪ YkY , Z =
Z1 ∪ · · · ∪ ZkZ be partitions of the three sets of variables. For (i, j, k) ∈ [kX ] × [kY ] × [kZ ], let Ti jk be T
restricted to Xi ,Y j , Zk (i. e., T with X \ Xi , Y \Y j , and Z \ Zk zeroed out), and let

                                    L = {Ti jk | (i, j, k) ∈ [kX ] × [kY ] × [kZ ], Ti jk 6= 0}.

Ti jk is called a block of T . For i ∈ [kX ] let LXi = {Ti j0 k0 ∈ L | ( j0 , k0 ) ∈ [kY ] × [kZ ]}, and define similarly LY j
and LZk .
      We will be particularly interested in probability distributions p : L → [0, 1]. Let P(L) be the set of
such distributions. For such a p ∈ P(L), and for i ∈ [kX ], let p(Xi ) := ∑Ti jk ∈LXi p(Ti jk ), and similarly p(Y j )
and p(Zk ). Then, define pX ∈ R by

                                                                 |Xi | p(Xi )
                                                                     
                                              pX := ∏                            ,
                                                     i∈[k ]
                                                               p(Xi )
                                                              X


and pY and pZ similarly. This expression, which arises naturally in the Laser Method, and is related to
the Shannon entropy of p, will play an important role in our upper bounds and lower bounds.

2.8     Tensor rotations and variable-symmetric tensors
If T is a tensor over X = {x1 , . . . , xq }, Y = {y1 , . . . , yr }, and Z = {z1 , . . . , zs }, then the rotation of T ,
denoted rot(T ), is the tensor over {x1 , . . . , xr }, {y1 , . . . , ys }, and {z1 , . . . , zq } such that for any (xi , y j , zk ) ∈

                             T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                                  14
                    L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

X × Y × Z, the coefficient of xi y j zk in T is equal to the coefficient of x j yk zi in rot(T ). Tensor T is
variable-symmetric if T ' rot(T ).
    If T is a variable-symmetric tensor minimal over X,Y, Z, then partitions X = X1 ∪ · · · ∪ XkX , Y =
Y1 ∪ · · · ∪YkY , Z = Z1 ∪ · · · ∪ ZkZ of the sets of variables are called T -symmetric if (using the notation of
the previous subsection) kX = kY = kZ , |Xi | = |Yi | = |Zi | for all i ∈ [kX ], and the block T jki ' rot(Ti jk )
for all (i, j, k) ∈ [kX ]3 . For the L resulting from such a T -symmetric partition, a probability distribution
p ∈ P(L) is called T -symmetric if it satisfies p(Ti jk ) = p(T jki ) for all (i, j, k) ∈ [kX ]3 , and we write
Psym (L) ⊆ P(L) for the set of such T -symmetric distributions. Notice in particular that any p ∈ Psym (L)
satisfies pX = pY = pZ .



3      Combinatorial tools for asymptotic upper bounds on slice rank

We now give three general tools for proving upper bounds on S̃(T ) for many tensors T . Each of our tools
generalizes one of the three main tools of [3], which were bounding the weaker notion I˜ instead of S̃, and
could also only apply to a more restrictive set of tensors. We will make clear what previous result we are
generalizing, although our presentation here is entirely self-contained.


3.1     Generalization of [3, Theorem 5.3]

We know from Lemma 2.1 part (5) that tensors T over X,Y, Z such that min{|X|, |Y |, |Z|} is small must
also have small S̃(T ). We begin here by showing that if T can be written as a sum of a few tensors, each
of which has a small bound on S̃ which can be proved in this way, then we can still prove an upper bound
on S̃(T ).
    The measure of a tensor T , denoted µ(T ), is given by µ(T ) := |X| · |Y | · |Z|, where X,Y, Z are the
sets of variables which are minimal for T . We state two simple facts about µ:

Fact 3.1. For tensors A and B,

      • µ(A ⊗ B) = µ(A) · µ(B), and

      • if A is minimal over X,Y, Z, then S(A) ≤ min{|X|, |Y |, |Z|} ≤ µ(A)1/3 .

Theorem 3.2. Suppose T is a tensor, and T1 , . . . , Tk are tensors with T = T1 + · · · + Tk . Then, S̃(T ) ≤
∑ki=1 (µ(Ti ))1/3 .

Proof. Note that
                                    T ⊗n =              ∑                   P1 ⊗ · · · ⊗ Pn .
                                             (P1 ,...,Pn )∈{T1 ,...,Tk }n


                         T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                  15
                                                         J OSH A LMAN

It follows that

                            S(T ⊗n ) ≤              ∑                   S(P1 ⊗ · · · ⊗ Pn )
                                         (P1 ,...,Pn )∈{T1 ,...,Tk }n

                                     ≤              ∑                   µ(P1 ⊗ · · · ⊗ Pn )1/3
                                         (P1 ,...,Pn )∈{T1 ,...,Tk }n

                                     =              ∑                   (µ(P1 ) · µ(P2 ) · · · µ(Pn ))1/3
                                         (P1 ,...,Pn )∈{T1 ,...,Tk }n

                                     = (µ(T1 )1/3 + · · · + µ(Tk )1/3 )n ,

which implies as desired that S̃(T ) ≤ µ(T1 )1/3 + · · · + µ(Tk )1/3 .

Remark 3.3. [3, Theorem 5.3], in addition to bounding I˜ instead of S̃, also required that T = T1 + · · · + Tk
be a partition of the terms of T . Here in Theorem 3.2 we are allowed any tensor sum, although in general
a partition minimizes the resulting upper bound.

3.2    Generalization of [3, Theorem 5.2]
This tool will be the most important in proving upper bounds on the asymptotic slice rank of many tensors
of interest. We show that a partitioning method similar to the Laser Method applied to a tensor T can be
used to prove upper bounds on S̃(T ). Recall the definitions and notation about partitions of tensors from
Section 2.7.
Theorem 3.4. For any tensor T and partition of its sets of variables,

                                          S̃(T ) ≤ sup min{pX , pY , pZ }.
                                                        p∈P(L)

    Our proof of Theorem 3.4 uses one of the most successful techniques from past work on proving
slice rank upper bounds, of partitioning the terms of T ⊗n by distribution from P(L); similar results
(without blocking variables) were also proved, for instance, by [35, Proposition 6], [3, Theorem 5.2], [12,
Theorem 5.9].

Proof of Theorem 3.4. For any positive integer n, we can write

                                           T ⊗n =            ∑            P1 ⊗ · · · ⊗ Pn .
                                                       (P1 ,...,Pn )∈Ln

For a given (P1 , . . . , Pn ) ∈ Ln , let dist(P1 , . . . , Pn ) be the probability distribution on L which results from
picking a uniformly random α ∈ [n] and outputting Pα . For a probability distribution p : L → [0, 1], define
Ln,p := {(P1 , . . . , Pn ) ∈ Ln | dist(P1 , . . . , Pn ) = p}. Note that the number of p for which Ln,p is nonempty
is only poly(n), since they are the distributions which assign an integer multiple of 1/n to each element
of L. Let D be the set of these probability distributions.
    We can now rearrange:
                                           T ⊗n = ∑               ∑ P1 ⊗ · · · ⊗ Pn .
                                                   p∈D (P1 ,...,Pn )∈Ln,p


                          T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                       16
                      L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

Hence,
                                                                                                   !
                              S(T ⊗n ) ≤ ∑ S                       ∑            P1 ⊗ · · · ⊗ Pn
                                                p∈D        (P1 ,...,Pn )∈Ln,p
                                                                                                                !
                                           ≤ poly(n) · max S                       ∑             P1 ⊗ · · · ⊗ Pn .
                                                               p∈D
                                                                            (P1 ,...,Pn )∈Ln,p

    For any probability distribution p : L → [0, 1], let us count the number of x-variables used in
                                                                   !
                                                               ∑           P1 ⊗ · · · ⊗ Pn .
                                                      (P1 ,...,Pn )∈Ln,p

These are the tuples of the form (x1 , . . . , xn ) ∈ X n where, for each i ∈ [kX ], there are exactly n · p(Xi )
choices of j for which x j ∈ Xi . The number of these is,6
                                                                       
                                                n
                                                                          · ∏ |Xi |n·p(Xi ) .
                             n · p(X1 ), n · p(X2 ), . . . , n · p(XkX ) i∈[k ]                    X


                    n+o(n)
This is at most pX           , where pX is the quantity defined in Section 2.7. It follows that
                                                                   !
                                                                                                   n+o(n)
                                      Sx                ∑             P1 ⊗ · · · ⊗ Pn       ≤ pX            .
                                                 (P1 ,...,Pn )∈Ln,p

We can similarly argue about Sy and Sz . Hence,
                                                                                                                   !
                                     ⊗n
                               S(T        ) ≤ poly(n) · max S                      ∑             P1 ⊗ · · · ⊗ Pn
                                                               p∈D
                                                                            (P1 ,...,Pn )∈Ln,p

                                           ≤ poly(n) · max min{pX , pY , pZ }n+o(n)
                                                               p∈D

                                           ≤ poly(n) · sup min{pX , pY , pZ }n+o(n) .
                                                               p∈P(L)


Hence, S(T ⊗n ) ≤ sup p min{pX , pY , pZ }n+o(n) , and the desired result follows.
   6 Here,
                                                                      
                                                        n                             n!
                                                                         =
                                               p1 n, p2 n, . . . , p` n    (p1 n)!(p2 n)! · · · (p` n)!
with each pi ∈ [0, 1] and p1 + · · · + p` = 1, is the multinomial coefficient, with the known bound from Stirling’s approximation,
for fixed pi s, that
                                                                                 !n+o(n)
                                                         n                     −pi
                                                                          ≤ ∏ pi           .
                                                p1 n, p2 n, . . . , p` n    i
                                                           p
Throughout this paper we use the convention that pi i = 1 when pi = 0.


                             T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                             17
                                                          J OSH A LMAN

Remark 3.5. [3, Theorem 5.2] is less powerful than our Theorem 3.4 in two ways: it used I˜ instead of S̃,
and it required each Xi ,Y j , Zk to contain only one variable.

Remark 3.6. Suppose T is over X,Y, Z with |X| = |Y | = |Z| = q. For any probability distribution p we
always have pX , pY , pZ ≤ q, and moreover we only have pX = q when p(Xi ) = |Xi |/q for each i. Similar
to [3, Corollary 5.1], it follows that if no probability distribution p is δ -close (say, in `1 distance) to
having p(Xi ) = |Xi |/q for all i, p(Y j ) = |Y j |/q for all j, and p(Zk ) = |Zk |/q for all k, simultaneously, then
we get S̃(T ) ≤ q1− f (δ ) for some increasing function f with f (δ ) > 0 for all δ > 0.

Remark 3.7. Applying Theorem 3.4 when all the parts of the partitions of X, Y , and Z have size 1 always
yields the best possible bound that Theorem 3.4 can give on S̃(T ). That said, many tensors are easier to
argue about when a different partition is used, as we will see in Section 5 below.

    Before moving on, we make an observation about applying Theorem 3.4 to variable-symmetric
tensors. This observation has implicitly been used in past work on applying the Laser Method, such
as [19], but we prove it here for completeness. Recall the notation in Section 2.8 about such tensors.

Proposition 3.8. Suppose T is a variable-symmetric tensor over X,Y, Z, and X = X1 ∪ · · · ∪ XkX , Y =
Y1 ∪ · · · ∪YkY , Z = Z1 ∪ · · · ∪ ZkZ are T -symmetric partitions. Then,

                                                      S̃(T ) ≤       sup        pX .
                                                                   p∈Psym (L)


Proof. We know from Theorem 3.4 that S̃(T ) ≤ sup p∈P(L) min{pX , pY , pZ }. We will show that for any
p ∈ P(L), there is a p0 ∈ Psym (L) such that min{pX , pY , pZ } ≤ min{p0X , pY0 , p0Z }, which means that in fact,
S̃(T ) ≤ sup p∈Psym (L) min{pX , pY , pZ }. Finally, the desired result will follow since, for any p0 ∈ Psym (L),
we have p0X = pY0 = p0Z .
    Consider any p ∈ P(L), and define the distribution p0 ∈ Psym (L) by p0 (Ti jk ) := (p(Ti jk ) + p(T jki ) +
p(Tki j ))/3 for each Ti jk ∈ L. To prove that min{pX , pY , pZ } ≤ p0X , we will show that (pX pY pZ )1/3 ≤ p0X .

                                                               p(Xi )/3               p(Yi )/3              p(Zi )/3
                               1/3                     |Xi |                    |Yi |                   |Zi |
                    (pX pY pZ )      = ∏
                                        i∈[kX ]
                                                      p(Xi )                   p(Yi )                  p(Zi )
                                                                           0
                                                              |Xi | p (Xi )
                                     = ∏              p(Xi ) p(Y ) p(Yi ) p(Z ) p(Zi ) )1/3
                                      i∈[kX ] (p(Xi )           i            i
                                                          0
                                               |Xi | p (Xi )
                                     ≤ ∏ 0             p0 (Xi )
                                      i∈[kX ] p (Xi )

                                     = p0X ,

where the second-to-last step follows from the fact that for any real numbers a, b, c ∈ [0, 1], setting
d = (a + b + c)/3, we have aa bb cc ≥ d 3d .

                         T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                                18
                    L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

3.3     Generalization of [3, Theorem 5.1]
The final remaining tool from [3], their Theorem 5.1, turns out to be unnecessary for proving our tight
lower bounds in the next section. Nonetheless, we sketch here how to extend it to give asymptotic slice
rank upper bounds as well.
    For a tensor T , let m(T ) := max{Sx (T ), Sy (T ), Sz (T )}. Recall from Lemma 2.1 that for any two
tensors A, B we have S(A ⊗ B) ≤ S(A) · m(B).
    In general, for two tensors A and B, even if S̃(A) and S̃(B) are “small,” it might still be the case that
S̃(A + B) is “large,” much larger than S̃(A) + S̃(B). For instance, for any positive integer q, define the
tensors T1 := ∑qi=0 x0 yi zi , T2 := ∑q+1                       q+1
                                      i=1 xi y0 zi , and T3 := ∑i=1 xi yi zq+1 . We can see that S̃(T1 ) = S̃(T2 ) =
S̃(T3 ) = 1, but T1 + T2 + T3 = CWq , and we will show soon that S̃(CWq ) grows unboundedly with q.
    Here we show that if, not only is S̃(A) small, but even Sx (A) is small, then we can get a decent bound
on S̃(A + B).
Theorem 3.9. Suppose T, A, B are tensors such that A + B = T . Then,
                                                           1−p
                                               m(A)                1
                                S̃(T ) ≤                         · p,
                                           (1 − p) · Sx (A)       p

where p ∈ [0, 1] is given by                                     
                                                            x (B)
                                                    log SS̃(B)
                                        p :=                            .
                                                                    Sx (B)
                                             log Sm(A)
                                                   x (A)
                                                           + log    S̃(B)

Proof. We begin by, for any integers n ≥ k ≥ 0, giving bounds on S(A⊗k ⊗ B⊗(n−k) ). First, since Sx is
submultiplicative, we have

                        S(A⊗k ⊗ B⊗(n−k) ) ≤ Sx (A⊗k ⊗ B⊗(n−k) ) ≤ Sx (A)k · Sx (B)n−k .

Second, from the definition of m, we have

                        S(A⊗k ⊗ B⊗(n−k) ) ≤ m(A⊗k ) · S(B⊗(n−k) ) ≤ m(A)k · S̃(B)n−k .

      It follows that for any positive integer n we have
                    n                             n  
             ⊗n          n        ⊗k     ⊗(n−k)         n
         S(T ) ≤ ∑           · S(A ⊗ B          )≤ ∑      · min{Sx (A)k · Sx (B)n−k , m(A)k · S̃(B)n−k }.
                   k=0   k                         k=0  k
                                                                                n
                                                                                
    As in the proof of [3, Theorem 5.1], we can see that the quantity           k   · min{Sx (A)k · Sx (B)n−k , m(A)k ·
S̃(B)n−k } is maximized at k = pn, and the result follows.

Remark 3.10. This result generalizes [3, Theorem 5.1], no longer requiring that A be the tensor T
restricted to a single x-variable. In [3, Theorem 5.1], since A is T restricted to a single x-variable, and
we required A to have at most q terms, we got the bounds Sx (A) = 1 and m(A) ≤ q. Similarly, B had at
most q − 1 different x-variables, so Sx (B) ≤ q − 1. Substituting those values into Theorem 3.9 yields the
original [3, Theorem 5.1] with I˜ replaced by S̃.

                         T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                       19
                                                        J OSH A LMAN

4     Computing the slice ranks for tensors of interest

In this section, we give slice rank upper bounds for a number of tensors of interest. It will follow from
Section 5 that all of the bounds we prove in this section are tight.



4.1     Generalized Coppersmith–Winograd tensors

We begin with the generalized CW tensors defined in [3], which for a positive integer q and a permutation
σ : [q] → [q] are given by

                                                                         q
                  CWq,σ := x0 y0 zq+1 + x0 yq+1 z0 + xq+1 y0 z0 + ∑ (xi yσ (i) z0 + xi y0 zi + x0 yi zi ).
                                                                        i=1



    The usual Coppersmith–Winograd tensor CWq results by setting σ to the identity permutation. Just as
in [3, Section 7.1], we can see that Theorem 3.2 and Theorem 3.4 immediately apply to CWq,σ to show
that there is a universal constant δ > 0 such that for any q and σ we have S̃(CWq,σ ) ≤ (q + 2)1−δ , and
hence a universal constant c > 2 such that ωu (CWq,σ ) ≥ c. Indeed, by proceeding in this way, we get the
exact same constants as in [3].
    That said, we will now use Theorem 3.4 to prove that c ≥ 2.16805. (In fact, essentially the same
argument as we present now shows that [3, Theorem 5.2] was already sufficient to show the weaker claim
that ωg (CWq,σ ) ≥ 2.16805.)
    We begin by partitioning the sets of variables of CWq,σ , using the notation of Theorem 3.4. Let
X0 = {x0 }, X1 = {x1 , . . . , xq }, and X2 = {xq+1 }, so that X0 ∪ X1 ∪ X2 is a partition of the x-variables
of CWq,σ .7 Similarly, let Y0 = {y0 }, Y1 = {y1 , . . . , yq }, Y2 = {yq+1 }, Z0 = {z0 }, Z1 = {z1 , . . . , zq }, and
Z2 = {zq+1 }. We can see this is a CWq,σ -symmetric partition with L = {T002 , T020 , T200 , T011 , T101 , T110 }.
   Consider any probability distribution p ∈ Psym (L). By symmetry, we know that p(T002 ) = p(T020 ) =
p(T200 ) = v and p(T011 ) = p(T101 ) = p(T110 ) = 1/3 − v for some value v ∈ [0, 1/3]. Applying Theo-
rem 3.4, and in particular Proposition 3.8, yields:


                                                                   q2(1/3−v)
                               S̃(CWq ) ≤ sup           v          2/3−2v (1/3 + v)1/3+v
                                                                                         .
                                             v∈[0,1/3] v (2/3 − 2v)



In fact, we will see in the next section that this is tight (i. e., the value above is equal to S̃(CWq ), not just
an upper bound on it). The values for the first few q can be computed using optimization software as
follows:

    7 The sets of partitions were 1-indexed before, but we 0-index here for notational consistency with past work.



                            T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                     20
                     L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

                                                     q   S̃(CWq,σ )
                                                     1   2.7551 · · ·
                                                     2   3.57165 · · ·
                                                     3   4.34413 · · ·
                                                     4   5.07744 · · ·
                                                     5   5.77629 · · ·
                                                     6   6.44493 · · ·
                                                     7   7.08706 · · ·
                                                     8   7.70581 · · ·

    Finally, using the lower bound R̃(CWq,σ ) ≥ q + 2 (in fact, it is known that R̃(CWq,σ ) = q + 2),
and the upper bound on S̃(CWq,σ ) we just proved, we can apply Theorem 2.9 to give lower bounds
ωu (CWq,σ ) ≥ 2 log(R̃(CWq,σ ))/ log(S̃(CWq,σ )) ≥ 2 log(q + 2)/ log(S̃(CWq,σ )) as follows:

                                         q    Lower Bound on ωu (CWq,σ )
                                         1    2.16805 · · ·
                                         2    2.17794 · · ·
                                         3    2.19146 · · ·
                                         4    2.20550 · · ·
                                         5    2.21912 · · ·
                                         6    2.23200 · · ·
                                         7    2.24404 · · ·
                                         8    2.25525 · · ·
     It is not hard to see that the resulting lower bound on ωu (CWq,σ ) is increasing with q and is always
at least 2.16805 . . . (see Appendix A below for a proof), and hence that for any q and any σ we have
ωu (CWq,σ ) ≥ 2.16805 as desired.

4.2    Generalized simple Coppersmith–Winograd tensors
Similarly to CWq,σ , we can define for a positive integer q and a permutation σ : [q] → [q] the simple
Coppersmith–Winograd tensor cwq,σ given by:
                                                 q
                                     cwq,σ := ∑ (xi yσ (i) z0 + xi y0 zi + x0 yi zi ).
                                                i=1

These tensors, when σ is the identity permutation id, are well-studied. For instance, Coppersmith and
Winograd [19] showed that if R̃(cw2,id ) = 2 then ω = 2.
    We will again give a tight bound on S̃(cwq,σ ) using Theorem 3.4 combined with the next section.
To apply Theorem 3.4, and in particular Proposition 3.8, we again pick a partition of the variables. Let
X0 = {x0 }, X1 = {x1 , . . . , xq }, Y0 = {y0 }, Y1 = {y1 , . . . , yq }, Z0 = {z0 }, and Z1 = {z1 , . . . , zq }. This is
a cwq,σ -symmetric partition with L = {T011 , T101 , T110 }. There is a unique p ∈ Psym (L), which assigns
probability 1/3 to each part. It follows that
                                                                                 3
                              S̃(cwq,σ ) ≤ (1/3)−1/3 (2/3)−2/3 · q2/3 =              · q2/3 .
                                                                                22/3

                          T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                        21
                                                  J OSH A LMAN

   Again, we will see in the next section that this bound is tight. Using the lower bound R̃(cwq,σ ) ≥ q + 1,
we get the lower bound
                                                         log(q + 1)
                                     ωu (cwq,σ ) ≥ 2                  .
                                                            3
                                                      log 22/3  · q2/3
   The first few values are as follows; note that we cannot get a lower bound better than 2 when q = 2
because of Coppersmith and Winograd’s aforementioned remark that if R̃(cw2,id ) = 2 then ω = 2..

                                       q    Lower Bound on ωu (cwq,σ )
                                       1    2.17795 · · ·
                                       2    2
                                       3    2.02538 · · ·
                                       4    2.06244 · · ·
                                       5    2.09627 · · ·
                                       6    2.12549 · · ·
                                       7    2.15064 · · ·

4.3   Cyclic group tensors
We next look at two tensors which were studied in [16], [2], and [3, Section 7.3]. For each positive integer
q, define the tensor Tq (the structural tensor of the cyclic group Cq ) as:
                                                q−1 q−1
                                           Tq = ∑ ∑ xi y j zi+ j mod q .
                                                 i=0 j=0

Define also the lower triangular version of Tq , called Tqlower , as:
                                                       q−1 q−1−i
                                           Tqlower =   ∑ ∑ xi y j zi+ j .
                                                       i=0 j=0

    While Theorem 3.4 does not give any nontrivial upper bounds on S̃(Tq ), it does give nontrivial upper
bounds on S̃(Tqlower ), as noted in [3, Section 7.3]. Using computer optimization software, we can compute
our lower bound on S̃(Tqlower ), using Theorem 3.4 where each partition contains exactly one variable, for
the first few values of q:
                                       q    Upper Bound on S̃(Tqlower )
                                       2    1.88988 · · ·
                                       3    2.75510 · · ·
                                       4    3.61071 · · ·
                                       5    4.46157 · · ·

    We show in the next section that these numbers are also tight. The value of S̃(Tqlower ) also follows
from [33]; see in particular [33, Table 1].
    It is known (see, e. g., [2]) that R̃(Tq ) = R̃(Tqlower ) = q. Thus we get the following lower bounds on
ωu (Tqlower ) ≥ 2 log(q)/ log(S̃(Tqlower )):

                        T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                              22
                      L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

                                             q    Lower Bound on ωu (Tqlower )
                                             2    2.17795 · · ·
                                             3    2.16805 · · ·
                                             4    2.15949 · · ·
                                             5    2.15237 · · ·

     These numbers match the lower bounds obtained by [2, 9] in their study of Tq ; our Theorem 3.4 can
be viewed as an alternate tool to achieve those lower bounds. The bound approaches 2 as q → ∞, since
S̃(Tqlower ) = Ω(q) as q → ∞.8 Interestingly, it is shown in [12, Theorem 4.16] that Tqlower degenerates to
Tq over the field Fq , which implies that our bounds above also hold for Tq over Fq .


4.4    The value of the subtensor t112 of CWq⊗2
A key tensor which arises in applying the Laser Method to increasing powers of CWq , including [19, 36,
26, 4, 25, 27], is the tensor t112 which (for a given positive integer q) is given by
                            q                      q                       q                     q
                 t112 := ∑ xi,0 yi,0 z0,q+1 + ∑ x0,k y0,k zq+1,0 + ∑ xi,0 y0,k zi,k + ∑ x0,k yi,0 zi,k .
                           i=1                   k=1                     i,k=1                 i,k=1

Coppersmith–Winograd [19] and future work studied the value of this tensor. In [19] it is shown that for
every τ ∈ [2/3, 1],
                                 Vτ (t112 ) ≥ 22/3 qτ (q3τ + 2)1/3 .

This bound has been used in all the subsequent work using CWq , without improvement. Here we show it
is tight and cannot be improved in the case τ = 2/3:

Proposition 4.1. V2/3 (t112 ) = 22/3 q2/3 (q2 + 2)1/3 .

Proof. Consider the variable-symmetric tensor ts := t112 ⊗ rot(t112 ) ⊗ rot(rot(t112 )). As in [19], by
definition of V2/3 , for every δ > 0 there is a positive integer n such that ts⊗n has a degeneration to
                                              2                3n(1−δ ) . In particular, by Corollary 2.6 this yields
  i hai , ai , ai i for values such that ∑i ai ≥ (V2/3 (T112 ))
L

the bound
                                        S̃(ts⊗n ) ≥ ∑ a2i ≥ (V2/3 (t112 ))3n(1−δ ) .
                                                        i

Since this holds for all δ > 0, it follows that S̃(ts ) ≥ (V2/3 (t112 ))3 ≥ 22 q2 (q2 + 2).
    We now give an upper bound on S̃(ts ) using Theorem 3.4. Although we are analyzing ts , we
will make use of a partition of the variables of t112 . The partition is as follows: X0 = {xi,0 | i ∈ [q]},
X1 = {x0,k | k ∈ [q]}, Y0 = {yi,0 | i ∈ [q]}, Y1 = {y0,k | k ∈ [q]}, Z0 = {zi,k | i, k ∈ [q]}, Z1 = {z0,q+1 }, and
Z2 = {zq+1,0 }. Hence, L = {T001 , T112 , T010 , T100 }. As in [19], and similar to Proposition 3.8, since ts
is defined as ts := t112 ⊗ rot(t112 ) ⊗ rot(rot(t112 )), it follows that S̃(ts ) ≤ sup p∈P(L) pX · pY · pZ . We can
   8 For instance, if we pick the probability distribution p to be the uniform distribution on the q terms of T lower , then we find
                                                                                                     
                                                                                                   2           q
pX = pY = pZ = Ω(q), and so by Theorem 5.3 below, we get S̃(Tqlower ) = Ω(q).


                            T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                                 23
                                                        J OSH A LMAN

assume, again by symmetry, that any probability distribution p on L assigns the same value v to T010 and
T100 , and the same value 1/2 − v to T001 and T112 . We finally get the bound:
                                                                      (q2 )2v
                                   S̃(ts ) ≤ sup (2q)2 ·                           .
                                            v∈[0,1/2]         (2v)2v (1/2 − v)1−2v

This is maximized at v = q2 /(2q2 + 2), which yields exactly S̃(ts ) ≤ 22 q2 (q2 + 2). The desired bound
follows.

    The only upper bound we are able to prove on Vτ for τ > 2/3 is the straightforward Vτ (t112 ) ≤
V2/3 (t112 )3τ/2 = 2τ qτ (q2 + 2)τ/2 , which is slightly worse than the best known lower bound Vτ (t112 ) ≥
22/3 qτ (q3τ + 2)1/3 . It is an interesting open problem to prove tight upper bounds on Vτ (T ) for any
nontrivial tensor T and value τ > 2/3. T = t112 may be a good candidate since the Laser Method seems
                                                                                         ⊗n
unable to improve Vτ (t112 ) for any τ, even when applied to any small tensor power t112    .
    Notice that we were able to prove a tight bound on S̃(ts ) here: the upper bound we proved matches a
lower bound which we were able to derive from Coppersmith–Winograd’s analysis (which made use of
the Laser Method) of Vτ (t112 ). In the next section we will substantially generalize this fact, by showing a
tight bound on S̃(T ) for any tensor T to which the Laser Method applies.


5    Slice rank lower bounds via the Laser Method
In this section, we show that the Laser Method can be used to give matching upper and lower bounds on
S̃(T ) for any tensor T to which it applies. We will build off of Theorem 3.4, which we will show matches
the bounds which arise in the Laser Method.
    Consider any tensor T which is minimal over X,Y, Z, and let X = X1 ∪ · · · ∪ XkX , Y = Y1 ∪ · · · ∪YkY ,
Z = Z1 ∪ · · · ∪ ZkZ be partitions of the three sets of variables. Define Ti jk , L, and pX for a probability
distribution p on L, as in the top of Subsection 2.7. Recall in particular that Ti jk is T restricted to the sets
of variables Xi , Y j , and Zk .
Definition 5.1. We say that T , along with partitions of X,Y, Z, is a laser-ready tensor partition if the
following three conditions are satisfied:
(1) For every (i, j, k) ∈ [kX ] × [kY ] × [kZ ], either Ti jk = 0, or else Ti jk has a degeneration to a tensor ha, b, ci
    with ab = |Xi |, bc = |Y j |, and ca = |Zk | (i. e., a matrix multiplication tensor which is as big as possible
    given |Xi |, |Y j |, and |Zk |).
(2) There is an integer ` such that Ti jk 6= 0 only if i + j + k = `.
(3) T is variable-symmetric, and the partitions are T -symmetric.
    These conditions are exactly those for which the original Laser Method used by Coppersmith and
Winograd [19] applies to T . We note that condition (3) is a simplifying assumption rather than a real
condition on T : for any tensor T and partitions satisfying conditions (1) and (2), the tensor T 0 :=
T ⊗ rot(T ) ⊗ rot(rot(T )) along with the corresponding product partitions, satisfies all three conditions,
gives at least as good a bound on ω using the Laser Method as T and the original partitions, and more
generally has ωu (T 0 ) ≤ ωu (T ).

                          T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                        24
                      L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

Theorem 5.2 ([19, 21, 36], see also [11, Proposition 15.32]). Suppose T , along with the partitions of
X,Y, Z, is a laser-ready tensor partition. Then, for any distribution p ∈ Psym (L), and any positive integer
n, the tensor T ⊗n has a degeneration into
                                                      !n−o(n)
                                            ∏ p(Xi )−p(X )              i
                                                                                             ha, a, ai,
                                           i∈[kX ]

where                                                                                   !n/2−o(n)
                                                                            p(Ti jk )
                                             a=           ∏ |Xi |                                   .
                                                         Ti jk ∈L

Proof. Typically, as described in [36, Section 3], there is an additional loss in the size of the degeneration
if there are multiple different distributions p, p0 with the same marginals (meaning p(Xi ) = p0 (Xi ),
p(Y j ) = p0 (Y j ), and p(Zk ) = p0 (Zk ) for all i, j, k) but different values of V (p) := ∏Ti jk ∈L Vτ (Ti jk ) p(Ti jk ) for
any τ ∈ [2/3, 1]. However, because of condition (1) in the definition of a laser-ready tensor partition, the
quantity V (p) is equal to
                                             ∏ (|Xi | · |Y j | · |Zk |) p(Ti jk )·τ/2 ,
                                              Ti jk ∈L

and in particular satisfies V (p) = V (p0 ) for any two distributions p, p0 with the same marginals. Thus, we
do not incur this loss, and we get the desired degeneration.

    Our key new result about such tensor partitions is as follows:
Theorem 5.3. Suppose tensor T , along with the partitions of X,Y, Z, is a laser-ready tensor partition.
Then,
                                       S̃(T ) = sup pX .
                                                                        p∈Psym (L)

Proof. The upper bound, S̃(T ) ≤ sup p∈Psym (L) pX , is given by Proposition 3.8.
    For the lower bound, we know from Theorem 5.2 that for all p ∈ Psym (L), and all positive integers n,
the tensor T ⊗n has a degeneration into
                                                       !n−o(n)
                                            ∏ p(Xi )−p(X )              i
                                                                                             ha, a, ai,
                                           i∈[kX ]

where                                                                                   !n/2−o(n)
                                             a=           ∏ |Xi | p(T )         i jk
                                                                                                    .
                                                         Ti jk ∈L

By Proposition 2.5, this means T ⊗n has a degeneration to an independent tensor of size
                                                 !n−o(n)
                                                                                                 n−o(n)
                                          ∏ p(Xi )−p(X )            i
                                                                                         · a2 = pX        .
                                         i∈[kX ]

Applying Proposition 2.3 and Proposition 2.4 implies that S̃(T ) ≥ pX for all p ∈ Psym (L), as desired.

                           T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                             25
                                                   J OSH A LMAN

Corollary 5.4. The upper bounds on S̃(CWq,σ ), S̃(cwq,σ ), S̃(Tqlower ), and S̃(Tq ) from Section 4 are tight.

Proof. CWq,σ , cwq,σ , and Tqlower , partitioned as they were in the previous section, are laser-ready tensor
partitions. The tight bound for Tq follows from the degeneration to Tqlower described in the previous
section.

Corollary 5.5. Every tensor T with a laser-ready tensor partition (including CWq,σ , cwq,σ , and Tqlower )
has S̃(T ) = Q̃(T ).

Proof. All tensors satisfy S̃(T ) ≥ Q̃(T ). In Theorem 5.3, the upper bound on S̃(T ) showed that T ⊗n has
a degeneration to an independent tensor of size S̃(T )n−o(n) , which implies that Q̃(T ) ≥ S̃(T ).

Corollary 5.6. If T is a tensor with a laser-ready tensor partition, and applying the Laser Method to T
with this partition yields an upper bound on ω of ωu (T ) ≤ c for some c > 2, then ωu (T ) > 2.

Proof. When the Laser Method shows, as in Theorem 5.2, that T ⊗n has a degeneration into
                                                               !n−o(n)
                                        ∏ p(Xi )−p(X )     i
                                                                           ha, a, ai,
                                      i∈[kX ]

the resulting upper bound on ωu (T ) is that
                                                          !n−o(n)
                                                −p(Xi )
                                   ∏ p(Xi )                         · aωu (T ) ≥ R̃(T )n .
                                  i∈[kX ]

In particular, since the left-hand side equals pX when ωu (T ) = 2, this yields ωu (T ) = 2 if and only
if pX = R̃(T ), so if it yields ωu (T ) ≤ c, then S̃(T ) = pX < R̃(T )1−δ for some δ > 0. Combined with
Theorem 2.7 or Theorem 2.9, this means that ωu (T ) > 2.


A    Proof that ωu (CWq,σ ) ≥ 2.16805 for all q
Define the function f : [0, 1/3] → R by
                                                                  1
                                 f (v) :=                                               .
                                            vv (2/3 − 2v)2/3−2v (1/3 + v)1/3+v
In Section 4.1, we showed that
                                                                     log(q + 2)
                                ωu (CWq,σ ) ≥ min 2                                   .
                                                  v∈[0,1/3]      log(q2/3−2v · f (v))
The value of this optimization problem is computed for 1 ≤ q ≤ 8 in a table in Section 4.1, where we see
that ωu (CWq,σ ) ≥ 2.16805 for all q ≤ 8.
    Let vq denote the argmin for the optimization problem. In particular, for q = 8, the argmin is
v8 = 0.017732422 . . .. From the q2/3−2v term in the optimization problem, we see that vq+1 ≤ vq for all q,

                       T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                26
                   L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

and in particular, vq ≤ v8 for all q > 8. It follows that f (vq ) ≤ f (v8 ) = 2.07389 . . . for all q > 8. Thus,
for all q > 8 we have:

                                                    log(q + 2)               log(q + 2)
                     ωu (CWq,σ ) ≥ min 2             2/3−2v
                                                                       =2                     .
                                    v∈[0,1/3]   log(q       · f (v8 ))    log(q2/3 · f (v8 ))

This expression equals 2.18562 . . . at q = 9, and is easily seen to be increasing with q for q > 9, which
implies as desired that ωu (CWq,σ ) ≥ 2.16805 for all q ≥ 9 and hence all q.


Acknowledgements
I would like to thank Matthias Christandl, Joshua Grochow, Ryan Williams, Virginia Vassilevska Williams,
Jeroen Zuiddam, and anonymous reviewers, for many helpful discussions and suggestions.


References
 [1] J OSH A LMAN: Limits on the Universal Method for matrix multiplication. In Proc. 34th Comput.
     Complexity Conf. (CCC’19), pp. 12:1–12:24. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik,
     2019. [doi:10.4230/LIPIcs.CCC.2019.12] 1

 [2] J OSH A LMAN AND V IRGINIA VASSILEVSKA W ILLIAMS: Further limitations of the known
     approaches for matrix multiplication. In Proc. 9th Innovations in Theoret. Comp. Sci.
     conf. (ITCS’18), pp. 25:1–25:15. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2018.
     [doi:10.4230/LIPIcs.ITCS.2018.25] 3, 4, 22, 23

 [3] J OSH A LMAN AND V IRGINIA VASSILEVSKA W ILLIAMS: Limits on all known (and some un-
     known) approaches to matrix multiplication. In Proc. 59th FOCS, pp. 580–591. IEEE Comp. Soc.,
     2018. [doi:10.1109/FOCS.2018.00061] 2, 3, 4, 6, 8, 9, 10, 11, 13, 15, 16, 18, 19, 20, 22

 [4] J OSH A LMAN AND V IRGINIA VASSILEVSKA W ILLIAMS: A refined laser method and faster matrix
     multiplication. In Proc. 32nd Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’21), pp.
     522–539. SIAM, 2021. [doi:10.1137/1.9781611976465.32] 2, 3, 4, 7, 11, 12, 23

 [5] N OGA A LON , A MIR S HPILKA , AND C HRISTOPHER U MANS: On sunflowers and matrix multipli-
     cation. Comput. Complexity, 22(2):219–243, 2013. [doi:10.1007/s00037-013-0060-1] 2

 [6] A NDRIS A MBAINIS , Y UVAL F ILMUS , AND F RANÇOIS L E G ALL: Fast matrix multiplication:
     limitations of the Coppersmith–Winograd method. In Proc. 47th STOC, pp. 585–593. ACM Press,
     2015. [doi:10.1145/2746539.2746554] 3, 5

 [7] DARIO B INI: Border rank of a p × q × 2 tensor and the optimal approximation of a pair of bilinear
     forms. In Proc. 7th Internat. Colloq. on Automata, Languages, and Programming (ICALP’80), pp.
     98–108. Springer, 1980. [doi:10.1007/3-540-10003-2_63] 11

                        T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                                 27
                                            J OSH A LMAN

 [8] M ARKUS B LÄSER: Fast Matrix Multiplication. Number 5 in Graduate Surveys. Theory of Comput-
     ing Library, 2013. [doi:10.4086/toc.gs.2013.005] 9

 [9] J ONAH B LASIAK , T HOMAS C HURCH , H ENRY C OHN , J OSHUA A. G ROCHOW, E RIC NASLUND ,
     W ILLIAM F. S AWIN , AND C HRIS U MANS: On cap sets and the group-theoretic approach to matrix
     multiplication. Discrete Analysis, 2017:3:1–27. [doi:10.19086/da.1245] 4, 5, 8, 13, 23

[10] J ONAH B LASIAK , T HOMAS C HURCH , H ENRY C OHN , J OSHUA A. G ROCHOW, AND C HRIS
     U MANS: Which groups are amenable to proving exponent two for matrix multiplication? 2017.
     [arXiv:1712.02302] 8

[11] P ETER B ÜRGISSER , M ICHAEL C LAUSEN , AND M OHAMMAD A. S HOKROLLAHI: Algebraic Com-
     plexity Theory. Volume 315 of Grundlehren der Math. Wiss. Springer, 2013. [doi:doi:10.1007/978-
     3-662-03338-8] 9, 25

[12] M ATTHIAS C HRISTANDL , P ÉTER V RANA , AND J EROEN Z UIDDAM: Universal points in
     the asymptotic spectrum of tensors. In Proc. 50th STOC, pp. 289–296. ACM Press, 2018.
     [doi:10.1145/3188745.3188766] 8, 14, 16, 23

[13] M ATTHIAS C HRISTANDL , P ÉTER V RANA , AND J EROEN Z UIDDAM: Barriers for fast ma-
     trix multiplication from irreversibility. In Proc. 34th Comput. Complexity Conf. (CCC’19),
     volume 137, pp. 26:1–26:17. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019.
     [doi:10.4230/LIPIcs.CCC.2019.26] 8, 14

[14] M ICHAEL B. C OHEN , Y IN TAT L EE , AND Z HAO S ONG: Solving linear programs in the current
     matrix multiplication time. J. ACM, 68(1):3:1–3:39, 2021. Preliminary version in STOC’19.
     [doi:10.1145/3424305] 4

[15] H ENRY C OHN , ROBERT K LEINBERG , BALÁZS S ZEGEDY, AND C HRISTOPHER U MANS: Group-
     theoretic algorithms for matrix multiplication. In Proc. 46th FOCS, pp. 379–388. IEEE Comp. Soc.,
     2005. [doi:10.1109/SFCS.2005.39, arXiv:math/0511460] 2

[16] H ENRY C OHN AND C HRISTOPHER U MANS: A group-theoretic approach to fast matrix multiplica-
     tion. In Proc. 44th FOCS, pp. 438–449. IEEE Comp. Soc., 2003. [doi:10.1109/SFCS.2003.1238217]
     2, 11, 22

[17] H ENRY C OHN AND C HRISTOPHER U MANS: Fast matrix multiplication using coherent configura-
     tions. In Proc. 24th Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’13), pp. 1074–1086.
     SIAM, 2013. [doi:10.1137/1.9781611973105.77] 2, 7

[18] D ON C OPPERSMITH AND S HMUEL W INOGRAD: On the asymptotic complexity of matrix multipli-
     cation. SIAM J. Comput., 11(3):472–492, 1982. [doi:10.1137/0211038] 11

[19] D ON C OPPERSMITH AND S HMUEL W INOGRAD: Matrix multiplication via arithmetic progressions.
     J. Symbolic Comput., 9(3):251–280, 1990. [doi:10.1016/S0747-7171(08)80013-2] 2, 3, 4, 5, 7, 11,
     18, 21, 23, 24, 25

                      T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                         28
                 L IMITS ON THE U NIVERSAL M ETHOD FOR M ATRIX M ULTIPLICATION

[20] E RNIE C ROOT, V SEVOLOD F. L EV, AND P ÉTER P ÁL PACH: Progression-free sets in Zn4 are
     exponentially small. Ann. Math., 185(1):331–337, 2017. [doi:10.4007/annals.2017.185.1.7] 8

[21] A LEXANDER M. DAVIE AND A NDREW JAMES S TOTHERS: Improved bound for complex-
     ity of matrix multiplication. Proc. Royal Soc. Edinburgh, Sec. A, 143(2):351–369, 2013.
     [doi:10.1017/S0308210511001648] 3, 4, 7, 11, 25

[22] J ORDAN S. E LLENBERG AND D ION G IJSWIJT: On large subsets of Fnq with no three-term arithmetic
     progression. Ann. Math., 185(1):339–343, 2017. [doi:10.4007/annals.2017.185.1.8] 8

[23] M ATTI K ARPPA AND P ETTERI K ASKI: Probabilistic tensors and opportunistic Boolean matrix
     multiplication. In Proc. 30th Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’19), pp.
     496–515. SIAM, 2019. [doi:10.1137/1.9781611975482.31] 7

[24] ROBERT K LEINBERG , W ILL S AWIN , AND DAVID S PEYER: The growth rate of tri-colored sum-free
     sets. Discrete Analysis, 2018:12:1–10. [doi:10.19086/da.3734] 5

[25] F RANÇOIS L E G ALL: Faster algorithms for rectangular matrix multiplication. In Proc. 53rd FOCS,
     pp. 514–523. IEEE Comp. Soc., 2012. [doi:10.1109/FOCS.2012.80] 4, 7, 23

[26] F RANÇOIS L E G ALL: Powers of tensors and fast matrix multiplication. In Proc. 39th Inter-
     nat. Symp. Symbolic and Algebraic Computation (ISSAC’14), pp. 296–303. ACM Press, 2014.
     [doi:10.1145/2608628.2608664] 2, 3, 4, 7, 11, 12, 23

[27] F RANÇOIS L E G ALL AND F LORENT U RRUTIA: Improved rectangular matrix multiplication using
     powers of the Coppersmith–Winograd tensor. In Proc. 29th Ann. ACM–SIAM Symp. on Discrete
     Algorithms (SODA’18), pp. 1029–1046. SIAM, 2018. [doi:10.1137/1.9781611975031.67] 4, 7, 8,
     23

[28] E RIC NASLUND AND W ILL S AWIN: Upper bounds for sunflower-free sets. In Forum of Mathe-
     matics, Sigma, volume 5, pp. e15:1–10. Cambridge Univ. Press, 2017. [doi:10.1017/fms.2017.12]
     8

[29] A RNOLD S CHÖNHAGE: Partial and total matrix multiplication. SIAM J. Comput., 10(3):434–455,
     1981. [doi:10.1137/0210032] 11

[30] VOLKER S TRASSEN: Gaussian elimination is not optimal. Numerische Mathematik, 13(4):354–356,
     1969. [doi:10.1007/BF02165411] 2, 10

[31] VOLKER S TRASSEN: The asymptotic spectrum of tensors and the exponent of matrix multiplication.
     In Proc. 27th FOCS, pp. 49–54. IEEE Comp. Soc., 1986. [doi:10.1109/SFCS.1986.52] 9, 10

[32] VOLKER S TRASSEN: Relative bilinear complexity and matrix multiplication. J. Reine Angew.
     Math., 1987(375–376):406–443, 1987. [doi:10.1515/crll.1987.375-376.406] 2, 3, 7, 10, 11, 13

[33] VOLKER S TRASSEN: Degeneration and complexity of bilinear maps: some asymptotic spectra. J.
     Reine Angew. Math., 1991(413):127–180, 1991. [doi:10.1515/crll.1991.413.127] 8, 22

                      T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                         29
                                              J OSH A LMAN

[34] T ERENCE TAO: A symmetric formulation of the Croot–Lev–Pach–Ellenberg–Gijswijt capset bound,
     2016. LINK at terrytao.wordpress.com. 5, 8, 13

[35] T ERENCE TAO AND W ILL S AWIN: Notes on the “slice rank” of tensors., 2016. LINK at terry-
     tao.wordpress.com. 8, 13, 16

[36] V IRGINIA VASSILEVSKA W ILLIAMS: Multiplying matrices faster than Coppersmith–Winograd.
     In Proc. 44th STOC, pp. 887–898. ACM Press, 2012. [doi:10.1145/2213977.2214056] 2, 3, 4, 7,
     11, 12, 23, 25


AUTHOR

     Josh Alman
     Assistant professor
     Department of Computer Science
     Columbia University
     New York, NY, USA
     josh cs columbia edu
     http://joshalman.com


ABOUT THE AUTHOR

     J OSH A LMAN is an Assistant Professor of Computer Science at Columbia University. He
         received his Ph. D. under the co-supervision of Virginia Vassilevska Williams and Ryan
         Williams at MIT in 2019, then spent two years as a Michael O. Rabin postdoctoral fellow
         in the Theory of Computation group at Harvard before joining Columbia. In addition
         to matrix multiplication, Josh is broadly interested in algorithm design and complexity
         theory, and especially problems in these areas with an algebraic nature. Josh likes solving
         and creating puzzles, and he is the captain of the       Galactic Trendsetters         , the
         team that won the 2020 MIT Mystery Hunt.




                      T HEORY OF C OMPUTING, Volume 17 (1), 2021, pp. 1–30                              30