DOKK Library

New Lower Bounds for the Border Rank of Matrix Multiplication

Authors Joseph M. Landsberg, Giorgio Ottaviani,

License CC-BY-3.0

Plaintext
                        T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298
                                       www.theoryofcomputing.org




              New Lower Bounds for the
          Border Rank of Matrix Multiplication
                         Joseph M. Landsberg ∗                      Giorgio Ottaviani†
                Received October 1, 2013; Revised December 12, 2014; Published August 6, 2015




       Abstract: The border rank of the matrix multiplication operator for n × n matrices is
       a standard measure of its complexity. Using techniques from algebraic geometry and
       representation theory, we show the border rank is at least 2n2 − n. Our bounds are better
       than the previous lower bound (due to Lickteig in 1985) of 3n2 /2 + n/2 − 1 for all n ≥ 3.
       The bounds are obtained by finding new equations that bilinear maps of small border rank
       must satisfy, i. e., new equations for secant varieties of triple Segre products, that matrix
       multiplication fails to satisfy.

ACM Classification: F.2.1
AMS Classification: 68Q17, 68Q25, 15A99
Key words and phrases: matrix multiplication complexity, border rank

1     Introduction and statement of results
Finding lower bounds in complexity theory is considered difficult. For example, chapter 14 of [1] is
titled “Circuit lower bounds: Complexity theory’s Waterloo.” The complexity of matrix multiplication is
roughly equivalent to the complexity of many standard operations in linear algebra, such as taking the
determinant or inverse of a matrix. A standard measure of the complexity of an operation is the minimal
size of an arithmetic circuit needed to perform it. The exponent of matrix multiplication ω is defined to
be limn logn of the arithmetic cost to multiply n × n matrices, or equivalently, limn logn of the minimal
number of multiplications needed [3, Propositions 15.1, 15.10]. Determining the complexity of matrix
    ∗ Supported by NSF grant DMS-1006353.
    † Member of GNSAGA-INDAM.



© 2015 Joseph M. Landsberg and Giorgio Ottaviani
c b Licensed under a Creative Commons Attribution License (CC-BY)                 DOI: 10.4086/toc.2015.v011a011
                                J OSEPH M. L ANDSBERG AND G IORGIO OTTAVIANI

multiplication is a central question of practical importance. We give new lower bounds for its complexity
in terms of border rank. These lower bounds are used to prove further lower bounds for tensor rank
in [12, 17, 18].
     Let A, B,C be vector spaces, with dual spaces A∗ , B∗ ,C∗ , and let T : A∗ × B∗ → C be a bilinear map.
The rank of T is the smallest r such that there exist a1 , . . . , ar ∈ A, b1 , . . . , br ∈ B, c1 , . . . , cr ∈ C such that
T (α, β ) = ∑ri=1 ai (α)bi (β )ci . The border rank of T is the smallest r such that T can be written as a limit
of a sequence of bilinear maps of rank r. Let R(T ) denote the border rank of T . See [11] or [3] for
more on the rank and border rank of tensors, especially the latter for their relation to other measures of
complexity.
     Let Mhm,n,li : Matm×n × Matn×l → Matm×l denote the matrix multiplication operator. One has (see,
e. g., [3, Propositions 15.1, 15.10]) that ω = limn (logn R(Mhn,n,ni )). Naïvely R(Mhm,n,li ) ≤ mnl via
the standard algorithm. In 1969, V. Strassen [23] showed that R(Mh2,2,2i ) ≤ 7 and, as a consequence,
R(Mhn,n,ni ) ≤ O(n2.81 ). Further upper bounds have been derived since then by numerous authors, with the
current record R(Mhn,n,ni ) ≤ O(n2.3728639 ) [15]. In 1983 Strassen showed [24] that R(Mhn,n,ni ) ≥ 3n2 /2,
and shortly thereafter T. Lickteig [16] showed R(Mhn,n,ni ) ≥ 3n2 /2 + n/2 − 1. Since then no further
general lower bound had been found (although it is now known that R(Mh2,2,2i ) = 7, see [10, 8]), and a
completely different proof (using methods proposed by Mulmuley and Sohoni for Geometric complexity
theory) that R(Mhn,n,ni )) ≥ 3n2 /2 − 2 was given in [4].
     Our results are as follows:

Theorem 1.1. Let n ≤ m. For all l ≥ 1

                                                             nl (n + m − 1)
                                            R(Mhm,n,li ) ≥                  .                                         (1.1)
                                                                    m
Corollary 1.2.
                                                 R(Mhn,n,li ) ≥ 2nl − l ,                                             (1.2)

                                                R(Mhn,n,ni ) ≥ 2n2 − n .                                              (1.3)

   For 3 × 3 matrices, the state of the art is now 15 ≤ R(Mh3,3,3i ) ≤ 20, the upper bound is due to
Smirnov [22].

Remark 1.3. The best lower bound for the rank of matrix multiplication, was, until recently,
R(Mhn,n,ni ) ≥ 5n2 /2 − 3n, due to to Bläser [2]. After this paper was posted on arXiv, and using
Corollary 1.2, it was shown that R(Mhn,n,ni ) ≥ 3n2 − 4n3/2 − n by Landsberg in [12] and then, pushing
                                                                                          √
the same methods further, A. Massarenti and E. Raviolo showed R(Mhn,n,ni ) ≥ 3n2 − 2 2n3/2 − 3n
in [17, 18].

    Our bounds come from explicit equations that bilinear maps of low border rank must satisfy. These
equations are best expressed in the language of tensors. Our method is similar in nature to the method
used by Strassen to get his lower bounds—we find explicit polynomials that tensors of low border rank
must satisfy, and show that matrix multiplication fails to satisfy them. Strassen found his equations via
linear algebra—taking the commutator of certain matrices. We found ours using representation theory

                       T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                                         286
               N EW L OWER B OUNDS FOR THE B ORDER R ANK OF M ATRIX M ULTIPLICATION

and algebraic geometry. (Algebraic geometry is not needed for presenting the results. For its role in our
method see [14].) More precisely, in §2 we define, for every p, a linear map
                                                         mn          mn
                                                     nl( p )
                                     (Mhm,n,li )∧p
                                                A : C        → Cml( p+1)                                 (1.4)

and we prove that
                                             mn − 1 −1
                                                  
                                                       rank (Mhm,n,li )∧p
                                                                         
                             R(Mhm,n,li ) ≥                            A    .
                                               p
In order to prove Theorem 1.1, we specialize this map for a judiciously chosen p to a subspace where it
becomes injective. The above-mentioned equations are the minors of the linear map (Mhm,n,li )∧p      A .
    The map (1.4) is of interest in its own right, we discuss it in detail in §4. This is done with the help of
representation theory—we explicitly describe the kernel as a sum of irreducible representations labeled
by Young diagrams.

Remark 1.4. It is conjectured in the computer science community that R(Mhn,n,ni ) grows like O(n2+ε )
for any ε > 0. A truly significant lower bound would be a function that grew like n2 h(n) where h is
an increasing function. No super-linear lower bound on the complexity of any explicit tensor (or any
computational problem) is known, see [1, 25].
    From a mathematician’s perspective, all known equations for secant varieties of Segre varieties that
have a geometric model arise by translating multi-linear algebra to linear algebra, and it appears that the
limit of this technique is roughly the “input size” 3n2 .

Remark 1.5. The methods used here should be applicable to lower bound problems coming from the
Geometric Complexity Theory (GCT) introduced by Mulmuley and Sohoni [19], in particular to separate
the determinant (small weakly skew circuits) from polynomials with small formulas (small tree circuits).
More precisely, the method used in this paper is a special case of the Young flattenings introduced in [14]
and described in §2.2. In that paper they were used to obtain lower bounds on the symmetric border rank
of polynomials, but the same methods can be used to compare any two polynomials. Both the method of
partial derivatives [5] and the method of shifted partial derivatives [9, 7] may be viewed as special types
of Young flattenings. One includes a space of polynomials Sd V into a space of linear maps (found via
representation theory), and then P ∈ Sd V cannot be in the GL(V )-orbit closure of Q ∈ Sd V if the rank
of the linear map associated to P is greater than the rank of the linear map associated to Q. While it is
unlikely these methods will prove the main conjecture of GCT, we hope they will be able to improve the
state of the art, and at the same time give further insight to the differences and similarities between the
determinant and permanent.

Overview
In §2 we describe the new equations to test for border rank in the language of tensors. Theorem 1.1 is
proved in §3. We discuss the kernel of the map (1.4) in section §4. This analysis should be useful for
future work. We conclude in §5 with a review of Lickteig’s method for purposes of comparison. An
appendix (§A) with basic facts from representation theory that we use is included for readers not familiar
with the subject.

                    T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                               287
                              J OSEPH M. L ANDSBERG AND G IORGIO OTTAVIANI

Acknowledgments
We thank K. Mulmuley and A. Wigderson for discussions regarding the perspective of computer scientists,
P. Bürgisser and A. Wigderson for help improving the exposition, J. Hauenstein with help with computer
calculations, and M. Bläser for help with the literature.


2     The new equations
Let A, B,C be complex vector spaces of dimensions a, b, c, with b ≤ c, and with dual vector spaces
A∗ , B∗ ,C∗ . Then A ⊗ B ⊗C may be thought of as the space of bilinear maps A∗ × B∗ → C.
     The most naïve equations for border rank are the so-called flattenings. Given T ∈ A ⊗ B ⊗C, consider
T as a linear map B∗ → A ⊗C and write TB for this map. Then R(T ) ≥ rank(TB ) and similarly for the
analogous TA , TC . The inequality is because flattenings are sub-additive, and if T has rank one, the rank of
TB is one, so if T has border rank r, the rank of TB is at most r. The reason one obtains a lower bound on
border rank is because the rank of a linear map is determined by taking minors, an algebraic operation.


2.1   Strassen’s equations
Strassen’s equations [24] may be understood as follows (see §2.2 for the geometric origin of this
perspective). As described in [20], tensor TB with IdA to obtain a linear map B∗ ⊗ A → A ⊗ A ⊗C and
skew-symmetrize the A ⊗ A factor to obtain a map

                                           TA∧1 : B∗ ⊗ A → Λ2 A ⊗C .

If T is generic, then one can show that TA∧1 will have maximal rank, and if T = a ⊗ b ⊗ c is of rank
one, rank((a ⊗ b ⊗ c)∧1      A ) = a − 1. To see this, expand a = a1 to a basis a1 , . . . , aa of A with dual basis
α 1 , . . . , α a of A∗ . Then
                                           TA∧1 = ∑[α i ⊗ b] ⊗ [a1 ∧ ai ⊗ c] ,
                                                i

so the image is isomorphic to (A/a1 ) ⊗ c.
     It follows that R(T ) ≥ rank(TA∧1 )/(a − 1) because rank(T1 + T2 )∧1             ∧1           ∧1
                                                                       A ≤ rank(T1 )A + rank(T2 )A for
                                                                   ∧1
any tensors T1 , T2 ∈ A ⊗ B ⊗C and thus if R(T ) = r, then rank(T )A ≤ rank(T0 )r = (a − 1)r where T0 is a
rank one tensor. Thus the best bound one could hope for with this technique is up to r = ba/(a − 1). The
minors of size r(a − 1) + 1 of TA∧1 give equations for the tensors of border rank at most r in A ⊗ B ⊗C.
This is most effective when a = 3.
     When a > 3, for each 3-plane A0 ⊂ A, consider the restriction T |A0 ⊗B⊗C and the corresponding
equations, to obtain equations for the tensors of border rank at most r in A ⊗ B ⊗C as long as r ≤ 3b/2.
This procedure is called inheritance (see [11, §7.4.2]).
     We consider the following generalizations: tensor TB with IdΛ p A to obtain a linear map B∗ ⊗
Λ p A → Λ p A ⊗ A ⊗C and skew-symmetrize the Λ p A ⊗ A factor to obtain a map

                                        TA∧p : B∗ ⊗ Λ p A → Λ p+1 A ⊗C .                                      (2.1)

                     T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                                   288
              N EW L OWER B OUNDS FOR THE B ORDER R ANK OF M ATRIX M ULTIPLICATION

To avoid redundancies, assume b ≤ c and p ≤ da/2e − 1. Then, if T = a ⊗ b ⊗ c is of rank one,
                                                                       
                                                                     a−1
                                      rank((a ⊗ b ⊗ c)∧p
                                                      A )=                .
                                                                      p

To see this, compute

                         TA∧p = ∑[α i1 ∧ · · · ∧ α i p ⊗ b] ⊗ [a1 ∧ ai1 ∧ · · · ∧ ai p ⊗ c] ,

to conclude the image is isomorphic to Λ p (A/a1 ) ⊗ c.
    In summary:

Theorem 2.1. Interpret T ∈ A ⊗ B ⊗C as a linear map B∗ → A ⊗C and let TA∧p : B∗ ⊗ Λ p A → Λ p+1 A ⊗C
be the map obtained by skew-symmetrizing T ⊗ IdΛ p A in the A ⊗ Λ p A factor. Then

                                                          rank TA∧p
                                               R(T ) ≥       a−1
                                                                 .
                                                                p


Proof. Let r = R(T ) and let Tε = ∑ri=1 Tε,i be such that R(Tε,i ) = 1 and limε → 0 Tε = T . Then

                                                          r                          
                                                                                  a−1
                         rank TA∧p ≤ rank(Tε )∧p
                                              A ≤        ∑     rank(Tε,i )∧p
                                                                          A =r          .
                                                         i=1                       p

Remark 2.2. Alternatively, one can compute the rank using the vector bundle techniques of [14].

    When this article was posted on arXiv, we only knew that the minors of size r a−1
                                                                                          
                                                                                        p + 1 of the maps
                                                                                               √
TA∧p gave nontrivial equations for tensors of border rank at most r in A ⊗ B ⊗C for r ≤ 2a − a. Then,
in [13], it was shown they actually give nontrivial equations at least up to 2b − 3 (the maximum possible
would be 2b − 2).

    We record the following proposition which follows from Stirling’s formula and the discussion above.

                                              of tensors of border rank at most r in A ⊗ B ⊗C obtained
Proposition 2.3. The equations for the variety
by taking minors of TA∧p are of degree r a−1
                                           p + 1. In particular, when r approaches the upper bound 2b
and p = d a2 e − 1, the equations are asymptotically of degree
                                                  r
                                                      2 2a b
                                                       √     .
                                                      π a−1

    Theorem 1.1 is obtained by applying the inheritance principle to the case of an (n + m − 1)-plane
A0 ⊂ A = Cnm .

                    T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                         289
                             J OSEPH M. L ANDSBERG AND G IORGIO OTTAVIANI

2.2    Origin of the equations corresponding to minors of (2.1)
This subsection is not used in the proof of the main theorem. We work in projective space as the objects
we are interested in are invariant under rescaling.
    Let
                                   Seg(PA × PB × PC) ⊂ P(A ⊗ B ⊗C)
denote the Segre variety of rank one tensors and let σr (Seg(PA × PB × PC)) denote its r-th secant variety,
the variety of tensors of border rank at most r.
    In [14] we introduced a generalization of flattenings, called Young flattenings, which in the present
context is as follows: Irreducible polynomial representations of the general linear group GL(A) correspond
to partitions π = (π1 , . . . , πa ); see §A.1. Let Sπ A denote the corresponding GL(A)-module. Consider
representations Sπ A, Sµ B, Sν C, and the identity maps IdSπ A ∈ Sπ A ⊗ Sπ A∗ , etc. Then we may consider

            T ⊗ IdSπ A ⊗ IdSµ A ⊗ IdSν A ∈ A ⊗ B ⊗C ⊗ Sπ A ⊗ Sπ A∗ ⊗ Sµ B ⊗ Sµ B∗ ⊗ Sν C ⊗ Sν C∗ .

We may decompose Sπ A ⊗ A according to the Pieri rule (see §A.3) and project to one irreducible
component, say Sπ̃ A, where π̃ is obtained by adding a box to π, and similarly for C, while for B we
may decompose Sµ B∗ ⊗ B and project to one irreducible component, say Sµ̂ B∗ , where µ̂ is obtained by
deleting a box from µ. The upshot is a tensor

                               T 0 ∈ Sπ̃ A ⊗ Sµ B ⊗ Sν̃ C ⊗ Sπ A∗ ⊗ Sµ̂ B∗ ⊗ Sν C∗

which we may then consider as a linear map, e. g.,

                                T 0 : Sπ A ⊗ Sµ B∗ ⊗ Sν C → Sπ̃ A ⊗ Sµ̂ B∗ ⊗ Sν̃ C

and rank conditions on T 0 may give border rank conditions on T .
   Returning to the minors of (2.1), the minors of size t + 1 of TA∧p give modules of equations which are
contained in

      Λt+1 (Λ p A ⊗ B∗ ) ⊗ Λt+1 (Λ p+1 A∗ ⊗C∗ ) =              Sµ (Λ p A) ⊗ Sµ 0 B∗ ⊗ Sν (Λ p+1 A∗ ) ⊗ Sν 0 C∗ .
                                                      M
                                                                                                                   (2.2)
                                                    |µ|=t+1,
                                                     |ν|=t+1

Determining which irreducible submodules of (2.2) actually contribute nontrivial equations appears to be
difficult.


3     Proof of Theorem 1.1
Let M, N, L be vector spaces of dimensions m, n, l. Write A = M ⊗N ∗ , B = N ⊗L∗ , C = L⊗M ∗ , so a = mn,
b = nl, c = ml. The matrix multiplication operator Mhm,n,li is Mhm,n,li = IdM ⊗ IdN ⊗ IdL ∈ A ⊗ B ⊗ C.
(See [11, §2.5.2] for an explanation of this identification.) Let U = N ∗ . Then

                       (Mhm,n,li )∧p          p              ∗
                                  A : L ⊗U ⊗ ∧ (M ⊗U) → L ⊗ M ⊗ ∧
                                                                  p+1
                                                                      (M ⊗U) .                                     (3.1)

                     T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                                       290
               N EW L OWER B OUNDS FOR THE B ORDER R ANK OF M ATRIX M ULTIPLICATION

                                                                ∧p
This is just the identity map on the L factor, so we may write Mm,n,l = ψ p ⊗ IdL , where

                                ψ p : Λ p (M ⊗U) ⊗U → M ∗ ⊗ Λ p+1 (M ⊗U) .                                 (3.2)

    The essential idea is to choose a subspace A0 ⊂ M ⊗ U on which the “restriction” of ψ p becomes
injective for p = n − 1. Take a vector space W of dimension 2, and fix isomorphisms U ' Sn−1W ∗ ,
M ' Sm−1W ∗ . Let A0 be the SL(W )-direct summand Sm+n−2W ∗ ⊂ Sn−1W ∗ ⊗ Sn−1W ∗ = M ⊗U.
    Recall that Sα W may be interpreted as the space of homogeneous polynomials of degree α in two
variables. If f ∈ Sα W and g ∈ Sβ W ∗ (with β ≤ α) then we can perform the contraction g · f ∈ Sα−β W .
In the case f = l α is the power of a linear form l, then the contraction g · l α equals l α−β multiplied by the
value of g at the point l, so that g · l α = 0 if and only if l is a root of g.
    Consider the natural skew-symmetrization map

                                             A0 ⊗ Λn−1 (A0 ) → Λn (A0 ) .                                  (3.3)

Because SL(W ) is reductive, there is a unique SL(W )-complement A00 to A0 , so the projection M ⊗U → A0
is well defined. Compose (3.3) with the projection

                                        M ⊗U ⊗ Λn−1 (A0 ) → A0 ⊗ Λn (A0 )                                  (3.4)

to obtain
                                          M ⊗U ⊗ Λn−1 (A0 ) → Λn (A0 ) .                                   (3.5)
Now (3.5) gives a map
                                      ψ p0 : U ⊗ Λn−1 (A0 ) → M ∗ ⊗ Λn (A0 ) .                             (3.6)
We claim (3.6) is injective. (Note that when n = m the source and target space of (3.6) are dual to each
other.)
    Consider the transposed map Sm−1W ∗ ⊗ Λn Sm+n−2W → Sn−1W ⊗ Λn−1 Sm+n−2W . It is defined as
follows on decomposable elements (and then extended by linearity):
                                                     n
                         g ⊗ ( f1 ∧ · · · ∧ fn ) 7→ ∑ (−1)i−1 g( fi ) ⊗ f1 ∧ · · · f̂i · · · ∧ fn .
                                                    i=1

We show this dual map is surjective. Let

                        l n−1 ⊗ (l1m+n−2 ∧ · · · ∧ ln−1
                                                    m+n−2
                                                          ) ∈ Sn−1W ⊗ Λn−1 Sm+n−2W

with li ∈ W . Such elements span the target so it will be sufficient to show any such element is in the image.
Assume first that l is distinct from the li . Since n ≤ m, there is a polynomial g ∈ Sm−1W ∗ which vanishes
on l1 , . . . , ln−1 and is nonzero on l. Then, up to a nonzero scalar, g ⊗ (l1m+n−2 ∧ · · · ∧ ln−1
                                                                                                m+n−2
                                                                                                      ∧ l m+n−2 )
maps to our element.
    Since the image is closed (being a linear space), the condition that l is distinct from the li may be
removed by taking limits.
    Finally, ψ p0 ⊗ IdL : B∗ ⊗ Λn−1 A0 → C ⊗ Λn A0 is the map induced from the restricted matrix multiplica-
tion operator.

                     T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                               291
                              J OSEPH M. L ANDSBERG AND G IORGIO OTTAVIANI

   To complete the proof of Theorem 1.1, observe  that an element of rank one in A0 ⊗ B ⊗C induces a
      ∗   n−1  0         n  0        n+m−2
                                           
map B ⊗ Λ A → C ⊗ Λ A of rank n−1 .
   By Lemma 3.1 below, the border rank of Mhm,n,li must be at least the border rank of T 0 ∈ A0 ⊗ B ⊗C,
and by Theorem 2.1
                                                           n+m−1
                                                                
                                 dim B ∗ ⊗ Λn−1 (A0 )              nl (n + m − 1)
                       R(T 0 ) ≥        n+m−2
                                                           n−1
                                                      = nl n+m−2=                .
                                         n−1                n−1
                                                                          m

This concludes the proof of Theorem 1.1.

Lemma 3.1. Let T ∈ A ⊗ B ⊗C, let A = A0 ⊕ A00 and let π : A → A0 be the linear projection, which induces
π̃ : A ⊗ B ⊗C → A0 ⊗ B ⊗C. Then R(T ) ≥ R(π̃(T )) and R(T ) ≥ R(π̃(T )).

Proof. If T = ∑ri=1 ai ⊗ bi ⊗ ci then π̃(T ) = ∑ri=1 π(ai ) ⊗ bi ⊗ ci .

Remark 3.2. If we let B0 = U, C0 = M, then in the proof above we are just computing the rank of (T 0 )∧p   A
where T 0 ∈ A ⊗ B0 ⊗C0 is IdU ⊗ IdM . The maximal border rank of a tensor T in Cmn ⊗ Cm ⊗ Cn is mn
which occurs anytime the map T : Cmn∗ → Cm ⊗ Cn is injective, so T 0 is a generic tensor in A ⊗ B0 ⊗C0 ,
and the calculation of rank(ψ p0 ) is determining the maximal rank of (T 0 )∧p
                                                                            A for a generic element of C
                                                                                                         mn ⊗
  n      m                                         0                         n−1
C ⊗ C . Also note that the projection A → A , viewed as linear map S W ⊗ S W → S   ∗   n−1    ∗    m+n−2  W∗
is just polynomial multiplication.


4     The kernel through representation theory
The purpose of this section is to show that there are nontrivial equations for tensors of border rank less
than 2n2 that matrix multiplication does satisfy.


4.1   The kernel as a module
Assume b ≤ c, so n ≤ m. For a partition π, let `(π) denote the number of parts of π, i. e., the largest k
such that πk > 0, where we write π = (π1 , . . . , πk ). Let π 0 denote the conjugate partition to π. See §A.1
for the definition of Sπ U.

Example 4.1. Consider the case m = n = 3, take p = 4. Let


                             α1 =         ,      α2 =        ,      α3 =        .


Note that α1 = α30 , α2 = α20 . Then (see §A.2)

                    Λ4 (M ⊗U) = (Sα3 M ⊗ Sα1 U) ⊕ (Sα2 M ⊗ Sα2 U) ⊕ (Sα1 M ⊗ Sα3 U) .

                     T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                             292
               N EW L OWER B OUNDS FOR THE B ORDER R ANK OF M ATRIX M ULTIPLICATION

Observe that (via the Pieri rule, see §A.3)


                                 ⊗         =                 ⊕             ,



                                 ⊗         =                 ⊕             ,



                                 ⊗         =                     ⊕              ⊕   .


Among the seven summands on the right-hand side, only




                                                 ∧4 in this case contains L∗ ⊗ S
does not fit in the 3 × 3 square. The kernel of M3,3,l                          2,1,1 M ⊗ S4,1U, corre-
sponding to
                                   π=                 π + (1) =

which has dimension l · 24 · 3 = 72l, and one can verify, either by computer or by explicitly writing down
highest weight vectors of each module, that this is the entire kernel.

Lemma 4.2. ker(Mhm,n,li )∧p
                         A ⊇ ⊕π Sπ 0 M ⊗ Sπ+(1)U ⊗ L, where the summation is over partitions

                                            π = (m, ν1 , . . . , νn−1 ),

where ν = (ν1 , . . . , νn−1 ) is a partition of p − m, ν1 ≤ m and

                                      π + (1) = (m + 1, ν1 , . . . , νn−1 ) .
                ∧p
Proof. Write Mm,n,l  = ψ p ⊗ IdL , where ψ p : Λ p (M ⊗U) ⊗U → M ∗ ⊗ Λ p+1 (M ⊗U). Such modules are
contained in the kernel by Schur’s lemma, as there is no corresponding module in the target for it to map
to.

Remark 4.3. We expect that equality holds in Lemma 4.2.


5    Review of Lickteig’s bound
For comparison, we outline the proof of Lickteig’s bound. (Expositions of Strassen’s bound are given
in several places, e. g., [11, Chap. 3] and [3, §19.3].) It follows in three steps. The first combines
two standard facts from algebraic geometry: for varieties X,Y ⊂ PV , let J(X,Y ) ⊂ PV denote the join

                     T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                         293
                            J OSEPH M. L ANDSBERG AND G IORGIO OTTAVIANI

of X and Y . Then σr+s (X) = J(σr (X), σs (X)). If X = Seg(PA × PB × PC) is a Segre variety, then
σs (Seg(PA × PB × PC)) ⊆ Subs (A ⊗ B ⊗C), where
                                                                              
                                                  ∃A0 ⊂ A, B0 ⊂ B,C0 ⊂ C,
             Subs (A ⊗ B ⊗C) := T ∈ A ⊗ B ⊗C dim A0 = dim B0 = dimC0 = s, .
                                                   T ∈ A0 ⊗ B0 ⊗C0
                                                                              

See, e. g., [11, §7.1.1] for details. Next Lickteig observes that if T ∈ σr+s (Seg(PA × PB × PC)), then
there exist A0 , B0 ,C0 each of dimension s such that, thinking of T : A∗ ⊗ B∗ → C,

                                   dim T ((A0 )⊥ ⊗ B∗ + A∗ ⊗ (B0 )⊥ ) ≤ r .
                                                                      
                                                                                                  (5.1)

This follows because the condition is a closed condition and it holds for points on the open subset of
points in the span of r + s points on Seg(PA × PB × PC).
    Finally, for matrix multiplication, with A = M ⊗ N ∗ , etc., he defines M 0 ⊂ M, N ∗ 0 ⊂ N ∗ to be the
smallest spaces such that A0 ⊆ M 0 ⊗ N ∗ 0 and similarly for the other spaces. Then one applies (5.1)
combined with the observation that M|(A0 )⊥ ⊗B∗ ⊆ M 0 ⊗ L∗ , etc., and keeps track of the various bounds to
conclude.


A     Appendix: Facts from representation theory
A.1    Representations of GL(V )
The irreducible representations of GL(V ) are indexed by sequences π = (p1 , . . . , pl ) of non-increasing
integers with l ≤ dimV . Those that occur in V ⊗d are partitions of d, and we write |π| = d and Sπ V for
the module. V ⊗d is also an Sd -module, and the groups GL(V ) and Sd are the commutants of each other
in V ⊗d which implies the famous Schur-Weyl duality that

                                        V ⊗d =
                                                     M
                                                                Sπ V ⊗ [π]
                                                 |π|=d,`(π)≤v

as a (GL(V ) × Sd )-module, where [π] is the irreducible Sd -module associated to π. Repeated num-
bers in partitions are sometimes expressed as exponents when there is no danger of confusion, e. g.,
(3, 3, 1, 1, 1, 1) = (32 , 14 ). For example, S(d)V = Sd V and S(1d )V = Λd V . The modules Ssv V = (ΛvV )⊗s
are trivial as SL(V )-modules. The module S(22)V is the home of the Riemann curvature tensor in Rie-
mannian geometry. See any of [11, Chap. 6], [6, Chap 6] or [21, Chap. 9] for more details on the
representations of GL(V ) and what follows.

A.2    Useful decomposition formulas
To decompose S2 (A ⊗ B) as a GL(A) × GL(B)-module, note that given P ∈ S2 A and Q ∈ S2 B, the product
of P and Q defined by P ⊗ Q(α ⊗ β , α 0 ⊗ β 0 ) := P(α, α 0 )Q(β , β 0 ) will be in S2 (A ⊗ B). Similarly, if
P ∈ Λ2 A and Q ∈ Λ2 B, P ⊗ Q will also be symmetric as

                   P(α 0 , α)Q(β 0 , β ) = [−P(α, α 0 )][−Q(β , β 0 )] = P(α, α 0 )Q(β , β 0 ) .

                    T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                             294
               N EW L OWER B OUNDS FOR THE B ORDER R ANK OF M ATRIX M ULTIPLICATION

Since the dimensions of these spaces add to the dimension of S2 (A ⊗ B) we conclude

                                  S2 (A ⊗ B) = (S2 A ⊗ S2 B) ⊕ (Λ2 A ⊗ Λ2 B) .

By an analogous argument, we have the decomposition

                                 Λ2 (A ⊗ B) = (S2 A ⊗ Λ2 B) ⊕ (Λ2 A ⊗ S2 B) .

More generally (see, e. g., [11, §6.5.2]) we have

                                       Λ p (A ⊗ B) = ⊕|π|=p Sπ A ⊗ Sπ 0 B ,                                 (A.1)
                                       S p (A ⊗ B) = ⊕|π|=p Sπ A ⊗ Sπ B                                     (A.2)

where π 0 denotes the conjugate partition to π, that is, if we represent π = (p1 , . . . , pn ) by a Young diagram,
with p j boxes in the j-th row, the diagram of π 0 is obtained by reflecting the diagram of π along the NW
to SE axis.

A.3    The Pieri rule
The decomposition of Sπ V ⊗V is multiplicity free, consisting of a copy of each Sµ V such that the Young
diagram of µ is obtained from the Young diagram of π by adding a box. (Boxes must be added in such a
way that one still has a partition and the number of rows is at most the dimension of V .)
    For example:

                                          ⊗         =               ⊕         .


More generally, the Pieri formula states that Sπ V ⊗ Sd V decomposes multiplicity free into the sum of all
Sµ V that can be obtained by adding d boxes to the Young diagram of π in such a way that no two boxes
are added to the same column, and Sπ V ⊗ Λd V decomposes multiplicity free into the sum of all Sµ V that
can be obtained by adding d boxes to the Young diagram of π in such a way that no two boxes are added
to the same row. See any of the standard references given above for details.


References
 [1] S ANJEEV A RORA AND B OAZ BARAK: Computational Complexity: A Modern Approach. Cam-
     bridge Univ. Press, 2009. 285, 287

 [2] M ARKUS B LÄSER: A 52 n2 -lower bound for the rank of n × n-matrix multiplication
     over arbitrary fields. In Proc. 40th FOCS, pp. 45–50. IEEE Comp. Soc. Press, 1999.
     [doi:10.1109/SFFCS.1999.814576] 286

 [3] P ETER B ÜRGISSER , M ICHAEL C LAUSEN , AND M OHAMMAD A MIN S HOKROLLAHI: Algebraic
     Complexity Theory. Volume 315 of Grundlehren der mathematischen Wissenschaften. Springer,
     1997. [doi:10.1007/978-3-662-03338-8] 285, 286, 293

                     T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                                 295
                          J OSEPH M. L ANDSBERG AND G IORGIO OTTAVIANI

 [4] P ETER B ÜRGISSER AND C HRISTIAN I KENMEYER: Geometric complexity theory and tensor
     rank. In Proc. 43rd STOC, pp. 509–518. ACM Press, 2011. [doi:10.1145/1993636.1993704,
     arXiv:1011.1350] 286

 [5] X I C HEN , N EERAJ K AYAL , AND AVI W IGDERSON: Partial derivatives in arithmetic complexity
     and beyond. Foundations and Trends in Theoretical Computer Science, 6(1-2):1–138, 2011.
     [doi:10.1561/0400000043] 287

 [6] W ILLIAM F ULTON AND J OE H ARRIS: Representation Theory. Volume 129 of Graduate Texts in
     Mathematics. Springer, 1991. [doi:10.1007/978-1-4612-0979-9] 294

 [7] A NKIT G UPTA , P RITISH K AMATH , N EERAJ K AYAL , AND R AMPRASAD S APTHARISHI: Ap-
     proaching the chasm at depth four. J. ACM, 61(6):33:1–33:16, 2014. Preliminary versions in ECCC
     and CCC’13. [doi:10.1145/2629541] 287

 [8] J ONATHAN D. H AUENSTEIN , C HRISTIAN I KENMEYER , AND J OSEPH M. L ANDSBERG: Equa-
     tions for lower bounds on border rank. Experimental Mathematics, 22(4):372–383, 2013.
     [doi:10.1080/10586458.2013.825892] 286

 [9] N EERAJ K AYAL: An exponential lower bound for the sum of powers of bounded degree polynomials.
     Electron. Colloq. on Comput. Complexity (ECCC), TR12-081, 2012. 287

[10] J OSEPH M. L ANDSBERG: The border rank of the multiplication of 2 × 2 matrices is seven. J. Amer.
     Math. Soc., 19(2):447–459, 2006. [doi:10.1090/S0894-0347-05-00506-0, arXiv:math/0407224] 286

[11] J OSEPH M. L ANDSBERG: Tensors: Geometry and Applications. Volume 128 of Graduate Studies
     in Mathematics. Amer. Math. Soc., Providence, 2012. [doi:10.1090/gsm/128] 286, 288, 290, 293,
     294, 295

[12] J OSEPH M. L ANDSBERG: New lower bounds for the rank of matrix multiplication. SIAM J.
     Comput., 43(1):144–149, 2014. [doi:10.1137/120880276, arXiv:1206.1530] 286

[13] J OSEPH M. L ANDSBERG: Nontriviality of equations and explicit tensors in Cm ⊗ Cm ⊗
     Cm of border rank at least 2m − 2.   J. Pure Appl. Algebra, 219(8):3677–3684, 2015.
     [doi:10.1016/j.jpaa.2014.12.016] 289

[14] J OSEPH M. L ANDSBERG AND G IORGIO OTTAVIANI: Equations for secant varieties of
     Veronese and other varieties. Annali di Matematica Pura ed Applicata, 192(4):569–606, 2013.
     [doi:10.1007/s10231-011-0238-6] 287, 289, 290

[15] F RANÇOIS L E G ALL: Powers of tensors and fast matrix multiplication. In Proceedings of the
     39th Int. Symp. on Symb. and Alg. Comput. (ISSAC’14), pp. 296–303. ACM, New York, 2014.
     [doi:10.1145/2608628.2608664, arXiv:1401.7714] 286

[16] T HOMAS L ICKTEIG: A note on border rank.            Inf. Process. Lett., 18(3):173–178, 1984.
     [doi:10.1016/0020-0190(84)90023-1] 286

                   T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                       296
             N EW L OWER B OUNDS FOR THE B ORDER R ANK OF M ATRIX M ULTIPLICATION

[17] A LEX M ASSARENTI AND√ E MANUELE R AVIOLO: The rank of n × n matrix multipli-
                                      3
     cation is at least 3n2 − 2 2n 2 − 3n. Linear Algebra Appl., 438(11):4500–4509, 2013.
     [doi:10.1016/j.laa.2013.01.031] 286

[18] A LEX M ASSARENTI AND E MANUELE        √ R3 AVIOLO: Corrigendum to “The rank of n × n ma-
     trix multiplication is at least 3n2 − 2 2n 2 − 3n”. Linear Algebra Appl., 445:369–371, 2014.
     [doi:10.1016/j.laa.2013.12.009] 286

[19] K ETAN D. M ULMULEY AND M ILIND A. S OHONI: Geometric complexity theory I: an ap-
     proach to the P vs. NP and related problems. SIAM J. Comput., 31(2):496–526, 2001.
     [doi:10.1137/S009753970038715X] 287

[20] G IORGIO OTTAVIANI: Symplectic bundles on the plane, secant varieties and Lüroth quartics
     revisited. In Vector bundles and low codimensional subvarieties: state of the art and recent
     developments, volume 21 of Quad. Mat., pp. 315–352. Dept. Math., Seconda Univ. Napoli, Caserta,
     2007. [arXiv:math/0702151] 288

[21] C LAUDIO P ROCESI: Lie Groups: An Approach through Invariants and Representations. Universi-
     text. Springer, 2007. [doi:10.1007/978-0-387-28929-8] 294

[22] A LEXEY V. S MIRNOV: The bilinear complexity and practical algorithms for matrix multiplication.
     Comput. Math. Math. Phys., 53(12):1781–1795, 2013. [doi:10.1134/S0965542513120129] 286

[23] VOLKER S TRASSEN: Gaussian elimination is not optimal. Numer. Math., 13(4):354–356, 1969.
     [doi:10.1007/BF02165411] 286

[24] VOLKER S TRASSEN: Rank and optimal computation of generic tensors. Linear Algebra Appl.,
     52-53:645–685, 1983. [doi:10.1016/0024-3795(83)80041-X] 286, 288

[25] AVI W IGDERSON: P, NP and mathematics—a computational complexity perspective. In In-
     ternational Congress of Mathematicians. Vol. I, pp. 665–712. Eur. Math. Soc., Zürich, 2007.
     [doi:10.4171/022-1/25] 287


AUTHORS

      Joseph M. Landsberg
      Professor of Mathematics
      Texas A&M University, College Station, TX, USA
      jml math tamu edu
      http://www.math.tamu.edu/~jml




                  T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                       297
                         J OSEPH M. L ANDSBERG AND G IORGIO OTTAVIANI

    Giorgio Ottaviani
    Professore Ordinario di Geometria
    Università di Firenze, Firenze Italy
    ottaviani math unifi it
    http://web.math.unifi.it/users/ottaviani/


ABOUT THE AUTHORS

    J OSEPH M. L ANDSBERG works on questions at the interface of algebraic geometry, differ-
        ential geometry and representation theory. His current research is focused on geometric
        problems originating in complexity theory. Landsberg obtained his Ph. D. in 1990 from
        Duke University under the direction of Robert Bryant on minimal submanifolds and a
        generalization of calibrations. He has directed six Ph. D. theses, is the author of Tensors:
        Geometry and Applications, is a co-author of Cartan for Beginners, and has published
        over fifty research articles. Fall 2014 he co-organized Algorithms and Complexity in
        Algebraic Geometry at the Simons Institute for the Theory of Computing, UC Berke-
        ley, where he was also the Chancellor’s Professor and gave a course on geometry and
        complexity theory whose class notes will hopefully soon be a book.


    G IORGIO OTTAVIANI is a Professor of Geometry at the University of Florence, Italy. He
        received his Ph. D. in mathematics in 1987 at the University of Florence, under the
        guidance of Vincenzo Ancona, discussing the thesis “Vector bundles on Grassmannians
        and Quadrics.” His M. Sc. advisor was Francesco Gherardelli. He moved to the University
        of Rome Tor Vergata as an assistant researcher and then to the University of L’Aquila as
        a professor. Ottaviani was awarded the Foundation Severi Prize in 1989. His first works
        discussed instanton vector bundles, homogeneous vector bundles, small codimension
        projective subvarieties, and enjoying the help of computational tools. In the past eight
        years, Ottaviani has become interested in the field of tensors and their applications,
        focusing on tensor rank.




                 T HEORY OF C OMPUTING, Volume 11 (11), 2015, pp. 285–298                              298