Tensor Network Complexity of Multilinear Maps

Authors Per Austrin, Petteri Kaski, Kaie Kubjas,
Plaintext
                          T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54
                                       www.theoryofcomputing.org




                Tensor Network Complexity of
                      Multilinear Maps
                    Per Austrin∗                   Petteri Kaski†           Kaie Kubjas‡
               Received November 28, 2018; Revised December 20, 2020; Published June 22, 2022




       Abstract. We study tensor networks as a model of arithmetic computation for
       evaluating multilinear maps. These capture any algorithm based on low-rank tensor
       decompositions, such as 𝑂(𝑛 𝜔+𝜖 ) time matrix multiplication, and in addition many
       other algorithms such as 𝑂(𝑛 log 𝑛) time discrete Fourier transform and 𝑂 ∗ (2𝑛 ) time
       for computing the permanent of a matrix. However, tensor networks sometimes
       yield faster algorithms than those that follow from low-rank decompositions. For
       instance the fastest known 𝑂(𝑛 (𝜔+𝜖)𝑡 ) time algorithms for counting 3𝑡-cliques can be
       implemented with tensor networks, even though the underlying tensor has rank 𝑛 3𝑡
       for all 𝑡 ≥ 2. For counting homomorphisms of a general pattern graph 𝑃 into a host
       graph on 𝑛 vertices we obtain an upper bound of 𝑂(𝑛 (𝜔+𝜖) bw(𝑃)/2 ) where bw(𝑃) is
       the branchwidth of 𝑃. This essentially matches the bound for counting cliques, and
       yields small improvements over previous algorithms for many choices of 𝑃.
     An extended abstract of this paper appeared in the Proceedings of the 10th Innovations in Theoretical Computer
Science Conference (ITCS 2019) [9].
   ∗ Supported by the Swedish Research Council, Grant 621-2012-4546 and the Approximability and Proof Complexity
project funded by the Knut and Alice Wallenberg Foundation.
   † Supported by the European Research Council, Grant 338077.
   ‡ Supported by the European Union’s Horizon 2020 research and innovation programme: Marie Skłodowska-Curie
Grant 748354, research carried out at LIDS, MIT and Team PolSys, LIP6, Sorbonne Université.


ACM Classification: F.1.1, F.2.2, G.2.1, G.2.2
AMS Classification: 15A69, 68Q17, 68Q25, 68W05, 05C85, 14Q20
Key words and phrases: arithmetic complexity, lower bound, multilinear map, tensor network


© 2022 Per Austrin, Petteri Kaski, and Kaie Kubjas
c b Licensed under a Creative Commons Attribution License (CC-BY)                      DOI: 10.4086/toc.2022.v018a016
                                P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

           While powerful, the model still has limitations, and we are able to show a number
       of unconditional lower bounds for various multilinear maps, including the following.
         (a) An Ω(𝑛 bw(𝑃) ) time lower bound for counting homomorphisms from 𝑃 to an
             𝑛-vertex graph, matching the upper bound if 𝜔 = 2. In particular for 𝑃 a
             𝑣-clique this yields an Ω(𝑛 d2𝑣/3e ) time lower bound for counting 𝑣-cliques, and
             for 𝑃 a 𝑘-uniform 𝑣-hyperclique we obtain an Ω(𝑛 𝑣 ) time lower bound for 𝑘 ≥ 3,
             ruling out tensor networks as an approach to obtaining non-trivial algorithms
             for hyperclique counting and the Max-3-CSP problem.
         (b) An Ω(20.918𝑛 ) time lower bound for the determinant and the permanent of an
             𝑛 × 𝑛 matrix.


1     Introduction
One of the cornerstones of the theory of computation is the study of efficient algorithms:

       For a function 𝑓 , how much time is required to evaluate 𝑓 on given inputs?

Answering this question for almost any specific 𝑓 is well beyond reach of contemporary tools.
For example, it is theoretically possible that canonical NP-complete problems, such as the Circuit
Satisfiability problem, can be solved in linear time whereas they are widely believed to require
super-polynomial (or somewhat less widely, exponential) time [47, 51, 52]. The main reason
for this barrier to quantitative understanding is that it is very hard to prove lower bounds for
explicit functions in general models of computation such as circuits or Turing machines.
    This situation withstanding, a more modest program to advance our understanding of
computation is to study restricted models of computation that for many 𝑓 of interest are
simultaneously

    1. general enough to capture the fastest-known algorithms for 𝑓 , and

    2. restricted enough to admit proofs of strong unconditional time lower bounds for 𝑓 .

There is a substantial body of existing work that fits under this program, ranging from the study
of low-depth or otherwise restricted circuits (see, e. g., [8, Ch. 14]) to models of algorithm-design
principles such as greedy algorithms, backtracking, or dynamic programming [3, 41], to linear
or semidefinite programming relaxations for hard combinatorial optimization problems [71].

1.1    Multilinear maps
One class of functions 𝑓 that are of substantial importance is the family of ℓ -linear maps
(multilinear maps) from ℓ input vector spaces to an output vector space.1 Examples range from
maps of known near-linear-time complexity in the input size, such as the discrete Fourier
transform [35, 97], to maps without known polynomial-time-complexity algorithms, such as
    1Multilinear maps with ℓ = 1 are called linear, ℓ = 2 bilinear, ℓ = 3 trilinear, and so forth.


                           T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                     2
                      T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

the permanent of a matrix [89, 95]. Beyond motivation in numerical multilinear algebra and
its applications, recent advances in the study of fine-grained algorithm design and complexity
have highlighted the fundamental role of algebraic methods in the fastest-known algorithm
designs across a broad range of tasks from graph problems, such as all-pairs shortest paths and
𝑘-clique, to parsing and constraint satisfaction problems, such as maximum satisfiability and
graph coloring [2, 17, 19, 44, 53, 75, 102, 103].
     In this paper, we study the arithmetic complexity of evaluating a multilinear map, that is, the
number of operations (scalar additions, subtractions, and multiplications) needed to evaluate
the map. To set up a baseline, a generic ℓ -linear map from ℓ vector spaces of dimension 𝑛 to a
scalar requires Ω(𝑛ℓ ) scalars to represent the map directly using combinations of basis vectors.
Given this complexity of a direct explicit representation, it is a fundamental problem to seek
less costly representations, along with associated efficient algorithms that work on the chosen
representation.
     We propose the systematic study of tensor networks on hypergraphs as a framework for
fast evaluation of multilinear maps, and show a number of upper and lower bounds on the
computational power of tensor networks in the spirit of (i) and (ii) above.

1.2   Tensor networks
Tensor networks have a long and rich history which can be traced as far back as 19th -century
studies in invariant theory due to Cayley [27, 28], Clebsch [31], Clifford [32], Sylvester [94],
and Kempe [57, 58]. Tensor networks are extensively deployed in applications from pure
mathematics and theoretical physics [56, 64, 66, 67, 80, 81, 85, 90] to computational physics
and chemistry [78, 83, 92]. In theoretical computer science, they appear in various guises,
including, for example, in the Holant framework [96, 25, 24], in the study of probabilistic
graphical models [62, 87], in the study of parallel programming [91], in the study of quantum
computing [7], and in the study of arithmetic complexity [11, 26, 40]. Tensor contraction is also
emerging as a basic computational primitive in computer hardware [54, 73]. (See Section 1.5 for
a detailed discussion.) As the precise definitions are somewhat technical, let us start with a few
simple motivating examples and then state our results, with the understanding that precise
definitions appear in Section 4.
    In our setting, a tensor network is a hypergraph in which the vertices are tensors and the
hyperedges are called modes. Each mode that is incident to a tensor defines a “dimension” for
indexing the entries of the tensor—for example, a matrix is a tensor that is incident to two
modes, one mode for the rows of the matrix, and the other mode for the columns of the matrix.
A network may be simplified by a sequence of contractions, where each contraction takes a subset
of tensors and replaces it with a single tensor whose entries are obtained as generalized inner
products of the entries of the tensors being contracted.
    As a concrete first example of these concepts, let us consider the task of multiplying two
matrices, 𝐴 and 𝐵. More specifically, let 𝐴 be a matrix with rows indexed by mode 𝑖 and columns
indexed by mode 𝑘, and let 𝐵 be a matrix with rows indexed by mode 𝑘 and columns indexed
by mode 𝑗. We may represent the multiplication task as the tensor network depicted on the left
in (1.1). The result of contracting 𝐴 and 𝐵 is a new matrix with rows indexed by 𝑖 and columns

                     T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                        3
                                                            P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

indexed by 𝑗, where the entry at each position (𝑖, 𝑗) is 𝑘 𝐴 𝑖 𝑘 𝐵 𝑘 𝑗 . If the three index sets all have
                                                                                                             Í
size 𝑛, then computing 𝐴 · 𝐵 by contracting them in such a direct manner uses Θ(𝑛 3 ) operations.
To obtain faster matrix multiplication, we can rewrite the bare-bones network on the left in (1.1)
using a low-rank decomposition of the matrix multiplication tensor. For example, Strassen’s
decomposition [93] of 2 × 2 matrix multiplication can be represented using the second network
in (1.1). Note that the index 𝑖 used by 𝐴 and the result has been separated into two distinct
indices 𝑖 and 𝑖 0, and similarly for 𝑗 and 𝑘.

                                                            i           j
                     i                                                                                         1 0 0 1 -1 0 1 j
                                                                    γ                                       γ= 0 0 1 0 1 0 0 i
                 A                                                                                             0 1 0 1 0 0 0j
                                                                    ℓ                                          1 -1 1 0 0 1 0
                     k                                                                                                   ℓ
                                                                                                                                                      (1.1)
                 B                                  α                            β            1 0 1 0 1 -1 0 k                  1 1 0 -1 0 1 0 j′
                                                                                              0 0 0 0 1 0 1 ′                   0 0 1 0 0 1 0 ′
                                                                                      j′
                                                                                           α=                  i             β=                   k
                                               i′       k                   k′                0 1 0 0 0 1 0k                    0 0 0 1 0 0 1 j′
                     j
                                                    A                            B            1 1 0 1 0 0 -1                    1 0 -1 0 1 0 1
                                                                                                        ℓ                               ℓ



    We can execute the network by successively contracting groups of vertices. In (1.2) we see
the process of successively contracting pairs of tensors in a carefully chosen order, until only
a single tensor—the result of the computation—remains. Such an execution can be naturally
represented by a rooted binary tree, as shown on the right in (1.2), where the tensors of the
network form the leaves, and each internal node represents the result of contracting its two
children. To summarize, a tensor-network algorithm is specified by providing (a) a tensor
network that when contracted yields the desired result, and (b) an execution tree indicating the
order in which to contract tensors in the network.


                     i       j                                  i       j                   i       j        i       j
                         γ                                          γ                           γ                γ
                                                                                                                         i    j
                         ℓ                                              ℓ                       ℓ                ℓ
                                                                                                                         A·B
                                                                                                                                                      (1.2)
             α                        β                                          β
        i′       k               k′       j′                                k′       j′

             A                        B                                          B



    The cost of performing one of the contractions in an execution is the product of the lengths
of the modes used by any tensor involved in the contraction. This simply measures (up to
a constant factor) the number of arithmetic operations (additions/multiplications) used to
compute the result by a direct, naïve computation that does not depend on the values of the
tensors. For example, the contraction of 𝛼 and 𝐴 in the first step of (1.2) has cost 28 because it
involves the three modes 𝑖 0 (length 2), 𝑘 (length 2) and ℓ (length 7).
    We observe that cost is data-oblivious—the tensor 𝛼 is fixed with many zero-entries but these
entries still contribute to the cost. Indeed, in many cases there may be faster ways of evaluating

                                               T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                                     4
                                           T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

a contraction than to evaluate it naively, and just like we saw above, this can often be dealt with
by rewriting the network appropriately. For instance, consider now the multiplication of two
2 𝑘 × 2 𝑘 matrices. Because the family of matrix multiplication tensors is closed under Kronecker
products, this operation may be computed by a tensor network like the one shown in (1.3)
(depicting the case 𝑘 = 5), where 𝛼, 𝛽 and 𝛾 are as in (1.2). The rows/columns of the matrices
are now indexed by 𝑘-tuples of bits. A cost-efficient execution of this network successively
contracts tensors in the order shown to the right in (1.3). In this execution, the first contraction
of 𝐴 with the first 𝛼 block has a cost of 2 𝑘 · 2 𝑘 · 7, and results in a tensor of size 2 𝑘−1 × 2 𝑘−1 × 7.
Then the contraction with the next 𝛼 block has a cost of 2 𝑘−1 · 2 𝑘−1 · 72 and produces a result of
size 2 𝑘−2 × 2 𝑘−2 × 7 × 7, and so on, until the contraction with the last 𝛼 block which has a cost of
2 · 2 · 7 𝑘 = 𝑂(7 𝑘 ). The contractions with the 𝛽 and 𝛾 blocks behave similarly. Thus all the Θ(𝑘)
contractions in the execution have cost bounded by 𝑂(7 𝑘 ), meaning that we get a total running
time of 𝑂(𝑘7 𝑘 ) = 𝑂(𝑁 log2 7 log 𝑁) for 𝑁 × 𝑁 matrices.2
                                  i1 i2 i3 i4 i5       j1 j2 j3 j4 j5




                                    γ      γ       γ       γ       γ
                                     ℓ1     ℓ2     ℓ3       ℓ4       ℓ5

                                                                                                                (1.3)
         α        α          α      α     α               β        β          β       β        β


       i′1 i′2 i′3 i′4 i′5       k1 k2 k3 k4 k5         k1′ k2′ k3′ k4′ k5′       j1′ j2′ j3′ j4′ j5′

                             A                                                B


    This type of argument can capture any algorithm based on a low-rank decomposition of
the underlying tensor of the multilinear map, and indeed, this enables 𝑂(𝑛 𝜔 )-time3 matrix
multiplication using tensor networks. Beyond simple low-rank decompositions, which always
give rise to “star-like” networks as in (1.3), there are many interesting algorithms that can
be captured using networks with a more complicated topology. For instance, many maps of
substantial importance have a layered structure that decomposes the map to a sequence of
elementary maps. A canonical example is the discrete Fourier transform (DFT), which for a
smooth composite order such as 2 𝑘 , can be decomposed into a fast Fourier transform (FFT) that
consists of a sequence of 𝑘 transforms of order 2 interleaved with diagonal-matrix multiplications
of twiddle factors [35, 97].

1.3    Our results
Starting with motivation (i) and seeking to express existing fastest-known algorithms as
executions of tensor networks by a sequence of contractions, we show upper bounds for a
   2In fact, a more careful analysis gives running time 𝑂(𝑁 log2 7 ).
   3Throughout the paper, we write 𝜔(ℎ) = 𝜔𝔽 (ℎ) for the infimum over all 𝑡 such that the arithmetic complexity
of multiplying an 𝑛 × b𝑛 ℎ c matrix with an b𝑛 ℎ c × 𝑛 matrix is 𝑂(𝑛 𝑡 ) where ℎ > 0 is a constant. While the value of
𝜔(ℎ) may depend on the underlying field 𝔽 , we tacitly ignore this, since the field is fixed throughout the paper.
Also, we write simply 𝜔 = 𝜔(1) for the exponent of square matrix multiplication. For all fields it is known that
2 ≤ 𝜔 < 2.37286 [4, 69, 98]; for the state of the art on 𝜔(ℎ), see [70].


                                          T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                     5
                            P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

number of natural problems. Beyond standard linear settings such as the FFT, not only do tensor
networks realize classical bilinear settings such as Abelian group algebra products and fast
matrix multiplication algorithms based on low tensor rank, they are powerful enough to capture
a substantial number of higher-linearity applications, including Ryser’s algorithm for matrix
permanent [89], and the Kruskal operator [60, 63] (see Section 3.5), which underlies realization of
rank-decompositions for tensor rank [61] and current fastest algorithms for detecting outlier
correlations [55].
    One problem for which tensor networks turn out to be particularly useful is counting
homomorphisms from a fixed pattern graph 𝑃 to a large host graph 𝐺 on 𝑛 vertices. The most
well-studied such problem is when 𝑃 is a 𝑘-clique. For this problem, the currently fastest
algorithms run in time roughly 𝑂(𝑛 𝜔𝑘/3 ) [75, 44]. For general 𝑃, it is known that the problem
can be solved in 𝑂(𝑛 tw(𝑃)+1 ) time [42], where tw(𝑃) is the treewidth of 𝑃. We show that tensor
networks can solve the problem in 𝑂(𝑛 (𝜔+𝜖) bw(𝑃)/2 ) time, where bw(𝑃) is the branchwidth of 𝑃.
When 𝑃 is a 𝑘-clique, we have bw(𝑃) = d2𝑘/3e; if 𝜔 = 2, the running time coincides with that of
the currently fastest algorithms. On the other hand, if 𝜔 > 2, we can slightly improve upon this
to recover the currently fastest known running time of Eisenbrand and Grandoni [44] relying
on fast rectangular matrix multiplication. In the case of general 𝑃, the bound 𝑂(𝑛 (𝜔+𝜖) bw(𝑃)/2 )
improves on the treewidth-based bound for graphs with bw(𝑃) ≤ 2(tw(𝑃) + 1)/𝜔 (and in
particular if 𝜔 = 2 it is always as fast as the treewidth-based bound, ignoring the 𝜖). By recent
results of Curticapean, Dell, and Marx [37], fast algorithms for homomorphism-counting can
be used to obtain fast algorithms for counting subgraphs of 𝐺 isomorphic to 𝑃, and in some
cases our new branchwidth-based bound leads to an improvement; for example, for counting
paths of length 7, 8 or 9, we get a running time of 𝑂(𝑛 3𝜔/2+𝜖 ) < 𝑂(𝑛 3.56 ) compared to 𝑂(𝑛 4 )
using the treewidth-based bound, whereas for very long paths it is not clear whether we would
need 𝜔 = 2 in order for this bound to improve on the treewidth-based bound. Previous work
that combines branch decompositions and fast matrix multiplication includes Dorn [43] and
Bodlaender et al. [21].
    Further applications captured by tensor networks are the set covering and set partitioning
frameworks via fast zeta and Möbius transforms that underlie the current fastest algorithms
for graph coloring [19] and its generalizations such as computing the Tutte polynomial [16, 17].
To summarize, we have the following compendium of upper bound results. For the detailed
definitions of the relevant multilinear maps, see Sections 3 and 5.
Theorem 1.1. We have the following upper bounds on arithmetic complexity via tensor networks.
   1. 𝑂(𝑛 𝜔(ℎ)+𝜖 ) for the multiplication of matrices of shape 𝑛 × b𝑛 ℎ c and b𝑛 ℎ c × 𝑛 where ℎ > 0 is a
      constant.
   2. 𝑂(𝑛 (𝜔+𝜖) bw(𝑃)/2 ) for counting homomorphisms of a fixed pattern hypergraph 𝑃 into a hypergraph
      on 𝑛 vertices.
   3. 𝑂(𝑛 𝛽+𝜖 ) for counting 𝑣-cliques in an 𝑛-vertex graph, where 𝛽 is the exponent of the complexity of
      multiplying matrices of shape 𝑛 b𝑣/3c × 𝑛 b(𝑣+1)/3c and 𝑛 b(𝑣+1)/3c × 𝑛 b(𝑣+2)/3c .
   4. 𝑂(max(𝑛 dℓ /2e(𝜔+𝜖−1) 𝑟, 𝑛 2dℓ /2e 𝑟 𝜔+𝜖−2 )) for the Kruskal operator of ℓ matrices of shape 𝑛 × 𝑟.

                       T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                 6
                        T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

   5. 𝑂(2 𝑘 𝑘) for the discrete Fourier transforms for the Abelian groups ℤ2𝑘 and ℤ2𝑘 .

   6. 𝑂(2 𝑘 𝑘) for group algebra products on 𝔽 [ℤ2𝑘 ] and 𝔽 [ℤ2𝑘 ] when 2 is unit in 𝔽 .

   7. 𝑂(2 𝑘 𝑘) for the semigroup algebra product on 𝔽 [({0, 1} 𝑘 , ⊆, ∩, ∪)].

   8. 𝑂(2𝑛 𝑛) for the permanent of an 𝑛 × 𝑛 matrix.

Above 𝜖 > 0 is an arbitrary constant.

    Perhaps the most interesting application above is the 𝑣-clique problem,               which suggests
                                                                                       𝑣
that one should seek to pursue generalizations to 𝑣-vertex hypercliques of 𝑘 hyperedges with
𝑣 > 𝑘 ≥ 3. Indeed, subgraph counting is a problem that has received substantial interest over the
years (e. g., [53, 75, 6, 5, 44, 18, 20, 99, 100, 46, 45, 77, 59, 37]), but progress in the particular case
of 𝑣-clique has been stuck to the extent that the problem has attracted notoriety as a hardness
assumption in fine-grained complexity [1, 2]. Beyond the study of cliques, hypercliques, and
subgraph counting, nontrivial algorithms for such forms would have immediate applicability, for
example, in the study of maximum constraint satisfaction problems (Max-CSP) for constraints
of width 𝑘 ≥ 3; see Williams [102] for the case 𝑘 = 2. One of the main goals of our subsequent
lower bounds is to rule out tensor networks as candidates to yield improved algorithms in this
setting.
    Turning from upper bounds to lower bounds and motivation (ii), tensor networks are
restricted enough to enable nontrivial lower bounds for many multilinear maps. To begin
with, an immediate limitation of tensor networks is that all the intermediate results during the
execution of a network are multilinear, and the execution of a network can be simulated by a
multilinear circuit. Raz [84] shows that multilinear formulas cannot compute the determinant
of an 𝑛 × 𝑛 matrix in a polynomial number of operations in 𝑛, even though polynomial-size
general circuits are known for the determinant (see [13, 22, 36, 88]).
    It turns out that considerably stronger lower bounds can be shown for tensor networks.
In particular, we establish lower bounds for arithmetic complexity via tensor networks of
𝑃-homomorphism counting and the Kruskal operator. These lower bounds are tight under the
assumption 𝜔 = 2. Furthermore, we rule out the possibility of any nontrivial algorithm designs
via tensor networks f or counting cliques in hypergraphs. The following theorem collects our
main lower-bound results, and should be contrasted with the upper bounds in Theorem 1.1.

Theorem 1.2. We have the following lower bounds on arithmetic complexity via tensor networks.

   1. Ω(𝑛 bw(𝑃) ) for the multilinear form corresponding to 𝑃-homomorphism counting. In particular,
      this yields a lower bound of Ω(𝑛 d2𝑣/3e ) for counting cliques of size 𝑣, and a lower bound of Ω(𝑛 𝑣 )
      for counting hypercliques of size 𝑣.

   2. Ω(max(𝑛ℓ , 𝑛 dℓ /2e 𝑟)) for the Kruskal operator of ℓ matrices of shape 𝑛 × 𝑟.
          𝑛
             ) for the determinant and the permanent of an 𝑛 × 𝑛 matrix.
             
   3. Ω( 𝑛/3

                       T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                              7
                               P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

    We remark that [72] independently showed that the rank of the 𝑣-hyperclique tensor is
Ω(𝑛 𝑣 ); our Ω(𝑛 𝑣 ) lower bound for tensor   networks strengthens that. One may wonder about
                                       𝑛 
the gap between the bounds of Ω( 𝑛/3       ) and 𝑂(2𝑛 𝑛) for the permanent. As we explain below,
                                                                                  𝑛 
our lower bound methods are inherently rank-based and cannot go beyond 𝑛/3            . A curious
point is that it is not immediately clear whether tensor networks can even achieve 𝑂 ∗ (2𝑛 ) time
for the determinant, and we do not know whether this is possible.

1.4   Overview of proof ideas
As a running example in this overview, we consider the 6-linear form 𝐴 : 𝔽 [𝑛]×[𝑛] × 𝔽 [𝑛]×[𝑛] ×
. . . × 𝔽 [𝑛]×[𝑛] → 𝔽 that takes as input 6 matrices of size 𝑛 × 𝑛 and is defined by the equation
                                                                       Õ
                                                                                     (1)   (2)   (3)   (4)   (5)   (6)
              𝐴(𝑋 (1) , 𝑋 (2) , 𝑋 (3) , 𝑋 (4) , 𝑋 (5) , 𝑋 (6) ) =                  𝑋𝑖𝑗 𝑋𝑖 𝑘 𝑋𝑖ℓ 𝑋 𝑗 𝑘 𝑋 𝑗ℓ 𝑋 𝑘ℓ .        (1.4)
                                                                    𝑖,𝑗,𝑘,ℓ ∈[𝑛]

If 𝜒 is the adjacency matrix of a loopless graph 𝐺, then 𝐴(𝜒, 𝜒, 𝜒, 𝜒, 𝜒, 𝜒) counts the 4-cliques
in the graph. Associated with 𝐴 is the 6-tensor 𝑇(𝐴) of size 𝑛 2 × 𝑛 2 × · · · × 𝑛 2 , where each of the
6 modes is indexed by a pair (𝑖, 𝑗) ∈ [𝑛] × [𝑛], and the value at a given position is the coefficient
of the corresponding term in 𝐴. Concretely,

                                                               1 if 𝑖1 = 𝑖2 = 𝑖3 and 𝑗1 = 𝑗4 = 𝑗5 and
                                                              
                                                              
                                                              
                                                              
                𝑇(𝐴)𝑖1 𝑗1 ,𝑖2 𝑘2 ,𝑖3ℓ3 ,𝑗4 𝑘4 ,𝑗5ℓ5 ,𝑘6ℓ6 =          𝑘 = 𝑘 = 𝑘 6 and ℓ3 = ℓ 5 = ℓ 6 ,
                                                                      2   4
                                                              
                                                              
                                                               0 otherwise.
                                                              

Upper bounds Most, but not all, of the families of multilinear maps we consider are closed
under taking Kronecker products. For instance, consider the 4-clique counting form (1.4) for an
𝑛-vertex graph and its associated tensor 𝑇(𝐴). Then for any 𝑘 ≥ 1, the tensor associated with
the 4-clique counting form in 𝑛 𝑘 -vertex graphs is 𝑇(𝐴)⊗𝑘 , the 𝑘-fold Kronecker product of 𝑇(𝐴)
with itself. We write 𝐴 ⊗𝑘 for the associated map. With this in mind, it is natural to seek general
constructions that, given an efficient evaluation of some map 𝐴, yields an efficient evaluation of
𝐴 ⊗𝑘 .
    We give such a construction, and show that the cost of the best tensor network execution for
𝐴 is essentially submultiplicative in a quantity that we call the amortized cost of an execution.
  ⊗𝑘

For tensors of order at most 3, the notion of amortized cost essentially captures the rank of
𝑇(𝐴), but for higher-order tensors, the amortized cost may be significantly smaller than the rank.
Roughly speaking, the amortized cost of a step in an execution of a map 𝐴 is: (i) equal to the
normal cost if the operation involves the contraction of two tensors that both depend on some
input variables of 𝐴, but (ii) equal to the size of the result if only one of the tensors involved in
the contraction depends on the input variables of 𝐴. A precise definition appears in Section 5.1.
Our general upper bound for the cost of 𝐴 ⊗𝑘 can, somewhat informally, be stated as follows.
Theorem 1.3 (Submultiplicativity of cost, informal statement). If a multilinear map 𝐴 has a tensor
network execution consisting of 𝑠 steps, each with cost at most 𝑐 and amortized cost at most 𝑎, then 𝐴 ⊗𝑘
has a tensor network execution consisting of at most 𝑘 · 𝑠 steps, each with cost at most 𝑎 𝑘−1 · 𝑐.

                        T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                               8
                                 T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

    An immediate corollary of this is that we can capture any algorithm for 𝐴 ⊗𝑘 based on a
low-rank decomposition of 𝑇(𝐴) (Corollary 5.2). For example, this implies that tensor networks
can multiply 𝑛 × 𝑛 matrices in 𝑂(𝑛 𝜔+𝜖 ) time (Section 5.2).
    However, returning to our running example form (1.4), as we explain below the tensor 𝑇(𝐴)
has rank 𝑛 4 , meaning that Corollary 5.2 only yields a trivial upper bound. This is where the
full generality of Theorem 1.3 comes in. Consider the form (1.4) for graphs on some constant
number 𝑛0 of vertices. As it turns out, we can design a network and an associated execution for
this form, depicted in (1.5), with an execution of cost 𝑛02𝑒+3 and amortized cost 𝑛0𝑒+1 , where 𝑛0𝑒 is
the rank of the tensor associated with 𝑛0 × 𝑛0 matrix multiplication. Picking 𝑛0 to be a large
enough constant so that 𝑒 is approximately 𝜔, and letting 𝑘 be such that 𝑛 is approximately 𝑛0𝑘 ,
we obtain via Theorem 1.3 an 𝑂(𝑛 𝜔+𝜖+1 ) time upper bound for (1.4).4

         X (1)    X (3)      X (2)     X (6)      X (4)    X (5)
        i1   j1   i3   ℓ3   i2    k2   k6   ℓ6   j4   k4   j5   ℓ5




                                                                                                                 (1.5)




Lower bounds Just like many other arithmetic complexity lower bounds, our lower bounds
boil down to establishing lower bounds on the rank of certain matrices.
    In order to establish a lower bound on the rank of 𝑇(𝐴), we flatten it to a matrix and analyze
the rank of that matrix. There are 25 possible bipartitions of the 6 modes of 𝑇(𝐴) into two
non-empty subsets, and the lower bound on the rank of 𝑇(𝐴) that we obtain is the maximum of
the ranks of the resulting matrices. Using this method, it is easy to establish that for our example
from (1.4), the rank of 𝑇(𝐴) = 𝑛 4 . That this is an upper bound follows from (1.4), and that it is a
lower bound follows by considering the bipartition taking variables 𝑋 (1) and 𝑋 (6) as row indices,
and the other four variables as column indices. The resulting 𝑛 4 × 𝑛 8 matrix has full rank.
    Tensor networks are more versatile and can be more efficient than low-rank decompositions
of 𝑇(𝐴). Nevertheless, we show limitations on this versatility. In particular we show that every
tensor network execution for 𝐴 induces a tree in which the leaves are the inputs of 𝐴 and all
internal vertices have degree 3. We call this a socket tree. Each edge in a socket tree induces a
bipartition of the variables and our key technical lemma is to show that for each such bipartition,
the rank of the corresponding flattening of 𝑇(𝐴) is a lower bound on the cost of the execution
that gave rise to the tree. Thus, to obtain a lower bound for the cost of a specific execution,
we consider the maximum rank obtained over all edges of the corresponding socket tree, and
to lower bound the cost of every tensor network execution, we minimize this quantity over
    4This bound is the running time of the algorithm of Nešetřil and Poljak [75] for counting 4-cliques, which if 𝜔 > 2
is slightly worse than the running time of Theorem 1.1.


                                 T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                               9
                              P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

all possible socket trees. We refer to the resulting quantity as the socket width of 𝐴, denoted
𝑤(𝐴) (formal definition appears in Section 6). Our general lower bound can thus be phrased as
follows, where 𝑐(𝐴) denotes the minimum cost of a tensor network for evaluating 𝐴 (formal
definition appears in Section 4.5).
Theorem 1.4. For every multilinear map 𝐴, it holds that 𝑐(𝐴) ≥ 𝑤(𝐴).
      Indeed, for our running example (1.4), there are low-width socket trees establishing 𝑤(𝐴)        ≤
𝑛 3 , see (1.6). However, that bound is tight: our Ω(𝑛 d2·4/3e ) = Ω(𝑛 3 ) lower bound for the 42 -linear
                                                                                                 
form (Theorem 1.2) is obtained by proving 𝑤(𝐴) ≥ 𝑛 3 (Lemma 7.4) and appealing to Theorem 1.4.
                    i1 , j1        i3 , ℓ3    i2 , k2    k6 , ℓ6     j4 , k4       j5 , ℓ5

                                                                                                   (1.6)



1.5   Earlier and related work
We now proceed to a more detailed discussion of earlier work.

Tensor networks The history of tensor networks (or, alternatively, tensor diagrams or diagrams) as
an analytical and computational tool goes back to the 19th -century to the work of Cayley [27, 28],
Clebsch [31], Clifford [32], Sylvester [94], and Kempe [57, 58]. The diagrammatic form used
here can be traced back to Penrose [81]. Some early appearances of tensor diagrams are by
Cvitanovic̀ [38], Cvitanovic̀ and Kennedy [39], Kuperberg [64], and many others. Surveys of
tensor diagrams are given by Penrose and Rindler [82] and by Landsberg [66]. Schrĳver [90]
provides a brief historical account.
    Deviating from Penrose’s notation, we work relative to a fixed basis and a corresponding
dual basis in each relevant vector space to avoid distinction between primal and dual spaces.
This in particular enables a concise treatment of hyperedges and saves us from considering
orientation of edges, or the planar layout of edges in a drawing. That is, we will view a tensor
network combinatorially as a hypergraph with tensors associated at the vertices, and with a
subset of the hyperedges designated to form the boundary of the network (see Section 4 for the
precise definitions). A yet further conceptual difference is that we view the execution of a tensor
network as a sequence of contractions of sets of vertices (tensors) rather than as contractions of
hyperedges (modes). This choice enables us to reduce the size of hyperedges gradually before
eliminating them during an execution, thus enabling better granularity. For simplicity, we will
restrict to a purely multilinear framework and will not consider sums of networks although
such a study would be possible, and is pursued, e. g., in Penrose’s paper [81].
    A large body of existing work in applications (see Section 1.2) studies how to efficiently
execute a given tensor network 𝐷. Our quest in this paper differs from such investigations in
that we take a multilinear map 𝐴, and seek to design the most efficient network 𝐷 that realizes 𝐴, or
to establish lower bounds for best-possible designs. In particular, our upper and lower bounds
in Theorem 1.1 and Theorem 1.2 are over all tensor networks that realize a particular map 𝐴 of
interest.

                      T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                           10
                      T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

Computational problems on tensors and tensor networks Problems on given tensors and
tensor networks are known to be mostly computationally hard as soon as the setting changes
from matrices to higher-order tensors. Håstad [48] showed that computing the rank of an
explicitly given ℓ -tensor is NP-complete over finite fields and NP-hard over rationals whenever
ℓ ≥ 3. Hillar and Lim extended the latter result to any field containing ℚ [49]. They also showed
that many other tensor problems such as the eigenvalue, singular value and spectral norm
decision and approximation problems as well as the rank-1 approximation problem for 3-tensors
(over ℝ and in some cases over ℂ) are NP-hard.
   The task of finding the value of a given scalar-valued tensor network is known to be
#P-complete (see, e. g., [10, 14]). Similarly, it is NP-hard to find the most efficient sequence of
contractions for a given network [65, 83].

Tensor networks in applications Beyond our present use of tensor networks as a model of
computation to efficiently evaluate multilinear maps, tensor networks have been used across a
broad range of applications. Accordingly, the following should be viewed as mere pointers to
further literature on tensor networks, not as an exhaustive listing of all applications of tensor
networks. Orus [78] gives an introduction to tensor networks in the context of computational
physics. Itai and Landau [7] study quantum computation and quantum algorithms for evaluating
tensor networks [7]. Solomonik and Hoefler study sparse tensor algebra as a model for parallel
programming [91]. The Holant framework introduced by Valiant [96] and studied further by
Cai et al. [25, 24] involves the study of multilinear sum–product expressions that can be viewed
as tensor networks; for the reader familar with the Holant framework, the connection with
tensor networks is explained in Appendix A.3. Tensor networks appear naturally in the study of
probabilistic graphical models [62, 74, 87, 101], and in the study of various machine-learning
problems [29, 30].

Bilinear and multilinear complexity As was concisely outlined in Section 1.4, for bilinear
maps our present work is captured by the study of the tensor rank of 3-tensors and an extensive
body of work on bilinear complexity, with the arithmetic complexity of the matrix multiplication
map as the primary driver. For two starting points to this literature, we refer to the monograph
by Bürgisser, Clausen, and Shokrollahi [23] and to the survey by Pan [79]. Our present work can
be seen as generalizing this bilinear theory to higher orders of linearity via tensor networks and
their executions. The current state of the art for fast matrix multiplication is due to Alman and
Vassilevska Williams [4], Le Gall [68, 69], Le Gall and Urrutia [70], Vassilevska Williams [98],
Cohn and Umans [34], Cohn, Kleinberg, Szegedy, and Umans [33], Benson and Ballard [12], and
Huang, Rice, Matthews, and van de Geĳn [50].

1.6   Organization of this paper
Section 2 recalls preliminaries on tensors, multilinear maps, and branchwidth. Section 3 reviews
the specific multilinear maps that we study in this paper and describes the associated tensor for
each map. In Section 4, tensor networks, execution and cost of a tensor network, and cost of a

                     T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                      11
                               P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

multilinear map are defined. Section 5 presents tensor-network algorithms for the multilinear
maps introduced in Section 3. In Section 6, a general lower bound on the cost of evaluating a
multilinear map using a tensor network is obtained. The lower bound is expressed in terms of
the socket-width of a multilinear map. In Section 7, lower bounds on socket-width for concrete
multilinear maps studied in Sections 3 and 5 are derived. Finally, Appendix A gives some
background results on minimum-cost executions.


2     Preliminaries
Throughout the paper [𝑛] denotes {1, 2, . . . , 𝑛} and 𝔽 denotes an arbitrary fixed field.


2.1   Tensors
This section sets up our notation for tensors and multilinear maps. We work with tensors and
multilinear maps relative to fixed bases for the respective vector spaces over 𝔽 .

Modes, indexing, and positions We will work with the following convention of positioning
individual entries inside a tensor. Let 𝐸 be a finite set of modes. Associate with each mode 𝑒 ∈ 𝐸
a finite nonempty index set 𝐽(𝑒). In this case we say that 𝐸 is a set of indexed modes. The length of
𝑒 is |𝐽(𝑒)|. A position is an element 𝑗 ∈ 𝑒∈𝐸 𝐽(𝑒). Let us write 𝐽(𝐸) = 𝑒∈𝐸 𝐽(𝑒) for the set of all
                                          Î                                Î
positions with respect to the indexed modes 𝐸. In the special case that the set 𝐸 of modes is
empty we define the set 𝐽(𝐸) of positions to consist of a single element.
    For example, the 3-tensor 𝛼 shown on the right in (1.1) (page 4) involves the set 𝐸 = {𝑖 0 , 𝑘, ℓ }
of modes where 𝐽(𝑖 0) = [2], 𝐽(𝑘) = [2], and 𝐽(ℓ ) = [7] and a position in the tensor is an element of
𝐽(𝐸) = [2] × [2] × [7].

Tensors, matrices, vectors, and scalars Let 𝐸 be a set of indexed modes. A tensor with respect
to 𝐸 is a mapping 𝑇 : 𝐽(𝐸) → 𝔽 . Equivalently, we write 𝑇 ∈ 𝔽 𝐽(𝐸) to indicate that 𝑇 is a tensor
with respect to the indexed set 𝐸 of modes. We view the set 𝔽 𝐽(𝐸) of all tensors over 𝐸 as a vector
space over 𝔽 with addition and scalar multiplication of tensors defined entrywise. We say that
|𝐸| is the order of the tensor. A tensor of order zero is called a scalar, a tensor of order one is
called a vector, and a tensor of order two is called a matrix. The volume of the tensor is |𝐽(𝐸)|. The
tuple (|𝐽(𝑒)| : 𝑒 ∈ 𝐸) is the shape of the tensor. It is convenient to use the “×” symbol to highlight
the shape of a tensor; that is, instead of writing, say (2, 3, 4) for the shape, we write 2 × 3 × 4. For
a position 𝑗 ∈ 𝐽(𝐸) and a tensor 𝑇 ∈ 𝔽 𝐽(𝐸) , we say that 𝑇𝑗 ∈ 𝔽 is the entry of 𝑇 at 𝑗.
     A flattening of 𝑇 induced by a bipartition 𝐸1 ∪ 𝐸2 = 𝐸 of the modes of 𝑇 is a |𝐽(𝐸1 )| × |𝐽(𝐸2 )|
matrix 𝑀 where, for 𝑗1 ∈ 𝐽(𝐸1 ) and 𝑗2 ∈ 𝐽(𝐸2 ) we have 𝑀 𝑗1 ,𝑗2 = 𝑇𝑗1 𝑗2 . Given two order-ℓ
tensors 𝑆 ∈ 𝔽 [𝑛1 ]×[𝑛2 ]×···×[𝑛ℓ ] and 𝑇 ∈ 𝔽 [𝑚1 ]×[𝑚2 ]×···×[𝑚ℓ ] , their Kronecker product 𝑆 ⊗ 𝑇 is a tensor in
𝔽 [𝑛1 𝑚1 ]×[𝑛2 𝑚2 ]×···×[𝑛ℓ 𝑚ℓ ] defined by

                       (𝑆 ⊗ 𝑇)𝑚1 (𝑖1 −1)+𝑗1 ,𝑚2 (𝑖2 −1)+𝑗2 ,...,𝑚ℓ (𝑖ℓ −1)+𝑗ℓ = 𝑆 𝑖1 ,𝑖2 ,...,𝑖ℓ 𝑇𝑗1 ,𝑗2 ,...,𝑗ℓ .

                        T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                        12
                             T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

    For example, the tensor 𝛼 shown on the right in (1.1) has order three, volume 28, and shape
(2, 2, 7) or 2 × 2 × 7. The flattening of 𝛼 induced by the bipartition {𝑖 0 , 𝑘} ∪ {ℓ } is the 4 × 7 matrix
obtained by taking the four rows of the two 2 × 7 matrices shown. The Kronecker product 𝛼 ⊗ 𝛼
of 𝛼 with itself is an order-3 tensor of shape 4 × 4 × 49.


2.2    Multilinear maps
Let 𝐸1 , 𝐸2 , . . . , 𝐸ℓ , 𝐸0 be pairwise disjoint sets of indexed modes such that 𝐸1 , 𝐸2 , . . . , 𝐸ℓ are
nonempty. We say that a map 𝐴 : 𝔽 𝐽(𝐸1 ) × 𝔽 𝐽(𝐸2 ) × · · · × 𝔽 𝐽(𝐸ℓ ) → 𝔽 𝐽(𝐸 ) is an ℓ -linear map if 𝐴 is
                                                                              0


linear with respect to each of its ℓ inputs individually when the other inputs remain fixed. In
particular, a 1-linear map is a linear map. A multilinear map that gives a scalar output is a
multilinear form. In particular, 𝐴 is a form if and only if 𝐸0 is empty.


The tensors of a multilinear map For an ℓ -linear map 𝐴 : 𝔽 𝐽(𝐸1 ) × 𝔽 𝐽(𝐸2 ) × · · · × 𝔽 𝐽(𝐸ℓ ) →
𝔽 𝐽(𝐸 ) , we define two slightly different tensors 𝑇(𝐴) and 𝑇(𝐴).
     0
                                                                                       ˆ      The tensor 𝑇(𝐴) is indexed by
𝐽(𝐸1 ) × 𝐽(𝐸2 ) × · · · × 𝐽(𝐸ℓ ) × 𝐽(𝐸 ), and the tensor 𝑇(𝐴)
                                            0                             ˆ    is indexed by 𝐽(𝐸1 ∪ 𝐸2 ∪ . . . ∪ 𝐸ℓ ∪ 𝐸0). At
a position (𝑗1 , 𝑗2 , . . . , 𝑗ℓ , 𝑗 0) of 𝑇(𝐴) and the corresponding position 𝑗1 𝑗2 . . . 𝑗ℓ 𝑗 0 of 𝑇(𝐴)         ˆ these take
the value
                                                       ˆ
                            𝑇(𝐴)(𝑗1 ,𝑗2 ,...,𝑗ℓ ,𝑗0) = 𝑇(𝐴) 𝑗1 𝑗2 ...𝑗ℓ 𝑗 0 = 𝐴 𝑒
                                                                                  (𝑗1 ) (𝑗2 )
                                                                                       , 𝑒 , . . . , 𝑒 (𝑗ℓ ) 𝑗0 ,
                                                                                                            


where 𝑒 (𝑗𝑖 ) ∈ 𝔽 𝐽(𝐸𝑖 ) denotes the tensor with a 1 in position 𝑗𝑖 and 0s in all other position. The
only difference between 𝑇(𝐴) and 𝑇(𝐴)              ˆ     is their shape. The shape of 𝑇(𝐴) is |𝐽(𝐸1 )| × |𝐽(𝐸2 )| ×
· · · × |𝐽(𝐸ℓ )| × |𝐽(𝐸 )|, except if 𝐴 is a form in which case the |𝐽(𝐸0)| part is omitted. Thus 𝑇(𝐴) is
                            0

                                                                  ˆ      >                     >
of order ℓ + 1 (or ℓ if 𝐴 is a form). The shape of 𝑇(𝐴)                is 𝑒∈𝐸𝑖 ,𝑖∈[ℓ ] |𝐽(𝑒)| × 𝑒 0 ∈𝐸0 |𝐽(𝑒 0)|, thus its
order is |𝐸1 | + |𝐸2 | + · · · + |𝐸ℓ | + |𝐸0 |.
      For example, consider the matrix multiplication map 𝐴 : 𝔽 [𝑛]×[𝑛] × 𝔽 [𝑛]×[𝑛] → 𝔽 [𝑛]×[𝑛] , i. e., for
two 𝑛 × 𝑛 matrices 𝑋 and 𝑌, 𝐴(𝑋 , 𝑌) = 𝑋 · 𝑌. The tensor 𝑇(𝐴)             ˆ     is a 6-tensor where each position
is a 6-tuple (𝑖1 , 𝑘1 , 𝑘2 , 𝑗1 , 𝑖2 , 𝑗2) ∈ [𝑛] , and 𝑇(𝐴) is a 3-tensor where each position is a triple
                                                     6

                                                                     ˆ
((𝑖1 , 𝑘1 ), (𝑘 2 , 𝑗1 ), (𝑖2 , 𝑗2 )) ∈ ([𝑛] × [𝑛])3 . The value of 𝑇(𝐴)  at position (𝑖1 , 𝑘1 , 𝑘2 , 𝑗1 , 𝑖2 , 𝑗2 ) (and of
𝑇(𝐴) at its corresponding position) is the value at position (𝑖2 , 𝑗2 ) of 𝐴(𝑒 (𝑖1 𝑘1 ) , 𝑒 (𝑘2 𝑗1 ) ) = 𝑒 (𝑖1 𝑘1 ) ·𝑒 (𝑘2 𝑗1 ) ,
where 𝑒 (𝑎𝑏) is the 𝑛 × 𝑛 matrix with a 1 in row 𝑎 column 𝑏 and zeroes elsewhere. This is 1 if
𝑖1 = 𝑖2 , 𝑗1 = 𝑗2 and 𝑘 1 = 𝑘 2 , otherwise 0.
      In other words, each mode of 𝑇(𝐴) corresponds to one of the inputs of 𝐴, or the output.
These inputs are in turn sets of indexed modes so may contain more “fine-grained” structure, but
this information is lost at the level of granularity of 𝑇(𝐴). When working with tensor networks
for evaluating 𝐴, we need to keep track of the fine-grained mode structure because this is in
many cases what allows us to construct efficient algorithms, hence in most parts of the paper we
are more interested in the tensor 𝑇(𝐴)           ˆ     which contains this fine-grained structure.
      On the other hand, 𝑇(𝐴)          ˆ     does not contain information about which modes are grouped
together to form the inputs and output of 𝐴, and this information is also important. This leads
us to the notion of sockets, defined next.

                           T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                             13
                                P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

Sockets Let us study the tensor 𝑇(𝐴)  ˆ   with respect to the map 𝐴. We say that the modes in
                                             ˆ
𝐸1 ∪ 𝐸2 ∪ · · · ∪ 𝐸ℓ are the input modes of 𝑇(𝐴),      and the modes in 𝐸0 are the output modes of
 ˆ
𝑇(𝐴)                                                                              ˆ
      with respect to 𝐴. Let us say that 𝐸1 , . . . , 𝐸ℓ are the input sockets of 𝑇(𝐴) with respect to
                                        ˆ
𝐴. Similarly, 𝐸 is the output socket in 𝑇(𝐴) with respect to 𝐴. In particular, the output socket is
                 0

empty if and only if 𝐴 is a form. To describe a socketing of the modes of a tensor, it is convenient
to use parentheses to group a shape of a tensor into sockets, see also Section 2.1. Two multilinear
maps, 𝐴1 and 𝐴2 , may have the same base tensor 𝑇(𝐴      ˆ 1 ) = 𝑇(𝐴
                                                                  ˆ 2 ), and from a tensor 𝑇ˆ one may
obtain different multilinear maps by varying how the modes of 𝑇ˆ are assigned to input and
output sockets.

The form of a multilinear map Let 𝐴 be a multilinear map with a nonempty output socket.
We can turn 𝐴 into a multilinear form 𝐹(𝐴) by turning its output socket into an input socket. Let
us say that 𝐹(𝐴) is the multilinear form of 𝐴. We also set 𝐹(𝐴) = 𝐴 when 𝐴 is a multilinear form.
Í ForÍexample, Í the form of the 𝑛 × 𝑛 square-matrix multiplication map is the form 𝐹(𝑋 , 𝑌, 𝑍) =
  𝑖∈[𝑛]  𝑗∈[𝑛]  𝑘∈[𝑛] 𝑋𝑖𝑗 𝑌𝑗 𝑘 𝑍 𝑖 𝑘 .


2.3    Branch decompositions and branchwidth
A branch decomposition of a (hyper)graph 𝐺 consists of (i) a tree 𝑇 whose every vertex has degree
either one or three, and (ii) a bĳection 𝜋 between the (hyper)edge set of 𝐺 and the set of vertices
of degree one in 𝑇. Deleting an edge 𝑒 ∈ 𝐸(𝑇) from 𝑇 partitions 𝑇 into two subtrees, 𝑇1 and 𝑇2 ,
which via 𝜋 give rise to a partition of the (hyper)edges of 𝐺 into two sets, 𝐸1 and 𝐸2 . The width
𝑤(𝑒) of the partition induced by 𝑒 is the number of vertices of 𝐺 that are incident to at least one
(hyper)edge in 𝐸1 and at least one (hyper)edge in 𝐸2 . The width of the branch decomposition
(𝑇, 𝜋) is the maximum of the widths 𝑤(𝑒) for 𝑒 ∈ 𝐸(𝑇). The branchwidth bw(𝐺) of 𝐺 is the
minimum width of a branch decomposition of 𝐺.
    For graphs, we recall the following upper bound on branchwidth.

Claim 2.1 (Robertson and Seymour [86]). For every 𝑛 ≥ 3, bw(𝐾 𝑛 ) = d2𝑛/3e. Consequently,
bw(𝐺) ≤ d2|𝑉(𝐺)|/3e for every graph 𝐺.


3     Examples of multilinear maps
This section reviews the specific multilinear maps that we study in this paper. Together with
each map we describe its associated tensor and a socketing of the tensor that realizes the map.

3.1    Discrete Fourier transform
Let 𝑛 ≥ 2 be an integer and let 𝜌 ∈ 𝔽 be a primitive 𝑛 th root of unity in the field 𝔽 if it exists.5
Define the 𝑛 × 𝑛 symmetric matrix Φ with entries Φ𝑖,𝑗 = 𝜌(𝑖−1)(𝑗−1) for all 𝑖, 𝑗 ∈ [𝑛]. The discrete

    5That is, 𝜌 satisfies 𝜌𝑛 = 1 and 𝜌𝑛/𝑘 ≠ 1 for all integer divisors 𝑘 ≥ 2 of 𝑛.


                          T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                    14
                            T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

Fourier transform (DFT) of order 𝑛 at 𝜌 is the linear map 𝐿 : 𝔽 [𝑛] → 𝔽 [𝑛] defined for all 𝑥 ∈ 𝔽 [𝑛] by
the matrix-vector multiplication 𝐿(𝑥) = Φ𝑥. In particular, the matrix Φ is the tensor associated
with 𝐿. The matrix Φ has two modes, namely its rows and columns. To realize 𝐿, take the
columns of Φ as the input socket, and the rows as the output socket.

3.2   Determinant and permanent
We write 𝑆𝑛 for the group of all permutations 𝜎 : [𝑛] → [𝑛] (the symmetric group of degree 𝑛).
For 𝜎 ∈ 𝑆𝑛 , let 𝑐(𝜎) be the number of cycles in the cycle decomposition of 𝜎, where each fixed
point of 𝜎 is counted as a cycle. Further standard examples of multilinear maps include the
determinant and the permanent,
                            Õ                   Ö                                                  Õ Ö
               det 𝐴 =             (−1)𝑛−𝑐(𝜎)           𝑎 𝑖,𝜎(𝑖)                      per 𝐴 =                   𝑎 𝑖,𝜎(𝑖) ,   (3.1)
                            𝜎∈𝑆𝑛                𝑖∈[𝑛]                                              𝜎∈𝑆𝑛 𝑖∈[𝑛]

both of which are 𝑛-linear in the 𝑛 columns (or the 𝑛 rows) of the matrix 𝐴 = (𝑎 𝑖𝑗 )𝑖,𝑗∈[𝑛] . The
determinant and the permanent are associated with order-𝑛 tensors 𝑇ˆ (det) , 𝑇ˆ (per) ∈ 𝔽 [𝑛]×[𝑛]×···×[𝑛]
defined for all 𝑗 = (𝑖1 , 𝑖2 , . . . , 𝑖 𝑛 ) ∈ [𝑛] × [𝑛] × · · · × [𝑛] by

                            (                                                                      (
                  (det)         (−1)𝑛−𝑐(𝑗)   if 𝑗 ∈ 𝑆𝑛                                   (per)         1   if 𝑗 ∈ 𝑆𝑛
               𝑇ˆ𝑗      =                                                             𝑇ˆ𝑗      =                             (3.2)
                                0            otherwise                                                 0   otherwise.

A socketing of 𝑇ˆ (det) and 𝑇ˆ (per) that realizes the determinant and the permanent, respectively,
is to take each of the 𝑛 modes as an input socket corresponding to 𝐴. The determinant and
permanent tensors are not closed under taking Kronecker products when 𝑛 ≥ 2, because the
Kronecker product of two tensors has the same order as its input tensors, but for each 𝑛 there is
exactly one determinant (permanent) tensor of order 𝑛.

3.3   Matrix multiplication
Let 𝑛, 𝑟, and 𝑚 be positive integers. Perhaps the most fundamental example of a bilinear map is
the map that multiplies an 𝑛 × 𝑟 matrix 𝐴 = (𝐴 𝑖𝑗 )𝑖∈[𝑛], 𝑗∈[𝑟] with an 𝑟 × 𝑚 matrix 𝐵 = (𝐵 𝑖𝑗 )𝑖∈[𝑟], 𝑗∈[𝑚]
to obtain the 𝑛 × 𝑚 product matrix 𝐶 = (𝐶 𝑖𝑗 )𝑖∈[𝑛] ,𝑗∈[𝑚] defined for all 𝑖 ∈ [𝑛] and 𝑗 ∈ [𝑚] by
                                                               Õ
                                                    𝐶 𝑖𝑗 =             𝐴𝑖 𝑘 𝐵 𝑘 𝑗 .                                          (3.3)
                                                               𝑘∈[𝑟]

It is natural to view the input 𝐴 ∈ 𝔽 [𝑛]×[𝑟] as a 2-tensor, and similarly so for the input
𝐵 ∈ 𝔽 [𝑟]×[𝑚] , and the output 𝐶 ∈ 𝔽 [𝑛]×[𝑚] . Thus, (3.3) is naturally associated with the 6-tensor
𝑇ˆ ∈ 𝔽 [𝑛]×[𝑟]×[𝑟]×[𝑚]×[𝑛]×[𝑚] with entries defined for all 𝑖1 , 𝑖2 ∈ [𝑛], 𝑗1 , 𝑗2 ∈ [𝑚], and 𝑘 1 , 𝑘2 ∈ [𝑟] by
                                              (
                                                   1 if 𝑖1 = 𝑖2 and 𝑗1 = 𝑗2 and 𝑘 1 = 𝑘 2 ;
                                𝑇ˆ𝑖1𝑘1𝑘2 𝑗1𝑖2 𝑗2 =                                                                           (3.4)
                                                   0 otherwise.

                        T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                                  15
                               P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

To realize (3.3) using (3.4), we can use the socketing grouped by parentheses in ([𝑛] × [𝑟]) ×
([𝑟] × [𝑚]) × ([𝑛] × [𝑚]), where the first two groups are the two input sockets corresponding to
𝐴 and 𝐵, and the last group is the output socket corresponding to 𝐶.
    Let us write h𝑛, 𝑟, 𝑚i as a shorthand for the tensor (3.4). From (3.4) and the definition of the
Kronecker product it is immediate that matrix-multiplication tensors satisfy

                              h𝑛1 , 𝑟1 , 𝑚1 i ⊗ h𝑛2 , 𝑟2 , 𝑚2 i = h𝑛1 𝑛2 , 𝑟1 𝑟1 , 𝑚1 𝑚2 i .                       (3.5)

That is, matrix multiplication tensors are closed under taking Kronecker products.

3.4    Group algebra product
Let (𝔸 , +) be an Abelian group of order 𝑛. Another fundamental example of a bilinear map is to
convolve two vectors 𝑓 ∈ 𝔽 𝔸 and 𝑔 ∈ 𝔽 𝔸 according to the group operation of 𝔸 to obtain the
vector ℎ = 𝑓 ∗ 𝑔 ∈ 𝔽 𝔸 , defined for all 𝑘 ∈ 𝔸 by
                                                               Õ
                                                   ℎ𝑘 =               𝑓(𝑘−𝑗) 𝑔 𝑗 .                                 (3.6)
                                                               𝑗∈𝔸


The map (3.6) is associated with the 3-tensor 𝑇ˆ ∈ 𝔽 𝔸×𝔸×𝔸 defined for all 𝑖, 𝑗, 𝑘 ∈ 𝔸 by
                                                           (
                                                         1           𝑖 + 𝑗 = 𝑘;
                                                𝑇ˆ𝑖𝑗 𝑘 =                                                           (3.7)
                                                         0           otherwise.

A socketing of the three modes of (3.7) that realizes (3.6) is to take the first two modes as two
input sockets corresponding to 𝑓 and 𝑔, respectively, and to take the last mode as the output
socket corresponding to ℎ. The vector space 𝔽 𝔸 equipped with the convolution product ∗ is the
group algebra 𝔽 [𝔸].

3.5    Kruskal operator
Let 𝑛1 , 𝑛2 , . . . , 𝑛ℓ and 𝑟 be positive integers. Matrix multiplication generalizes naturally to the
ℓ -linear task of multiplying ℓ matrices 𝐴(1) ∈ 𝔽 [𝑛1 ]×[𝑟] , 𝐴(2) ∈ 𝔽 [𝑛2 ]×[𝑟] , . . ., 𝐴(ℓ ) ∈ 𝔽 [𝑛ℓ ]×[𝑟] to obtain
the order-ℓ tensor 𝑌 ∈ 𝔽 [𝑛1 ]×[𝑛2 ]×···×[𝑛ℓ ] defined for all (𝑖1 , 𝑖2 , . . . , 𝑖ℓ ) ∈ [𝑛1 ] × [𝑛2 ] × · · · × [𝑛ℓ ] by
                                                           Õ
                                                                      (1)    (2)           (ℓ )
                                          𝑌𝑖1 𝑖2 ···𝑖ℓ =           𝐴 𝑖 1 𝑗 𝐴 𝑖 2 𝑗 · · · 𝐴 𝑖ℓ 𝑗 .                  (3.8)
                                                           𝑗∈[𝑟]


This map is known as the Kruskal operator [63, 60]6 of the matrices 𝐴(1) , 𝐴(2) , . . . , 𝐴(ℓ ) .
   The map (3.8) is associated with the 3ℓ -tensor

                                 𝑇ˆ ∈ 𝔽 [𝑛1 ]×[𝑟]×[𝑛2 ]×[𝑟]×···×[𝑛ℓ ]×[𝑟]×[𝑛1 ]×[𝑛2 ]×···×[𝑛ℓ ]
   6Kolda [60] calls this operator the Kruskal operator. Kruskal [63] studied the special case ℓ = 3.


                         T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                        16
                                  T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

defined for all 𝑖1 , 𝑖10 ∈ [𝑛1 ], 𝑖2 , 𝑖20 ∈ [𝑛2 ], . . ., 𝑖ℓ , 𝑖ℓ0 ∈ [𝑛ℓ ] and 𝑗1 , 𝑗2 , . . . , 𝑗ℓ ∈ [𝑟] by
                                                   (
                                                       1   if 𝑖1 = 𝑖10 , 𝑖2 = 𝑖20 , . . ., 𝑖ℓ = 𝑖ℓ0 and 𝑗1 = 𝑗2 = . . . = 𝑗ℓ ;
               𝑇ˆ𝑖1 𝑗1 𝑖2 𝑗2 ···𝑖ℓ 𝑗ℓ 𝑖10 𝑖20 ···𝑖ℓ0 =                                                                                  (3.9)
                                                       0   otherwise.

A socketing of (3.9) that realizes (3.8) is to take each of the ℓ pairs of modes [𝑛1 ] × [𝑟], [𝑛2 ] × [𝑟],
. . ., [𝑛ℓ ] × [𝑟] as an input socket and the final ℓ modes [𝑛1 ] × [𝑛2 ] × · · · × [𝑛ℓ ] as the output socket.
       Let us write h𝑛1 , 𝑛2 , . . . , 𝑛ℓ | 𝑟i for the tensor (3.9). Analogously to the closure property (3.5)
for matrix multiplication tensors, we observe that
                 h𝑛1 , 𝑛2 , . . . , 𝑛ℓ | 𝑟i ⊗ h𝑛10 , 𝑛20 , . . . , 𝑛ℓ0 | 𝑟 0i = h𝑛1 𝑛10 , 𝑛2 𝑛20 , . . . , 𝑛ℓ 𝑛ℓ0 | 𝑟𝑟 0i .           (3.10)
That is, Kruskal operator tensors are closed under taking Kronecker products. Furthermore, we
observe that h𝑛, 𝑟, 𝑚i and h𝑛, 𝑚 | 𝑟i are equal after reordering of the modes. That is, the matrix
multiplication tensor (3.4) is the special case of the Kruskal operator tensor (3.9) when ℓ = 2.

3.6    Homomorphism-counting forms
Further multilinear maps arise naturally by algebraization of combinatorial problems. For
example, the linear form of the matrix multiplication map
                                                                     Õ
                                                                                𝐴 𝑖𝑗 𝐵 𝑗 𝑘 𝐶 𝑘𝑖                                       (3.11)
                                                                   𝑖,𝑗,𝑘∈[𝑛]

can be used to count the triangles in a graph, and the form
                                                              Õ
                                                                          𝐴 𝑖𝑗 𝐵 𝑖 𝑘 𝐶 𝑖ℓ 𝐷 𝑗 𝑘 𝐸 𝑗ℓ 𝐹 𝑘ℓ                             (3.12)
                                                           𝑖,𝑗,𝑘,ℓ ∈[𝑛]

can be used to count the occurrences of any 4-vertex subgraph in a graph by varying the six 𝑛 × 𝑛
matrices 𝐴, 𝐵, 𝐶, 𝐷, 𝐸, 𝐹 ∈ 𝔽 [𝑛]×[𝑛] . A more complicated variant takes as input four 3-tensors
𝐴, 𝐵, 𝐶, 𝐷 ∈ 𝔽 [𝑛]×[𝑛]×[𝑛] of shape 𝑛 × 𝑛 × 𝑛 and considers the linear form
                                                                Õ
                                                                           𝐴 𝑖𝑗 𝑘 𝐵 𝑖 𝑘ℓ 𝐶 𝑖𝑗ℓ 𝐷 𝑗 𝑘ℓ .                               (3.13)
                                                            𝑖,𝑗,𝑘,ℓ ∈[𝑛]

An 𝑛 4−𝛿 time algorithm for the associated trilinear map (𝐴, 𝐵, 𝐶) ↦→ 𝐷 would imply an algorithm
for the Max 3-Sat problem with running time (2 − 𝛿0)𝑛 [102].
    The forms (3.11), (3.12), and (3.13) are all special cases of what we call a homomorphism-
counting form defined by a hypergraph 𝑃, or, succinctly, a 𝑃-linear form. In more precise terms,
let 𝑃 be a 𝑘-uniform hypergraph on 𝑣 ≥ 𝑘 vertices and write [𝑣]      for the set of 𝑘-element subsets
                                                                   
                                                                 𝑘
                                                                                                   [𝑣]
of [𝑣]. For each hyperedge 𝑆 = {𝑖1 , 𝑖2 , . . . , 𝑖 𝑘 } ∈ 𝐸(𝑃) ⊆                                    𝑘
                                                                                                        of 𝑃, let 𝑋 (𝑆) be an input tensor of
shape [𝑛]𝑆 . The 𝑃-linear form of order 𝑛 is the form
                                                                    Õ             Ö
                                                                                             (𝑆)
                                                                                          𝑋𝜎| ,                                       (3.14)
                                                                                               𝑆
                                                                 𝜎∈[𝑛]𝑉(𝑃) 𝑆∈𝐸(𝑃)


                                T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                                     17
                            P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

where 𝜎| 𝑆 is the restriction of 𝜎 to the 𝑘 indices in 𝑆. For example, the forms (3.11), (3.12),
and (3.13) are the 𝑃-linear forms corresponding to 𝑃 being a triangle, a 𝐾4 , and a complete
3-uniform hypergraph on 4 vertices, respectively. It is immediate that (3.14) is an |𝐸(𝑃)|-linear
map.
   The map (3.14) is associated with a tensor 𝑇ˆ of order 𝑘 · |𝐸(𝑃)| whose modes are 𝑀 = { (𝑆, 𝑖) |
𝑆 ∈ 𝐸(𝑃), 𝑖 ∈ 𝑆 } with index sets 𝐽((𝑆, 𝑖)) = [𝑛]. The entries of 𝑇ˆ are defined by
                      (
                       1    if ∃𝜎 ∈ [𝑛]𝑉(𝑃) such that 𝑗(𝑆,𝑖) = 𝜎𝑖 for every (𝑆, 𝑖) ∈ 𝑀;
                 𝑇ˆ𝑗 =                                                                               (3.15)
                       0    otherwise.

A socketing of (3.15) that realizes (3.14) is given by one input socket for each hyperedge 𝑆 ∈ 𝐸(𝑃)
such that the socket contains the 𝑘 modes (𝑆, 𝑖) for 𝑖 ∈ 𝑆.
    Let us observe that the tensors of the forms (3.12) and (3.13) are actually the same—they are
both of order 12, have volume 𝑛 12 , and are in fact equal to the outer product of the 𝑛 × 𝑛 × 𝑛
identity tensor with itself 4 times after renaming of modes. However, due to the difference in
socketing, the forms are computationally very different. We show in Section 7 that while there
are non-trivial tensor network algorithms for evaluating (3.12), no such algorithms exist for
(3.13).
    Let us write h𝑛i𝑃 for the tensor 𝑇ˆ as defined in (3.15). Analogously to the closure properties
(3.5) and (3.10), we observe the closure property

                                         h𝑛i𝑃 ⊗ h𝑛 0i𝑃 = h𝑛𝑛 0i𝑃 .                                   (3.16)


4     Tensor networks
4.1   Networks
A network (or diagram) consists of a finite set 𝑉 of vertices, a finite set 𝐸 of hyperedges, an incidence
relation 𝐼 ⊆ 𝑉 × 𝐸, and a boundary 𝐵 ⊆ 𝐸. A network is nondegenerate if every hyperedge is
incident to at least one vertex. In what follows we assume that all networks are nondegenerate.
A hyperedge 𝑒 ∈ 𝐸 is a loop if 𝑒 ∉ 𝐵 and 𝑒 is incident to exactly one vertex.
    For a vertex 𝑣 ∈ 𝑉, let us write 𝐼(𝑣) = {𝑒 ∈ 𝐸 : (𝑣, 𝑒) ∈ 𝐼} for the set of hyperedges incident to
𝑣. Dually, for a hyperedge 𝑒 ∈ 𝐸, let us write 𝐼(𝑒) = {𝑣 ∈ 𝑉 : (𝑣, 𝑒) ∈ 𝐼} for the set of vertices
incident to 𝑒. For a network 𝐷, we write 𝑉(𝐷), 𝐸(𝐷), 𝐼(𝐷), and 𝐵(𝐷) to refer to the vertices of 𝐷,
the hyperedges of 𝐷, the incidence relation of 𝐷, and the boundary of 𝐷, respectively.
    For example, on the left in (1.2) (page 1.2), we see a network with five vertices 𝑉 =
{𝐴, 𝐵, 𝛼, 𝛽, 𝛾}, seven hyperedges 𝐸 = {𝑖, 𝑗, ℓ , 𝑖 0 , 𝑘, 𝑘 0 , 𝑗 0 }, and boundary 𝐵 = {𝑖, 𝑗}. The vertices
incident to 𝑖 0 are 𝐼(𝑖 0) = {𝐴, 𝛼}.


Induced networks For a network 𝐷 and a nonempty subset 𝑊 ⊆ 𝑉(𝐷), the induced network
𝐷[𝑊] consists of the vertices in 𝑊 together with the hyperedges of 𝐷 that are incident to at

                      T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                              18
                      T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

least one vertex in 𝑊; the boundary of 𝐷[𝑊] consists of all hyperedges that are at the boundary
of 𝐷 or incident to a vertex outside 𝑊. Formally,
      𝑉(𝐷[𝑊]) = 𝑊 ,
      𝐸(𝐷[𝑊]) = {𝑒 ∈ 𝐸(𝐷) : ∃𝑤 ∈ 𝑊 s.t. (𝑤, 𝑒) ∈ 𝐼(𝐷)} ,
                                                                                               (4.1)
      𝐼(𝐷[𝑊]) = 𝐼(𝐷) ∩ (𝑉(𝐷[𝑊]) × 𝐸(𝐷[𝑊])) ,
      𝐵(𝐷[𝑊]) = (𝐵(𝐷) ∩ 𝐸(𝐷[𝑊])) ∪ {𝑒 ∈ 𝐸(𝐷[𝑊]) : ∃𝑣 ∈ 𝑉(𝐷)\𝑊 s.t. (𝑣, 𝑒) ∈ 𝐼(𝐷)}.
For a vertex 𝑣 ∈ 𝑉, we abbreviate 𝐷[𝑣] = 𝐷[{𝑣}]. Note that the boundary of 𝐷[𝑣] consists of all
non-loop hyperedges incident to 𝑣 in 𝐷.
   For example, for the network on the left in (1.2), the induced network 𝐷[{𝛼, 𝐴}] has
hyperedges 𝐸(𝐷[{𝛼, 𝐴}]) = {𝑖 0 , 𝑘, ℓ } and boundary 𝐵(𝐷[{𝛼, 𝐴}]) = {ℓ }.

4.2   Tensor networks
Let 𝐷 be a network. We index 𝐷 by associating with each hyperedge 𝑒 ∈ 𝐸 an index set 𝐽(𝑒) of size
ℓ (𝑒). Induced networks inherit indexing by restriction. Next we associate with each vertex 𝑣 ∈ 𝑉
a tensor 𝑇(𝑣) ∈ 𝔽 𝐽(𝐼(𝑣)) . We say that 𝐷 equipped with the tensors (𝑇(𝑣))𝑣∈𝑉 is a tensor network.
     The value of a tensor network 𝐷, or the tensor represented by 𝐷, is a tensor 𝑇(𝐷) ∈ 𝔽 𝐽(𝐵) ,
defined for all 𝑖 ∈ 𝐽(𝐵) by                     Õ Ö
                                     𝑇(𝐷)𝑖 =             𝑇(𝑣)𝑖𝑗 .                              (4.2)
                                            𝑗∈𝐽(𝐸(𝐷)\𝐵) 𝑣∈𝑉

Observe that in (4.2) the positions 𝑖 and 𝑗 together identify a unique entry of 𝑇(𝑣) by projection
to 𝐽(𝐼(𝑣)). We also observe that the value of a tensor network with an empty boundary is a scalar.

4.3   Contracting tensors
Let 𝐷 be a tensor network and let 𝑊 ⊆ 𝑉(𝐷) be a nonempty set of vertices. Let 𝑤 be a new
element not in 𝑉. We may contract 𝑊 in 𝐷 to obtain a tensor network 𝐷/𝑊 by replacing the
sub-network 𝐷[𝑊] in 𝐷 with the single vertex 𝑤 whose associated tensor 𝑇(𝑤) is the tensor
represented by 𝐷[𝑊]. Formally,
                     𝑉(𝐷/𝑊) = (𝑉(𝐷) \ 𝑊) ∪ {𝑤} ,
                      𝐸(𝐷/𝑊) = 𝐸(𝐷) \ (𝐸(𝐷[𝑊]) \ 𝐵(𝐷[𝑊])) ,
                      𝐼(𝐷/𝑊) = (𝐼(𝐷) \ 𝐼(𝐷[𝑊])) ∪ {(𝑤, 𝑒) : 𝑒 ∈ 𝐵(𝐷[𝑊])} ,                     (4.3)
                      𝐵(𝐷/𝑊) = 𝐵(𝐷) ,
                         𝑇(𝑤) = 𝑇(𝐷[𝑊]) .
The cost of contracting 𝑊 in 𝐷 is 𝑐(𝐷, 𝑊) = 𝑒∈𝐸(𝐷[𝑊]) |𝐽(𝑒)|. The value of a tensor network is
                                               Î
invariant under contraction, i. e., for all nonempty 𝑊 ⊆ 𝑉(𝐷) it holds that 𝑇(𝐷) = 𝑇(𝐷/𝑊) (see
Lemma A.1 for a proof).
   For example, contracting 𝑊 = {𝛼, 𝐴} in the network on the left in (1.2) results in the network
shown in the second step, where the tensor 𝑇 = 𝑇(𝐷[{𝛼, 𝐴}]) of the new (unlabelled) vertex is a
vector of length ℓ , which in position 𝑢 takes the value 𝑇𝑢 = 𝑖0 ,𝑘 𝛼 𝑖0 𝑘𝑢 𝐴 𝑖0 𝑘 .
                                                             Í


                     T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                       19
                            P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

4.4   Execution and cost of a tensor network
To compute the tensor 𝑇(𝐷) from a given tensor network 𝐷, we may proceed by a sequence
of contractions on 𝐷. Such a process is called executing 𝐷, and the cost of 𝐷 is the cost of a
minimum-cost execution of 𝐷. We proceed with the details.
    Let 𝐷 = 𝐷0 be a tensor network with at least one tensor. For 𝑘 = 1, 2, . . . , 𝑡, select a nonempty
subset 𝑊𝑘−1 ⊆ 𝑉(𝐷 𝑘−1 ) such that 𝑊𝑘−1 has at least two tensors or consists of a single tensor with
a loop. Set 𝐷 𝑘 = 𝐷 𝑘−1 /𝑊𝑘−1 and observe that the number of tensors and/or modes decreases by
at least one in the contraction. Suppose that 𝐷𝑡 is loopless and consists of a single tensor. We
say that such a sequence of contractions is an execution of 𝐷 in 𝑡 steps. The cost of the execution
is max 𝑘=1,2,...,𝑡 𝑐(𝐷 𝑘−1 , 𝑊𝑘−1 ). The cost of an execution in zero steps is defined to be 0.
    It is immediate that 𝐷 has at least one execution and every execution consists of at most
2|𝑉(𝐷)| − 1 steps. By invariance under contractions, we have 𝑇(𝐷𝑡 ) = 𝑇(𝐷). The cost 𝑐(𝐷) of 𝐷
is the cost of a minimum-cost execution of 𝐷.
    An execution of 𝐷 of cost 𝑐(𝐷) immediately translates into an algorithm that computes 𝑇(𝐷)
using 𝑂(𝑐(𝐷)|𝑉(𝐷)|) arithmetic operations in 𝔽 , since the contraction step 𝐷 𝑘 = 𝐷 𝑘−1 /𝑊𝑘−1
takes 𝑂(𝑐(𝐷 𝑘−1 , 𝑊𝑘−1 )) ≤ 𝑐(𝐷) time to evaluate, and there are 𝑂(𝑉(𝐷)) steps.

Lemma 4.1. Let 𝐷 be a tensor network. There exists a minimum-cost execution of 𝐷 such that each
contracted set has size at most two. Furthermore, if 𝐷 is loopless, we can assume that each contracted set
has size exactly two.

    Lemma 4.1 is proven in Appendix A.2. In what follows we restrict to consider loopless 𝐷
only. Thus while a general execution may contract arbitrary vertex sets in 𝐷 in each step, we
may assume without loss of generality that the minimum-cost execution has the structure of a
rooted binary tree, whose leaves are the vertices of the tensor network, and each internal vertex
is the tensor obtained by contracting its two children.
    For example, (1.2) illustrates an execution of the network shown there, both as a sequence of
contractions of pairs of vertices, and more succinctly as an execution tree overlaid on top of the
network.

4.5   Cost of a multilinear map
Let us now define the cost of a multilinear map via the minimum-cost tensor networks (and
socketing) for evaluating the map. That is, the cost of a multilinear map is defined in terms of
the best tensor network that implements the map. In more precise terms, let

                                𝐴 : 𝔽 𝐽(𝐸1 ) × 𝔽 𝐽(𝐸2 ) × · · · × 𝔽 𝐽(𝐸ℓ ) → 𝔽 𝐽(𝐸 )
                                                                                  0




be an ℓ -linear map. Consider the tensor 𝑇(𝐴)ˆ     of 𝐴 and the associated input sockets 𝐸1 , 𝐸2 , . . . , 𝐸ℓ
and the output socket 𝐸 . Let 𝐷 be an arbitrary tensor network such that 𝑇(𝐷 ∗ ) = 𝑇(𝐴)
                                0      ∗                                                       ˆ        and
the boundary satisfies 𝐵(𝐷 ) = 𝐸1 ∪ 𝐸2 ∪ · · · ∪ 𝐸ℓ ∪ 𝐸 . Modify the network 𝐷 as follows. For
                                  ∗                          0                       ∗

each 𝑘 = 1, 2, . . . , ℓ , introduce a new vertex to 𝐷 ∗ , make the new vertex incident to each of the
modes in the input socket 𝐸 𝑘 , and associate the new vertex with a tensor 𝑋 (𝑘) ∈ 𝔽 𝐽(𝐸 𝑘 ) . Remove

                       T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                              20
                          T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

the modes 𝐸1 ∪ 𝐸2 ∪ · · · ∪ 𝐸ℓ from the boundary of 𝐷 ∗ . Let us denote the resulting network by
𝐷 and call the introduced ℓ new vertices the socket vertices of 𝐷. We observe that 𝐵(𝐷) = 𝐸0
and 𝐴(𝑋 (1) , 𝑋 (2) , . . . , 𝑋 (ℓ ) ) = 𝑇(𝐷). Furthermore, the cost 𝑐(𝐷) is independent of the value of
𝑋 (𝑘) ∈ 𝔽 𝐽(𝐸 𝑘 ) for 𝑘 = 1, 2, . . . , ℓ . We say that 𝐷 is a realization of 𝐴 if it can be obtained from 𝐴 by
this process, and write D(𝐴) for the set of all tensor network realizations 𝐷 of 𝐴.
    The cost of the map 𝐴 is 𝑐(𝐴) = min𝐷∈D(𝐴) 𝑐(𝐷). In particular, we observe that the minimum
exists since the cost of a tensor network is a nonnegative integer and the family D(𝐴) is nonempty.


5     Upper bounds
This section presents tensor-network algorithms for the maps introduced in Section 3. We start
with our key technical result that cost is submultiplicative (Theorem 1.3, stated formally as
Theorem 5.1), which then enables us to represent essentially the fastest known algorithms using
tensor networks, and, in the case of 𝑃-linear forms, also improve on earlier work as reviewed in
Section 1.3.
    Recall that the cost of an execution of a network is the maximum number of operations of any
step (contraction) in that execution, rather than the total running time (number of operations).
With the exception of Theorem 1.3, the results in this section are stated as the resulting upper
bound on running time but they are all derived by obtaining upper bounds on cost.

5.1    Submultiplicativity of cost
Let 𝐸1 , 𝐸2 , . . . , 𝐸ℓ , 𝐸0 be pairwise disjoint sets of indexed modes such that 𝐸1 , 𝐸2 , . . . , 𝐸ℓ are
nonempty. Let
                                    𝐴 : 𝔽 𝐽(𝐸1 ) × 𝔽 𝐽(𝐸2 ) × · · · × 𝔽 𝐽(𝐸ℓ ) → 𝔽 𝐽(𝐸 )
                                                                                      0



be an ℓ -linear map. For a positive integer 𝑘, we define the ℓ -linear map 𝐴 ⊗𝑘 such that its tensor
satisfies 𝑇(𝐴 ⊗𝑘 ) = 𝑇(𝐴)⊗𝑘 . Then
                                             𝑘          𝑘                𝑘          0 𝑘
                               𝐴 ⊗𝑘 : 𝔽 𝐽(𝐸1 ) × 𝔽 𝐽(𝐸2 ) × · · · × 𝔽 𝐽(𝐸ℓ ) → 𝔽 𝐽(𝐸 ) .
Note that 𝑇(𝐴 ⊗𝑘 ) = 𝑇(𝐴)⊗𝑘 is the 𝑘-fold Kronecker product of 𝑇(𝐴) with itself—that is, it has
the same order, but the index sets are larger—whereas 𝑇(𝐴     ˆ ⊗𝑘 ) is the 𝑘-fold outer product of
 ˆ
𝑇(𝐴)  with itself—that is, its index sets have the same sizes, but its order is 𝑘 times larger.
    Let 𝐷 be a network that realizes 𝐴 and let 𝒯𝐷 be an execution tree for 𝐷. For each internal
vertex 𝑥 in 𝒯𝐷 (that is, a vertex obtained by contraction), define the amortized cost of 𝑥 by splitting
into the following three cases:
    1. if neither of the two subtrees of 𝑥 contains a socket vertex, the amortized cost of 𝑥 is 1;
    2. if exactly one of the subtrees of 𝑥, say, the subtree rooted at 𝑦 (where 𝑥 and 𝑦 are adjacent
       in 𝒯𝐷 ), contains at least one socket vertex, the amortized cost of 𝑥 is the maximum of the
       volume of the tensor at 𝑥 and the volume of the tensor at 𝑦;7
   7Here, it is crucial to note that the volume of the other subtree rooted at 𝑥, only containing non-socket vertices,
does not contribute directly to the amortized cost of 𝑥.


                        T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                      21
                           P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

   3. if both of the subtrees of 𝑥 contain at least one socket vertex, the amortized cost of 𝑥 is the
      cost of the contraction to obtain 𝑥.

    The amortized cost 𝑎(𝒯𝐷 ) of 𝒯𝐷 is the maximum of the amortized costs of the internal vertices
of 𝒯𝐷 . Since the amortized cost of each internal vertex of 𝒯𝐷 is at most its cost, we have
𝑎(𝒯𝐷 ) ≤ 𝑐(𝒯𝐷 ). Furthermore, we observe that the amortized cost of 𝑥 in case (ii) above may be
strictly less than the cost of the contraction to obtain 𝑥. In particular, in (ii) the amortized cost is
defined not by the cost of the contraction but rather by volume. This is because in a 𝑘 th Kronecker
power we can amortize the cost of the aggregate transformation in case (ii) not with a single
contraction but with a sequence of 𝑘 contractions. This observation will form the heart of the
proof of Theorem 1.3.
    Before proceeding with the proof, let us illustrate the key ideas in visual terms. Let us start
with the three illustrations in (5.1).




                                                                                                   (5.1)




Suppose the leftmost network in (5.1) is socketed so that the two modes at the top form the
output socket, and the four modes at the bottom form two input sockets so that modes in the
same socket are incident to the same vertex. In the middle in (5.1), we have adjoined two socket
vertices to the input sockets to obtain a realization 𝐷. On the right in (5.1), we display an
execution tree 𝒯𝐷 for 𝐷. Observe that the bottom-most internal vertices of 𝒯𝐷 , and the top-most
internal vertex of 𝒯𝐷 , have type (ii). The internal vertex in the center has type (iii). (There are no
internal vertices of type (i).) Supposing that all the modes have length at least 2, we also observe
that the vertices of type (ii) have amortized cost strictly less than their contraction cost.
    Let us now consider the 𝑘 th power of (5.1) visually, for 𝑘 = 4:



                                                                                                   (5.2)


    The leftmost network in (5.2) depicts the 𝑘-fold outer product of the network on the left in
(5.1) with itself. Observe that we simply take 𝑘 copies of the network, but that for the purposes
of the visualization we have taken care to draw the 𝑘 copies of each mode together for the
socketing. In the middle in (5.2), we have adjoined two socket vertices to the input sockets to
obtain a realization 𝐷 ⊗𝑘 of 𝐴 ⊗𝑘 . On the right in (5.2), we display an execution tree 𝒯𝐷 ⊗𝑘 for

                      T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                          22
                        T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

𝐷 ⊗𝑘 . Observe how each of the internal vertices of type (ii) in 𝒯𝐷 gets expanded to a sequence
of 𝑘 internal vertices in 𝒯𝐷 ⊗𝑘 . This transformation from 𝒯𝐷 to 𝒯𝐷 ⊗𝑘 is the gist of the following
theorem.
Theorem 5.1 (Formal statement of Theorem 1.3). Let 𝐷 be an arbitrary realization of 𝐴 and let 𝒯𝐷
be an arbitrary execution tree for 𝐷. For all positive integers 𝑘, we have

                                       𝑐(𝐴 ⊗𝑘 ) ≤ 𝑎(𝒯𝐷 ) 𝑘−1 𝑐(𝒯𝐷 ) .                              (5.3)

Furthermore, this realization of 𝐴 ⊗𝑘 consists of at most 𝑘 · |𝑉(𝐷)| vertices.

Proof. Let 𝐷 ∗ be the subnetwork of 𝐷 with 𝑇(𝐷 ∗ ) = 𝑇(𝐴).    ˆ     That is, 𝐷 ∗ is the network induced
by all the non-socket vertices of 𝐷. Taking 𝑘 disjoint copies of 𝐷 ∗ , we obtain a network whose
           ˆ ⊗𝑘 ). Attaching the resulting network to tensors at sockets gives a realization of 𝐴 ⊗𝑘 .
tensor is 𝑇(𝐴
Let us write 𝐷 ⊗𝑘 for this realization.
    To establish (5.3), it suffices to construct an execution tree 𝒯𝐷 ⊗𝑘 for 𝐷 ⊗𝑘 whose cost satisfies
𝑐(𝒯𝐷 ⊗𝑘 ) ≤ 𝑎(𝒯𝐷 ) 𝑘−1 𝑐(𝒯𝐷 ). We construct 𝒯𝐷 ⊗𝑘 by rewriting 𝒯𝐷 from leaves towards the root to
consider the 𝑘 copies of each vertex in 𝐷 ∗ . We start with leaf vertices which are the vertices of
𝐷 ⊗𝑘 . We split the process into cases (i), (ii), and (iii) as in the definition of amortized cost. Let 𝑥
be the internal vertex of 𝒯𝐷 that we are currently considering.
    In case (i), we perform the contraction indicated by 𝑥 in each of the 𝑘 copies of 𝐷 ∗ in 𝐷 ⊗𝑘
individually. This creates 𝑘 new internal vertices in 𝒯𝐷 ⊗𝑘 that are all copies of 𝑥. We set these 𝑘
vertices as the vertices that correspond to 𝑥 in the subsequent steps. Each of these contractions
in 𝒯𝐷 ⊗𝑘 has the same cost as the contraction indicated by 𝑥 in 𝒯𝐷 . This cost is less than or equal
to 𝑐(𝒯𝐷 ).
    In case (ii), let 𝑦 be the child of 𝑥 in 𝒯𝐷 such that the subtree rooted at 𝑦 contains a socket
vertex, and let 𝑧 be the other child of 𝑥 in 𝒯𝐷 . There is a single vertex in 𝒯𝐷 ⊗𝑘 corresponding to 𝑦
and 𝑘 identical vertices in 𝒯𝐷 ⊗𝑘 corresponding to 𝑧. We contract these 𝑘 vertices individually
each with the vertex that corresponds to 𝑦. This creates 𝑘 new internal vertices in 𝒯𝐷 ⊗𝑘 , where
we set the topmost vertex as the vertex that corresponds to 𝑥 in the subsequent steps. After the
𝑖th step, the corresponding tensor has 𝑖 copies of modes of 𝑥 and 𝑘 − 𝑖 copies of modes of 𝑦.
The cost of the contraction in the 𝑖th step is the cost of contracting 𝑦 and 𝑧 in 𝒯𝐷 multiplied by
the the volume of 𝑦 to the power 𝑘 − 𝑖 and the volume of 𝑥 to the power 𝑖 − 1. Since the volumes
of 𝑥 and 𝑦 are less than or equal to 𝑎(𝒯𝐷 ), this cost is less than or equal to 𝑎(𝒯𝐷 ) 𝑘−1 𝑐(𝒯𝐷 ).
    In case (iii), let 𝑦 and 𝑧 be the two child vertices of 𝑥 in 𝒯𝐷 . By the structure of the earlier
steps, we have that a single vertex in 𝐷 ⊗𝑘 corresponds to 𝑦, and similarly for 𝑧. We contract
these two vertices. This creates one new internal vertex in 𝒯𝐷 ⊗𝑘 , which we set as the vertex that
corresponds to 𝑥 in the subsequent steps. This tensor has 𝑘 copies of modes of 𝑥. The cost
of this contraction in 𝒯𝐷 ⊗𝑘 is the cost of the corresponding contraction in 𝒯𝐷 to the 𝑘 th power,
because both tensors have 𝑘 copies of all modes compared to 𝑦 and 𝑧. By definition, in case (iii)
the amortized cost of contracting 𝑦 and 𝑧 is the same as the cost of contracting 𝑦 and 𝑧. Hence
the cost of this contraction in 𝒯𝐷 ⊗𝑘 is less than or equal to 𝑎(𝒯𝐷 ) 𝑘 ≤ 𝑎(𝒯𝐷 ) 𝑘−1 𝑐(𝒯𝐷 ).
    This rewriting process produces an execution tree 𝒯𝐷 ⊗𝑘 for 𝐷 ⊗𝑘 with 𝑐(𝒯𝐷 ⊗𝑘 ) ≤ 𝑎(𝒯𝐷 ) 𝑘−1 𝑐(𝒯𝐷 ).
                                                                                                      

                      T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                           23
                            P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

    An immediate corollary is that tensor networks can use low rank decompositions of 𝑇(𝐴) to
efficiently evaluate 𝐴 ⊗𝑘 .

Corollary 5.2 (Submultiplicativity of low-rank executions). Let 𝐴 : 𝔽 𝐽(𝐸1 ) × 𝔽 𝐽(𝐸2 ) × · · · × 𝔽 𝐽(𝐸ℓ ) →
𝔽 𝐽(𝐸 ) be a multilinear map. Define 𝑚 = max{|𝐽(𝐸1 )|, |𝐽(𝐸2 )|, . . . , |𝐽(𝐸ℓ )|, |𝐽(𝐸0)|} and 𝑟 = rk 𝑇(𝐴).
     0


Then 𝑐(𝐴 ⊗𝑘 ) ≤ max(𝑟, 𝑚) 𝑘 min(𝑟, 𝑚)

Proof. By taking a star-like network topology (as in (5.1)) we get an execution with 𝑎(𝒯𝐷 ) =
max(𝑟, 𝑚) and cost 𝑐(𝒯𝐷 ) = 𝑚 · 𝑟.                                                          


5.2   Fast matrix multiplication
Let us now illustrate the use of Theorem 5.1 by capturing fast matrix multiplication with tensor
networks. We start by recalling that for a constant ℎ > 0, we write 𝜔(ℎ) for the rectangular matrix
multiplication exponent (see Section 1.3). It is immediate that max(2, ℎ + 1) ≤ 𝜔(ℎ) ≤ ℎ + 2; for
the state-of-the-art bounds on 𝜔(ℎ), see Le Gall and Urrutia [70].

Lemma 5.3. For all constants ℎ > 0 and 𝜖 > 0 it holds that an 𝑛 × b𝑛 ℎ c matrix may be multiplied with
an b𝑛 ℎ c × 𝑛 matrix in 𝑂(𝑛 𝜔(ℎ)+𝜖 ) operations by executing a tensor network.

Proof. Fix an 0 < 𝜖 < 1. Let 0 < 𝛿 1 , 𝛿2 < 1 be constants whose precise values we will fix later.
By definition of 𝜔(ℎ), for all 𝑡 > 𝜔(ℎ) there exists a constant 𝑈 > 0 such that the arithmetic
complexity of multiplying an 𝑐 × b𝑐 ℎ c matrix with an b𝑐 ℎ c × 𝑐 matrix is at most 𝑈 𝑐 𝑡 for all 𝑐. Since
the rank of a bilinear map is bounded from above by two times its (multiplicative) complexity
(see [23, §14.1]), it follows that there exist three 3-tensors 𝛼, 𝛽, 𝛾 respective shapes (𝑐 × 𝑏) × 𝑑,
(𝑏 × 𝑐) × 𝑑, and (𝑐 × 𝑐) × 𝑑 for positive integer constants 𝑏, 𝑐, and 𝑑 with max(𝑐 2 , 𝑐𝑏) ≤ 𝑑 ≤ 𝑐 𝜔(ℎ)+𝛿1
and 𝑐 ℎ−𝛿2 ≤ 𝑏 ≤ 𝑐 ℎ such that 𝛼, 𝛽, 𝛾 decompose the matrix multiplication tensor h𝑐, 𝑏, 𝑐i defined
by (3.4) as depicted below; the indices 𝑖1 , 𝑘1 , 𝑘2 , 𝑗1 , 𝑖2 , 𝑗2 refer to the tensor (3.4). The mode
shared by 𝛼, 𝛽, 𝛾 in (5.4) has length 𝑑, the modes 𝑖1 , 𝑖2 , 𝑗1 , 𝑗2 each have length 𝑐, and the modes
𝑘 1 , 𝑘2 have length 𝑏.
                                            i2 j2


                                                                        0       1
                                                                            γ
                                                                            2


                                          hc, b, ci                                                   (5.4)
                                                                2                       2
                                                            α                       β
                                                            0       1               0       1


                                  i1 k1             k2 j1


For example, for ℎ = 1, Strassen’s decomposition [93] as depicted in (5.5) below realizes (5.4)
with 𝑐 = 𝑏 = 2 and 𝑟 = 7. We use the numbering in magenta to indicate correspondence between

                      T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                              24
                       T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

modes in (5.4) and (5.5).

             1 0 1 0 1 -1 0 1              1 1 0 -1 0 1 0 1                  1 0 0 1 -1 0 1 1
             0 0 0 0 1 0 1                 0 0 1 0 0 1 0                  γ= 0 0 1 0 1 0 0 0
          α=                  0         β=                  0
             0 1 0 0 0 1 01                0 0 0 1 0 0 11                    0 1 0 1 0 0 01            (5.5)
             1 1 0 1 0 0 -1                1 0 -1 0 1 0 1                    1 -1 1 0 0 1 0
                        2                               2                                 2


    Let us next set up an application of Theorem 5.1. Let 𝐴 : 𝔽 [𝑐]×[𝑏] × 𝔽 [𝑏]×[𝑐] → 𝔽 [𝑐]×[𝑐] be the
bilinear map that multiplies a 𝑐 × 𝑏 matrix with a 𝑏 × 𝑐 matrix. Observe that the tensor of 𝐴 is
ˆ
𝑇(𝐴)  = h𝑐, 𝑏, 𝑐i. To realize 𝐴, define two input sockets in (5.4), namely {𝑖1 , 𝑘1 } and {𝑘 2 , 𝑗1 } to
obtain a realization 𝐷 and an execution tree 𝒯𝐷 as follows:




                                                                                                       (5.6)




Since max(𝑐 2 , 𝑐𝑏) ≤ 𝑑, the amortized cost of 𝒯𝐷 satisfies 𝑎(𝒯𝐷 ) = 𝑑. The cost is 𝑐(𝒯𝐷 ) =
max(𝑐 2 , 𝑐𝑏)𝑑.
                                 ℎ                ℎ
   Let the matrices 𝑋 ∈ 𝔽 [𝑛]×[b𝑛 c] and 𝑌 ∈ 𝔽 [b𝑛 c]×[𝑛] be given. We construct a tensor network
                                                                        ℎ log 𝑛
that multiplies 𝑋 and 𝑌. We may assume that 𝛿2 < ℎ. Let 𝑘 = d (ℎ−𝛿2 ) log 𝑐 e. We observe that
                                                                                               𝑘   𝑘
𝑛 ℎ ≤ 𝑐 (ℎ−𝛿2 )𝑘 ≤ 𝑏 𝑘 and 𝑛 ≤ 𝑐 (1−𝛿2 /ℎ)𝑘 ≤ 𝑐 𝑘 . Extend the matrices 𝑋 , 𝑌 to 𝑋 ∈ 𝔽 [𝑐 ]×[𝑏 ] and
         𝑘    𝑘
𝑌 ∈ 𝔽 [𝑏 ]×[𝑐 ] by inserting rows and columns with zero-entries.
    Since h𝑐, 𝑏, 𝑐i ⊗𝑘 = h𝑐 𝑘 , 𝑏 𝑘 , 𝑐 𝑘 i by (3.5), we have that 𝐴 ⊗𝑘 is the map that multiplies a
𝑐 × 𝑏 𝑘 matrix with a 𝑏 𝑘 × 𝑐 𝑘 matrix. Using Theorem 5.1 with 𝐷 and 𝒯𝐷 , we obtain 𝑐(𝐴 ⊗𝑘 ) ≤
 𝑘

𝑎(𝒯𝐷 ) 𝑘−1 𝑐(𝒯𝐷 ) = 𝑑 𝑘 max(𝑐 2 , 𝑐𝑏). Moreover, the realization 𝐷 ⊗𝑘 of 𝐴 ⊗𝑘 given by Theorem 5.1
consists of |𝑉(𝐷 ⊗𝑘 )| = 𝑂(𝑘) vertices. We can now associate 𝑋 and 𝑌 with the two socket vertices
of 𝐷 ⊗𝑘 , taking care to associate 𝑋 with the left socket (originating from {𝑖1 , 𝑘1 } and 𝛼) and 𝑌
with the right socket (originating from {𝑘 1 , 𝑗1 } and 𝛽). (Cf. (5.2) for an illustration how 𝐷 and 𝒯𝐷
in (5.6) yield 𝐷 ⊗𝑘 and 𝒯𝐷 ⊗𝑘 .) Executing 𝐷 ⊗𝑘 then results in the product matrix 𝐴 ⊗𝑘 (𝑋 , 𝑌) = 𝑋𝑌
in
                    𝑑 𝑘 max(𝑐 2 , 𝑐𝑏)|𝑉(𝐷 ⊗𝑘 )| ≤ 𝑐 (𝜔(ℎ)+𝛿1 )𝑘 𝑐 2 𝑏|𝑉(𝐷 ⊗𝑘 )|
                                                 ≤ 𝑛 (𝜔(ℎ)+𝛿1 )(ℎ/(ℎ−𝛿2 )) 𝑐 5+ℎ 𝑏|𝑉(𝐷 ⊗𝑘 )|
                                                 = 𝑂(𝑛 𝜔(ℎ)+𝜖 )

operations for all large enough 𝑛; here we have used the elementary upper bound 𝜔(ℎ) ≤ ℎ + 2
and selected 𝛿1 , 𝛿2 to be small enough constants (that depend on the constants ℎ and 𝜖). 

                      T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                              25
                            P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

    The following lemma illustrates that we can also capture rectangular matrix multiplication
by reduction to square matrix multiplication using tensor networks. This lemma in particular
covers the cases when 𝑟 does not grow as a polynomial function of 𝑛. We also observe that if
𝜔 = 2, the upper bounds in the lemma are optimal up to the choice of 𝜖 > 0 because of the size
of the input/output.

Lemma 5.4. For all 𝜖 > 0 it holds that we may multiply an 𝑛 × 𝑟 matrix with an 𝑟 × 𝑛 matrix by
executing a tensor network in

   1. 𝑂(𝑛 𝜔+𝜖−1 𝑟) operations when 𝑟 ≥ 𝑛, and

   2. 𝑂(𝑛 2 𝑟 𝜔+𝜖−2 ) operations when 𝑟 ≤ 𝑛.

Proof. Fix an 𝜖 > 0 and let 𝛼, 𝛽, 𝛾 be three 3-tensors of shape (𝑐 × 𝑐) × 𝑑 for constants 𝑏 = 𝑐 and
𝑑 as in the proof of Lemma 5.3. Let the matrices 𝑋 ∈ 𝔽 [𝑛]×[𝑟] and 𝑌 ∈ 𝔽 [𝑟]×[𝑛] be given. We
construct tensor networks that compute the product 𝑋𝑌.
    To establish (i), first pad 𝑋 and 𝑌 using rows and columns of zero-entries so that both 𝑛
and 𝑟 become positive integer powers of 𝑐 and 𝑛 divides 𝑟. This increases 𝑛 and 𝑟 by at most a
multiplicative factor 𝑐. We now have 𝑛 = 𝑐 𝑘 and 𝑟 = 𝑐 𝑡 for positive integers 𝑡 ≥ 𝑘.
    Observe that we can compute the 𝑛 × 𝑛 product matrix 𝑋𝑌 by taking the sum of 𝑛𝑟 = 𝑐 𝑡−𝑘
matrix products of size 𝑛 × 𝑛.
    Let us implement this computation with a tensor network. Reshape 𝑋 to a (𝑘 + 𝑡)-tensor
whose all modes have length 𝑐. The first 𝑘 modes index the rows, the last 𝑡 modes index the
columns. Reshape 𝑌 to a (𝑡 + 𝑘)-tensor whose all modes have length 𝑐. The first 𝑡 modes index
the rows, the last 𝑘 modes index the columns.
    Connect 𝑋 and 𝑌 into a network as displayed in (5.7) on the right.

                                                           k modes,
                                                           each of length c
                                                              {




                                 hn, r, ni            hck , ck , ck i
                                                                                                       (5.7)


                             X               Y       X              Y
                                                         Ic        } t − k modes,
                                                              Ic    each of length c


That is, we join 𝑡 − 𝑘 column modes of 𝑋 with the matching 𝑡 − 𝑘 row modes of 𝑌 using 𝑡 − 𝑘
identity matrices 𝐼 𝑐 (to avoid degeneracy of the network if 𝑋 and 𝑌 are removed). Then we
connect the remaining modes of 𝑋 and 𝑌 to the left and right sockets of a matrix multiplication
network for 𝑐 𝑘 × 𝑐 𝑘 matrices as depicted by h𝑐 𝑘 , 𝑐 𝑘 , 𝑐 𝑘 i in (5.7); this matrix multiplication network
is obtained as in the proof of Lemma 5.3 with 𝑏 = 𝑐. The result is a multiplication network

                       T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                              26
                        T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

h𝑛, 𝑟, 𝑛i as depicted in (5.7) on the left. We execute this network using the structure on the right
in (5.7) by first contracting (with zero cost) the identity matrices 𝐼 𝑐 with 𝑌. We then execute the
remaining network as in the proof of Lemma 5.3. For large enough 𝑛, the cost of the execution
is at most 𝑐 𝑘(𝜔+𝜖/2)+2 𝑐 𝑡−𝑘+1 , which translates to 𝑂(𝑛 𝜔+𝜖 𝑟) operations since the network has 𝑂(𝑘)
vertices whose contractions have nonzero cost.
    To establish (ii), first pad 𝑋 and 𝑌 using rows and columns of zero-entries so that both 𝑛
and 𝑟 become positive integer powers of 𝑐 and 𝑟 divides 𝑛. This increases 𝑟 and 𝑛 by at most a
multiplicative factor 𝑐. We now have 𝑟 = 𝑐 𝑘 and 𝑛 = 𝑐 𝑡 for positive integers 𝑡 ≥ 𝑘.
    Observe that we can compute the 𝑛 × 𝑛 product matrix 𝑋𝑌 by taking ( 𝑛𝑟 )2 = 𝑐 2(𝑡−𝑘) matrix
products of size 𝑟 × 𝑟.
    Let us implement this computation with a tensor network. Reshape 𝑋 to a (𝑡 + 𝑘)-tensor
whose all modes have length 𝑐. The first 𝑡 modes index the rows, the last 𝑘 modes index the
columns. Reshape 𝑌 to a (𝑘 + 𝑡)-tensor whose all modes have length 𝑐. The first 𝑘 modes index
the rows, the last 𝑡 modes index the columns. Connect 𝑋 and 𝑌 into a network as displayed in
(5.7) on the right.
                                               t − k modes,            k modes,
                                               each of length c        each of length c
                                                            {




                                                                     {

                                                       Ic                          Ic
                               hn, r, ni          Ic        hck , ck , ck i   Ic
                                                                                                      (5.8)



                           X               Y                X            Y

That is, we join 𝑡 − 𝑘 row modes of 𝑋 each to an identity matrix 𝐼 𝑐 whose other mode is at the
boundary. Similarly, we join matching 𝑡 − 𝑘 column modes of 𝑌 each to an identity matrix 𝐼 𝑐
whose other mode is at the boundary. (This is to avoid degeneracy of the network if 𝑋 and 𝑌
are removed.) Then connect the remaining modes of 𝑋 and 𝑌 to the left and right sockets of a
matrix multiplication network for 𝑐 𝑘 × 𝑐 𝑘 matrices as depicted by h𝑐 𝑘 , 𝑐 𝑘 , 𝑐 𝑘 i in (5.8); this matrix
multiplication network is obtained as in the proof of Lemma 5.3 with 𝑏 = 𝑐. The result is a
multiplication network h𝑛, 𝑟, 𝑛i as depicted in (5.8) on the left. We execute this network using
the structure on the right in (5.8) by first contracting (with zero cost) the identity matrices 𝐼 𝑐 with
𝑋 and 𝑌, respectively. We then execute the remaining network as in the proof of Lemma 5.3.
For large enough 𝑟, the cost of the execution is at most 𝑐 𝑘(𝜔+𝜖/2)+2 𝑐 2(𝑡−𝑘)+1 , which translates to
𝑂(𝑛 2 𝑟 𝜔−1+𝜖 ) operations since the network has 𝑂(𝑘) vertices whose contractions have nonzero
cost.                                                                                                     
    Let us conclude this subsection with a well-known lemma on rectangular matrix multiplica-
tion that we can also capture with tensor networks.
Lemma 5.5. For all non-negative integers 𝑎, 𝑏, 𝑐 and 𝜖 > 0 it holds that we may multiply an 𝑛 𝑎 × 𝑛 𝑏
matrix by an 𝑛 𝑏 × 𝑛 𝑐 matrix via a tensor network in 𝑂(𝑛 max(𝑎+𝑏,𝑏+𝑐,𝑎+𝑐)(𝜔+𝜖)/2 ) operations.

                      T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                              27
                          P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

Proof. By symmetry we may assume that 𝑎 ≤ 𝑐. Thus there are three cases to consider, namely
(i) 𝑎 ≤ 𝑏 ≤ 𝑐, (ii) 𝑎 ≤ 𝑐 ≤ 𝑏, and (iii) 𝑏 ≤ 𝑎 ≤ 𝑐.
     When 𝑎 ≤ 𝑏 ≤ 𝑐, we need to achieve 𝑂(𝑛 (𝑏+𝑐)(𝜔+𝜖)/2 ) operations. Toward this end, it
suffices to multiply an 𝑛 𝑏 × 𝑛 𝑏 matrix with an 𝑛 𝑏 × 𝑛 𝑐 matrix, which can be implemented as
𝑛 𝑐−𝑏 multiplications of two square matrices of size 𝑛 𝑏 × 𝑛 𝑏 . Proceeding analogously as in
Lemma 5.4, we obtain a network that can be executed in 𝑂(𝑛 𝑐−𝑏 𝑛 𝑏(𝜔+𝜖/2) ) operations. We have
𝑐 − 𝑏 + 𝑏(𝜔 + 𝜖) ≤ (𝑏 + 𝑐)(𝜔 + 𝜖)/2 as desired since 𝑐 ≥ 𝑏, 𝜔 ≥ 2, and 𝜖 ≥ 0.
     When 𝑎 ≤ 𝑐 ≤ 𝑏, we need to achieve 𝑂(𝑛 (𝑏+𝑐)(𝜔+𝜖)/2 ) operations. Toward this end, it suffices
to multiply an 𝑛 𝑐 × 𝑛 𝑏 matrix with an 𝑛 𝑏 × 𝑛 𝑐 matrix, and apply part (i) of Lemma 5.4 to
obtain a network that can be executed in 𝑂(𝑛 𝑏+𝑐(𝜔+𝜖−1) ) operations. We have 𝑏 + 𝑐(𝜔 + 𝜖 − 1) ≤
(𝑏 + 𝑐)(𝜔 + 𝜖)/2 as desired since 𝑏 ≥ 𝑐, 𝜔 ≥ 2, and 𝜖 ≥ 0.
     When 𝑏 ≤ 𝑎 ≤ 𝑐, we need to achieve 𝑂(𝑛 (𝑎+𝑐)(𝜔+𝜖)/2 ) operations. Toward this end, it suffices
to multiply an 𝑛 𝑎 × 𝑛 𝑏 matrix with an 𝑛 𝑏 × 𝑛 𝑐 matrix. An easy modification of part (ii) of
Lemma 5.4 gives a network that can be executed in 𝑂(𝑛 𝑎+𝑐+𝑏(𝜔+𝜖−2) ) operations. We have
𝑎 + 𝑐 + 𝑏(𝜔 + 𝜖 − 2) ≤ (𝑎 + 𝑐)(𝜔 + 𝜖)/2 as desired since 𝑏 ≤ 𝑎, 𝑏 ≤ 𝑐, 𝜔 ≥ 2, and 𝜖 ≥ 0.         

5.3   Homomorphism-counting for pattern graphs of small branchwidth
Our main result in this section is the following upper bound for 𝑃-linear forms when 𝑃 is a
graph of small branchwidth.
Lemma 5.6. For any fixed pattern graph 𝑃 and every 𝜖 > 0, there is a tensor network that evaluates the
𝑃-linear form of order 𝑛 in 𝑂(𝑛 2 + 𝑛 bw(𝑃)(𝜔+𝜖)/2 ) operations.
Proof. Consider any branch decomposition of 𝑃 of width bw(𝑃). Rooting this decomposition
arbitrarily by subdividing any edge and taking the newly added vertex as root, we obtain a
binary rooted tree 𝑇 with |𝐸(𝑃)| leaves where the leaves are identified by the edges of 𝑃. Let 𝑟
be the root of 𝑇 and for each vertex 𝑢 of 𝑇, let:
   1. 𝐶𝑢 ⊆ 𝑉(𝑃) (mnemonic: 𝐶 for “crossing” or “cut”) be the set of all vertices of 𝑃 that appear
      both in some leaf in the subtree rooted at 𝑢, and in some leaf outside the subtree. By
      definition |𝐶𝑢 | ≤ bw(𝑃) for all 𝑢, and 𝐶 𝑟 = ∅.
   2. 𝐷𝑢 ⊆ 𝑉(𝑃) (mnemonic: 𝐷 for “done”) be the set of all vertices of 𝑃 that appear only in
      leaves in the subtree rooted at 𝑢, and not outside. Note that 𝐶𝑢 and 𝐷𝑢 form a partition of
      the set of all vertices appearing in some leaf in the subtree rooted at 𝑢, and that 𝐷𝑟 = 𝑉(𝑃).
   3. 𝐸𝑢 ⊆ 𝐸(𝑃) be the set of all leaves of the subtree of 𝑇 rooted at 𝑢 (recall that each leaf of 𝑇
      corresponds to an edge). Note that 𝐸𝑟 = 𝐸(𝑃).
                𝐶𝑢
   4. 𝐴𝑢 ∈ 𝔽 [𝑛] be the following order-|𝐶𝑢 | tensor of shape 𝑛 × 𝑛 × · · · × 𝑛 defined for a position
      𝑖 ∈ [𝑛]𝐶𝑢 by                               Õ Ö
                                                            (𝑆)
                                      (𝐴𝑢 )𝑖 =             𝑋(𝑖𝑗) ,
                                                                 |𝑆
                                                 𝑗∈[𝑛]𝐷𝑢 𝑆∈𝐸𝑢

      where 𝑖𝑗 ∈ [𝑛]𝐶𝑢 ∪𝐷𝑢 is the Cartesian product of 𝑖 and 𝑗.

                     T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                         28
                          T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

     Note that 𝐴𝑟 equals the value of the 𝑃-linear form (3.14). Furthermore, for a leaf 𝑢 of 𝑇
corresponding to an edge {𝑥 1 , 𝑥2 } of 𝑃, the tensor 𝐴𝑢 is easy to compute: if both 𝑥 1 and 𝑥 2 have
degree at least 2, then 𝐴𝑢 simply equals the input tensor 𝑋 {𝑥1 ,𝑥2 } . On the other hand if 𝑥 1 and/or
𝑥 2 has degree 1, then 𝐴𝑢 is either a vector or a scalar, being either the row sums, column sums,
or sum of all entries of 𝑋 {𝑥1 ,𝑥2 } . Each of these is readily computed in 𝑂(𝑛 2 ) time by a contraction
of 𝑋 {𝑥1 ,𝑥2 } with an appropriate tensor of ones.
     As we shall see below, for every non-leaf vertex 𝑥 of 𝑇 with children 𝑦 and 𝑧, the tensor
𝐴 𝑥 equals the contraction of 𝐴 𝑦 and 𝐴 𝑧 . Thus, the desired result would follow if we can
show that the contraction of two siblings 𝐴 𝑦 and 𝐴 𝑧 can be computed by a tensor network in
𝑂(𝑛 bw(𝑃)(𝜔+𝜖) /2) operations. We now proceed to establish this.
     Partition 𝐶 𝑥 into 𝐶 𝑥 𝑦 ∪ 𝐶 𝑥𝑧 ∪ 𝐶 𝑥 𝑦𝑧 where 𝐶 𝑥 𝑦𝑧 = 𝐶 𝑥 ∩ 𝐶 𝑦 ∩ 𝐶 𝑧 , 𝐶 𝑥 𝑦 = 𝐶 𝑥 \ 𝐶 𝑧 and 𝐶 𝑥𝑧 = 𝐶 𝑥 \ 𝐶 𝑦
(note that every element of 𝐶 𝑥 must appear in exactly one of 𝐶 𝑦 and 𝐶 𝑧 so these three sets indeed
partition 𝐶 𝑥 ). Symmetrically partition 𝐶 𝑦 into 𝐶 𝑥 𝑦 ∪ 𝐶 𝑦𝑧 ∪ 𝐶 𝑥 𝑦𝑧 and 𝐶 𝑧 into 𝐶 𝑥𝑧 ∪ 𝐶 𝑦𝑧 ∪ 𝐶 𝑥 𝑦𝑧 .
Note that 𝐶 𝑦𝑧 ⊆ 𝐷𝑥 and that the contraction of 𝐴 𝑦 and 𝐴 𝑧 at position (𝑖, 𝑗, 𝑘) for 𝑖 ∈ [𝑛]𝐶 𝑥𝑦 ,
𝑗 ∈ [𝑛]𝐶 𝑥𝑧 , 𝑘 ∈ [𝑛]𝐶 𝑥 𝑦𝑧 is exactly
                                    Õ
                                                  𝐴 𝑦 (𝑖, 𝑘, ℓ )𝐴 𝑧 (𝑗, 𝑘, ℓ ) = 𝐴 𝑥 (𝑖, 𝑗, 𝑘)
                                           𝐶 𝑦𝑧
                                  ℓ ∈[𝑛]

                                                                                                            (𝑥)
Let 𝑎 = |𝐶 𝑥 𝑦 |, 𝑏 = |𝐶 𝑦𝑧 |, and 𝑐 = |𝐶 𝑥𝑧 |. Split each mode in 𝐶 𝑥 𝑦 into two separate modes 𝐶 𝑥 𝑦 and
  (𝑦)
𝐶 𝑥 𝑦 where the former is used to index 𝐴 𝑥 and the latter is used to index 𝐴 𝑦 . Similarly split the
modes in 𝐶 𝑥𝑧 and 𝐶 𝑦𝑧 , but not the modes in 𝐶 𝑥 𝑦𝑧 .
     The contraction of 𝐴 𝑦 and 𝐴 𝑧 can then be evaluated using rectangular matrix multiplication
with the following tensor network.
                                                                 (x)                (x)
                                              Cxyz            Cxy              Cxz


                                                                hna , nb , nc i
                                                                                                                  (5.9)
                                                          (y)          (y)                (z)
                                                        Cxy       Cyz                 Cxz
                                                                              (z)
                                                                             Cyz
                                                          Ay                         Az

    By Lemma 5.5, this network can be evaluated in 𝑂(𝑛 |𝐶 𝑥𝑦𝑧 | 𝑛 max(𝑎+𝑏,𝑏+𝑐,𝑎+𝑐)(𝜔+𝜖)/2 ) operations.
Since 𝑎+𝑏+|𝐶 𝑥 𝑦𝑧 | = |𝐶 𝑦 | ≤ bw(𝑃), 𝑏+𝑐+|𝐶 𝑥 𝑦𝑧 | = |𝐶 𝑧 | ≤ bw(𝑃), and 𝑎+𝑐+|𝐶 𝑥 𝑦𝑧 | = |𝐶 𝑥 | ≤ bw(𝑃),
the number of operations used to contract 𝐴 𝑦 and 𝐴 𝑧 is bounded by 𝑂(𝑛 bw(𝑃)(𝜔+𝜖)/2 ).                

5.4     Clique-counting forms
For counting 𝑣-cliques, the general upper bound for 𝑃-linear forms (Lemma 5.6) with 𝑃 = 𝐾 𝑣
                                                                                          (𝜔+𝜖)
gives a running time of 𝑂(𝑛 (𝜔+𝜖)d2𝑣/3e/2 ) = 𝑂(𝑛 (𝜔+𝜖)b𝑣/3c+ 2 (𝑣 mod 3) ). In this section, we give
a slightly improved tensor-network algorithm, matching the running time of the currently

                        T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                       29
                              P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

fastest known algorithm of Eisenbrand and Grandoni [44]. That this is possible is more or less
immediate given that tensor networks can do fast rectangular matrix multiplication. It can
also be viewed as a consequence of a more careful analysis of the proof of Lemma 5.6 where,
towards the end, instead of using the generic upper bound of Lemma 5.5 for rectangular matrix
multiplication, we use the precise shape of the bottleneck matrix multiplication needed for the
𝑣-clique case. For completeness, we give a more direct argument below.
    For brevity, throughout this section we refer to the 𝑃-linear form for 𝑃 = 𝐾 𝑣 as the 𝑣2 -linear
                                                                                            
form.
Lemma 5.7. For all constants 𝜖 > 0 and 𝑣 ≥ 3, the 𝑣2 -linear form of order 𝑛 can be evaluated by
                                                                  
executing a tensor network in 𝑂(𝑛 𝛽+𝜖 ) operations, where 𝜔𝑣/3 ≤ 𝛽 ≤ 𝜔𝑣/3 + (𝑣 mod 3) is the exponent
of the arithmetic complexity of the h𝑛 b𝑣/3c , 𝑛 b(𝑣+1)/3c , 𝑛 b(𝑣+2)/3c i matrix multiplication map.
Proof. Let 𝑆1 , 𝑆2 , and 𝑆3 be an arbitrary partition of the 𝑣 vertices of 𝐾 𝑣 into sets of sizes b𝑣/3c,
b(𝑣 + 1)/3c and b(𝑣 + 2)/3c, respectively. Let 𝐸 𝑖𝑗 = 𝐸(𝑆 𝑖 , 𝑆 𝑖 ∪ 𝑆 𝑗 ) be the edges inside 𝑆 𝑖 and
between 𝑆 𝑖 and 𝑆 𝑗 . In particular, 𝐸12 , 𝐸23 and 𝐸31 form a partition of the edges of 𝐾 𝑣 .
     We construct a network for evaluating the clique-counting form as follows. First, contract
all inputs 𝑋 (𝑎𝑏) with 𝑎𝑏 ∈ 𝐸12 naïvely. This results in a tensor 𝑌 (12) with volume 𝑛 |𝑆1 |+|𝑆2 | which
we view as an 𝑛 |𝑆1 | × 𝑛 |𝑆2 | matrix where the rows are indexed by multisubsets 𝑅 of [𝑛] of size
|𝑆1 | and the columns by multisubsets 𝐶 of size |𝑆2 |; the entry at row 𝑅, column 𝐶 is the 0-1
indicator of whether 𝑅 ∪ 𝐶 is a set of size |𝑆1 | + |𝑆2 | where all vertices of 𝑅 have edges to all
other vertices in 𝑅 ∪ 𝐶. Contract all of 𝐸23 and 𝐸31 similarly, obtaining an 𝑛 |𝑆2 | × 𝑛 |𝑆3 | matrix
𝑌 (23) and an 𝑛 |𝑆3 | × 𝑛 |𝑆1 | matrix 𝑌 (31) . The cost of creating 𝑌 (𝑖𝑗) is simply its volume 𝑛 |𝑆𝑖 |+|𝑆 𝑗 | ≤ 𝑛 𝛽 .
     Next, perform the matrix multiplication 𝑍 = 𝑌 (12) · 𝑌 (23) in time 𝑂(𝑛 𝛽+𝜖 ) using the same
method as in the proof of Lemma 5.3. The matrix 𝑍 at row 𝑅 ∈ [𝑛]|𝑆1 | , column 𝐶 ∈ [𝑛]|𝑆3 | counts
the subsets 𝑈 ∈ [𝑛]|𝑆2 | such that 𝑊 = 𝑅 ∪ 𝐶 ∪ 𝑈 is a set of size |𝑆1 | + |𝑆2 | + |𝑆3 | (viewing 𝑅, 𝐶
and 𝑈 as multisubsets of [𝑛] in the obvious way) where all vertices in 𝑅 ∪ 𝑈 have edges to all
other vertices in 𝑊.
     Finally, contract 𝑍 with 𝑌 (31) (identifying the rows of 𝑍 with the columns of 𝑌 (31) and vice
versa) in time 𝑛 |𝑆3 |+|𝑆1 | ≤ 𝑛 𝛽 to obtain the number of 𝑣-cliques in the input graph.                           

5.5    Fast Fourier transforms and fast convolution
Let us next express fast Fourier transforms as tensor networks and their executions. We start by
observing that the Cooley–Tukey [35] fast Fourier transform on ℤ2𝑘 can be implemented as a
tensor network. We assume 𝜌 is a primitive (2 𝑘 )th root of unity in the field 𝔽 .
Lemma 5.8. The discrete Fourier transform for the Abelian group ℤ2𝑘 can be computed by executing a
tensor network in 𝑂(2 𝑘 𝑘) operations.
Proof. The case 𝑘 = 0 is immediate so let us assume that 𝑘 ≥ 1. We construct a tensor network
                                              𝑘
whose execution multiplies a vector 𝑥 ∈ 𝔽 [2 ] with the DFT matrix Φ in Section 3.1 for 𝑛 = 2 𝑘 to
                          𝑘
yield the result Φ𝑥 ∈ 𝔽 [2 ] . Toward this end, let us write 𝐼𝑚 for an 𝑚 × 𝑚 identity matrix,
                                                                     
                                                            1  1
                                                 𝐻2 =                                                          (5.10)
                                                            1 −1

                         T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                     30
                               T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

                                                                                     𝑘
for the 2 × 2 Hadamard-Walsh matrix, and 𝑅 (𝑘,𝑗) ∈ 𝔽 [2 ] for the vector obtained by concatenating
2 𝑗 copies of the vector
                                                                   𝑗       𝑗                   𝑗   𝑘−1−𝑗 −1)
                                          [1, 1, . . . , 1 , 𝜌2 ·0 , 𝜌2 ·1 , . . . , 𝜌2 ·(2                     ]
                                           | {z } |                               {z                       }
                                               2 𝑘−1−𝑗                           2 𝑘−1−𝑗

for 𝑗 = 0, 1, . . . , 𝑘 − 2. We can decompose Φ into a sequence of 2 𝑘 × 2 𝑘 matrices (see, e. g., [97])

                              Φ = 𝑃 (𝑘) 𝐵(𝑘,𝑘−1)𝑇 (𝑘,𝑘−2) 𝐵(𝑘,𝑘−2) · · · 𝑇 (𝑘,1) 𝐵(𝑘,1)𝑇 (𝑘,0) 𝐵(𝑘,0)                                   (5.11)

consisting of alternating butterfly matrices

                                  𝐵(𝑘,𝑗) = 𝐼2 ⊗ 𝐼2 ⊗ · · · ⊗ 𝐼2 ⊗𝐻2 ⊗ 𝐼2 ⊗ 𝐼2 ⊗ · · · ⊗ 𝐼2
                                              |          {z            }                   |          {z            }
                                                           𝑗                                        𝑘−𝑗−1


and diagonal twiddle matrices
                                                         𝑇 (𝑘,𝑗) = diag(𝑅 (𝑘,𝑗) )

followed by a permutation matrix 𝑃 (𝑘) that permutes the indices in [2 𝑘 ] viewed as 𝑘-bit strings
by reversing the bit-order. Since only the (𝑗 + 1)th Kronecker component in the butterfly matrix
𝐵(𝑘,𝑗) is a nonidentity matrix, and multiplication of a vector with the diagonal twiddle matrix
𝑇 (𝑘,𝑗) corresponds to pointwise (Hadamard) multiplication with the vector 𝑅 (𝑘,𝑗) on the diagonal,
we observe that the sequence (5.11) can be represented as a tensor network 𝐷 ∗ as depicted below
(for 𝑘 = 5) so that all the modes have length 2.

                                 R5,3                     R5,2                           R5,1                       R5,0
                     H2
                                              H2
                                                                        H2                                                              (5.12)
                                                                                                       H2
                                                                                                                               H2

      Φ =   P (5)   B (5,4)     T (5,3)      B (5,3)     T (5,2)       B (5,2)           T (5,1)      B (5,1)       T (5,0)   B (5,0)


                                                                                                       𝑘
We can now connect the network (5.12) to a data vector 𝑥 ∈ 𝔽 [2 ] to obtain the network 𝐷 below:



                                                                                                                                        (5.13)




Below we depict in red an execution tree 𝒯𝐷 with cost 2 𝑘+1 for the network (5.13). Observe that
the mode permutation (multiplication with the matrix 𝑃 (5) ) is not part of the execution since the

                              T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                                        31
                          P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

permutation amounts to merely rearranging the modes.




                                                                                                (5.14)




Since the network has 𝑂(𝑘) tensors, the execution (5.14) can be carried out in 𝑂(2 𝑘 𝑘) operations
in 𝔽 .                                                                                           
Lemma 5.9. The discrete Fourier transform for the elementary Abelian group ℤ2𝑘 can be computed by
executing a tensor network in 𝑂(2 𝑘 𝑘) operations.

Proof. This proof is analogous to the proof of Lemma 5.8 but omits the twiddle matrices 𝑇 (𝑘,𝑗)
and the final permutation matrix 𝑃 (𝑘) from the decomposition of the tensor network.        
   The following corollary is immediate from Lemma 5.8.

Lemma 5.10. The group algebra products on 𝔽 [ℤ2𝑘 ] and 𝔽 [ℤ2𝑘 ] can be computed by executing a tensor
network in 𝑂(2 𝑘 𝑘) operations whenever 2 is a unit in 𝔽 .
                                            𝑘
Proof. We start with 𝔽 [ℤ2𝑘 ]. Let 𝑓 , 𝑔 ∈ 𝔽 [2 ] be two vectors given as input. Our task is to compute
                                   𝑘
the ℤ2𝑘 -convolution 𝑓 ∗ 𝑔 ∈ 𝔽 [2 ] . The case 𝑘 = 0 is immediate so suppose that 𝑘 ≥ 1. Recalling
that 𝑓 ∗ 𝑔 = Φ−1 ((Φ 𝑓 ) · (Φ𝑔)), where “·” denotes an elementwise (Hadamard) product of two
vectors of length 2 𝑘 , let us construct a tensor network as follows. First take the FFT of 𝑓 and 𝑔
using Lemma 5.8, then multiply the resulting vectors elementwise, and finally take the inverse
FFT by replacing 𝜌 with 𝜌−1 in Lemma 5.8 and multiplying with the diagonal matrix 21𝑘 𝐼2𝑘 .
Below we display (for 𝑘 = 5) the resulting network 𝐷 that executes to 𝑓 ∗ 𝑔.



                                                                                                (5.15)



The execution of the network proceeds from right to left analogously to (5.14).




                                                                                                (5.16)




                     T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                          32
                         T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

The cost of this execution is 2 𝑘+1 . Since the network has 𝑂(𝑘) tensors, the execution (5.16) can be
carried out in 𝑂(2 𝑘 𝑘) operations in 𝔽 .
    The case of 𝔽 [ℤ2𝑘 ] is analogous but replacing Lemma 5.8 with Lemma 5.9 and modifying the
networks accordingly.                                                                               

5.6   Yates’ algorithm
A particularly simple use case for Theorem 5.1 occurs when 𝐴 : 𝔽 [𝑠] → 𝔽 [𝑡] is a linear map. It
is immediate that we can realize 𝐴 with a two-vertex network and an execution tree that has
amortized cost max(𝑠, 𝑡) and cost 𝑠𝑡.
                                                                              𝑘
      Then, Theorem 5.1 immediately implies that we can evaluate 𝐴 ⊗𝑘 : 𝔽 [𝑠] → 𝔽 [𝑡] using a tensor
network with cost max(𝑠 𝑘+2 , 𝑡 𝑘+2 ) and 𝑂(𝑘) vertices. In particular, the network can be executed
in 𝑂(max(𝑠 𝑘+2 , 𝑡 𝑘+2 )𝑘) operations in 𝔽 . This network in essence realizes Yates’ algorithm [104]
for multiplying an 𝑠 𝑘 -length vector with the 𝑘 th Kronecker power of an 𝑠 × 𝑡 matrix to obtain
𝑡 𝑘 -length vector.
      Applying the previous observation to 𝐴 = 𝐻2 in (5.10) with 𝑠 = 𝑡 = 2, we obtain Lemma 5.9
as an immediate corollary. Similarly, other choices of 2 × 2 matrices yield the algebraic
core of currently the fastest known algorithms for problems such as graph coloring and its
generalizations such as computing the Tutte polynomial of a graph [16, 17, 19]. In particular,
the pair of mutually inverse 2 × 2 matrices
                                                                            
                                             1 1                        1 −1
                                  𝑍2 =                 ,     𝑀2 =
                                             0 1                        0  1

yield, as 𝑍2⊗𝑘 and 𝑀2⊗𝑘 , the zeta and Möbius tranforms for the lattice ({0, 1} 𝑘 , ⊆, ∩, ∪) of all
subsets of a 𝑘-element set, partially ordered by subset inclusion. Theorem 5.1 yields immediately
the standard algorithms (normally developed via Yates’s algorithm) for the fast zeta and the fast
Möbius transforms via tensor networks. These networks can then be combined as in the proof
of Lemma 5.10 to yield the associated bilinear convolution maps (multiplication maps in the
semigroup algebra 𝔽 [({0, 1} 𝑘 , ⊆, ∩, ∪)]) to realize these maps in 𝑂(2 𝑘 𝑘) operations. We omit the
details due to similarity with Lemma 5.10.

5.7   Kruskal operator
We proceed to implement the Kruskal operator by taking Kronecker products and reducing to
fast rectangular matrix multiplication. Let us first proceed via Lemma 5.4 relying on square
matrix multiplication.

Lemma 5.11. For all constants 𝜖 > 0 and ℓ = 1, 2, . . . it holds that we may evaluate the Kruskal operator
of ℓ matrices of shape 𝑛 × 𝑟 by executing a tensor network in

   1. 𝑂(𝑛 dℓ /2e(𝜔+𝜖−1) 𝑟) operations when 𝑟 ≥ 𝑛 dℓ /2e , and

   2. 𝑂(𝑛 2dℓ /2e 𝑟 𝜔+𝜖−2 ) operations when 𝑟 ≤ 𝑛 dℓ /2e .

                       T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                           33
                              P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

Proof. Fix an 𝜖 > 0 and let 𝛼, 𝛽, 𝛾 be three 3-tensors of shape (𝑐 × 𝑐) × 𝑑 for constants 𝑏 = 𝑐 and
𝑑 as in the proof of Lemma 5.3. Let 𝐴(1) , 𝐴(2) , . . . , 𝐴(ℓ ) ∈ 𝔽 [𝑛]×[𝑟] be given as input. We construct
a tensor network that computes the output 𝑌 of the Kruskal operator (3.8).
     Without loss of generality we may assume that ℓ is even by introducing a matrix 𝐴(ℓ +1) ∈
𝔽 [𝑛]×[𝑟]  filled with 1-entries and setting ℓ ← ℓ + 1. By inserting rows and columns with
zero-entries as necessary, we can assume that both 𝑛 and 𝑟 are positive integer powers of
𝑐. The key idea is now to take Kronecker products of the matrices 𝐴(1) , 𝐴(2) , . . . , 𝐴(ℓ /2) and
                                                                                                    ℓ /2
𝐴(ℓ /2+1) , 𝐴(ℓ /2+2) , . . . , 𝐴(ℓ ) in the vertical dimension only to obtain two matrices 𝐵 ∈ 𝔽 [𝑛 ]×[𝑟] and
          ℓ /2
𝐶 ∈ 𝔽 [𝑛 ]×[𝑟] , respectively. We then multiply 𝐵 and the transpose of 𝐶 using fast rectangular
                                                                      ℓ /2 ℓ /2
matrix multiplication from Lemma 5.4 to obtain 𝑌 ∈ 𝔽 [𝑛 ]×[𝑛 ] . It is immediate from Lemma 5.4
that this results in the operation counts claimed in (i) and (ii) as long as we can realize the idea
using tensor networks. Taking Kronecker product in the vertical dimension only can be realized
by joining all the horizontal dimensions to a common mode, which becomes the inner mode for
matrix multiplication. The resulting network is depicted below (for ℓ = 6), where either (5.7) or
(5.8) is used for the subnetwork indicated with h𝑛ℓ /2 , 𝑟, 𝑛ℓ /2 i depending on whether (i) or (ii)
holds.

                                                   hnl/2 , r, nl/2 i
                                                                                                                  (5.17)

                                        A(1) A(2) A(3) A(4) A(5) A(6)

In drawing (5.17) we have made two abstractions. First, each drawn mode in fact is a bundle
of modes, each of length 𝑐. Second, each mode in a bundle that is incident to one of the input
matrices 𝐴(1) , 𝐴(2) , . . . , 𝐴(ℓ ) is in fact subdivided by inserting an identity matrix 𝐼 𝑐 just before the
incidence to the input matrix. The network (5.17) is executed first by contracting the (zero-cost)
identity matrices 𝐼 𝑐 , then contracting 𝐴(1) , 𝐴(2) , . . . , 𝐴(ℓ /2) and 𝐴(ℓ /2+1) , 𝐴(ℓ /2+2) , . . . , 𝐴(ℓ ) , and
finally proceeding to execute the subnetwork h𝑛ℓ /2 , 𝑟, 𝑛ℓ /2 i as in Lemma 5.4.                                    

   The next lemma gives a sharper conclusion using the rectangular matrix multiplication
exponents 𝜔(ℎ) when 𝑟 is assumed to grow polynomially as a function of 𝑛.

Lemma 5.12. For all constants 𝜖 > 0, 𝜌 > 0, and ℓ = 1, 2, . . . it holds that we may evaluate the Kruskal
                                                                                                                  𝜌
                                                                                                      
                                                                                                     dℓ /2e𝜔            +𝜖 
operator of ℓ matrices of shape 𝑛 × 𝑟, where 𝑟 = b𝑛 𝜌 c, by executing a tensor network in 𝑂 𝑛                  dℓ /2e

operations.

Proof. Proceed as in the proof of Lemma 5.11, but implement the subnetwork for h𝑛 dℓ /2e , 𝑟, 𝑛 dℓ /2e i
as in Lemma 5.3. More precisely, take 𝑁 = 𝑛 dℓ /2e , ℎ = 𝜌/dℓ /2e, and observe that h𝑁 , b𝑁 ℎ c, 𝑁i =
h𝑛 dℓ /2e , 𝑟, 𝑛 dℓ /2e i.                                                                           

   To our knowledge, Lemmas 5.11 and 5.12 capture the state of the art for computing the
Kruskal operator.

                        T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                            34
                         T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

5.8   The permanent
The following lemma observes that essentially the fastest known algorithm for the permanent,
namely Ryser’s algorithm [89], can be realized as a tensor network. Here by essentially fastest
known we mean the base of the exponential running time. Sub-exponential speed-ups to Ryser’s
algorithm are known, see, e. g., Björklund [15].

Lemma 5.13. The permanent of an 𝑛 × 𝑛 matrix can be computed by executing a tensor network in
𝑂(2𝑛 𝑛) operations.

Proof. We observe that Ryser’s algorithm [89] for the permanent (3.1), namely the inclusion–
exclusion expression                   Õ             ÖÕ
                              per 𝐴 =      (−1)𝑛−|𝑆|       𝑎 𝑖𝑗
                                               𝑆⊆[𝑛]           𝑖∈[𝑛] 𝑗∈𝑆

is implementable with a star-shaped tensor network consisting of 𝑛 matrices of shape 2𝑛 × 𝑛
joined together by a common mode (of length 2𝑛 ), with the 𝑛 modes of length 𝑛 being the
boundary of the network. Each of the 𝑛 matrices consists of the {0, 1}-valued incidence vectors
of the 2𝑛 subsets 𝑆 ⊆ [𝑛], with one of the matrices arbitrarily selected to contain signed rows
determined by (−1)𝑛−|𝑆| . The input to the network consists of 𝑛 vectors of length 𝑛, namely the
rows of the 𝑛 × 𝑛 input matrix 𝐴. The network is executed by first executing the 𝑛 matrix-vector
multiplications, and then contracting the resulting 𝑛 vectors of length 2𝑛 until the scalar per 𝐴
remains.                                                                                        


6     A lower bound for the cost of a multilinear map
In this section, we prove a general lower bound on the cost of evaluating a multilinear map
using tensor networks, as defined in Section 4.5. The lower bound is expressed in terms of the
socket-width of a multilinear map, which we now proceed to define.
    Let 𝐴 : 𝔽 𝐽(𝐸1 ) × 𝔽 𝐽(𝐸2 ) × · · · × 𝔽 𝐽(𝐸ℓ ) → 𝔽 𝐽(𝐸 ) be an ℓ -linear map. A socket-tree of 𝐴 is a tree 𝒯𝑆
                                                          0


whose ℓ + 1 leaf vertices are the sockets 𝐸1 , 𝐸2 , . . . , 𝐸ℓ , 𝐸0 of 𝐴 and whose internal vertices all
have degree exactly 3. Associate with each edge 𝑒 = {𝑥 𝑅 , 𝑥 𝐶 } of 𝒯𝑆 the two subtrees 𝒯𝑆 (𝑥 𝑅 , 𝑒)
and 𝒯𝑆 (𝑥 𝐶 , 𝑒) obtained by removing 𝑒, where 𝒯𝑆 (𝑥 𝑅 , 𝑒) is the subtree containing 𝑥 𝑅 and 𝒯𝑆 (𝑥 𝐶 , 𝑒)
is the subtree containing 𝑥 𝐶 . Let 𝐿(𝑥 𝑅 , 𝑒) be the set of leaves in 𝒯𝑆 (𝑥 𝑅 , 𝑒) and let 𝐿(𝑥 𝐶 , 𝑒) be the
set of leaves in 𝒯𝑆 (𝑥 𝐶 , 𝑒).
    The sets 𝐿(𝑥 𝑅 , 𝑒) and 𝐿(𝑥 𝐶 , 𝑒) are both nonempty and together partition the set of sockets.
Consider the flattening 𝑀(𝒯𝑆 , 𝑒) of the tensor 𝑇(𝐴) such that the modes in 𝐿(𝑥 𝑅 , 𝑒) index the
rows and the modes in 𝐿(𝑥 𝐶 , 𝑒) index the columns of 𝑀(𝒯𝑆 , 𝑒). The width of 𝒯𝑆 at 𝑒 is the rank of
𝑀(𝒯𝑆 , 𝑒), and the width of 𝒯𝑆 is 𝑤(𝒯𝑆 ) = max𝑒∈𝐸(𝒯𝑆 ) rk(𝑀(𝒯𝑆 , 𝑒)).
    Let us write S (𝐴) for the set of all socket-trees of the multilinear form 𝐴. We define the
socket-width of 𝐴 to be 𝑤(𝐴) = min𝒯𝑆 ∈S (𝐴) 𝑤(𝒯𝑆 ).
    The rest of this section is devoted to proving Theorem 1.4:

Theorem 1.4. For every multilinear map 𝐴, it holds that 𝑐(𝐴) ≥ 𝑤(𝐴).

                       T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                  35
                                P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

   First, we prove that without loss of generality, we may restrict attention to forms rather than
general maps.

Claim 6.1. For any multilinear map 𝐴, it holds that 𝑐(𝐴) ≥ 𝑐(𝐹(𝐴)).

Proof. We observe that 𝐴 and 𝐹(𝐴) satisfy 𝑇(𝐴)   ˆ        ˆ
                                                      = 𝑇(𝐹(𝐴)).    Any network 𝐷 ∈ D(𝐴) can be
modified to a network 𝐷 ∈ D(𝐹(𝐴)) by attaching a tensor 𝑋 0 ∈ 𝔽 𝐽(𝐸 ) to the boundary of 𝐷. Let
                         0                                               0


𝐷 ∈ D(𝐴) be such that 𝑐(𝐷) = 𝑐(𝐴). The minimum-cost execution of 𝐷, followed by contracting
𝑇(𝐷) and 𝑋 0, is an execution of 𝐷 0. Its cost is 𝑐(𝐴), since the cost of contracting of 𝑇(𝐷) and 𝑋 0
is 𝑒∈𝐵(𝐷) |𝐽(𝑒)| and 𝑒∈𝐵(𝐷) |𝐽(𝑒)| ≤ 𝑐(𝐴), because the last step of the minimum-cost execution
  Î                  Î
of 𝐷 contracts a set 𝑊 with all modes 𝑒 ∈ 𝐵(𝐷) incident to 𝑊. Thus, 𝑐(𝐴) ≥ 𝑐(𝐹(𝐴)).                

   Furthermore, 𝑤(𝐴) = 𝑤(𝐹(𝐴)) for every multilinear map 𝐴, since 𝑤(𝐴) only depends on the
tensor 𝑇(𝐴), but not on which of its coordinates (if any) is the output. Thus it suffices to prove
Theorem 1.4 for multilinear forms, which we now proceed to do.

Lemma 6.2. For any multilinear form 𝐹, it holds that 𝑐(𝐹) ≥ 𝑤(𝐹).

Proof. Let 𝐷 ∈ D(𝐹) be such that 𝑐(𝐷) = 𝑐(𝐹). It is a tensor network with empty boundary
and a socket vertex 𝑆 𝑖 ∈ 𝑉(𝐷) for each input socket 𝐸 𝑖 , where 𝑖 = 1, 2, . . . , ℓ . Its tensor is
𝑇(𝐷) = 𝐹(𝑋 (1) , 𝑋 (2) , . . . , 𝑋 (ℓ ) ) where 𝑋 (𝑖) = 𝑇(𝑆 𝑖 ) for 𝑖 = 1, 2, . . . , ℓ .
    By Lemma 4.1, a minimum-cost execution of 𝐷 can be represented by a rooted binary tree
𝒯𝐷 , where the set of leaves of 𝒯𝐷 are 𝑉(𝐷) and each inner vertex represents the vertex obtained
by contracting its two children. Let 𝒯𝑆 be the unique socket-tree of 𝐹 that is obtained as a
topological minor of 𝒯𝐷 . Slightly abusing the notation, we assume that the leaves of 𝒯𝑆 are
the socket vertices 𝑆1 , 𝑆2 , . . . , 𝑆ℓ instead of the sockets 𝐸1 , 𝐸2 , . . . , 𝐸ℓ . To establish the lemma, it
suffices to show that 𝒯𝐷 has cost at least 𝑤(𝒯𝑆 ), since 𝑤(𝒯𝑆 ) ≥ 𝑤(𝐹).
    Let 𝑒 = {𝑥 𝑅 , 𝑥 𝐶 } ∈ 𝐸(𝒯𝑆 ) be an edge of the socket tree 𝒯𝑆 with rk(𝑀(𝒯𝑆 , 𝑒)) = 𝑤(𝒯𝑆 ), and let
𝑒 be an edge of the execution tree 𝒯𝐷 in the subdivision of 𝑒 appearing in 𝒯𝐷 . Without loss of
e
generality we may assume that e            𝑒 is directed from the part of 𝒯𝐷 corresponding to 𝑥 𝑅 towards the
part corresponding to 𝑥 𝐶 (if not, simply switch names of 𝑥 𝑅 and 𝑥 𝐶 ). Define 𝑆𝑅 = 𝐿(𝑥 𝑅 , 𝑒) and
𝑆𝐶 = 𝐿(𝑥 𝐶 , 𝑒). Let 𝑊𝑅 ⊆ 𝑉(𝐷) be the set of non-socket vertices of 𝐷 that appear on the same
side of e𝑒 in 𝒯𝐷 with socket vertices 𝑆𝑅 and let 𝑊𝐶 be the set of remaining non-socket vertices of
𝐷. See Figure 1 for an illustration of all these definitions. Finally, let 𝐷 0 = 𝐷/𝑆𝑅 /𝑆𝐶 /𝑊𝑅 /𝑊𝐶 be
the result of contracting each of these four sets of vertices of 𝐷. For notational convenience, we
identify the four vertices of the new network with the four subsets 𝑆𝑅 , 𝑆𝐶 , 𝑊𝑅 , 𝑊𝐶 .
    Now, the tensor 𝑃 = 𝑇(𝐷 0[𝑊𝑅 ∪ 𝑆𝑅 ]) appears as an intermediate result in the execution 𝒯𝐷 ,8
hence the volume of 𝑃 is a lower bound on the cost of 𝒯𝐷 .
    We group the modes of 𝐷 0 incident on 𝑆𝑅 or 𝑊𝑅 as shown in Figure 2: 𝐸𝑆𝑊 are all modes
in 𝐷 0 incident exactly upon 𝑆𝑅 and 𝑊𝑅 , 𝐸𝑊 𝐶 are all modes incident on 𝑊𝑅 but not on 𝑆𝑅 , 𝐸𝑆𝐶
   8Note that this follows from our choice that the 𝑅 part of 𝐷 is below 𝑒˜ in the execution tree, and that the same is
not necessarily true for the tensor 𝑇(𝐷 0 [𝑊𝐶 ∪ 𝑆𝐶 ]). E.g. if in Figure 1 we had instead taken 𝑒˜ to be the edge further
down which joins 𝑋 (1) and 𝑋 (2) with 𝐴(1) then 𝑇(𝐷 0 [𝑊𝐶 ∪ 𝑆𝐶 ]) would be the contraction of 𝐴(1) , 𝐴(2) , 𝑋 (3) and 𝑋 (4) ,
which is not an intermediate result in the execution of 𝒯𝐷 .


                          T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                          36
                               T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS


                        ee



                                                                                           X (1)               X (3)
   WR                                                                  WC
            (1)                                                  (2)                     SR        xR   e xC       SC
        A                                                    A


                                                                                           X (2)               X (4)
     SR       X   (1)
                         X   (2)
                                       X   (3)
                                                       X   (4)
                                                                  SC
                                                                                (b) The corresponding socket tree 𝒯𝑆 . The
(a) Example of a possible execution tree 𝒯𝐷 . Given the                         exact choice of e𝑒 in 𝒯𝐷 determines which
choice of 𝑒 in the corresponding socket tree 𝒯𝑆 shown                           part of the cut is the 𝑥 𝑅 part, and which is
                                                𝑒.
on the right there are four possible choices of e                               the 𝑥 𝐶 part.

             Figure 1: Illustration of the notation used for the execution and socket trees.

                                                                        ESC
                                                      SR
                                                                        ESW C
                                                                                 SC
                                                 ESW
                                                                                   WC
                                                   WR
                                                                       EW C

Figure 2: Illustration of 𝐷 0. We group the modes of 𝐷 0 based on how they connect 𝑆𝑅 , 𝑆𝐶 , and
the “𝐶 part” of 𝐷 0.


are all modes incident on 𝑆𝑅 but not 𝑊𝑅 , and finally 𝐸𝑆𝑊 𝐶 are all modes incident upon 𝑆𝑅 , 𝑊𝑅 ,
and at least one of 𝑆𝐶 or 𝑊𝐶 . Write 𝐸𝑆 = 𝐸𝑆𝑊 ∪ 𝐸𝑆𝐶 ∪ 𝐸𝑆𝑊 𝐶 for the modes incident on 𝑆𝑅 , and
similarly 𝐸𝐶 = 𝐸𝑊 𝐶 ∪ 𝐸𝑆𝐶 ∪ 𝐸𝑆𝑊 𝐶 for all modes incident upon at least one of 𝑆𝑅 or 𝑊𝑅 , and at
least one of 𝑆𝐶 or 𝑊𝐶 . Note that |𝐽(𝐸𝐶 )| is precisely the volume of 𝑃 which we aim to lower
bound.
    Define a matrix 𝐴 ∈ 𝔽 𝐽(𝐸𝑆 ) × 𝔽 𝐽(𝐸𝐶 ) as follows. We identify its row indices 𝑖 ∈ 𝐽(𝐸𝑆 ) as being
triples 𝑖 = (𝑖 𝑆𝑊 , 𝑖 𝑆𝐶 , 𝑖 𝑆𝑊 𝐶 ) ∈ 𝐽(𝐸𝑆𝑊 ) × 𝐽(𝐸𝑆𝐶 ) × 𝐽(𝐸𝑆𝑊 𝐶 ) and similarly its column indices 𝑗 ∈ 𝐽(𝐸𝐶 )
are triples 𝑗 = (𝑗𝑆𝐶 , 𝑗𝑊 𝐶 , 𝑗𝑆𝑊 𝐶 ) ∈ 𝐽(𝐸𝑆𝐶 ) × 𝐽(𝐸𝑊 𝐶 ) × 𝐽(𝐸𝑆𝑊 𝐶 ). Then the entries of 𝐴 are

                                                       (
                                                           𝑇(𝐷 0[𝑊𝑅 ])𝑖𝑆𝑊 ,𝑗𝑊 𝐶 ,𝑗𝑆𝑊 𝐶 if 𝑖 𝑆𝐶 = 𝑗𝑆𝐶 ∧ 𝑖 𝑆𝑊 𝐶 = 𝑗𝑆𝑊 𝐶 ,
          𝐴(𝑖𝑆𝑊 ,𝑖𝑆𝐶 ,𝑖𝑆𝑊 𝐶 ),(𝑗𝑆𝐶 ,𝑗𝑊 𝐶 ,𝑗𝑆𝑊 𝐶 ) =
                                                           0                           otherwise,


In the case when 𝐸𝑆 = 𝐸𝑆𝑊 (i. e., all modes incident on 𝑆𝑅 connect only to 𝑊𝑅 ), 𝐴 is
simply a flattening of 𝑇(𝐷 0[𝑊𝑅 ]). Recall that 𝑇(𝐷 0[𝑆𝑅 ]) ∈
                                                              Î        𝐽(𝑒) . Then for every
                                                                𝑒∈𝐸𝑆 𝔽


                             T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                              37
                                 P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

𝑗 = (𝑗𝑆𝐶 , 𝑗𝑊 𝐶 , 𝑗𝑆𝑊 𝐶 ) ∈ 𝐽(𝐸𝐶 ), we have
                  Õ                                      Õ
                           𝐴 𝑖,𝑗 𝑇(𝐷 0[𝑆𝑅 ])𝑖 =                     𝐴(𝑖𝑆𝑊 ,𝑗𝑆𝐶 ,𝑗𝑆𝑊 𝐶 ),𝑗 𝑇(𝐷 0[𝑆𝑅 ])𝑖𝑆𝑊 ,𝑗𝑆𝐶 ,𝑗𝑆𝑊 𝐶
                𝑖∈𝐽(𝐸𝑆 )                          𝑖
                                                  𝑆𝑊     ∈𝐽(𝐸𝑆𝑊 )
                                                  Õ
                                             =            𝑇(𝐷 0[𝑊𝑅 ])𝑖𝑆𝑊 ,𝑗𝑊 𝐶 ,𝑗𝑆𝑊 𝐶 𝑇(𝐷 0[𝑆𝑅 ])𝑖𝑆𝑊 ,𝑗𝑆𝐶 ,𝑗𝑆𝑊 𝐶
                                                  𝑖 𝑆𝑊

                                             = 𝑃 𝑗𝑆𝐶 ,𝑗𝑊 𝐶 ,𝑗𝑆𝑊 𝐶 = 𝑃 𝑗

(recall that 𝑃 is the contraction of 𝑇(𝐷 0[𝑊𝑅 ]) and 𝑇(𝐷 0[𝑆𝑅 ])). Viewing 𝑇(𝐷 0[𝑆𝑅 ]) as a row vector
in 𝔽 𝐽(𝐸𝑆 ) we see that 𝑃 is simply the vector-matrix product 𝑃 = 𝑇(𝐷 0[𝑆𝑅 ]) · 𝐴 ∈ 𝔽 𝐽(𝐸𝐶 ) .
    Symmetrically, for the other half of 𝐷 0, we can write 𝑄 = 𝑇(𝐷 0[𝑊𝐶 ∪ 𝑆𝐶 ]) as a matrix-vector
product 𝑄 = 𝐵 · 𝑇(𝐷 0[𝑆𝐶 ]) ∈ 𝔽 𝐽(𝐸𝐶 ) where 𝐵 is a matrix corresponding to 𝑇(𝐷 0[𝑊𝑆 ]) analogously
to how 𝐴 corresponds to 𝑇(𝐷 0[𝑊𝑅 ]).
    Thus we have 𝑇(𝐷) = 𝑇(𝐷 0[𝑆𝑅 ]) · 𝐴 · 𝐵 · 𝑇(𝐷 0[𝑆𝐶 ]). Recall that for each socket vertex 𝑆 𝑖 in
the original network        𝐷, we have 𝑇(𝑆 𝑖 ) = 𝑋 . Denoting 𝑋𝑅 = 𝑇(𝐷 [𝑆 𝑅 ]) and 𝑋𝐶 = 𝑇(𝐷 [𝑆 𝐶 ]),
                                                     (𝑖)                 0                      0

we get 𝑋𝑅 =        𝑆 𝑖 ∈𝑆𝑅 𝑋
                             (𝑖) and 𝑋 =
                                             𝑆 𝑖 ∈𝑆𝐶 𝑋 .9 Hence
                                                        (𝑖)
                Ë                        Ë
                                      𝐶


                                     𝐹(𝑋 (1) , 𝑋 (2) , . . . , 𝑋 (ℓ ) ) = 𝑋𝑅 · 𝐴 · 𝐵 · 𝑋𝐶 .

It follows that 𝐴 · 𝐵 is the flattening of 𝑇(𝐹) to a matrix with rows indexed by the sockets in 𝑆𝑅
and columns indexed by the sockets in 𝑆𝐶 . But this flattening is precisely the matrix 𝑀(𝒯𝑆 , 𝑒),
implying that |𝐽(𝐸𝐶) | ≥ rk(𝑀(𝒯𝑆 , 𝑒)) = 𝑤(𝒯𝑆 ), as desired.                                     


7     Lower bounds for socket-width
In this section we establish lower bounds on socket-width for concrete maps.

7.1   Determinant and permanent
We now prove lower bounds for the socket width of the determinant and permanent. Let us
start with the following trivial observation.

Claim 7.1. For any socket tree 𝒯𝑆 with 𝑛 ≥ 2 leaves, there is an edge 𝑒 = {𝑥 𝑅 , 𝑥 𝐶 } such that
𝑛/3 ≤ |𝐿(𝑥 𝑅 , 𝑒)| < 2𝑛/3.

Proof. Consider all edges 𝑒 = {𝑥 𝑅 , 𝑥 𝐶 } such that |𝐿(𝑥 𝑅 , 𝑒)| ≥ 𝑛/3. At least one such edge certainly
exists since 𝑛 ≥ 2. Indeed, an edge 𝑒 = {𝑥 𝑅 , 𝑥 𝐶 } incident to a leaf 𝑥 𝐶 has |𝐿(𝑥 𝑅 , 𝑒)| = 𝑛 − 1 leaves.
Among these, choose an edge such that 𝒯𝑆 (𝑥 𝑅 , 𝑒) is of minimal size. Assume for contradiction
that |𝐿(𝑥 𝑅 , 𝑒)| ≥ 2𝑛/3. But then one of the two subtrees of 𝑥 𝑅 in 𝒯𝑆 (𝑥 𝑅 , 𝑒) must have at least 𝑛/3
leaves, and since they are smaller than 𝒯𝑆 (𝑥 𝑅 , 𝑒), this contradicts the minimality of 𝒯𝑆 (𝑥 𝑅 , 𝑒). 
                                                                                          ˆ
   9These identities use the fact that 𝐷 is derived from a non-degenerate network 𝐷 ∗ for 𝑇(𝐹). In particular, every
mode in the network 𝐷 is incident upon at least one non-socket vertex, hence all modes incident upon 𝑆𝑅 are
boundary modes in 𝐷 0 [𝑆𝑅 ], and similarly for 𝑆𝐶 .


                           T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                       38
                            T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

Lemma 7.2. For every positive integer  𝑛 ≥ 2, the socket-width of both the determinant and the permanent
                                 𝑛 
of an 𝑛 × 𝑛 matrix is at least d𝑛/3e .

Proof. Let 𝒯𝑆 be a socket tree for the permanent, let 𝑒 = {𝑥 𝑅 , 𝑥 𝐶 } be an arbitrary edge in 𝒯𝑆 , and
let 𝑘 = |𝐿(𝑥 𝑅 , 𝑒)| be the number of leaves in 𝒯𝑆 (𝑥 𝑅 , 𝑒). We now show that rk(𝑀(𝒯𝑆 , 𝑒)) ≥ 𝑛𝑘 .
     Recall that the sockets of 𝒯𝑆 are the 𝑛 rows of the input matrix. Without loss of generality,
number the rows so that the leaves of 𝒯𝑆 (𝑥 𝑅 , 𝑒) are rows 1 to 𝑘, and the leaves of 𝒯𝑆 (𝑥 𝐶 , 𝑒) are
rows 𝑘 + 1 to 𝑛. The rows of 𝑀(𝒯𝑆 , 𝑒) are indexed by a tuple 𝑗𝑅 = (𝑖1 , . . . , 𝑖 𝑘 ) ∈ [𝑛] 𝑘 , and the
columns are indexed by a tuple 𝑗𝐶 = (𝑖 𝑘+1 , . . . , 𝑖 𝑛 ) ∈ [𝑛]𝑛−𝑘 . The entry of 𝑀(𝒯𝑆 , 𝑒) at position
(𝑗𝑅 , 𝑗𝐶 ) is 1 if (𝑖1 , 𝑖2 , . . . , 𝑖 𝑛 ) is a permutation. For any set 𝑈 ∈ [𝑛]          , let 𝑢1 < 𝑢2 < . . . < 𝑢 𝑘 be the
                                                                                         
                                                                                      𝑘
elements of 𝑈 in ascending order and define 𝑗𝑅 (𝑈) = (𝑢1 , . . . , 𝑢 𝑘 ); let 𝑢 𝑘+1 < 𝑢 𝑘+2 < . . . < 𝑢𝑛 be
the elements of [𝑛] \ 𝑈 in ascending order and define 𝑗𝐶 (𝑈) = (𝑢 𝑘+1 , 𝑢 𝑘+2 , . . . , 𝑢𝑛 ).
     For 𝑈1 , 𝑈2 ∈ [𝑛]                           of 𝑀(𝒯𝑆 , 𝑒) at position (𝑗𝑅 (𝑈1 ), 𝑗𝐶 (𝑈2 )) equals 1 if 𝑈1 = 𝑈2 , and 0
                              
                          𝑘
                                , the entry
otherwise. This induces a 𝑘 × 𝑘 identity submatrix of 𝑀(𝒯𝑆 , 𝑒), implying rk(𝑀(𝒯𝑆 , 𝑒)) ≥  𝑛𝑘 .
                                            𝑛     𝑛                                                                       
     By Claim 7.1, 𝒯𝑆 has an edge 𝑒 with 𝑛/3 ≤ 𝑘 < 2𝑛/3. For that edge, rk(𝑀(𝒯𝑆 , 𝑒)) ≥ 𝑛𝑘 ≥
   𝑛 
 d𝑛/3e , which completes the proof for the permanent.
     For the determinant, the only change is that some entries of 𝑀(𝒯𝑆 , 𝑒) become −1 instead of 1,
but this only changes the identified submatrix from an identity matrix to a diagonal matrix and
in particular does not change its rank.                                                                                     
   The preceding proof is similar to a lower bound by Nisan ([76], Lemma 2) used to obtain
lower bounds for algebraic branching programs. But the lower bounds obtained there can be
made as sharp as Ω(2𝑛 ) whereas in our setting, we cannot rule out the possibility of a tensor
network that avoids splitting the 𝑛 variables in two approximately  equal sizeparts. This means
                                                           𝑛               𝑛
that the best we can obtain with our current method is d𝑛/3e   instead of 𝑛/2  .

7.2    𝑃-linear forms
Suppose 𝒯𝑆 is a socket tree for a 𝑃-linear form for a 𝑘-uniform hypergraph 𝑃 on 𝑣 vertices. Recall
that the sockets of this form correspond to the elements of 𝐸(𝑃) ⊆ [𝑣]   . Given an edge 𝑒 ∈ 𝐸(𝒯𝑆 )
                                                                       
                                                                     𝑘
and a vertex 𝑥 ∈ 𝑉(𝒯𝑆 ), we write
                                                             Ø
                                              𝑈(𝑥, 𝑒) =               𝑆 ⊆ [𝑣].
                                                           𝑆∈𝐿(𝑥,𝑒)

Claim 7.3. Let 𝒯𝑆 be a socket tree for the 𝑃-linear form of order 𝑛. Let 𝑒 = {𝑥 𝑅 , 𝑥 𝐶 } ∈ 𝐸(𝒯𝑆 ) and
suppose |𝑈(𝑥 𝐶 , 𝑒) ∩ 𝑈(𝑥 𝑅 , 𝑒)| = 𝑢. Then the socket width of 𝒯𝑆 is at least 𝑤(𝒯𝑆 ) ≥ 𝑛 𝑢 .
Proof. Let 𝑚 = 𝑛 𝑢 . We show that rk(𝑀(𝒯𝑆 , 𝑒)) ≥ 𝑚 by identifying an 𝑚 × 𝑚 identity submatrix
of 𝑀(𝒯𝑆 , 𝑒).
   Define 𝐸𝑅 = { (𝑆, 𝑖) | 𝑆 ∈ 𝐿(𝑥 𝑅 , 𝑒), 𝑖 ∈ 𝑆 } to be the modes contained in the sockets on the 𝑥 𝑅
side of 𝑒, and analogously 𝐸𝐶 = { (𝑆, 𝑖) | 𝑆 ∈ 𝐿(𝑥 𝐶 , 𝑒), 𝑖 ∈ 𝑆 }.
   The rows of 𝑀(𝒯𝑆 , 𝑒) are indexed by 𝐽(𝐸𝑅 ) and the columns are indexed by 𝐽(𝐸𝐶 ). We will
consider an 𝑚 × 𝑚 submatrix of 𝑀(𝒯𝑆 , 𝑒) whose rows and columns are indexed by the elements

                          T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                          39
                             P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

𝜎 ∈ 𝑖∈𝐴 [𝑛 𝑖 ]. More specifically, each row of the submatrix is indexed by 𝑗𝑅 where 𝑗𝑅 | (𝑆,𝑖) = 𝜎| 𝑖
      Î
for 𝑖 ∈ 𝐴, and 𝑗𝑅 | (𝑆,𝑖) = 1 for 𝑖 ∉ 𝐴. Each column is indexed by 𝑗𝐶 where 𝑗𝐶 | (𝑆,𝑖) = 𝜎| 𝑖 for 𝑖 ∈ 𝐴
and 𝑗𝐶 | (𝑆,𝑖) = 1 for 𝑖 ∉ 𝐴.
      The value of the submatrix at position (𝑗𝑅 , 𝑗𝐶 ) is 1 if there exists 𝜎 ∈ 𝑖∈𝐴 [𝑛 𝑖 ] such that
                                                                                      Î
𝑗𝑅 | (𝑆,𝑖) = 𝜎| 𝑖 and 𝑗𝐶 | (𝑆,𝑖) = 𝜎| 𝑖 for all 𝑖 ∈ 𝐴 and 0 otherwise. We obtain an 𝑚 × 𝑚 identity matrix
as desired.                                                                                            
      From this claim, the branchwidth-based lower bound is immediate.
Lemma 7.4. For any hypergraph 𝑃, the socket width of the 𝑃-linear form of order 𝑛 is at least 𝑛 bw(𝑃) .
Proof. Any socket tree for a 𝑃-linear form can be directly viewed as a branch decomposition
of 𝑃. Thus, by definition every socket tree for the form has an edge 𝑒 = {𝑥 𝑅 , 𝑥 𝐶 } where
|𝑈(𝑥 𝐶 , 𝑒) ∩ 𝑈(𝑥 𝑅 , 𝑒)| ≥ bw(𝑃), and the lemma now follows from Claim 7.3.              
   For counting homomorphisms of hypergraph cliques, that is, the 𝑃-linear form for the
complete 𝑘-uniform hypergraph 𝑃 = [𝑣]    on 𝑣 vertices, we need the following simple lower
                                       
                                     𝑘
bound on the branchwidth of complete hypergraphs (which is most likely known, but we are
not aware of a reference).
Lemma 7.5. For 𝑣 > 𝑘 ≥ 3, the complete 𝑘-uniform hypergraph on 𝑣 vertices has branchwidth 𝑣.
Proof. Let 𝑇 be an arbitrary branch decomposition of the hypergraph. We claim that there
always exists an edge 𝑒 ∗ = {𝑥 𝑅∗ , 𝑥 𝐶∗ } such that 𝑈(𝑥 𝑅∗ , 𝑒 ∗ ) = 𝑈(𝑥 𝐶∗ , 𝑒 ∗ ) = [𝑣], implying that the
width of the decomposition is at least 𝑣. Towards establishing this claim, we first observe that
for every edge 𝑒 = {𝑥 𝑅 , 𝑥 𝐶 }, at least one of 𝑈(𝑥 𝑅 , 𝑒) and 𝑈(𝑥 𝐶 , 𝑒) is equal to [𝑣]. Indeed, assume
towards contradiction that there is an edge 𝑒 = {𝑥 𝑅 , 𝑥 𝐶 } with 𝑖1 ∉ 𝑈(𝑥 𝑅 , 𝑒) and 𝑖2 ∉ 𝑈(𝑥 𝐶 , 𝑒) for
some not necessarily distinct 𝑖1 , 𝑖2 ∈ [𝑣]. Since 𝑘 ≥ 2, there exists an edge 𝑆 ⊇ {𝑖1 , 𝑖2 }, and that
edge must appear in either 𝒯𝑆 (𝑥 𝑅 , 𝑒) or 𝒯𝑆 (𝑥 𝐶 , 𝑒), violating the assumption.
    Now suppose for contradiction that the claim about the existence of 𝑒 ∗ does not hold. Direct
each edge towards the subtree which covers [𝑣] (which, by the observation above, always exists).
For example, if 𝑈(𝑥 𝑅 , 𝑒) = [𝑣] and 𝑈(𝑥 𝐶 , 𝑒) ≠ [𝑣] then 𝑒 is directed towards 𝑥 𝑅 .
    Since a tree is acyclic, there must be some vertex 𝑥 ∗ such that all edges incident to 𝑥 ∗ are
directed towards 𝑥 ∗ . The vertex 𝑥 ∗ cannot be a leaf, because the subtree consisting of a leaf
contains a single edge, which covers 𝑘 < 𝑣 elements. Thus 𝑥 ∗ has degree 3. Let the three edges
incident to 𝑥 ∗ be 𝑒1 = {𝑥 ∗ , 𝑦1 }, 𝑒2 = {𝑥 ∗ , 𝑦2 } and 𝑒3 = {𝑥 ∗ , 𝑦3 }. Since all the edges are directed
towards 𝑥 ∗ there are 𝑖1 , 𝑖2 , 𝑖3 ∈ [𝑣] such that 𝑖1 ∉ 𝑆(𝑦1 , 𝑒1 ), 𝑖2 ∉ 𝑆(𝑦2 , 𝑒2 ) and 𝑖3 ∉ 𝑆(𝑦3 , 𝑒3 ). Since
𝑘 ≥ 3 there exists an edge 𝑆 ⊇ {𝑖1 , 𝑖2 , 𝑖3 }. But now the edge 𝑆 cannot appear in any of the
subtrees rooted at 𝑦1 , 𝑦2 , or 𝑦3 . Since these three subtrees together cover all leaves, this yields
the desired contradiction and we conclude that there must exist an edge 𝑒 ∗ = {𝑥 𝑅∗ , 𝑥 𝐶∗ } such that
𝑈(𝑥 𝑅∗ , 𝑒 ∗ ) = 𝑈(𝑥 𝐶∗ , 𝑒 ∗ ) = [𝑣] and therefore the branchwidth is 𝑣.                                       

7.3    Kruskal operator
We say that an ℓ -linear Kruskal operator is 𝑛-uniform if the lengths of the modes satisfy
𝑛 = 𝑛1 = 𝑛2 = · · · = 𝑛ℓ .

                       T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                  40
                           T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

Lemma 7.6. For positive integers 𝑛 and 𝑟, the socket-width of an 𝑛-uniform ℓ -linear Kruskal operator is
at least max 𝑛ℓ , 𝑛 dℓ /2e 𝑟 .

Proof. Let 𝒯𝑆 be an arbitrary socket tree for the operator. One of the leaves is the output socket
representing the tensor 𝐵 of shape 𝑛 × 𝑛 × · · · × 𝑛 (ℓ times) that is obtained by applying the
Kruskal operator to the input matrices 𝐴(𝑖) of shape 𝑛 × 𝑟, where 1 ≤ 𝑖 ≤ ℓ .
      Consider the neighbor 𝑥 of the output socket. It has three neighbors: the output socket leaf,
and two other vertices 𝑥 𝑅 and 𝑥 𝐶 . Either the subtree rooted at 𝑥 𝑅 or the one rooted at 𝑥 𝐶 must
contain 𝑢 ≥ dℓ /2e input sockets (since together they contain all ℓ input sockets). Let 𝑒 be the
edge leading to that subtree.
      We claim that rk(𝑀(𝒯𝑆 , 𝑒)) ≥ 𝑛 𝑢 𝑟. Suppose without loss of generality that the input sockets
𝐴 , . . . , 𝐴(𝑢) are in the subtree rooted at 𝑥 𝑅 , and that the input sockets 𝐴(𝑢+1) , . . . , 𝐴(ℓ ) are in
   (1)

the subtree rooted at 𝑥 together with the output socket.
      Each row of 𝑀(𝒯𝑆 , 𝑒) is indexed by sequences of 2𝑢 indices (𝑖1 , 𝑗1 , . . . , 𝑖 𝑢 , 𝑗𝑢 ) where each
𝑖 𝑠 ∈ [𝑛] and each 𝑗𝑡 ∈ [𝑟]. Each column is indexed by a sequence of ℓ + 2(𝑛 − 𝑢) indices (𝑖10 , . . . , 𝑖ℓ0 ;
𝑖 𝑢+1 , 𝑗𝑢+1 , . . . , 𝑖ℓ , 𝑗ℓ ) where each 𝑖 𝑠 , 𝑖 0𝑠 0 ∈ [𝑛] and each 𝑗𝑡 ∈ [𝑟]. An entry of 𝑀(𝒯𝑆 , 𝑒) is 1 if and
only if 𝑖10 = 𝑖1 , 𝑖20 = 𝑖2 , . . . , 𝑖ℓ0 = 𝑖ℓ , and 𝑗1 = 𝑗2 = . . . = 𝑗ℓ .
      For any 𝑖1 , . . . , 𝑖 𝑢 ∈ [𝑛] and 𝑗 ∈ [𝑟], consider the row index (𝑖1 , 𝑗, 𝑖 2 , 𝑗, . . . , 𝑖 𝑢 , 𝑗) and the
column index (𝑖1 , 𝑖2 , . . . , 𝑖 𝑢 , 1, 1, . . . , 1; 1, 𝑗, 1, 𝑗, . . . , 1, 𝑗). The 𝑛 𝑢 𝑟 × 𝑛 𝑢 𝑟 submatrix of 𝑀(𝒯𝑆 , 𝑒)
induced by these sets of row and column indices is the identity matrix, thus rk(𝑀(𝒯𝑆 , 𝑒)) ≥ 𝑛 𝑢 𝑟
as desired.
      For the 𝑛ℓ lower bound, we instead consider the edge 𝑒 0 joining 𝑥 with the output socket.
Now the rows are indexed by (𝑖1 , 𝑗1 , 𝑖2 , 𝑗2 , . . . , 𝑖ℓ , 𝑗ℓ ) and the columns by (𝑖10 , . . . , 𝑖ℓ0 ). The 𝑛ℓ × 𝑛ℓ
identity submatrix is obtained by taking for every 𝑖1 , . . . , 𝑖ℓ ∈ [𝑛], the row (𝑖1 , 1, 𝑖2 , 1, . . . , 𝑖ℓ , 1)
and the column (𝑖1 , . . . , 𝑖ℓ ).                                                                                      


A     Background on tensor networks
This appendix summarizes some results on tensor network contractions, minimum-cost exe-
cutions, and explains the connection between the Holant framework and tensor networks. In
the language of graphical models, computing the value of a tensor network is known as the
sum-product inference task and the results in Appendices A.1 and A.2 can be found in [62, Part
II].

A.1     Invariance property
In the following lemma, we will show that the tensor of a network is equal to the tensor of any
network that is obtained by a contraction from the original network. In particular, this implies
that any execution gives the same tensor.

Lemma A.1 (Invariance). Let 𝐷 be a tensor network. For all nonempty 𝑊 ⊆ 𝑉(𝐷) it holds that
𝑇(𝐷) = 𝑇(𝐷/𝑊).

                         T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                        41
                            P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

Proof. Let 𝑊 ⊆ 𝑉(𝐷) be nonempty, let 𝑤 be the vertex that replaces 𝐷[𝑊] in 𝐷/𝑊 and let
𝑖 ∈ 𝐽(𝐵(𝐷/𝑊)) = 𝐽(𝐵(𝐷)). From (4.1), (4.2), and (4.3), it follows that
                                   Õ                      Ö
            𝑇(𝐷/𝑊)𝑖 =                                                 𝑇(𝑣)𝑖𝑗
                           𝑗∈𝐽(𝐸(𝐷/𝑊)\𝐵(𝐷/𝑊)) 𝑣∈(𝑉(𝐷)\𝑊)∪{𝑤}
                                   Õ                              Ö
                       =                        𝑇(𝑤)𝑖𝑗                 𝑇(𝑣)𝑖𝑗
                           𝑗∈𝐽(𝐸(𝐷/𝑊)\𝐵(𝐷/𝑊))              𝑣∈𝑉(𝐷)\𝑊
                                 Õ                                Ö
                       =                      𝑇(𝐷[𝑊])𝑖𝑗                  𝑇(𝑣)𝑖𝑗
                           𝑗∈𝐽(𝐸(𝐷/𝑊)\𝐵(𝐷))                  𝑣∈𝑉(𝐷)\𝑊
                                 Õ                        Õ              Ö                    Ö
                       =                                                        𝑇(𝑤)𝑖𝑗 𝑗0              𝑇(𝑣)𝑖𝑗
                           𝑗∈𝐽(𝐸(𝐷/𝑊)\𝐵(𝐷))   𝑗 0 ∈𝐽(𝐸(𝐷[𝑊])\𝐵(𝐷[𝑊]))    𝑤∈𝑊                𝑣∈𝑉(𝐷)\𝑊
                                 Õ                        Õ               Ö
                       =                                                          𝑇(𝑣)𝑖𝑗 𝑗0
                           𝑗∈𝐽(𝐸(𝐷/𝑊)\𝐵(𝐷))   𝑗 0 ∈𝐽(𝐸(𝐷[𝑊])\𝐵(𝐷[𝑊]))    𝑣∈𝑉(𝐷)
                                Õ             Ö
                       =                               𝑇(𝑣)𝑖𝑗00
                           𝑗 00 ∈𝐽(𝐸(𝐷)\𝐵(𝐷)) 𝑣∈𝑉(𝐷)

                       = 𝑇(𝐷)𝑖 .

                                                                                                                

A.2    The structure of a minimum-cost execution
In this section, we analyze the structure of a minimum-cost execution. In particular, we prove
Lemma 4.1, which states that each contracted set has size at most two, and also show that one
can always contract adjacent vertices in a network.

Lemma A.2. Let 𝐷 be a tensor network. There exists a minimum-cost execution of 𝐷 such that each
contracted set has size at most two. Furthermore, if 𝐷 is loopless, we can assume that each contracted set
has size exactly two.

Proof. If 𝐷 contains loops, we may assume that a minimum-cost execution first removes all
the loops by contracting singleton vertices incident to loops. Indeed, the cost of contracting a
singleton vertex is the volume of the tensor associated to it. Since the result of an execution is
a single tensor, then every vertex has to be contained in a contracted set of an execution and
none of the hyperedges incident to a vertex can be removed before the vertex is contained in a
contracted set. Hence, the volume of any tensor in the tensor network is a lower bound for the
cost of any execution of 𝐷 and we may contract singleton vertices.
    So let us assume that 𝐷 is loopless. Suppose that a minimum-cost execution of 𝐷 contains
a contraction by a set 𝑊 = {𝑤 1 , 𝑤2 , . . . , 𝑤 𝑠 } of size at least 𝑠 ≥ 3, and let 𝑤 be the new vertex
after this contraction. Then we can replace the contraction by 𝑊 with two contractions by
𝑊 0 = {𝑤1 , 𝑤2 , . . . , 𝑤 𝑠−1 } and 𝑊 00 = {𝑤, 𝑤 𝑠 } without increasing the cost of the execution. The
cost of contracting 𝑊 0 is less than or equal to the cost of contracting 𝑊, because every hyperedge

                      T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                     42
                          T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

incident to 𝑊 0 is also incident to 𝑊. The cost of contracting 𝑊 00 is less than or equal to the
cost of contracting 𝑊, because the set of hyperedges incident to 𝑤 is contained in the set of
hyperedges incident to 𝑊 0. We repeat this procedure until all contracted sets in the execution
have size at most two.                                                                         

    Two tensors in a tensor network are called adjacent if they are incident to a common mode.

Lemma A.3 (Execution by contracting adjacent tensors). Let 𝐷 be a loopless tensor network that is
connected as a hypergraph. Then, there exists a minimum-cost execution of 𝐷 such that each contracted
set has size two and consists of adjacent vertices.

Proof. Consider a minimum-cost execution of 𝐷 such that each contracted set has size two; such
an execution exists by Lemma A.2. If all contracted sets in the execution consist of adjacent
vertices, we are done. Otherwise, consider a contraction of two vertices, 𝑢 and 𝑣, such that 𝑢
and 𝑣 are not adjacent when they are contracted to yield the vertex 𝑢𝑣. Consider the steps of
the execution after this contraction step. Let us call such a step a relevant step if it involves a
descendant of the vertex 𝑢𝑣. Let us modify the execution as follows. First, delete the contraction
of 𝑢 and 𝑣 from the execution. Then, for each relevant step in execution order, replace the
descendant of the vertex 𝑢𝑣 with the current descendant of either 𝑢 or 𝑣 so that the contraction
becomes a contraction of adjacent vertices whenever possible. If the descendants of 𝑢 and 𝑣
become adjacent after a relevant step, contract the descendants of 𝑢 and 𝑣, and then continue
the execution without further changes. Since 𝐷 is connected, the descendants of 𝑢 and 𝑣 must
eventually become adjacent.
     We will show that this modification of the execution gives again an execution and that it has
cost no larger than the original minimum-cost execution. In the modification of the execution,
let the contraction sets containing 𝑢 or a descendant of 𝑢 and not containing a descendant of 𝑣
be {𝑢, 𝑤1 }, {𝑢𝑤 1 , 𝑤2 }, . . ., {𝑢𝑤1 . . . 𝑤 𝑠−1 , 𝑤 𝑠 }; similarly, let the contraction sets containing 𝑣 or a
descendant of 𝑣 and not containing a descendant of 𝑢 be {𝑣, 𝑧1 }, {𝑣𝑧 1 , 𝑧2 }, . . ., {𝑣𝑧1 . . . 𝑧 𝑡−1 , 𝑧 𝑡 }.
Here 𝑤 1 , . . . , 𝑤 𝑠 , 𝑧1 , . . . , 𝑧 𝑡 can be vertices of 𝐷 or vertices obtained after contraction steps.
Contracting {𝑢𝑤1 . . . 𝑤 𝑠 , 𝑣𝑧1 . . . 𝑧 𝑡 } gives the vertex 𝑢𝑤 1 . . . 𝑤 𝑠 𝑣𝑧1 . . . 𝑧 𝑡 , which appears also in
the original execution. Hence, after a certain number of steps the tensor networks in the
original execution and in the modification are the same (after making necessary non-relevant
contractions) and the modification of the original execution is also an execution.
     First we consider the cost of contracting {𝑢𝑤 1 . . . 𝑤 𝑖−1 , 𝑤 𝑖 } in the modified execution
with 1 ≤ 𝑖 ≤ 𝑠. There is a 𝑗 satisfying 1 ≤ 𝑗 ≤ 𝑡 so that the original execution contracts
{𝑢𝑤 1 . . . 𝑤 𝑖−1 𝑣𝑧1 . . . 𝑧 𝑗 , 𝑤 𝑖 }. Then the cost of contracting {𝑢𝑤 1 . . . 𝑤 𝑖−1 , 𝑤 𝑖 } is at most the cost
of contracting {𝑢𝑤 1 . . . 𝑤 𝑖−1 𝑣𝑧 1 . . . 𝑧 𝑗 , 𝑤 𝑖 }, because every hyperedge incident to the vertex
𝑢𝑤 1 . . . 𝑤 𝑖−1 in the modified execution has to be incident to 𝑢𝑤1 . . . 𝑤 𝑖−1 𝑣𝑧1 . . . 𝑧 𝑗 in the original
execution. Otherwise, there would be a hyperedge in 𝐷 that is incident only to vertices in
{𝑢, 𝑤1 , . . . , 𝑤 𝑖−1 , 𝑣, 𝑧 1 , . . . , 𝑧 𝑗 }, and in particular both to vertices in {𝑢, 𝑤1 , . . . , 𝑤 𝑖−1 } and in
{𝑣, 𝑧1 , . . . , 𝑧 𝑗 }. This is impossible, because then 𝑢𝑤1 . . . 𝑤 𝑖−1 and 𝑣𝑧1 . . . 𝑧 𝑗 would be adjacent
vertices in the modified execution, but by assumption 𝑢𝑤 1 . . . 𝑤 𝑠 and 𝑣𝑧1 . . . 𝑧 𝑡 are the first
adjacent descendants of 𝑢 and 𝑣 in the modified execution. Similarly, we can show that the cost

                         T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                    43
                                P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

of contracting {𝑣𝑧1 . . . 𝑧 𝑗−1 , 𝑧 𝑗 } in the modified execution for 1 ≤ 𝑗 ≤ 𝑡 is less than the cost of
contracting a set in the original execution.
    Second we consider the cost of contracting {𝑢𝑤 1 . . . 𝑤 𝑠 , 𝑣𝑧1 . . . 𝑧 𝑡 } in the modified execu-
tion. Without loss of generality, we can assume that in the original execution the vertex
𝑢𝑤 1 . . . 𝑤 𝑠 𝑣𝑧1 . . . 𝑧 𝑡 is obtained from contracting {𝑢𝑤 1 . . . 𝑤 𝑠 𝑣𝑧1 . . . 𝑧 𝑡−1 , 𝑧 𝑡 }. We will show that
the cost of contracting {𝑢𝑤1 . . . 𝑤 𝑠 , 𝑣𝑧1 . . . 𝑧 𝑡 } in the modified execution is less than or equal
to the cost of contracting {𝑢𝑤1 . . . 𝑤 𝑠 𝑣𝑧1 . . . 𝑧 𝑡−1 , 𝑧 𝑡 } in the original execution. Indeed, every
hyperedge incident to 𝑢𝑤 1 . . . 𝑤 𝑠 in the modified execution is incident to 𝑢𝑤 1 . . . 𝑤 𝑠 𝑣𝑧1 . . . 𝑧 𝑡−1 in
the original execution, because otherwise there would be a hyperedge in 𝐷 that is incident only
to vertices in {𝑢, 𝑤1 , . . . , 𝑤 𝑠 , 𝑣, 𝑧 1 , . . . , 𝑧 𝑡−1 }, and in particular both to vertices in {𝑢, 𝑤1 , . . . , 𝑤 𝑠 }
and {𝑣, 𝑧1 , . . . , 𝑧 𝑡−1 }. However, by assumption 𝑢𝑤 1 . . . 𝑤 𝑠 and 𝑣𝑧1 . . . 𝑧 𝑡 are the first adjacent
descendants of 𝑢 and 𝑣 in the modified execution. Similarly, every hyperedge incident to
𝑣𝑧1 . . . 𝑧 𝑡 in the modified execution is incident to 𝑢𝑤1 . . . 𝑤 𝑠 𝑣𝑧 1 . . . 𝑧 𝑡−1 or 𝑧 𝑡 in the original
execution, because otherwise there would be a hyperedge in 𝐷 that is incident only to ver-
tices in {𝑢, 𝑤1 , . . . , 𝑤 𝑠 , 𝑣, 𝑧 1 , . . . , 𝑧 𝑡−1 }, and in particular both to vertices in {𝑢, 𝑤1 , . . . , 𝑤 𝑠 } and
{𝑣, 𝑧1 , . . . , 𝑧 𝑡−1 }. This contradicts that 𝑢𝑤 1 . . . 𝑤 𝑠 and 𝑣𝑧1 . . . 𝑧 𝑡 are the first adjacent descendants
of 𝑢 and 𝑣 in the modified execution.
    The modified execution consists of at least one less contraction of nonadjacent vertices.
Repeating this procedure until there are no contractions of nonadjacent vertices completes the
lemma.                                                                                                                    

A.3     Holant framework
In this section, we elaborate on the connection between tensor networks and the Holant
framework. We follow the exposition in [25]. The main object of the Holant framework is a
signature grid. A signature grid is a triple Ω = (𝐺, ℱ , 𝜋), where 𝐺 = (𝑉 , 𝐸) is an undirected
graph, ℱ is a set of functions and 𝜋 assigns to each vertex 𝑣 ∈ 𝑉 a multivariate function 𝑓𝑣 ∈ ℱ
with input variables corresponding to the edges incident to 𝑣 and taking values in a finite set 𝑆.
The function 𝑓𝑣 is called the signature of 𝑣. The edges of 𝐺 are considered to be variables that
take values in the set 𝑆. An assignment 𝜎 : 𝐸 → 𝑆 gives an evaluation 𝑣∈𝑉 𝑓𝑣 (𝜎|𝐸(𝑣)), where
                                                                           Î
𝐸(𝑣) is the set of edges incident to a vertex 𝑣. The Holant of a signature grid is the sum of these
evaluations over all the assignments 𝜎:
                                                         Õ Ö
                                         HolantΩ =                  𝑓𝑣 (𝜎|𝐸(𝑣)).
                                                        𝜎:𝐸→𝑆 𝑣∈𝑉

A generalization of a signature grid is an ℱ -gate that additionally allows dangling edges for
input and output variables.
   Signature grids are special instances of tensor networks with the restriction that all hyperedges
are of size two and the index sets associated to different edges are equal. Signature grids
correspond to tensor networks with empty boundary; ℱ -gates do not have the restriction. A
signature 𝑓𝑣 associated to a vertex 𝑣 in a signature grid corresponds precisely to a tensor 𝑇(𝑣)
associated to a vertex 𝑣 in a tensor network. The Holant of a signature grid is the same as the
value of the corresponding tensor network.

                          T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                                         44
                     T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

Acknowledgements. We are grateful to Andreas Björklund for highlighting branchwidth to
us as a natural parameter to generalize from clique-counting to counting homomorphisms,
and to our anonymous reviewers for their extensive reading of the paper and many helpful
suggestions.


References
  [1] Amir Abboud, Arturs Backurs, Karl Bringmann, and Marvin Künnemann: Fine-
      grained complexity of analyzing compressed data: Quantifying improvements over
      decompress-and-solve. In Proc. 58th FOCS, pp. 192–203. IEEE Comp. Soc., 2017.
      [doi:10.1109/FOCS.2017.26, arXiv:1803.00796] 7
  [2] Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams: If the current clique
      algorithms are optimal, so is Valiant’s parser. SIAM J. Comput., 47(6):2527–2555, 2018.
      Preliminary version in FOCS’15. [doi:10.1137/16M1061771, arXiv:1504.01431] 3, 7
  [3] Michael Alekhnovich, Allan Borodin, Joshua Buresh-Oppenheim, Russell Impagliazzo,
      Avner Magen, and Toniann Pitassi: Toward a model for backtracking and dynamic
      programming. Comput. Complexity, 20(4):679–740, 2011. Preliminary version in CCC’05.
      [doi:10.1007/s00037-011-0028-y, ECCC:TR09-038] 2
  [4] Josh Alman and Virginia Vassilevska Williams: A refined laser method and faster matrix
      multiplication. In Proc. 32nd Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’21), pp.
      522–539. SIAM, 2021. [doi:10.1137/1.9781611976465.32, arXiv:2010.05846] 5, 11
  [5] Noga Alon and Shai Gutner: Balanced families of perfect hash functions and their
      applications. ACM Trans. Algorithms, 6(3):54:1–12, 2010. Preliminary version in ICALP’07.
      [doi:10.1145/1798596.1798607, arXiv:0805.4300] 7
  [6] Noga Alon, Raphael Yuster, and Uri Zwick: Finding and counting given length cycles.
      Algorithmica, 17(3):209–223, 1997. Preliminary version in ESA’94. [doi:10.1007/BF02523189]
      7
  [7] Itai Arad and Zeph Landau: Quantum computation and the evaluation of tensor networks.
      SIAM J. Comput., 39(7):3089–3121, 2010. [doi:10.1137/080739379] 3, 11
  [8] Sanjeev Arora and Boaz Barak: Computational Complexity: A Modern Approach. Cambridge
      Univ. Press, 2009. [doi:10.1017/CBO9780511804090] 2
  [9] Per Austrin, Petteri Kaski, and Kaie Kubjas: Tensor network complexity of multilinear
      maps. In Proc. 9th Innovations in Theoret. Comp. Sci. Conf. (ITCS’18), pp. 7:1–21. Schloss
      Dagstuhl–Leibniz-Zentrum fuer Informatik, 2018. [doi:10.4230/LIPIcs.ITCS.2019.7] 1
 [10] Fahiem Bacchus, Shannon Dalmao, and Toniann Pitassi: Algorithms and complexity
      results for #SAT and Bayesian inference. In Proc. 44th FOCS, pp. 340–351. IEEE Comp.
      Soc., 2003. [doi:10.1109/SFCS.2003.1238208] 11

                    T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                    45
                        P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

[11] Martin Beaudry and Markus Holzer: The complexity of tensor circuit evaluation. Comput.
     Complexity, 16(1):60–111, 2007. Preliminary version in MFCS’01. [doi:10.1007/s00037-007-
     0222-0] 3

[12] Austin R. Benson and Grey Ballard: A framework for practical parallel fast matrix
     multiplication. In Proc. 20th SIGPLAN Symp. Principles and Practice of Parallel Programming
     (PPoPP’15), pp. 42–53. ACM Press, 2015. [doi:10.1145/2688500.2688513, arXiv:1409.2908]
     11

[13] Stuart J. Berkowitz: On computing the determinant in small parallel time using a small
     number of processors. Inform. Process. Lett., 18(3):147–150, 1984. [doi:10.1016/0020-
     0190(84)90018-8] 7

[14] Jacob D. Biamonte, Jason Morton, and Jacob Turner: Tensor network contractions for #SAT.
     J. Stat. Phys., 160(5):1389–1404, 2015. [doi:10.1007/s10955-015-1276-z, arXiv:1405.7375] 11

[15] Andreas Björklund: Below all subsets for some permutational counting problems. In
     Proc. 15th Scand. Symp. Algorithm Theory (SWAT’16), pp. 17:1–11. Schloss Dagstuhl–Leibniz-
     Zentrum fuer Informatik, 2016. [doi:10.4230/LIPIcs.SWAT.2016.17] 35

[16] Andreas Björklund, Thore Husfeldt, Petteri Kaski, and Mikko Koivisto: Fourier meets
     Möbius: fast subset convolution. In Proc. 39th STOC, pp. 67–74. ACM Press, 2007.
     [doi:10.1145/1250790.1250801, arXiv:cs/0611101] 6, 33

[17] Andreas Björklund, Thore Husfeldt, Petteri Kaski, and Mikko Koivisto: Computing the
     Tutte polynomial in vertex-exponential time. In Proc. 49th FOCS, pp. 677–686. IEEE Comp.
     Soc., 2008. [doi:10.1109/FOCS.2008.40, arXiv:0711.2585] 3, 6, 33

[18] Andreas Björklund, Thore Husfeldt, Petteri Kaski, and Mikko Koivisto: Counting paths
     and packings in halves. In Proc. 17th Eur. Symp. Algorithms (ESA’09), pp. 578–586. Springer,
     2009. [doi:10.1007/978-3-642-04128-0_52, arXiv:0904.3093] 7

[19] Andreas Björklund, Thore Husfeldt, and Mikko Koivisto: Set partitioning via inclusion-
     exclusion. SIAM J. Comput., 39(2):546–563, 2009. [doi:10.1137/070683933] 3, 6, 33

[20] Andreas Björklund, Petteri Kaski, and Łukasz Kowalik: Counting thin subgraphs via
     packings faster than meet-in-the-middle time. ACM Trans. Algorithms, 13(4):48:1–26, 2017.
     Preliminary version in SODA’14. [doi:10.1145/3125500, arXiv:1306.4111] 7

[21] Hans L. Bodlaender, Erik Jan van Leeuwen, Johan M. M. van Rooij, and Martin Vatshelle:
     Faster algorithms on branch and clique decompositions. In Proc. Internat. Symp. Math.
     Foundations of Comp. Sci. (MFCS’10), pp. 174–185. Springer, 2010. [doi:10.1007/978-3-642-
     15155-2_17] 6

[22] James R. Bunch and John E. Hopcroft: Triangular factorization and inversion by fast
     matrix multiplication. Math. Comput., 28(125):231–236, 1974. [doi:10.1090/S0025-5718-
     1974-0331751-8] 7

                   T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                      46
                    T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

[23] Peter Bürgisser, Michael Clausen, and M. Amin Shokrollahi: Algebraic Complexity Theory.
     Springer, 1997. [doi:10.1007/978-3-662-03338-8] 11, 24

[24] Jin-Yi Cai, Heng Guo, and Tyson Williams: A complete dichotomy rises from the capture
     of vanishing signatures. SIAM J. Comput., 45(5):1671–1728, 2016. Preliminary version in
     STOC’13. [doi:10.1137/15M1049798, arXiv:1204.6445] 3, 11

[25] Jin-yi Cai, Pinyan Lu, and Mingji Xia: Computational complexity of Holant problems.
     SIAM J. Comput., 40(4):1101–1132, 2011. [doi:10.1137/100814585] 3, 11, 44

[26] Florent Capelli, Arnaud Durand, and Stefan Mengel: The arithmetic complexity of
     tensor contraction. Theory Computing Sys., 58(4):506–527, 2016. Preliminary version in
     STACS’13. [doi:10.1007/s00224-015-9630-8, arXiv:1209.4865] 3

[27] Arthur Cayley: On the theory of the analytical forms called trees. The London, Ed-
     inburgh, and Dublin Philosophical Magazine and Journal of Science, 13(85):172–176, 1857.
     [doi:10.1080/14786445708642275] 3, 10

[28] Arthur Cayley: On the analytical forms called trees, part II. The London, Edin-
     burgh, and Dublin Philosophical Magazine and Journal of Science, 18(121):374–378, 1859.
     [doi:10.1080/14786445908642782] 3, 10

[29] Andrzej Cichocki, Namgil Lee, Ivan V. Oseledets, Anh Huy Phan, Qibin Zhao, and
     Danilo P. Mandic: Tensor networks for dimensionality reduction and large-scale optimiza-
     tion: Part 1 Low-rank tensor decompositions. Found. Trends Mach. Learning, 9(4–5):249–429,
     2016. [doi:10.1561/2200000059] 11

[30] Andrzej Cichocki, Anh Huy Phan, Qibin Zhao, Namgil Lee, Ivan V. Oseledets, Masashi
     Sugiyama, and Danilo P. Mandic: Tensor networks for dimensionality reduction and
     large-scale optimization: Part 2 Applications and future perspectives. Found. Trends Mach.
     Learning, 9(6):431–673, 2017. [doi:10.1561/2200000067, arXiv:1708.09165] 11

[31] Alfred Clebsch: Ueber symbolische Darstellung algebraischer Formen. J. reine angew.
     Math., 59:1–62, 1861. [doi:10.1515/crll.1861.59.1] 3, 10

[32] William K. Clifford: Extract of a letter to Mr. Sylvester from Prof. Clifford of University
     College, London. Amer. J. Math., 1(2):126–128, 1878. [doi:10.2307/2369303] 3, 10

[33] Henry Cohn, Robert D. Kleinberg, Balázs Szegedy, and Christopher Umans: Group-
     theoretic algorithms for matrix multiplication. In Proc. 46th FOCS, pp. 379–388. IEEE
     Comp. Soc., 2005. [doi:10.1109/SFCS.2005.39] 11

[34] Henry Cohn and Christopher Umans: Fast matrix multiplication using coherent con-
     figurations. In Proc. 24th Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’13), pp.
     1074–1087. SIAM, 2013. [doi:10.1137/1.9781611973105.77, arXiv:1207.6528] 11

                   T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                     47
                        P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

[35] James W. Cooley and John W. Tukey: An algorithm for the machine calculation of complex
     Fourier series. Math. Comput., 19(90):297–301, 1965. [doi:10.1090/S0025-5718-1965-0178586-
     1] 2, 5, 30

[36] László Csánky: Fast parallel matrix inversion algorithms. SIAM J. Comput., 5(4):618–623,
     1976. Preliminary version in FOCS’75. [doi:10.1137/0205040] 7

[37] Radu Curticapean, Holger Dell, and Dániel Marx: Homomorphisms are a good
     basis for counting small subgraphs. In Proc. 49th STOC, pp. 210–223. ACM Press, 2017.
     [doi:10.1145/3055399.3055502, arXiv:1705.01595] 6, 7

[38] Predrag Cvitanović: Group theory for Feynman diagrams in non-Abelian gauge theories.
     Phys. Rev. D, 14(6):1536–1553, 1976. [doi:10.1103/PhysRevD.14.1536] 10

[39] Predrag Cvitanović and Anthony D. Kennedy: Spinors in negative dimensions. Physica
     Scripta, 26(1):5–14, 1982. [doi:10.1088/0031-8949/26/1/001] 10

[40] Carsten Damm, Markus Holzer, and Pierre McKenzie: The complexity of tensor
     calculus. Comput. Complexity, 11(1–2):54–89, 2002. Preliminary version in CCC’00.
     [doi:10.1007/s00037-000-0170-4, ECCC:TR00-036] 3

[41] Sashka Davis and Russell Impagliazzo: Models of greedy algorithms for graph problems.
     Algorithmica, 54(3):269–317, 2009. Preliminary version in SODA’04. [doi:10.1007/s00453-
     007-9124-4, ECCC:TR05-120] 2

[42] Josep Díaz, Maria J. Serna, and Dimitrios M. Thilikos: Counting 𝐻-colorings of partial
     𝑘-trees. Theoret. Comput. Sci., 281(1–2):291–309, 2002. Preliminary version in COCOON’01.
     [doi:10.1016/S0304-3975(02)00017-8] 6

[43] Frederic Dorn: Dynamic programming and fast matrix multiplication. In Proc. 14th Eur.
     Symp. Algorithms (ESA’06), pp. 280–291. Springer, 2006. [doi:10.1007/11841036_27] 6

[44] Friedrich Eisenbrand and Fabrizio Grandoni: On the complexity of fixed pa-
     rameter clique and dominating set. Theoret. Comput. Sci., 326(1–3):57–67, 2004.
     [doi:10.1016/j.tcs.2004.05.009] 3, 6, 7, 30

[45] Peter Floderus, Miroslaw Kowaluk, Andrzej Lingas, and Eva-Marta Lundell: Detecting
     and counting small pattern graphs. SIAM J. Discr. Math., 29(3):1322–1339, 2015. Preliminary
     version in ISAAC’13. [doi:10.1137/140978211] 7

[46] Fedor V. Fomin, Daniel Lokshtanov, Venkatesh Raman, Saket Saurabh, and
     B. V. Raghavendra Rao: Faster algorithms for finding and counting subgraphs. J.
     Comput. System Sci., 78(3):698–706, 2012. [doi:10.1016/j.jcss.2011.10.001, arXiv:0912.2371] 7

[47] William I. Gasarch: The second P =? NP poll.             SIGACT News, 43(2):53–77, 2012.
     [doi:10.1145/2261417.2261434] 2

                   T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                       48
                     T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

[48] Johan Håstad: Tensor rank is NP-complete. J. Algorithms, 11(4):644–654, 1990. Preliminary
     version in ICALP’89. [doi:10.1016/0196-6774(90)90014-6] 11

[49] Christopher J. Hillar and Lek-Heng Lim: Most tensor problems are NP-hard. J. ACM,
     60(6):45:1–39, 2013. [doi:10.1145/2512329, arXiv:0911.1393] 11

[50] Jianyu Huang, Leslie Rice, Devin A. Matthews, and Robert A. van de Geijn: Gen-
     erating families of practical fast matrix multiplication algorithms. In Proc. Internat.
     Parallel and Distrib. Processing Symp. (IPDPS’17), pp. 656–667. IEEE Comp. Soc., 2017.
     [doi:10.1109/IPDPS.2017.56, arXiv:1611.01120] 11

[51] Russell Impagliazzo and Ramamohan Paturi: On the complexity of 𝑘-SAT. J. Comput.
     System Sci., 62(2):367–375, 2001. [doi:10.1006/jcss.2000.1727] 2

[52] Russell Impagliazzo, Ramamohan Paturi, and Francis Zane: Which problems have
     strongly exponential complexity? J. Comput. System Sci., 63(4):512–530, 2001. Preliminary
     version in FOCS’98. [doi:10.1006/jcss.2001.1774] 2

[53] Alon Itai and Michael Rodeh: Finding a minimum circuit in a graph. SIAM J. Comput.,
     7(4):413–423, 1978. Preliminary version in STOC’77. [doi:10.1137/0207033] 3, 7

[54] Norman P. Jouppi, Cliff Young, Nishant Patil, and David Patterson +72: In-datacenter
     performance analysis of a tensor processing unit. In Proc. 44th Internat. Symp. Com-
     puter Architecture (ISCA’17), pp. 1–12. ACM Press, 2017. [doi:10.1145/3079856.3080246,
     arXiv:1704.04760] 3

[55] Matti Karppa, Petteri Kaski, and Jukka Kohonen: A faster subquadratic algorithm for
     finding outlier correlations. ACM Trans. Algorithms, 14(3):31:1–26, 2018. Preliminary
     version in SODA’16. [doi:10.1145/3174804, arXiv:1510.03895] 6

[56] Louis H. Kauffman: Knots, abstract tensors and the Yang-Baxter equation. In L. Lussana,
     editor, Knots, Topology and Quantum Field Theories, pp. 179–334. World Scientific, 1989. 3

[57] Alfred Bray Kempe: On the application of Clifford’s graphs to ordinary binary quantics.
     Proc. London Math. Soc., 17(1):107–123, 1885. [doi:10.1112/plms/s1-17.1.107] 3, 10

[58] Alfred Bray Kempe: On the application of the Sylvester–Clifford graphs to ordinary binary
     quantics. (Second part.). Proc. London Math. Soc., 24(1):97–118, 1892. [doi:10.1112/plms/s1-
     24.1.97] 3, 10

[59] Ton Kloks, Dieter Kratsch, and Haiko Müller: Finding and counting small induced
     subgraphs efficiently. Inform. Process. Lett., 74(3–4):115–121, 2000. Preliminary version in
     WG’95. [doi:10.1016/S0020-0190(00)00047-8] 7

[60] Tamara Gibson Kolda: Multilinear operators for higher-order decompositions. Technical
     Report SAND2006-2081, Sandia National Laboratories, 2006. [doi:10.2172/923081] 6, 16

                   T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                      49
                         P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

[61] Tamara Gibson Kolda and Brett W. Bader: Tensor decompositions and applications.
     SIAM Rev., 51(3):455–500, 2009. [doi:10.1137/07070111X] 6

[62] Daphne Koller and Nir Friedman: Probabilistic Graphical Models. MIT Press, 2009. MIT
     Press. 3, 11, 41

[63] Joseph B. Kruskal: Three-way arrays: rank and uniqueness of trilinear decompositions,
     with application to arithmetic complexity and statistics. Lin. Alg. Appl., 18(2):95–138, 1977.
     [doi:10.1016/0024-3795(77)90069-6] 6, 16

[64] Greg Kuperberg: Involutory Hopf algebras and 3-manifold invariants. Internat. J. Mathe-
     matics, 2(1):41–66, 1991. [doi:10.1142/S0129167X91000053] 3, 10

[65] Chi-Chung Lam, P. Sadayappan, and Rephael Wenger: On optimizing a class of multi-
     dimensional loops with reduction for parallel execution. Parallel Process. Lett., 7(2):157–168,
     1997. [doi:10.1142/S0129626497000176] 11

[66] Joseph M. Landsberg: Tensors: Geometry and Applications. Amer. Math. Soc., 2012.
     [doi:10.1090/gsm/128] 3, 10

[67] Joseph M. Landsberg, Yang Qi, and Ke Ye: On the geometry of tensor network states.
     Quantum Inf. Comput., 12(3& 4):346–354, 2012. [doi:10.26421/QIC12.3-4-12] 3

[68] François Le Gall: Faster algorithms for rectangular matrix multiplication. In Proc. 53rd
     FOCS, pp. 514–523. IEEE Comp. Soc., 2012. [doi:10.1109/FOCS.2012.80, arXiv:1204.1111]
     11

[69] François Le Gall: Powers of tensors and fast matrix multiplication. In Proc. 39th Internat.
     Symp. Symbolic and Algebraic Computation (ISSAC’14), pp. 296–303. ACM Press, 2014.
     [doi:10.1145/2608628.2608664, arXiv:1401.7714] 5, 11

[70] François Le Gall and Florent Urrutia: Improved rectangular matrix multiplication using
     powers of the Coppersmith-Winograd tensor. In Proc. 29th Ann. ACM–SIAM Symp. on Dis-
     crete Algorithms (SODA’18), pp. 1029–1046. SIAM, 2018. [doi:10.1137/1.9781611975031.67,
     arXiv:1708.05622] 5, 11, 24

[71] James R. Lee, Prasad Raghavendra, and David Steurer: Lower bounds on the size of
     semidefinite programming relaxations. In Proc. 47th STOC, pp. 567–576. ACM Press, 2015.
     [doi:10.1145/2746539.2746599, arXiv:1411.6317] 2

[72] Andrea Lincoln, Virginia Vassilevska Williams, and R. Ryan Williams: Tight hardness for
     shortest cycles and paths in sparse graphs. In Proc. 29th Ann. ACM–SIAM Symp. on Discrete
     Algorithms (SODA’18), pp. 1236–1252. SIAM, 2018. [doi:10.1137/1.9781611975031.80,
     arXiv:1712.08147] 8

                    T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                        50
                     T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

[73] Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S.
     Vetter: NVIDIA tensor core programmability, performance & precision. In Proc. Internat.
     Parallel and Distrib. Processing Symp. (IPDPS’18), pp. 522–531. IEEE Comp. Soc., 2018.
     [doi:10.1109/IPDPSW.2018.00091, arXiv:1803.04014] 3
[74] Ankur Moitra and Alexander S. Wein: Spectral methods from tensor networks. In Proc.
     51st STOC, pp. 926–937. ACM Press, 2019. [doi:10.1145/3313276.3316357, arXiv:1811.00944]
     11
[75] Jarik Nešetřil and Svatopluk Poljak: On the complexity of the subgraph problem.
     Commentationes Mathematicae Universitatis Carolinae, 26(2):415–419, 1985. EuDML. 3, 6, 7, 9
[76] Noam Nisan: Lower bounds for non-commutative computation (extended abstract). In
     Proc. 23rd STOC, pp. 410–418. ACM Press, 1991. [doi:10.1145/103418.103462] 39
[77] Stephan Olariu: Paw-free graphs.              Inform. Process. Lett., 28(1):53–54, 1988.
     [doi:10.1016/0020-0190(88)90143-3] 7
[78] Román Orús: A practical introduction to tensor networks: Matrix product states and pro-
     jected entangled pair states. Ann. Phys., 349:117–158, 2014. [doi:10.1016/j.aop.2014.06.013,
     arXiv:1306.2164] 3, 11
[79] Victor Y. Pan: Matrix multiplication, trilinear decompositions, APA algorithms, and
     summation, 2014. [arXiv:1412.1145] 11
[80] Roger Penrose: Tensor Methods in Algebraic Geometry. Ph. D. thesis, St. John’s College, 1956.
     3
[81] Roger Penrose: Applications of negative dimensional tensors. In D. J. A. Welsh, editor,
     Combinatorial Mathematics and its Applications, pp. 221–244. Academic Press, 1971. LINK. 3,
     10
[82] Roger Penrose and Wolfgang Rindler: Spinors and space-time. Vol. 1. Cambridge Univ.
     Press, 1984. [doi:10.1017/CBO9780511564048] 10
[83] Robert N. C. Pfeifer, Jutho Haegeman, and Frank Verstraete: Faster identifica-
     tion of optimal contraction sequences for tensor networks. Phys. Rev. E, 90(3), 2014.
     [doi:10.1103/PhysRevE.90.033315] 3, 11
[84] Ran Raz: Tensor-rank and lower bounds for arithmetic formulas. J. ACM, 60(6):40:1–15,
     2013. Preliminary version in STOC’10. [doi:10.1145/2535928, ECCC:TR10-002] 7
[85] Jürgen Richter-Gebert and Peter Lebmeir: Diagrams, tensors and geometric reasoning.
     Discr. Comput. Geom., 42(2):305–334, 2009. [doi:10.1007/s00454-009-9188-9] 3
[86] Neil Robertson and Paul D. Seymour: Graph minors. X. Obstructions to tree-
     decomposition. J. Combin. Theory–B, 52(2):153–190, 1991. [doi:10.1016/0095-8956(91)90061-
     N] 14

                   T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                       51
                         P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

 [87] Elina Robeva and Anna Seigal: Duality of graphical models and tensor networks. Inf.
      Inference, 8(2):273–288, 2018. [doi:10.1093/imaiai/iay009, arXiv:1710.01437] 3, 11

 [88] Günter Rote: Division-free algorithms for the determinant and the Pfaffian: Algebraic
      and combinatorial approaches. In Helmut Alt, editor, Computational Discrete Mathematics,
      Advanced Lectures, pp. 119–135. Springer, 2001. [doi:10.1007/3-540-45506-X_9] 7

 [89] Herbert John Ryser: Combinatorial Mathematics. Amer. Math. Soc., 1963. 3, 6, 35

 [90] Alexander Schrijver: On traces of tensor representations of diagrams. Lin. Alg. Appl.,
      476:28–41, 2015. [doi:10.1016/j.laa.2015.02.037] 3, 10

 [91] Edgar Solomonik and Torsten Hoefler: Sparse tensor algebra as a parallel programming
      model, 2015. [arXiv:1512.00066] 3, 11

 [92] Edgar Solomonik, Devin Matthews, Jeff R. Hammond, John F. Stanton, and James Demmel:
      A massively parallel tensor contraction framework for coupled-cluster computations. J.
      Parallel Distrib. Comput., 74(12):3176–3190, 2014. [doi:10.1016/j.jpdc.2014.06.002] 3

 [93] Volker Strassen: Gaussian elimination is not optimal. Numerische Mathematik, 13(4):354–
      356, 1969. [doi:10.1007/BF02165411] 4, 24

 [94] James J. Sylvester: On an application of the new atomic theory to the graphical represen-
      tation of the invariants and covariants of binary quantics, with three appendices. Amer. J.
      Math., 1(1):64–104, 1878. [doi:10.2307/2369436] 3, 10

 [95] Leslie G. Valiant: The complexity of computing the permanent. Theoret. Comput. Sci.,
      8(2):189–201, 1979. [doi:10.1016/0304-3975(79)90044-6] 3

 [96] Leslie G. Valiant: Holographic algorithms. SIAM J. Comput., 37(5):1565–1594, 2008.
      Preliminary version in FOCS’04. [doi:10.1137/070682575, ECCC:TR05-099] 3, 11

 [97] Charles Van Loan: Computational Frameworks for the Fast Fourier Transform. SIAM, 1992.
      [doi:10.1137/1.9781611970999] 2, 5, 31

 [98] Virginia Vassilevska Williams: Multiplying matrices faster than Coppersmith–Winograd.
      In Proc. 44th STOC, pp. 887–898. ACM Press, 2012. [doi:10.1145/2213977.2214056] 5, 11

 [99] Virginia Vassilevska Williams, Joshua R. Wang, R. Ryan Williams, and Huacheng Yu:
      Finding four-node subgraphs in triangle time. In Proc. 26th Ann. ACM–SIAM Symp. on Dis-
      crete Algorithms (SODA’15), pp. 1671–1680. SIAM, 2015. [doi:10.1137/1.9781611973730.111]
      7

[100] Virginia Vassilevska Williams and R. Ryan Williams: Finding, minimizing, and counting
      weighted subgraphs. SIAM J. Comput., 42(3):831–854, 2013. Preliminary version in
      STOC’09. [doi:10.1137/09076619X] 7

                    T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                     52
                     T ENSOR N ETWORK C OMPLEXITY OF M ULTILINEAR M APS

[101] Martin J. Wainwright and Michael I. Jordan: Graphical models, exponential fam-
      ilies, and variational inference. Found. Trends Mach. Learning, 1(1–2):1–305, 2008.
      [doi:10.1561/2200000001] 11

[102] R. Ryan Williams: A new algorithm for optimal 2-constraint satisfaction and its impli-
      cations. Theoret. Comput. Sci., 348(2–3):357–365, 2005. Preliminary version in ICALP’04.
      [doi:10.1016/j.tcs.2005.09.023] 3, 7, 17

[103] R. Ryan Williams: Faster all-pairs shortest paths via circuit complexity. SIAM J. Comput.,
      47(5):1965–1985, 2018. Preliminary version in STOC’14. [doi:10.1137/15M1024524] 3

[104] Frank Yates: The Design and Analysis of Factorial Experiments. Imperial Bureau of Soil
      Science, 1937. Rothamsted Repository. 33


AUTHORS

     Per Austrin
     Associate professor
     School of Electrical Engineering and Computer Science
     KTH Royal Institute of Technology
     Sweden
     austrin kth se
     https://www.csc.kth.se/~austrin/


     Petteri Kaski
     Associate professor
     Department of Computer Science
     Aalto University
     Finland
     petteri kaski aalto fi
     https://www.aalto.fi/en/people/petteri-kaski


     Kaie Kubjas
     Assistant professor
     Department of Mathematics and Systems Analysis
     Aalto University
     Finland
     kaie kubjas aalto fi
     https://www.kaiekubjas.com




                    T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                    53
                       P ER AUSTRIN , P ETTERI K ASKI , AND K AIE K UBJAS

ABOUT THE AUTHORS

    Per Austrin graduated from KTH Royal Institute of Technology in 2008; his advisor
       was Johan Håstad. The subject of his thesis was hardness of approximation and
       discrete harmonic analysis. In his spare time he sometimes enjoys algorithmic
       programming contests.


    Petteri Kaski graduated from the Helsinki University of Technology in 2005; his
       advisor was Patric Östergård. The subject of his thesis was algorithms for
       classification of combinatorial objects. He enjoys the interplay of algorithms,
       algebra, and combinatorics.


    Kaie Kubjas graduated from the Free University of Berlin in 2013; her advisors were
       Christian Haase and Klaus Altmann. The subject of her thesis was algebraic and
       combinatorial aspects of group-based models. She enjoys applying algebraic
       geometry to other fields. Kubjas discovered her love for mathematics in her school
       years in Tartu, Estonia under the guidance and mentoring of her mathematics
       teachers Reene Õigus and Hele Kiisel. Thanks to the encouragement and support
       of her teachers, Kubjas got involved in mathematics competitions.




                  T HEORY OF C OMPUTING, Volume 18 (16), 2022, pp. 1–54                     54