Lower Bounds for Non-Commutative Skew Circuits

Authors Nutan Limaye, Guillaume Malod, Srikanth Srinivasan,
Plaintext
                          T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38
                                       www.theoryofcomputing.org




                       Lower Bounds for
                 Non-Commutative Skew Circuits
            Nutan Limaye                     Guillaume Malod           Srikanth Srinivasan
                 Received February 11, 2015; Revised June 8, 2016; Published September 7, 2016




       Abstract: Nisan (STOC 1991) exhibited a polynomial which is computable by linear-size
       non-commutative circuits but requires exponential-size non-commutative algebraic branching
       programs. Nisan’s hard polynomial is in fact computable by linear-size “skew circuits.” Skew
       circuits are circuits where every multiplication gate has the property that all but one of its
       children is an input variable or a scalar. Such multiplication gates are called skew gates.
           We prove that any non-commutative skew circuit which computes the square of the
       polynomial defined by Nisan must have exponential size. A simple extension of this result
       then yields an exponential lower bound on the size of non-commutative circuits where each
       multiplication gate has an argument of degree at most one-fifth of the total degree.
           We also extend our techniques to prove an exponential lower bound for a class of circuits
       which is a restriction of general non-commutative circuits and a generalization of non-
       commutative skew circuits. We define the non-skew depth of a circuit to be the maximum
       number of non-skew gates on any path from an input gate to the output gate. We prove lower
       bounds for non-commutative circuits of small non-skew depth.
           More precisely, we show that for any k < d, there is an explicit polynomial of degree
       d over n variables that has non-commutative circuits of polynomial size but such that any
       circuit with non-skew depth k must have size at least nΩ(d/k) . It is not hard to see that any
    Research funded by IFCPAR/CEFIPRA Project No 4702-1(A) and by the ANR project CompA (ANR-13-BS02-0001-01).


ACM Classification: F.1.3
AMS Classification: 68Q15, 68Q17
Key words and phrases: complexity theory, lower bounds, algebraic complexity, polynomials, circuits,
circuit complexity, arithmetic circuits, noncommutative ring, skew circuits


© 2016 Nutan Limaye, Guillaume Malod, and Srikanth Srinivasan
c b Licensed under a Creative Commons Attribution License (CC-BY)                  DOI: 10.4086/toc.2016.v012a012
                     N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

       polynomial of degree d that has polynomial-size circuits has a polynomial-size circuit with
       non-skew depth d. Hence, our results should be interpreted as proving lower bounds for the
       class of circuits with non-trivially small non-skew depth.
           As far as we know, this is the strongest model of non-commutative computation for
       which we have superpolynomial lower bounds.


1     Introduction
1.1    Non-commutative arithmetic circuits
If we want to design an efficient algorithm for a computational problem that is naturally stated as a
polynomial—such as the determinant or the permanent, matrix multiplication, Fast Fourier Transform,
etc.—then arithmetic circuits capture most natural candidate algorithms that we might consider. An
arithmetic circuit is an algorithm that starts with the input variables and possibly some constants in
the underlying field, and iteratively applies addition and multiplication operations until it computes
the desired polynomial. There has been a large body of work proving upper and lower bounds on the
arithmetic circuit complexity of various polynomials (see, e. g., the surveys [22, 6]). In particular, proving
explicit superpolynomial lower bounds for general arithmetic circuits is a celebrated open question
in complexity theory and one of the possible approaches to the P versus NP question (see, e. g., [4]).
However, despite more than three decades of intensive study, it has seen little tangible progress (in the
sense of concrete lower bounds for general circuits).
     In this paper, we concentrate on non-commutative arithmetic circuits, which compute polynomials in
the non-commutative polynomial ring FhXi. Here, variables do not commute upon multiplication; that
is, xy and yx (for distinct x, y ∈ X) are distinct monomials. There are two reasons for looking at such
circuits. The first is that such circuits yield algorithms for polynomial functions over non-commutative
algebras, which arise naturally and can even have applications to commutative computations (see [7, 2],
in particular the use of non-commutative determinants to approximate the commutative permanent). The
second reason is that proving explicit lower bounds for non-commutative arithmetic circuits is formally
an easier problem than that of proving lower bounds for (commutative) arithmetic circuits described in
the previous paragraph, and it is hoped that techniques discovered in the course proving non-commutative
lower bounds will be useful in the commutative setting as well.
     The results of Hyafil [11] and Nisan [15] were among the first to motivate the study of arithmetic
circuits from this latter point of view. In a breakthrough, Nisan [15] showed exponential lower bounds for
non-commutative arithmetic formulas (a restriction of general non-commutative arithmetic circuits) and
more generally for non-commutative algebraic branching programs (ABPs). (The formal definition of an
ABP is given in Definition 3.1.) This might have led one to think that a superpolynomial lower bound
for general (non-commutative) arithmetic circuits1 was also close at hand. However, Nisan also showed
using the same techniques that general arithmetic circuits are exponentially more powerful than arithmetic
formulas and ABPs, thus suggesting that his techniques may not be sufficient to prove lower bounds
for general arithmetic circuits. Indeed, there is no known lower bound for general non-commutative
    1 From here on, all circuits, formulas, ABPs, and polynomials, unless explicitly mentioned otherwise, will be non-

commutative.


                          T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                     2
                           L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

arithmetic circuits that is stronger than those that we already have for general commutative arithmetic
circuits.
    In a more recent result, Hrubeš, Wigderson, and Yehudayoff [9] suggested a new line of attack on the
general arithmetic circuit lower bound question. Their result introduces a “product lemma” for general
arithmetic circuits that generalizes a decomposition of ABPs due to Nisan [15]. Using this lemma,
they are able to show that superpolynomial lower bounds for general arithmetic circuits would follow
from a strong enough lower bound for the classical Sum-of-squares problem. However, as of now, this
approach has not yielded superpolynomial arithmetic circuit lower bounds. Therefore, the strongest
known computational model for which we have superpolynomial lower bounds remains the ABPs from
the paper of Nisan [15].

1.2    Our results
In this paper, we prove exponential lower bounds for skew circuits. Skew circuits are arithmetic circuits
where every multiplication involves at least one argument2 that is either an input variable or a field
element. More formally, we prove the following theorem.

Theorem 1.1. For infinitely many d ∈ N and any n ∈ N, there exists an explicit polynomial on n variables
of degree d such that any skew circuit computing it must have size Ω̃(nd/4 ) where the Ω̃(·) hides poly(d)
factors.

     Skew circuits are a well-studied model of computation [23, 14, 1, 13], especially in the commutative
setting, where they are equivalent in power to ABPs and to the evaluation of the determinant polynomial.
However, the picture seems more complicated in the non-commutative setting. Nisan [15] has shown that
skew circuits are exponentially more powerful than ABPs. Thus, our lower bound for this model can be
seen as one step towards the goal of superpolynomial lower bounds for general non-commutative circuits.
     Note that a superpolynomial lower bound for non-commutative skew circuits was claimed by Allender
et al. [1], but, unfortunately, the proof of this particular result in the paper (Theorem 7.12) seems to
fail because it did not take into account possible cancellations.3 Indeed, they argue that considering a
non-commutative skew-circuit and switching multiplication gates so that it is now left-skew yields a
polynomial which is weakly equivalent to the original one, i. e., which has exactly the same monomials
with possibly different coefficients. But this is not true, as there might have been cancellations of
monomials in the original skew circuit which do not happen anymore in the resulting left-skew one,
because of differing variable orders, thus leaving extraneous monomials.
     Theorem 1.1 also clarifies the relative power of skew circuits vis-à-vis general arithmetic circuits. In
fact, our lower bound shows that skew circuits are exponentially less powerful than circuits with just one
non-skew gate (that is, neither of its arguments is an input variable or field element). This is because the
explicit polynomial for which we prove a lower bound is just the square of a polynomial considered by
Nisan, and this polynomial in turn has skew circuits of linear size.
     We also consider the problem of extending our techniques to more powerful classes of circuits. We
obtain a first simple generalization of our lower bound to circuits where every multiplication gate has
   2 We assume fan-in 2 for all gates.
   3 Meena Mahajan, personal communication.



                           T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                           3
                      N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

an argument of degree at most δ , which we call δ -unbalanced circuits. For instance, this yields an
exponential lower bound for the same polynomial as above, when computed by circuits where each
multiplication gate has an argument of degree at most one fifth of the total degree.
    Another natural way to extend our results (and one that is analogous to a large body of work in the
Boolean circuit setting; see, e.g. [3, 5, 12]) is to augment a circuit for which we do have lower bounds
with a few “powerful” gates and see if one can still prove a lower bound. We therefore consider the
problem of proving lower bounds for skew circuits with a “few” non-skew multiplication gates.
    We say that the non-skew depth of a non-commutative circuit is the maximum number of non-skew
gates on a path from a variable to the output gate in the DAG underlying the circuit. We prove the
following result for such circuits.

Theorem 1.2. For infinitely many d ∈ N and any k, n ∈ N, there exists a polynomial of degree d on n
variables which is computable by a polynomial-size non-commutative circuit of non-skew depth O(k) but
requires size nΩ(d/k) for any non-commutative circuit of non-skew depth k.

    In particular the above theorem implies that there exists a polynomial of degree d which is computable
by a polynomial-size non-commutative circuit of non-skew depth d, but requires a superpolynomial
size for any non-commutative circuit of non-skew depth k(d) = o(d). It is not hard to see that any
polynomial of degree d that can be computed by a polynomial-size arithmetic circuit can also be
computed by a polynomial-size arithmetic circuit of non-skew depth d. Hence, strengthening our lower
bound substantially would prove lower bounds for general non-commutative circuits.
    We also show that the determinant polynomial can simulate our hard polynomial, thus completing the
picture in the non-commutative setting by showing that skew circuits are exponentially less powerful than
the determinant polynomial. Finally, we show that to prove superpolynomial lower bounds for general
non-commutative circuits, our complexity measure (to be defined formally in the upcoming section)
will need to be further refined. Slightly more precisely, we show that there is a polynomial that has
polynomial-size non-commutative circuit, but for which our complexity measure is as large as possible.

Organization. The rest of the paper is organized as follows. We start with a proof outline in Section 2.
We then present some definitions in Section 3 and preliminaries in Section 4. The proof of Theorem 1.1
is presented in Section 5, with the extension to unbalanced circuits, and the proof of Theorem 1.2 is
presented in Section 6.4 We also prove lower bounds for the permanent and determinant polynomials in
Section 7. Finally, we show the limits of these complexity measures in Section 8.


2     Proof outline
2.1    A lower bound for ABPs
Our overall proof strategy is similar to that of Nisan [15] for non-commutative formulas and algebraic
branching programs (ABPs). (The formal definition of an ABP is given in Definition 3.1.) In his result,
   4 As skew circuits are a subset of bounded non-skew depth circuits, our lower bound for bounded non-skew depth circuits

subsumes the lower bound for skew circuits. However, for the sake of exposition we first describe the lower bound proof for
skew circuits and then prove the lower bound for bounded non-skew depth circuits.


                          T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                          4
                              L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

Nisan considered the partial derivative matrix corresponding to a homogeneous polynomial f ∈ FhXi of
degree d, originally introduced by Hyafil [11], which is defined to be an nd/2 × nd/2 matrix M[ f ] where
the rows and columns are labelled by monomials in X of degree d/2. The (m1 , m2 )-th entry of the matrix
M[ f ] is defined to be the coefficient of the monomial m1 m2 in f .5
     Nisan observed that if f has a formula or ABP of small size, then f can be decomposed as a small sum
of polynomials of the form g · h where g and h are homogeneous polynomials of degree d/2. Crucially,
it may be seen that for any such g, h the matrix M[g · h] has rank 1 and hence, by subadditivity of rank,
M[ f ] has small rank. Thus, choosing an f such that rank(M[ f ]) is large gives us a lower bound.
     Intuitively speaking, the rank of the matrix M[ f ] is a measure of how “correlated” the first half of a
monomial appearing in f is with its second half: M[ f ] being full rank would mean that they are perfectly
correlated, whereas M[ f ] being low rank would mean that they are not very correlated at all. Nisan’s
argument shows that small ABPs have “information bottlenecks” at degree d/2 (and indeed at any degree
d 0 ≤ d), and hence the amount of correlation is small.

2.2     Nisan’s measure applied to skew circuits
A natural question to ask is if this argument can give a lower bound for non-commutative skew circuits as
well. Unfortunately, the answer is no, as is already implicit in Nisan’s paper. Consider the Palindrome
polynomial PALd/2 (X), which is the sum of all monomials of degree d that are palindromes when viewed
as strings of length d over the alphabet X. Nisan observed that PALd/2 (X) has a skew circuit of linear
size but at the same time M[PALd/2 (X)] has full rank. In fact, M[PALd/2 (X)] is a permutation matrix
since the first half of a palindrome uniquely determines the second half (thus, the first and second halves
of monomials appearing in f are perfectly correlated). Hence, the partial derivative matrix of polynomials
with small skew circuits can have as large a rank as possible. This means that in our lower bound argument
for skew circuits, we need to use a different measure of complexity.

2.3     A new measure for skew circuits
The measure that we use is a modified version of the partial derivative matrix, defined as follows. Let
 f ∈ FhXi be a homogeneous polynomial of degree d over n variables, and given an ordered partition
Π = (Y, Z) of [d] into two parts, we define M[ f , Π] to be the matrix whose rows and columns are indexed
by monomials in X of degree |Y | and |Z|, respectively. The (m1 , m2 )-th entry of M[ f , Π] is defined to the
coefficient of the unique monomial m of degree d which equals m1 if we keep only the variables indexed
by locations in Y and delete the others, and equals m2 if we only keep the variables indexed by locations
in Z. As above, the rank of M[ f , Π] measures the correlation between the restriction of a monomial to the
locations in Y and the locations in Z. We are usually interested in Π where |Y | ≤ |Z|, since in this case
we know that the maximum possible rank is min{n|Y | , n|Z| } ≤ n|Y | .
    In this notation, the measure of complexity used by Nisan is rank(M[ f , ([d/2], [d] \ [d/2])]) and we
have seen above that this measure is as large as it can be for, say, the Palindrome polynomial PALd/2 (X),
which has a small skew circuit. However, it is an easy observation that if one considers the partition
Π0 = (Y0 , Z0 ) where Z0 := [d/4 + 1, 3d/4] and Y0 := [d] \ Z0 , then M[PALd/2 (X), Π0 ] has rank 1.
   5 More generally, Nisan also considered the matrix where the rows and columns are labelled by monomials of degree d 0 ≤ d

and d − d 0 , respectively.


                              T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                       5
                    N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

    Thus, we might hope that for every polynomial f that has a small skew circuit, we could find a Π such
that M[ f , Π] has low rank. We are in fact able to show something much stronger: we can show in general
that if f has a small skew circuit, then rank(M[ f , Π0 ]) is “small” for the particular Π0 defined above.
(Here, “small” means that the rank is much smaller than full rank.) In terms of correlation, this statement
could be interpreted as saying that though skew circuits can compute polynomials that are perfectly
correlated w. r. t. Nisan’s partition ([d/2], [d] \ [d/2]), they can only do so by correlating the initial few
indices in the monomial with the final few indices, as in the Palindrome polynomial. Consequently, these
“extreme” indices end up uncorrelated with those in the middle. This is the weakness of skew circuits that
we exploit in our lower bound.

2.4   Decomposition lemma for skew circuits
The proof of this fact rests on a decomposition of skew circuits that is motivated by the similar ABP
decomposition mentioned above. Like in the ABP decomposition, we can show that given any homoge-
neous polynomial f of degree d that has a small skew circuit and any degree parameter d 0 ∈ [d], we can
decompose f as a small sum of polynomials of the form g × j h where g and h are polynomials of degrees
d 0 and d − d 0 , respectively, (we refer the reader to Section 3 for the definition of × j , but it intuitively
means that the polynomial g is multiplied on the left by the sum of the prefixes of the monomials of h
of degree j and on the right by the sum of the suffixes of degree d − d 0 − j). The proof of this lemma is
obtained by specializing the proof of a lemma of Hrubeš, Wigderson and Yehudayoff [9] regarding general
non-commutative arithmetic circuits to the case of skew circuits, where it yields a stronger conclusion.
     Given this decomposition lemma, we prove the lower bound as follows. We apply the lemma with d 0
being a large number close to d; for concreteness, say d 0 = 3d/4. In other words, we decompose f as
a small sum of polynomials g × j h where g and h are homogeneous polynomials of degrees 3d/4 and
d/4, respectively. In each such polynomial, a set Ig ⊆ [d] of 3d/4 indices corresponds to g and a set
Ih = [d] \ Ig corresponds to the polynomial h as shown in Figure 1 below.

                      Y0 :                [d] \Y0 :

                             d/4                           d/2                   d/4




                                                      Ig


                                   Figure 1: The partition Π0 and the set Ig .

   As we mentioned above, we will consider the rank of the matrix M[g × j h, Π0 ]. Now, it is easy to
show that
                     rank(M[g × j h, Π0 ]) = rank(M[g, Πg ]) · rank(M[h, Πh ])
where the partitions Πg = (Yg , Zg ) and Πh = (Yh , Zh ) are the natural restrictions of Π0 to Ig and Ih ,
respectively.
    Note that if rank(M[g × j h, Π0 ]) is to be close to full, i. e., n|Y0 | , then we need both rank(M[g, Πg ])
and rank(M[h, Πh ]) to be close to n|Yg | and n|Yh | , respectively. However, it is easily seen that, irrespective

                         T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                  6
                        L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

of the value of j, the matrix M[h, Πh ] is always a rank 1 matrix (this happens since Yh occupies all of
Ih and thus Zh = 0)/ and hence rank(M[g × j h, Π0 ]) falls exponentially short of its maximum possible
value. Since f is a small sum of such polynomials, the same is true of rank(M[ f , Π0 ]) as well. More
generally, the same strategy shows that rank(M[ f , Π]) is small as long as Π = (Y, Z) has the “left-right
monochromatic” form (LRM partitions for short) shown in Figure 2 (for d1 , d2 large enough).

                      Y:                Y or Z:

                           d1                                                  d2




Figure 2: Left-right monochromatic (LRM) partitions, where segments on both the left and right ends are
contained in Y .

    The above argument implies a strong exponential lower bound on the size of a skew circuit computing
any homogeneous polynomial F of degree d such that M[F, Π0 ] is full rank. It is easy to find explicit
examples of such polynomials. For example, we could take F to be the square of PALd/4 (X) or the
Lifted Identity polynomial of Hrubeš et al. [9]. In either of these cases, it can be checked that M[F, Π0 ] is
again a permutation matrix and hence full rank. Since (PALd/4 (X))2 can be computed by a small circuit
with just a single non-skew gate, this also gives an exponential separation between skew circuits and
circuits with one non-skew gate. However, this also implies that if we want to extend our lower bound to
non-commutative circuits of small non-skew depth, then we need to modify our measure further.


2.5   New measure and decomposition lemma for circuits with small non-skew depth
We prove our lower bound for circuits of small non-skew depth by induction on the non-skew depth k
of the circuit. As in the skew case, we choose a partition Πk of [d] such that no small non-skew depth k
circuit can compute a polynomial that has large rank w. r. t. the partition Πk . The inductive argument is
based on showing that if a non-skew depth k circuit C computes a polynomial of large rank w. r. t. Πk ,
then it must contain a depth k − 1 circuit that computes a polynomial of large rank w. r. t. Πk−1 (or an
even “harder” partition). We then apply the inductive hypothesis to prove the lower bound.
     Let us consider the problem of constructing such a partition in the case k = 1 (i. e., non-skew depth
1). Ideally, we would like to construct a partition Π1 such that if C is a circuit of non-skew depth 1 that
is high rank w. r. t. Π1 , then a sub-circuit of C is high rank w. r. t. an LRM partition as in Figure 2 (with
perhaps a slightly smaller degree). However, it can be checked that we cannot choose such a partition
even if we know beforehand that C is just a product of two skew circuits. That is, for any candidate
partition Π1 , there are skew circuits of degree d 0 ≤ d and d − d 0 computing polynomials g1 and g2 such
that neither the partition restricted to g1 , nor the partition restricted to g2 , is LRM.
     Hence, we are first led to the problem of enlarging the family of partitions that are hard for skew
circuits. Building on the techniques outlined for skew circuits above, we can also show that small skew
circuits cannot compute high rank polynomials w. r. t. the larger family of “extended LRM” (XLRM)
partitions, illustrated in Figure 3, which are obtained by extending an LRM partition on the left and right

                        T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                               7
                       N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

sides with segments of length ` that are contained in Y and Z, respectively.6 Intuitively, a skew circuit
that computes a large rank polynomial w. r. t. such a partition would try to pairwise correlate indices in
the segments (of length `) on the two extremes. However, after having done this, it is still left with the
task of computing a high rank polynomial w. r. t. an LRM partition, which we know to be a hard problem.

               Y:                    Y or Z:               Z:

                       `               d1                                            d2        `




                       Figure 3: Extended left-right monochromatic (XLRM) partitions.

     We are now ready to tackle the problem of proving lower bounds for circuits of non-skew depth k. We
choose our hard partition Πk = (Yk , Zk ) to have the form shown in Figure 4. That is, starting from the left,
our partition assigns an initial segment of length roughly d/4 to Yk . The remaining indices are assigned to
Yk and Zk in k0 pairs of segments of lengths roughly d/4k0 and d/2k0 , respectively, for k0 = O(k), so that
overall we have |Yk | = |Zk | = d/2. Note that Πk is in particular an XLRM partition, and hence is clearly
hard for skew circuits. We show that any small circuit C of non-skew depth at most k cannot compute a
polynomial of large rank w. r. t. Πk .
     To get an idea of the proof, consider first the easier case when the output of C is a non-skew
homogeneous multiplication gate and hence C is a product of two homogeneous polynomials g1 and g2
that have small circuits of non-skew depth at most k − 1. In this case, the indices in [d] are distributed
between g1 and g2 as shown in Figure 4. Now, as we have argued previously, if the polynomial f
computed by C is to have rank nearly n|Yk | w. r. t. Πk , then rank(M[gi , Πk,i ]) should be close to n|Yk,i | where
Πk,i = (Yk,i , Zk,i ) is the natural restriction of Πk to the indices corresponding to gi for i ∈ [2]. For this
to occur, however, we must have |Yk,i | ≈ |Zk,i | for each i, since otherwise for some i, we will have |Zk,i |
much smaller than |Yk,i |, and then rank(M[gi , Πk,i ]) ≤ n|Zk,i |  n|Yk,i | . However, it is easy to check that
if |Yk,i | ≈ |Zk,i | for each i, then the only possibility is that one of g1 or g2 , say g1 for concreteness, has
very small degree and the other “occupies” almost all the indices in [d] and is hence already computing a
polynomial of large rank w. r. t. Πk . Since g2 has a small circuit of non-skew depth at most k − 1, this
allows us to induct on g2 .

                 Y:                          Z:

                           d/4                    d/4k0                                            d/2k0

                                                                               ···

                                            g1                                            g2


                                                  Figure 4: The partition Πk .

    The general case puts together a couple of arguments we have already outlined. Using a decomposition
   6 The actually family of partitions we consider is a little more general.



                            T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                 8
                         L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

lemma that is similar in spirit to the skew circuit decomposition lemma described above, we can show
that any homogeneous polynomial f of degree d computed by a small circuit of non-skew depth at most
k can be written as a small sum of polynomials of the form

                                                   (g1 · g2 ) × j h

where g1 and g2 are homogeneous polynomials computed by small circuits of non-skew depth at most
k − 1 and h has a small skew circuit. In the easy case above, we have already handled the case when
deg(h) = 0, and so now we try to see how h can help produce a polynomial of large rank w. r. t. the
partition Πk . As in the proof of the hardness of XLRM partitions, one would guess that the worst that h
could do is to match up the d/2k0 indices in Y and Z on either extreme. In this case, we can argue as in
the easier case above that one of g1 or g2 occupies all that is remaining, which corresponds to a partition
that is hard for non-skew depth at most k − 1, as desired.
    As might be expected, the actual proof is not quite as neat, since we need to handle some other cases
that we have not describe above. It turns out, however, that these cases are easy, even if somewhat tedious,
to handle.


3    Definitions
Throughout the paper, we refer to a fixed set X = {x1 , . . . , xn } of non-commuting variables. We work
with the ring FhXi of polynomials in our non-commuting variables. We start by recalling the definition
of an algebraic branching program.

Definition 3.1 (ABPs [15]). An Algebraic Branching Program (ABP) is a directed acyclic graph with
one vertex of in-degree zero, called the source, and a vertex of out-degree zero, called the sink. The
vertices of the graph are partitioned into levels numbered 0, 1, · · · , d. Edges may only go from level i to
level i + 1 for i ∈ {0, · · · , d − 1}. The source is the only vertex at level 0 and the sink is the only vertex at
level d. Each edge is labeled with a homogeneous linear form in the input variables. The size of the ABP
is the number of vertices.

    For i, j ∈ N, we define [i, j] to be the set {i, i + 1, . . . , j}. (This set is empty if i > j.) We also use the
standard notation [i] to denote the set [1, i].
    For d ∈ N, we use Md (X) to denote the set of monomials of degree exactly d over the variables in X.

Definition 3.2 ( j-products). Given homogeneous polynomials g, h ∈ FhXi of degrees dg and dh , respec-
tively, and an integer j ∈ [0, dh ], we define the j-product of g and h, denoted g × j h, as follows.

    • When g and h are monomials, then we can factor h uniquely as a product of two monomials h1 h2
      such that deg(h1 ) = j and deg(h2 ) = dh − j. In this case, we define g × j h to be h1 · g · h2 .

    • The map is extended bilinearly to general homogeneous polynomials g, h. Formally, let g, h be
      general homogeneous polynomials, where g = ∑` g` , h = ∑i hi and g` , hi are monomials of g, h
      respectively. For j ∈ [0, dh ], each hi can be factored uniquely into hi1 , hi2 such that deg(hi1 ) = j and
      deg(hi2 ) = dh − j. And g × j h is defined to be ∑i ∑` hi1 g` hi2 .

                         T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                     9
                    N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

                           j                        dg                         dh − j

                          h1                        g                           h2

                                   Figure 5: j product for monomials g, h.

Note that g ×0 h and g ×dh h are just the products g · h and h · g, respectively.
    The following easily verifiable facts about j-products will be useful.
Fact 3.3.
   1. The operator × j is bilinear, i. e., (g1 + g2 ) × j h = g1 × j h + g2 × j h and g × j (h1 + h2 ) = g × j h1 +
      g × j h2 provided that g, g1 , g2 , h, h1 , h2 are such that all the above expressions are well defined.
   2. Assume g and h are such that g × j h is defined and let f be a homogeneous polynomial of degree d.
      Then (g × j h) · f = g × j (h · f ) and f · (g × j h) = g ×d+ j ( f · h).
   3. Assume g and h are as above and further that g = g1 · g2 . Then g × j h = g1 × j (g2 × j h) = g2 × j+dg1
      (g1 × j h) where dg1 = deg(g1 ). If instead we have g = g1 ×k g2 , then g × j h = g1 × j+k (g2 × j h).
    Given a monomial m = xi1 xi2 · · · xid ∈ FhXi and a subset S ⊆ [d], we denote by mS the product of all
the variables in the locations indexed by S, i. e., mS = ∏ j∈S xi j where the product is taken in increasing
order of j.
    Let Π denote a partition of [d] given by an ordered pair (Y, Z), where Y ⊆ [d] and Z = [d] \Y . In what
follows we only use partitions of sets into two parts.
Definition 3.4 (Partial derivative matrix). Let f ∈ FhXi be a homogeneous polynomial of degree d.
Given a partition Π = (Y, Z) of [d], we define a n|Y | × n|Z| matrix M[ f , Π] with entries from F as follows.
The rows of M[ f , Π] are labelled by monomials from M|Y | (X) and the columns by elements of M|Z| (X).
Let m0 ∈ M|Y | (X) and m00 ∈ M|Z| (X); the (m0 , m00 )-th entry of M[ f , Π] is the coefficient in the polynomial
f of the unique monomial m such that mY = m0 and mZ = m00 .
   We will use the rank of the matrix M[ f , Π] (for a suitably defined Π = (Y, Z)) as a measure of the
complexity of f . Note that since the rank of the matrix is at most the number of rows, we have for any
f ∈ FhXi
                                         rank(M[ f , Π]) ≤ n|Y | .
    As in many papers on multilinear formulas and circuits [16, 17, 18, 19, 20, 8], we will be interested
in how close the rank of M[ f , Π] can be to this trivial upper bound.
Definition 3.5 (Relative Rank). Let f ∈ FhXi be a homogeneous polynomial of degree d. For any Y ⊆ [d],
we define the relative rank of f w. r. t. Π = (Y, Z), denoted rel-rank( f , Π), to be
                                                          rank(M[ f , Π])
                                      rel-rank( f , Π) :=                 .
                                                               n|Y |
     Clearly, rel-rank( f , Π) ∈ [0, 1] for any f and Y as above. Furthermore, note that since rank(M[ f , Π])
is also bounded by n|Z| , the number of columns in the matrix, when |Y | > d − |Y |, this measure cannot
approach 1 for any choice of f .

                       T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                   10
                         L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

                       Y:                   Z:

                            d/4                        d/2                        d/4




                      Figure 6: Example of Y for which rel-rank(PAL2d/4 , (Y, Z)) = 1.


Notation. Fix any homogeneous polynomials g, h ∈ FhXi of degrees dg and dh , respectively, and
f = g × j h, where j ∈ [0, dh ]. Let d denote deg( f ) = dg + dh and I denote [ j + 1, j + dg ].
   For any pair of subsets S, I ⊆ [d] such that S ⊆ I, we denote by Collapse(S, I) the subset of [|I|]
which contains the ranks of all elements in I which are contained in S. Formally,

                 Collapse(S, I) = { j ∈ [|I|] | S contains the j-th smallest element of I} .

    For example, if I = {i1 , . . . , i2t }, where i1 ≤ · · · ≤ i2t and S contains every other element of I, i. e.,
S = {i2 , . . . , i2t } then Collapse(S, I) = {2, 4, . . . , 2t}. For any partition Π = (Y, Z) of [d] we use Yg to
denote Collapse(Y ∩ I, I), i. e., the set of ranks of indices that g occupies in g × j h which overlap with Y .
Similarly, we use Yh to denote Collapse(Y \ I, [d] \ I), i. e., the set of ranks of indices that h occupies in
g × j h which overlap with Y . Also we denote [dg ] \Yg by Zg and [dh ] \Yh by Zh . Finally, we use Πg , Πh to
denote partitions (Yg , Zg ) and (Yh , Zh ), respectively.
    The non-skew depth of a non-commutative circuit C is the maximum number of non-skew gates on a
path from a variable to the output gate in the DAG underlying C.


4    Preliminaries
We need the following lemmas that are straightforward adaptations of previous work.

Lemma 4.1 (Homogenization Lemma [9]). Let f ∈ FhXi be a homogeneous polynomial of degree d
computed by a non-commutative circuit C of size s. Then there is a homogeneous non-commutative circuit
C0 of size at most O(sd 2 ) computing f . Moreover, if C has non-skew depth at most k, then so does C0 . In
particular, if C is a skew circuit, then so is C0 .

Lemma 4.2 (Tensor Lemma). Let g, h ∈ FhXi be homogeneous polynomials of degrees dg and dh ,
respectively, and let f = g × j h for j ∈ [0, dh ]. Let d denote deg( f ) = dg + dh . Fix any partition
Π = (Y, Z) of [d]. Then

                              rank(M[ f , Π]) = rank(M[g, Πg ]) · rank(M[h, Πh ])

where Πg , Πh are as defined in Section 3.

Proof. We observe that under a suitable labelling of the rows and columns of the matrices, the ma-
trix M[ f , Π] = M[g, Πg ] ⊗ M[h, Πh ], where ⊗ represents the standard tensor (or Kronecker) product of
matrices. This will prove the lemma.

                        T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                  11
                    N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

     Let I denote the interval [ j + 1, j + dg ].
     For each of the matrices M[ f , Π f ], M[g, Πg ] and M[h, Πh ], we have labellings from the definitions
of these matrices, i. e., the rows and columns of M[ f , Π f ] are labelled by elements of M|Y f | and M|Z f | ,
respectively; and similarly for M[g, Πg ] and M[h, Πh ]. For M[ f , Π], we note that each monomial m ∈ M|Y |
can be identified with a pair of monomials (m0 , m00 ) of degree |Yg | and |Yh |, respectively, using the map
m 7→ (mY ∩I , mY \I ); this map is a bijection and hence, we also have an alternate labelling of the rows of
M[ f , Π] by M|Yg | × M|Yh | ; similarly, we also obtain a labelling of the columns of M[ f ,Y ] by M|Zg | × M|Zh | .
Under this alternate labelling for M[ f , Π], we show that M[ f , Π] = M[g, Πg ] ⊗ M[h, Πh ].
     By the bilinearity of both the ⊗ and × j maps, it suffices to do this when g and h are both monomials.
In this case, M[g, Πg ] is a 0-1 matrix with a 1 only in the (gYg , gZg )-th entry and similarly for M[h, Πh ].
Since f is also a monomial, the matrix M[ f , Π] is also a 0-1 matrix with a 1 only in the ( fY , fZ )-th entry
according to the original labelling. Under our alternate labelling of M[ f , Π], this corresponds to the
(( fY ∩I , fY \I ), ( fZ∩I , fZ\I ))-th entry of M[ f , Π]. It can be checked from the definition of × j that

                                 fY ∩I = gYg , fZ∩I = gZg , fY \I = hYh , fZ\I = hZh .

   Thus, f has a 1 in only the ((gYg , hYh ), (gZg , hZh ))-th entry and hence, M[ f , Π] is the tensor product of
M[g, Πg ] and M[h, Πh ] as claimed. This completes the proof of the lemma.


Corollary 4.3. Assume that f ,Y, dg , dh are as in the statement of Lemma 4.2. Then

       rel-rank( f , Π) = rel-rank(g, Πg ) · rel-rank(h, Πh ) ≤ min {rel-rank(g, Πg ), rel-rank(h, Πh )} .

Moreover, we also have rank(M[g, Πg ]) ≤ n|Zg | and rank(M[h, Πh ]) ≤ n|Zh | . Hence,
                                                    n                                 o
                              rel-rank( f , Π) ≤ min n−(|Yg |−|Zg |) , n−(|Yh |−|Zh |) .


4.1    Hard polynomials
Let w = (w1 , w2 , . . . , wd ) be a string in [n]d and let wR = (wd , wd−1 , . . . , w1 ) denote the reverse of the
string. Let x̃w denote the monomial xw1 xw2 . . . xwd over the variable set X = {x1 , x2 , . . . , xn }. We consider
the n-variable palindrome polynomial, defined below.

                                           PALd (X) =       ∑ x̃w · x̃w .    R

                                                          w∈[n]d


Nisan [15] studied the palindrome polynomial for n = 2. We denote by PAL2d (X) the squared palindrome
polynomial.
                           PAL2d (X) = (PALd (X))2 =           ∑          x̃w1 · x̃wR1 · x̃w2 · x̃wR2 .
                                                           w1 ,w2 ∈[n]d



                        T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                    12
                        L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

5    Lower bound for skew circuits
In this section, we prove an exponential lower bound for skew circuits. We start by giving a decomposition
lemma for such circuits. A similar decomposition was given by Nisan [15] for non-commutative ABPs.
More recently Hrubeš et al. [9] proved a decomposition lemma for general non-commutative circuits.
Our result can be thought of as an interpolation between the decomposition for ABPs and that for general
non-commutative circuits.
     We then formally define left-right monochromatic (LRM) partitions and prove that any skew circuit
of “small” size has “small” relative rank with respect to LRM partitions. Finally, we give an explicit
polynomial which has full relative rank with respect to a suitably chosen LRM partition. This gives a
lower bound for skew circuits.
     Let us now give a decomposition lemma for skew circuits. We will prove two other decomposition
lemmas and, though they take slightly different forms, they can all be presented with similar arguments,
as ways of grouping the monomials computed by a circuit. We will use the notion of parse trees from [14]
to describe how a circuit computes monomials.
Definition 5.1. The set of parse trees of a circuit C is defined by induction on its size.
    • If C is of size 1 it has only one parse tree: itself;

    • if the output gate of C is a +-gate whose arguments are the gates α and β , the parse trees of C are
      obtained by taking either a parse tree of the subcircuit rooted at α and the arc from α to the output
      or a parse tree of the subcircuit rooted at β and the arc from β to the output;

    • if the output gate of C is a ×-gate whose arguments are the gates α and β , the parse trees of C are
      obtained by taking a parse tree of the subcircuit rooted at α, a parse tree of a disjoint copy of the
      subcircuit rooted at β , and the arcs from α and β to the output.
A parse tree T computes a polynomial val(T ) in a natural way: this is the monomial equal to the product of
the variables labeling the leaves of T (from left to right). So parse trees are in one-to-one correspondence
with the monomials computed by the circuit (before regrouping), and summing the values of the parse
trees thus yields the computed polynomial.
Lemma 5.2 (Decomposition Lemma for skew circuits). Let f ∈ FhXi be a homogeneous polynomial of
degree d ∈ N computed by a homogeneous skew circuit C of size s. Fix any d 0 ∈ [d]. Let g1 , . . . , gt (t ≤ s)
be the intermediate polynomials of degree d 0 computed by C. Then there exist homogeneous polynomials
hi, j (i ∈ [t], j ∈ [0, d − d 0 ]) of degree d − d 0 such that

                                          f=∑          ∑           gi × j hi, j .
                                              i∈[t] j∈[0,d−d 0 ]

Proof. The polynomial f is the sum of the values of all the parse trees of C. Parse trees are obtained by
starting at the root and following along exactly one argument when encountering an addition gate and
along both arguments when encountering a multiplication gate. In a skew circuit at any multiplication gate
one argument will be an input gate, so the degree decreases by at most 1 and the parse tree looks like a path
with dangling input gates on the left or on the right (we will call such a parse tree path-like). Therefore

                       T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                13
                    N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

any parse tree will reach a unique gate of degree d 0 (to get unicity if d 0 = 1, at the last multiplication gate
we choose the left argument). We will stop building our parse tree once such a gate is found and consider
the resulting partial parse tree.
    Let α1 , . . . , αt be the gates of C of degree d 0 . Consider a partial parse tree T stopping at αi . It is
possible that different partial parse trees will “stop” at αi , and each will compute a monomial of the
form L · gi · R, where L (respectively R) is the monomial obtained by the multiplications by input gates
on the left side (respectively right side) in the parse tree. We can thus partition the set of parse trees
depending on which gate of degree d 0 it stopped at. We can further partition it with regard to the degree
of L. Grouping monomials according to this partition we get the desired decomposition.

Definition 5.3. We say that a partition Π = (Y, Z) of [d] is a (d1 , d2 )-left right monochromatic partition
((d1 , d2 )-LRM) if [d1 ] ∪ [d − d2 + 1, d] ⊆ Y .

    Figure 7 gives an illustration of a (d1 , d2 )-LRM partition.

Lemma 5.4 (Main Lemma: Relative rank of skew circuits). Let f ∈ FhXi be a homogeneous polynomial
of degree d ∈ N computed by a homogeneous skew circuit C of size s. For any (d1 , d2 )-LRM partition Π
of [d] such that d1 + d2 ≤ d
                                rel-rank( f , Π) ≤ sd · n− min{d1 ,d2 } .

Proof. Assume that D = min{d1 , d2 }. Apply the Decomposition Lemma for skew circuits (Lemma 5.2)
to C with d 0 = d − D to get polynomials gi and hi, j for (i, j) ∈ [t] × [0, D] as in the statement of the lemma.
By the subadditivity of rank, we have

                                  rel-rank( f , Π) ≤         ∑            rel-rank(gi × j hi, j , Π) .           (5.1)
                                                       (i, j)∈[t]×[0,D]



                       Y:                      Y or Z:

                             d1                                                                    d2



                         j                                                                              dh − j
                                                               gi


              Figure 7: For fixed d1 , d2 , a generic positioning of gi of degree d 0 in gi × j hi, j .

    Fix any (i, j) and consider rel-rank(gi × j hi, j , Π). By Corollary 4.3, we have

                                          rel-rank(gi × hi, j , Π) ≤ n−(|Yh |−|Zh |)                             (5.2)

where Yh = Collapse(Y \ [ j + 1, j + d 0 ], [d] \ [ j + 1, j + d 0 ]) and Zh = [D] \Yh . Note, however, that since
Y contains [d1 ] ∪ [d − d2 + 1, d], we have Y \ [ j + 1, j + d 0 ] = [d] \ [ j + 1, j + d 0 ] and hence Yh = [D] and
     / Using (5.2), we see rel-rank(gi × hi, j , Π) ≤ n−D and hence by (5.1), we have the claimed upper
Zh = 0.
bound on rel-rank( f , Π).

                        T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                      14
                         L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

Theorem 1.1 [Precise version]. Any skew circuit for PAL2d/4 (X) must have size Ω̃(nd/4 ) where the Ω̃(·)
hides poly(d) factors.

Proof. Let C be any skew circuit computing PAL2d/4 (X) and let s denote its size. By Lemma 4.1, we
know that there is a homogeneous circuit of size s0 = O(sd 2 ) computing the same polynomial.
     Let Y = [d/4] ∪ [3d/4 + 1, d], Z = [d] \Y , Π = (Y, Z). Note that Π is a (d/4, d/4)-LRM partition of
[d]. Apply Lemma 5.4 to the circuit C0 with d1 = d2 = d/4. The lemma implies that

                                 rel-rank PAL2d/4 (X), Π ≤ (s0 d) · n−d/4 .
                                                         


On the other hand, it is easy to verify that M[PAL2d/4 (X), Π] is a square permutation matrix and hence
rel-rank(PAL2d/4 (X), Π) = 1, which implies the claimed lower bound on s.

Remark 5.5. It is not hard to see that the lower bound of Theorem 1.1 is close to tight, since PAL2d/4 (X)
does have a skew circuit of size O(nd/4 ).

    A similar theorem can be proved for the Lifted Identity polynomial of Hrubeš et al. [9]:

                                              LIDr =      ∑        ze ze ,
                                                       e∈{0,1}2r


where, for (e1 , . . . , e2r ) ∈ {0, 1}2r , ze = ze1 · · · ze2r . For the partition Π defined above, M[LIDr , Π] is a
square permutation matrix, since choosing the prefix and suffix of degree r defines a unique monomial
appearing in LIDr , and its relative rank is therefore 1.
    A natural generalization of the skew circuits is the class of circuits wherein each multiplication gate
has a certain bound on the degree of one of its arguments. We call such circuits δ -unbalanced. Formally,
δ -unbalanced circuits can be defined as follows.

Definition 5.6. A circuit is called δ -unbalanced if every multiplication gate has an argument of degree
at most δ .

    In the following corollary we observe that our exponential lower bound on skew circuits can also be
extended to δ -unbalanced circuits. For instance, it yields an exponential lower bound for the computation
of PAL2d/4 (X) by circuits where every multiplication gate has an input of degree at most d/5.

Corollary 5.7 (of Theorem 1.1). Any δ -unbalanced circuit for PAL2d/4 (X) must have size Ω̃(nd/4−δ +1 )
where the Ω̃(·) hides poly(d) factors.

Proof sketch. The corollary follows by the observation that any δ -unbalanced circuit can be converted
into a skew circuit with O(nδ ) loss in size. Let g = g1 × g2 be a multiplication gate where (without loss of
generality) degree of g1 is at most δ . Then one can write down g1 as a sum of monomials g1 = ∑ti=1 mi ,
where the degree of each mi is at most δ and t = O(nδ ). As g = ∑ti=1 mi × g2 , it can be computed as a
sum of t terms of the form m × g2 , where m is a monomial of degree at most δ . It is easy to see that each
m × g2 can be computed by a skew circuit (with a loss of additional O(δ )).

                        T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                     15
                      N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

6     Lower bounds for circuits with small non-skew depth
Recall that the non-skew depth of a non-commutative circuit is the maximum number of non-skew gates
on a path from a variable to the output gate in the DAG underlying the circuit. We call a gate v in C
top-most if there is a path from v to the output gate in C that does not pass through any non-skew gates
other than possibly v itself.

6.1    A decomposition lemma for circuits of non-skew depth k
Lemma 6.1 (Decomposition Lemma for non-skew circuits). Let f ∈ FhXi be a homogeneous polynomial
of degree d computed by a non-skew homogeneous circuit C of size s. Let g1 , . . . , gt (t ≤ s) be the
polynomials computed by the top-most non-skew gates in C and let di0 = deg(gi ) for i ∈ [t]. Then there
exist homogeneous polynomials hi, j (i ∈ [t], j ∈ [0, d − di0 ]) of degree d − di0 and h0 of degree d such that

                                           f=∑           ∑           gi × j hi, j + h0 .
                                                i∈[t] j∈[0,d−di0 ]

Furthermore, each hi, j and h0 can be computed by a homogeneous skew circuit of size at most sd.

Proof. For the decomposition, we will give a proof sketch in the spirit of the proof given for Lemma 5.2.
Any given parse tree of C is path-like until either it reaches exactly one of the top-most non-skew gates or
it ends with a multiplication of two input gates. Collecting the values of the parse trees in the latter case
yields the polynomial h0 , while we can as before partition the remaining parse trees depending on the
top-most gate reached and the degree of the monomial multiplied on the left.
     Let us now show that the resulting polynomials hi, j and h0 can each be computed by a homogeneous
skew circuit of size at most sd. Let α1 , . . . , αt be the top-most non-skew gates. When a parse tree does
not stop at one of these gates, it must end at a multiplication of two input gates. We will call β1 , . . . , βu
the set of these multiplication gates. We start by replacing α1 , . . . , αt by input gates labelled with new
variables y1 , . . . , yt . Setting all these variables to 0 yields a circuit for h0 .
     We set all the y1 , . . . , yt to 0 except yi , set the gates β1 , . . . , βu to 0, and delete gates taking the value 0.
This yields a skew circuit C0 computing ∑ j∈[0,d−di0 ] yi × j hi, j : since a parse tree cannot both contain an α
gate and a β gate, setting the β gates to 0 does not modify the rest of the computation. Note that, apart
from input gates which are arguments of skew multiplications, all the gates in this circuit belong to a
path from αi to the output. In particular the arguments of any addition gate belong to paths from αi to the
output.
     We now build a new circuit by replacing each gate γ on a path from αi to the output, i. e., each gate
which is not an input gate argument of a skew multiplication, by a set of “component” gates. More
precisely, we replace each such gate γ by gates γ0 , . . . , γd−di0 and we will think of the gate γk as representing
the sum of the monomials computed by γ where the degree to the left of yi is k. Thus αi is replaced by a
first gate labelled yi and d − di0 gates labelled 0, since the polynomial yi has one monomial with degree 0
on the left of yi and no monomials with another degree on the left. The circuit C0 is then modified by
induction to compute the desired values.
     Let γ be a multiplication gate with left argument δ and right (skew) argument an input gate labelled x.
Then gate γk of the new circuit computes δk × x.

                          T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                          16
                          L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

    Let γ be a multiplication gate with right argument δ and left (skew) argument an input gate labelled
with a constant c. Then gate γk of the new circuit computes c × δk .
    Let γ be a multiplication gate with right argument δ and left (skew) argument an input gate labelled
with a variable x. Then gate γk of the new circuit computes x × δk−1 .
    Finally addition gates are made component-wise. Replacing yi by 1, the j-th component of the output
gate computes hi, j . The size of the original circuit, s, has been multiplied by at most d − d 0 + 1, for a total
size at most sd.

6.2     More partitions with respect to which small skew circuits are low rank
For any n ∈ N+ and θ ∈ R, we use expn (θ ) to denote nθ .

Definition 6.2. We say that a partition Π = (Y, Z) of [d] is a (d1 , d2 , `1 , `2 )-extended left right monochro-
matic ((d1 , d2 , `1 , `2 )-XLRM) partition if [d1 + `1 ] ∪ [d − d2 − `2 + 1, d − `2 ] ⊆ Y .

      Given below is an example of a (d1 , d2 , `1 , `2 )-XLRM partition.

               Y:                Y or Z:

                     `1           d1                                           d2              `2




                      Figure 8: Extended left-right monochromatic (XLRM) partitions.


Lemma 6.3 (Generalization of Lemma 5.4). Let f ∈ FhXi be a homogeneous polynomial of degree d ∈ N
computed by a homogeneous skew circuit C of size s. Let Π = (Y, Z) be a (d1 , d2 , `1 , `2 )-XLRM partition,
where d1 , d2 , `1 , `2 are non-negative integers with 4 | d1 and 4 | d2 , `2 ≤ `1 , and d ≥ d1 + d2 + `1 + `2 .
Then                                                                               
                                                   `
                                              2+O( D2 )                         d1 D
                       rel-rank( f , Π) ≤ (sd)          · expn −Ω min d1 , d2 ,            ,
                                                                                 `2
where D denotes min{d1 , d2 }.

   We will only apply the above lemma when d2 = Θ(d1 ) and `2 = O(d1 ), in which case the upper
bound on rel-rank( f , Π) is
                                   (sd)O(1) · expn (−Ω(d1 )) .
     The idea of the proof is simple. When `2 = 0, we have a (d1 , d2 )-LRM partition and we are done. If
that is not the case, we use induction on `2 . We first apply Lemma 5.2 to decompose f as a sum of a
small number of polynomials of the form g × j h where g has degree roughly d − (D/2): if the partition
corresponding to h takes (roughly) as large a chunk out of the `1 length initial segment as it takes out of
the final `2 length segment, we can use the induction hypothesis and we are done; otherwise, the partition
corresponding to h has many more elements of Y than Z and we are done since the relative rank of h
w. r. t. this partition is small.

                          T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                17
                    N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

Proof. We start with defining some parameters. Let ∆ = D/2 and
                                                     
                                                  d1 ∆ ∆
                                      r = min          ,    .
                                                  8`2    2

Also, let δ = (∆ − r)/2. Note that ∆/4 ≤ δ ≤ ∆/2.
    We prove a more general statement that is amenable to induction. For any integer i ≥ 0, we show that
for d1 , d2 , `1 , `2 as in the statement of the lemma additionally satisfying `2 ≤ iδ and i ≤ d1 /2r, then the
maximum possible relative rank of f w. r. t. Π, which we denote by ρ(`1 , `2 , d1 , d2 ), can be bounded by

                                        ρ(d1 , d2 , `1 , `2 ) ≤ (sd)1+i · n−r .                              (6.1)

    We will prove the above by induction on i. First, we note that it implies the lemma. For i = d4`2 /∆e,
we have both `2 ≤ iδ (using δ ≥ ∆/4) and also i ≤ d1 /2r by choice of r. Hence, (6.1) implies the
statement of the lemma.
    The base case is i = 0, which corresponds to `2 = 0 and follows directly from Lemma 5.4 since the
partition Π is (d1 , d2 )-LRM.
    For the inductive case, consider any i ≥ 1. We apply Lemma 5.2 to the circuit C with d 0 = ∆. For
some t ≤ s, we obtain
                                         f = ∑ ∑ gi × j hi, j                                       (6.2)
                                                   i∈[t] j∈[∆]

where g1 , . . . , gt are the intermediate polynomials of degree d − ∆ computed by C (and hence themselves
are computed by skew circuits of size at most s).
    We have rel-rank( f , Π) ≤ ∑i, j rel-rank(gi × j hi, j , Π) by the subadditivity of relative rank and hence it
suffices to bound each rel-rank(gi × j hi, j , Π). We analyze this term in two ways depending on j.
    The easier case is when j ≥ ∆ − j + r. In this case, it can be seen that the partition Πh = (Yh , Zh )
corresponding to hi, j (i. e., Yh = Collapse(Y \ [ j + 1, j + d − ∆], [d] \ [ j + 1, j + d − ∆]) and Zh = [∆] \Yh )
satisfies |Yh | − |Zh | ≥ r and hence by Corollary 4.3, we have

                               rel-rank(gi × hi, j , Π) ≤ rel-rank(hi, j , Πh ) ≤ n−r

for each such j.
    Now we consider the case when j ≤ ∆ − j + r. Note that in this case, we have j ≤ (∆ + r)/2 and
hence ∆ − j ≥ (∆ − r)/2 = δ . For each such j, we see that the partition Πg corresponding to gi (i. e.,
Yg = Collapse(Y ∩[ j +1, j +d −∆], [ j +1, j +d −∆]) and Zg = [d −∆]\Yg ) satisfies one of the following
two conditions.

    • if ∆ − j ≤ `2 , then Πg is (d1 , d2 , `1 − j, `2 − (∆ − j))-XLRM, which can also be seen to be (d1 −
      ( j − (∆ − j)), d2 , `1 − (∆ − j), `2 − (∆ − j))-XLRM by “moving” some of the degree from the “d1
      part” to the “`1 part.” As noted above, we have ∆ − j ≥ δ and hence `2 − (∆ − j) ≤ iδ − δ =
      (i − 1)δ . Also, as j ≥ (∆ − j) + r, we have d1 − ( j − (∆ − j)) ≥ d1 − r. Since i ≤ d1 /2r, we obtain
      i − 1 ≤ (d1 − r)/2r and the induction hypothesis along with Corollary 4.3 can be applied to yield

                            rel-rank(gi × hi, j , Π) ≤ rel-rank(gi , Πg ) ≤ (sd)1+(i−1) · n−r .


                        T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                   18
                          L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

      • if ∆ − j > `2 7 , then Πg is always (d1 − j, d2 − (∆ − j))-LRM, which in particular is (d1 /2, d2 /2)-
        LRM. In this case, by Corollary 4.3 and Lemma 5.4, we immediately get

                          rel-rank(gi × hi, j , Π) ≤ rel-rank(gi , Πg ) ≤ n− min{d1 ,d2 }/2 ≤ n−r .

      Hence, for each i, j, we have shown

                                    rel-rank(gi × hi, j , Π) ≤ (sd)1+(i−1) · n−r .

Putting this together with (6.2) and the subadditivity of relative rank, we obtain the inductive statement
(6.1).

6.3     The candidate hard partition for circuits of non-skew depth at most k
Throughout, let d0 ∈ N+ be a fixed parameter.
    Let d ∈ N. Given an (ordered) partition Π = (Y, Z) of [d], we define the signature of Π to be the
sequence sgn(Π) = σ = (i1 , i2 , . . . , i p ) of non-negative integers such that the first i1 elements of [d] belong
to Y , the next i2 elements belong to Z, the next i3 again to Y , and so on. Formally,
                                                    [h                   i
                                               Y=        ∑ i j + 1, ∑ i j .
                                                 q odd j<q         j≤q


We denote by |σ | the quantity ∑q≤p iq = d and use |σ |0 to denote p.
   Given two signatures σ1 ∈ Nn and σ2 ∈ Nm , we use σ1 ◦ σ2 ∈ Nm+n to denote their concatenation.
We also use σ1r to denote the r-fold repetition of σ1 .
   Given a signature σ = (i1 , . . . , i p ), we say that a signature τ is a prefix of σ if τ = (i01 , . . . , i0q ) for
q ≤ p, where i0j = i j for j < q and i0q ≤ iq .
   Clearly, we may define a partition Π of [d] using its signature. For any k ∈ N, we now define a partition
Πk = (Yk , Zk ) of [d] (for suitable d) such that small circuits of non-skew depth at most k computing a
homogeneous polynomial of degree d have low rank w. r. t. Πk .
   Fix any k ∈ N and let Dk = 8d0 + 12d0 k. We define the partition Πk = (Yk , Zk ) of [Dk ] so that

                                  sgn(Πk ) = (3(k + 1)d0 , 2d0 ) ◦ (d0 , 2d0 )1+3k .

Note that |Yk | = |Zk | = Dk /2. Figure 9 illustrates the partition Π0 and also the relation between the
partitions Πk and Πk−1 , which will be important in our lower bound.
    We will later show that small circuits of non-skew depth at most k computing a homogeneous
polynomial of degree Dk cannot compute a polynomial that has high relative rank w. r. t. Πk . In the
remainder of this section, we show that there are small circuits of non-skew depth O(k) (in fact, circuits
using only O(k) many non-skew gates) that can compute a homogeneous polynomial fk of degree Dk that
has full rank w. r. t. Πk . The basic “gadget” in this construction is the palindrome polynomial, and the
construction of fk involves “wrapping” a copy of PALDk /4 (X) around O(k) copies of PALd0 (X).
   7 This can only happen when ` ≤ ∆.
                                2



                         T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                      19
                    N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

               Y:                          Z:

                                                     3d0           2d0     d0       2d0



                3d0                                                                   d0    2d0

                                                Πk−1

              Figure 9: The partition Π0 (above) and constructing Πk from Πk−1 (below).

Lemma 6.4. Fix any positive integers k, d0 and let Dk be as above. Then there is a homogeneous
polynomial fk ∈ FhXi of degree Dk that is computable by a non-commutative arithmetic circuit of size
O(nDk ) with O(k) many non-skew gates and such that rel-rank( fk , Πk ) = 1.
Proof. We define the polynomials fk inductively. For k = 0, we define

                                 f0 := (PAL2d0 (X) · PALd0 (X)) ×d0 PALd0 (X) .

In the notation of Section 4.1, we can write f0 as

                         f0 =           ∑               x̃w1 · x̃w2 · x̃w3 · x̃wR3 · x̃wR2 · x̃w4 · x̃wR4 · x̃wR1 .
                                w1 ,w2 ,w3 ,w4 ∈[n]d0

Figure 10 illustrates the positioning of the segments of the monomial corresponding to w1 , w2 , w3 , and w4
w. r. t. the partition Π0 .


                                                w1 w2 w3 wR3 wR2 w4 wR4 wR1


                                                 fk−1
              w1 w2 w3                                                                w4 wR4 wR3 w5 wR5 wR2 w6 wR6 wR1


           Figure 10: The construction of polynomials f0 (above) and fk from fk−1 (below).

    We observe that f0 can be computed by a homogeneous non-commutative arithmetic circuit of size
O(nD0 ) = O(nd0 ) with exactly one non-skew gate. To see this, note that g0 := (PAL2d0 (X) · PALd0 (X))
can be computed by first computing each of the terms of the product using homogeneous skew circuits of
size O(nd0 ) and then multiplying them using exactly one non-skew gate. We can then compute f0 by
using g0 and only homogeneous skew multiplication gates by using the following inductive definitions.
                                                        (0)
                                                   g0 := g0 ,
                                                                   n
                                                  (i+1)                       (i)
                                                g0            := ∑ x j · g0 · x j .
                                                                  j=1



                       T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                             20
                               L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

                             (d )                                                           (i+1)            (i)
    The polynomial g0 0 is exactly f0 . Note that computing g0 from g0 requires only O(n) additional
gates. Thus, the size of the circuit computing f0 is O(nd0 ).
    For k > 0, we define the polynomial fk inductively as follows. The construction is illustrated in
Figure 10.
          fk :=               ∑                   (x̃w1 x̃w2 x̃w3 ) · fk−1 · (x̃w4 x̃wR4 ) · x̃wR3 · (x̃w5 x̃wR ) · x̃wR2 · (x̃w6 x̃wR ) · x̃wR1 .
                                                                                                               5                      6
                  w1 ,w2 ,w3 ,w4 ,w5 ,w6 ∈[n]d0

    It can be easily checked that the matrix M[ fk , Πk ] is an nDk /2 × nDk /2 permutation matrix and hence
rel-rank( fk , Πk ) = 1.
    We need to check that fk defined as above has a small non-commutative circuit with O(k) many
non-skew gates. For k ≥ 1, we define
                                              hk := ( fk−1 · PALd0 (X)) ×d0 PALd0 (X) ,
                                              gk := (hk · PALd0 (X)) ×d0 PALd0 (X) .
Note that
                                                   fk = (gk · PALd0 (X)) ×d0 PALd0 (X) .
    The circuit for hk is obtained from the circuit for fk−1 in a manner similar to the construction of the
circuit for f0 , and similarly, we can obtain a circuit for gk and then a circuit for fk . We omit the details. It
is easy to check that only 3 additional non-skew multiplication gates are used by the above procedure and
hence the number of non-skew gates used overall is O(k).

6.4     The lower bound for circuits of non-skew depth k
In this section, we show that small non-commutative circuits of non-skew depth k computing a homo-
geneous polynomial of degree Dk cannot compute a polynomial that has high relative rank w. r. t. Πk .
Throughout, let d0 ∈ N be a fixed parameter.
    For ` ∈ N+ , we say that a pair (g, Π) is `-good if g ∈ FhXi is a homogeneous polynomial with
deg(g) = D ≥ D` and Π = (Y, Z) is a partition of [D] such that sgn(Π) = (a, 2d0 ) ◦ (d0 , 2d0 )1+3`+r ◦ (b, c)
where
      • a ≥ 3(` + 1)d0 , r ≥ 0, and
      • either c = 0 and b ∈ [d0 ] or b = d0 and c ∈ [2d0 − 1].
Intuitively, the (g, Π) being `-good means that D ≥ D` and Π “contains” a copy of Π` as a sub-segment
and Π is furthermore similarly contained in Π`0 for some `0 ≥ `. See Figure 11, where the top partition
corresponds to the case c = 0 and the bottom one to the case b = d0 as mentioned above.
    The main lemma is the following.
Lemma 6.5 (Main Lemma for circuits of non-skew depth k). Assume k, d0 ∈ N such that 64 | d0 . Let
f ∈ FhXi be any homogeneous polynomial of degree Dk computed by a non-commutative circuit C of size
at most s with non-skew depth at most k and let Πk = (Yk , Zk ) be the partition defined above. Then
                                                  rel-rank( f , Πk ) ≤ (sDk )O(1) · n−Ω(d0 ) .

                              T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                                                  21
                     N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

                                         Π`                                     ···

                                         Π`                                     ···
                                         D`


                               Figure 11: Partitions that arise in `-good pairs.

    The basic idea of the proof is to repeatedly use Lemma 6.1 to decompose the polynomial f as a
sum of polynomials computed by circuits with smaller non-skew depth. When we apply Lemma 6.1,
we repeatedly obtain polynomials of the form g × j h where g and h are homogeneous polynomials of
degrees dg and Dk − dg , respectively, and j ∈ [0, Dk − dg ]. Given a polynomial g ∈ FhXi, j ∈ [0, Dk − dg ],
and ` ∈ [0, k], we say that the pair (g, j) is `-admissible if the pair (g, Πg ) is `-good, where Πg = (Yg , Zg )
for Yg := Collapse(Yk ∩ [ j + 1, j + dg ], [ j + 1, j + dg ]) and Zg := [dg ] \Yg . See Figure 12.

                                                        Πk


                                                   Π`                                     ···
                 j
                                                   g


                             Figure 12: Example of an `-admissible pair (g, j).


Proof. First let us introduce some notation. Let the non-skew depth of a node v of C be the maximum
number of non-skew gates on any path from a leaf to v. For ` ∈ [k], let G` (resp. G=` ) be the set of all
polynomials computed by gates in the circuit that have non-skew depth at most ` (resp. exactly `); note
that |G=` | ≤ |G` | ≤ s. We also denote by A` the set {(g, j) | g ∈ G` and (g, j) is `-admissible}. Finally,
we define V` by
               n                                                                                o
                                g    g
         V` =       ∑    g ×  H
                             j j   H j ∈ FhXi homogeneous      of degree exactly (D  k − deg(g))  .
                 (g, j)∈A`

Note that V` ⊆ FhXi is a vector space over F.
   Our proof proceeds in two steps.
   1. We first show that for each ` ∈ [0, k], the polynomial f can be decomposed as f = p` + e` where
      p` ∈ V` and e` is such that rel-rank(e` , Πk ) is small. The proof is by downward induction on `.
   2. We then show that rel-rank(p0 , Πk ) is small for each p0 ∈ V0 . Along with the above decomposition,
      this will finish the proof.
   We start with 1. above. Formally, we prove that there are absolute constants α, β > 0 such that for
each ` ∈ [0, k], the polynomial f can be written as

                                                  f = p` + e`                                              (6.3)

                         T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                22
                        L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

where p` ∈ V` and e` ∈ FhXi is homogeneous of degree D0 and satisfies

                                 rel-rank(e` , Πk ) ≤ (sDk )α · (k − `) · n−β d0 .                                 (6.4)

    The proof is by downward induction on `. We will choose α, β so that they satisfy some constraints
that come up during the course of the proof. The base case when ` = k is trivial, since we can choose
pk = f ∈ Vk and ek to be the zero polynomial. Both (6.3) and (6.4) are thus satisfied for any choice of
α, β .
    Now for the induction case. Say that ` ∈ [0, k − 1]. By the induction hypothesis we have f =
p`+1 + e`+1 , where p`+1 ∈ V`+1 and

                             rel-rank(e`+1 , Πk ) ≤ (sDk )α · (k − ` − 1) · n−β d0 .

By the definition of V`+1 , we know that

                      p`+1 =      ∑          g × j H gj =       ∑          g × j H gj +        ∑ g × j H gj        (6.5)
                               (g, j)∈A`+1                  (g, j)∈A0`+1                  (g, j)∈A`
                                                                                          |           {z       }
                                                                                                   p0`+1 ∈V`


where A0`+1 := A`+1 \ A` = {(g, j) | (g, j) is ` + 1-admissible and g ∈ G=`+1 }. (Here, we have used the
fact that if (g, j) is (` + 1)-admissible and g ∈ G` , then (g, j) is also `-admissible.)
     As noted above, the terms corresponding to (g, j) ∈ A` already sum to a polynomial p0`+1 ∈ V` . To
prove the induction statement (6.3), it therefore suffices to decompose each polynomial g × j H gj where
(g, j) ∈ A0`+1 . To do this, we need the following claim, whose proof is deferred.
Claim 6.6. Fix any ` ∈ [k]. Also fix any g ∈ G=` of degree dg ∈ [D` , Dk ], any homogeneous polynomial
H ∈ FhXi of degree Dk − dg , and j such that (g, j) is `-admissible. Then the polynomial g × j H can be
decomposed as
                                             g×j H = p+e
where p ∈ V`−1 and e ∈ FhXi is homogeneous of degree Dk and satisfies

                                    rel-rank(e, Πk ) ≤ (sDk )O(1) · n−Ω(d0 ) .

    Applying the above claim (with ` replaced by ` + 1) to each pair (g, j) ∈ A0`+1 from the right hand
side of (6.5), we obtain for each such (g, j) that

                                                  g × j H gj = pgj + egj

where pgj ∈ V` and rel-rank(egj , Πk ) ≤ (sDk )α1 · n−β1 d0 for suitably large α1 > 0 and small β1 > 0. Substi-
tuting in (6.5), we get

                                  p`+1 = p0`+1 +             ∑        pgj +       ∑           egj .
                                                       (g, j)∈A0`+1           (g, j)∈A0`+1
                                              |         {z             } |          {z         }
                                                        p`                          e0`



                       T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                         23
                    N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

    Note that p` ∈ V` (since V` is a vector space). Also, as |A0`+1 | ≤ (sDk ), we have

                           rel-rank(e0` , Πk ) ≤ (sDk )α1 +1 · n−β1 d0 ≤ (sDk )α · n−β d0

for α ≥ α1 + 1 and β ≤ β1 .
    Setting p` as above and e` = e`+1 + e0` , we have the required decomposition. The inequality (6.4)
follows since rel-rank(e` , Πk ) ≤ rel-rank(e`+1 , Πk ) + rel-rank(e0` , Πk ). This finishes the proof of the
induction.
    Thus, for ` = 0, we have
                                               f = p0 + e0
for some p0 ∈ V0 and rel-rank(e0 , Π0 ) ≤ k · (sDk )α · n−β d0 ≤ (sDk )α+1 · n−β d0 . To bound rel-rank( f , Πk ),
we only need to bound rel-rank(p0 , Πk ). Since p0 ∈ V0 , we have

                                               p0 =     ∑ g × j H gj .                                      (6.6)
                                                      (g, j)∈A0

    To analyze rel-rank(p0 , Πk ), we will need the following claim, the proof of which is also deferred.
Claim 6.7. Assume that h ∈ FhXi of degree dh ∈ [D0 , Dk ] is computed by a homogeneous skew circuit of
size s1 .
(a) Let Πh = (Yh , Zh ) be any partition of [dh ] such that (h, Πh ) is 0-good. Then

                                      rel-rank(h, Πh ) ≤ (s1 Dk )O(1) · n−Ω(d0 ) .

(b) Let H ∈ FhXi be a homogeneous polynomial of degree dH = Dk − dh . Given j ∈ [0, dH ] is such that
    (h, j) is 0-admissible, we have

                                   rel-rank(h × j H, Πk ) ≤ (s1 Dk )O(1) · n−Ω(d0 ) .

    Fix (g, j) ∈ A0 and consider the polynomial g × j H gj in the right hand side of (6.6). By Claim 6.7 and
using the fact that g is computable by a skew circuit of size at most s, we know that

                                 rel-rank(g × j H gj , Πk ) ≤ (sDk )O(1) · n−Ω(d0 ) .

Thus, we have

                     rel-rank( f , Πk ) ≤ rel-rank(p0 , Πk ) + rel-rank(e0 , Πk )
                                      ≤     ∑ rel-rank(g × j H gj , Πk ) + rel-rank(e0 , Πk )
                                          (g, j)∈A0

                                      ≤ (sDk )O(1) · n−Ω(d0 )

which finishes the proof of the lemma.

    It remains to prove the two claims used in the proof of Lemma 6.5. We prove Claim 6.7 first and then
Claim 6.6.

                        T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                  24
                         L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

Proof of Claim 6.7. We first prove Part (a) of the claim. Since (h, Πh ) is 0-good, we have sgn(Πh ) =
(a, 2d0 ) ◦ (d0 , 2d0 )1+r ◦ (b, c), for a ≥ 3d0 , r ≥ 0 and b, c such that either c = 0 and b ∈ [d0 ] or b = d0 and
c ∈ [2d0 − 1].
     We need to show that
                                        rel-rank(h, Πh ) ≤ (s1 Dk )O(1) · n−Ω(d0 ) ,                           (6.7)
    We divide the analysis into the following cases (see also Figure 13).

                                 a ≥ 3d0                                        b ≥ d0 /2

                                                                   ···

                                                                               b < d0 /2

                                                                   ···
                             `1     d1                                   d2    `2
                                                                                            c

                                                                   ···
                            `1     d1                                               d2     `2


                                           Figure 13: Cases from Claim 6.7.


    • c = 0 and b ≥ d0 /2: In this case, we can apply Lemma 5.4 with d1 = 3d0 and d2 = d0 /2 to get
      (6.7).

    • c = 0 and b < d0 /2: In this case, we apply Lemma 6.3 with d1 = d0 /2, d2 = d0 , `1 = 5d0 /2, and
      `2 = b + 2d0 < 5d0 /2. Note that

                 Y ⊇ [3d0 ] ∪ [d − b − 3d0 + 1, d − b − 2d0 ] = [d1 + `1 ] ∪ [d − d2 − `2 + 1, d − `2 ]

       and hence Lemma 6.3 implies (6.7).

    • b = d0 and c > 0: We apply Lemma 6.3 with parameters d1 = d2 = d0 , `1 = 2d0 , and `2 = c < 2d0 ,
      which gives (6.7).

    Part (b) of the claim follows from Part (a) as follows. Let

     Yh := Collapse(Yk ∩ [ j + 1, j + dh ], [ j + 1, j + dh ]) ,     Zh := [dh ] \Yh ,      and   Πh := (Yh , Zh ) .

Since (h, j) is 0-admissible, we know that (h, Πh ) is 0-good. By Corollary 4.3, we have

                       rel-rank(h × j H, Πk ) ≤ rel-rank(h, Πh ) ≤ (s1 Dk )O(1) · n−Ω(d0 )

where the last inequality follows from Part (a).

                        T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                          25
                     N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

                                                       Π`                                             ···


                                             Π`−1

                  3d0


           Figure 14: The partition Πg (above) and the relation between Π` and Π`−1 (below).

Proof of Claim 6.6. Let Yg := Collapse(Yk ∩ [ j + 1, j + dg ], [ j + 1, j + dg ]). Also define Zg := [dg ] \Yg
and Πg := (Yg , Zg ). Since (g, j) is `-admissible, we know that (g, Πg ) is `-good.
    To do this, consider the subcircuit Cg of C that computes g. Since g is at non-skew depth `, we may
assume that Cg has non-skew depth ` also by removing gates at larger non-skew depths. Recall that C and
hence Cg has size at most s.
    By applying Lemma 6.1 to the polynomial g, we can see that
                                        g= ∑           ∑           gi ×m hi,m + h0
                                             i∈[t] m∈[0,dg −di ]

where g1 , . . . , gt are the polynomials computed by the top-most non-skew gates in Cg and di = deg(gi ).
Further, each of the hi,m and h0 have skew circuits of size at most sdg ≤ sDk . Thus, we have
                                  g × j H = ∑ ∑(gi ×m hi,m ) × j H + h0 × j H .                                 (6.8)
                                              i j,m

     We argue that polynomial on the right hand side of (6.8) either belongs to V`−1 or has relative rank at
most (sDk )O(1) · n−Ω(d0 ) w. r. t. Yk . Since V`−1 is a vector space and rel-rank(·, Πk ) is subadditive, this will
complete the proof.
     First we consider the polynomial h0 × j H. Note that (h0 , j) is `-admissible (since (g, j) is) and hence
it is also 0-admissible. Moreover, h0 is computable by a skew circuit of size at most sDk . Hence, by
Claim 6.7, we have
                                   rel-rank(h0 × j H, Πk ) ≤ (sDk )O(1) · n−Ω(d0 ) ,                           (6.9)
which completes the analysis of this term.
     Now consider any polynomial qi,m := (gi ×m hi,m ) × j H appearing in (6.8). For notational simplic-
ity, we let dg0 := di = deg(gi ) and dh0 := deg(hi,m ) = dg − di . We will show that either qi,m ∈ V`−1 or
rel-rank(qi,m , Πk ) is small; to prove the latter, we will use the following inequalities which follow from
Lemma 4.2 and Corollary 4.3:
     rel-rank(qi,m , Πk ) ≤ rel-rank(gi ×m hi,m , Πg ) ≤ min{rel-rank(gi , Π0g ), rel-rank(hi,m , Π0h )}
                                                                            0   0        0    0                (6.10)
                                                          ≤ min{n−(|Yg |−|Zg |) , n−(|Yh |−|Zh |) }
where Π0g = (Yg0 , Zg0 ) and Π0h = (Yh0 , Zh0 ) are the natural restrictions of Πg to gi and hi,m , respectively. That
is,
                         Yg0 := Collapse(Yg ∩ [m + 1, m + dg0 ], [m + 1, m + dg0 ]) ,
                         Yh0 := Collapse(Yg \ [m + 1, m + dg0 ], [dg ] \ [m + 1, m + dg0 ]) ,


                         T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                     26
                       L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

and Zg0 and Zh0 denote [di ] \Yh0 and [di,m ] \ Zh0 , respectively.
   Since (g, Πg ) is `-good, we know that dg ≥ D` and, furthermore, we have

                               sgn(Πg ) = (a, 2d0 ) ◦ (d0 , 2d0 )1+3`+r ◦ (b, c)

where a ≥ 3(` + 1)d0 , r ≥ 0 and b, c such that either c = 0 and b ∈ [d0 ] or b = d0 and c ∈ [2d0 − 1].
   The upper bound on rel-rank(qi,m , Πk ) is based on a case analysis. We refer the reader to the
accompanying figures for an intuitive description of each case.

   1. m < 5d0 /2 and dg − m − dg0 < b + c + 3rd0 + 9d0 : In this case

                                   sgn(Π0g ) = (ag , 2d0 ) ◦ (d0 , 2d0 )1+3(`−1) ◦ σ ,

      where ag ≥ (3` + 1/2)d0 and σ is some signature: in particular, dg0 ≥ D`−1 + d0 /2. In what follows,
      we will argue that either gi has low relative rank w. r. t. Π0g or qi,m ∈ V`−1 .
      Since gi is computed by a top-most non-skew gate in the circuit Cg , we can write gi = gi,1 · gi,2
      where gi,1 and gi,2 are homogeneous polynomials computed by homogeneous circuits of size at
      most s and non-skew depth at most ` − 1. Let e1 and e2 = dg0 − e1 denote the degrees of gi,1 and
      gi,2 , respectively. Let Π0g,1 = (Yg,1
                                         0 , Z 0 ) and Π0 = (Y 0 , Z 0 ) be the induced partitions on g and
                                              g,1       g,2   g,2 g,2                                  i,1
      gi,2 , respectively, i. e.,
            0
           Yg,1 = Collapse(Yg0 ∩ [e1 ], [e1 ]) and Yg,2
                                                    0
                                                        = Collapse(Yg0 ∩ [e1 + 1, dg0 ], [e1 + 1, dg ]) .

      Our analysis is further divided into two cases depending on e1 .

        (i) e1 < d0 /2: In this case, we see that e2 = dg0 − e1 ≥ D`−1 and also

                                  sgn(Π0g,2 ) = (ag − e1 , 2d0 ) ◦ (d0 , 2d0 )1+3(`−1) ◦ σ .

           Hence, (gi,2 , Π0g,2 ) is (` − 1)-good. Thus, the polynomial qi,m (which by Fact 3.3 can be
                                                                                       0 and some j )
           written as gi,2 × j2 H2 for some homogeneous polynomial H2 of degree Dk − dg,2            2
           belongs to V`−1 and hence we are done.
       (ii) e1 ≥ d0 /2: If
                                     sgn(Π0g,1 ) = (ag , 2d0 ) ◦ (d0 , 2d0 )1+3(`−1) ◦ σ 0
           for some signature σ 0 , then as in the previous case, we have qi,m = gi,1 × j1 H1 for some
           suitable H1 and j1 , and hence qi,m ∈ V`−1 .
           Otherwise, we can use the fact that sgn(Π0g,1 ) must be a prefix of (ag , 2d0 ) ◦ (d0 , 2d0 )1+3(k−1)
           and using the fact that |sgn(Π0g,1 )| = e1 ≥ d0 /2, we see that |Yg,1   0 |−|Z 0 | ≥ d /2 and therefore,
                                                                                           g,1   0
           we have
                                                                   0       0
                                    rel-rank(gi,1 , Π0g,1 ) ≤ n−(|Yg,1 |−|Zg,1 |) ≤ n−Ω(d0 ) .
           By Lemma 4.2, the same bound holds for rel-rank(qi,m , Πk ) as well.


                      T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                     27
                   N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

            m < 5d0 /2                                  gi                                dg − m − dg0

                                                  Π`−1                                   ...
                         gi,1                                gi,2



                                                  Π`−1                                   ...
                                                 gi,1                            gi,2



                                                  Π`−1                                   ...
                                       gi,1                              gi,2


Figure 15: The subcases in Case 1: The first figure represents Case 1(i), and the second and third represent
Case 1(ii).

   2. m < 5d0 /2 but dg − m − dg0 ≥ b + c + 3rd0 + 3d0 : In this case, it can be checked that sgn(Πg ) is
      a prefix of (ag , 2d0 ) ◦ (d0 , 2d0 )1+3(`−1) for some ag ≥ (3` + 1/2)d0 . We analyze in two different
      ways depending on whether dg0 is reasonably large or not.
        (i) dg0 ≥ d0 /2: In this case, it follows that no matter what exactly sgn(Πg ) is, we will always
            have |Yg0 | − |Zg0 | ≥ d0 /2 and hence by (6.10), we have

                                              rel-rank(gi ×m hi,m , Π0g ) ≤ n−Ω(d0 ) .

       (ii) dg0 < d0 /2: In this case, it can be checked that dh0 ≥ D`−1 and (h, sgn(Π0h )) is (` − 1, dh0 )-good
            and hence also (0, dh0 )-good. Thus, we have
              rel-rank(qi,m , Πk ) = rel-rank((gi ×m hi,m ) × j H, Πk ) = rel-rank(gi × j+m (hi,m × j H), Πk )
                                  ≤ rel-rank(hi,m , Π0h ) ≤ (sD)O(1) · n−Ω(d0 )
            where the second equality uses Fact 3.3, the first inequality uses two applications of Corol-
            lary 4.3, and the last inequality follows from Part 1 of Claim 6.7.

   3. m ∈ [5d0 /2, a]: In this case, we can show that
                                     rel-rank(hi,m , Π0h ) ≤ (sDk )O(1) · n−Ω(d0 ) .
      By (6.10), the same upper bound holds for rel-rank(qi,m , Πk ).
      Instead of going through the explicit case analysis, we refer the reader to Figure 17 for the
      various cases that can occur. It can be checked that in each of these, the resulting partition Π0h is
      (d1 , d2 , `1 , `2 )-XLRM, where d1 = d2 = d0 /8, `1 = (5/2)d0 and `2 ≤ (2 + (1/8))d0 ≤ `1 (`2 can
      possibly be chosen to be 0 as in the second figure). By Lemma 6.3, this shows what we wanted to
      prove.

                         T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                28
                        L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

               m            gi                                 dg − m − dg0

                                             Π`−1                                      ...


               m       gi

                                             Π`−1                                      ...


               Figure 16: The subcases in Case 2: Case 2(i) above and Case 2(ii) below.

              m                              gi

                                                                ...

                   a                                                                b ≤ d0 /8
              m                              gi

                                                                ...

                   a                                                                b > d0 /8
              m                              gi

                                                                ...

                   a                                                                         c ≤ 2d0


                            Figure 17: The subcases that can occur in Case 3.

   4. m > a: Here again we can show that

                                    rel-rank(hi,m , Π0h ) ≤ (sDk )O(1) · n−Ω(d0 )

      by noting that irrespective of the placing of gi , the partition Π0h is (d1 , d2 , `1 , `2 )-XLRM for
      d1 = d2 = d0 /2, `1 = 5d0 and some `2 ≤ (4 + (1/2))d0 ≤ `1 and using Lemma 6.3. See Figure 18.


   The main lower bound for non-commutative circuits of small non-skew depth follows.
Theorem 1.2 (Precise version). Let k, d ∈ N be any parameters such that 64(8 + 12k) | d. There is a
homogeneous polynomial f ∈ FhXi of degree d such that f is computable by a homogeneous circuit of
size O(nd) with O(k) non-skew gates but any non-commutative circuit of skew depth at most k computing
f must have size at least Ω̃(nΩ(d/k) ), where the Ω̃(·) hides poly(d) factors.
Proof. We let f = fk as defined above with d0 := d/(8 + 12k) (and hence deg( fk ) = Dk = d). By
Lemma 6.4, we know that f is computable by a homogeneous circuit of size O(nd) with O(k) non-skew
gates. Moreover, rel-rank( f , Πk ) = 1, where Πk = (Yk , Zk ) is the partition defined in Section 6.3.

                       T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                            29
                   N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

                                                Π`−1
                   ≥ 3d0            ≥ 3d0

                                                                ...                       ...
                            a

                                m                                 gi


                                               Figure 18: Case 4.


    Let C be any non-commutative circuit of non-skew depth at most k computing f and let s denote the
size of C. By Lemma 4.1, we know that there is also a homogeneous circuit C0 of non-skew depth at most
k and size at most sd O(1) computing f . Thus, Lemma 6.5 implies that

                           rel-rank( f , Πk ) ≤ (sd)O(1) n−Ω(d0 ) = (sd)O(1) n−Ω(d/k) .

As rel-rank( f , Πk ) = 1, we have the required lower bound on s.

Remark 6.8. The divisibility constraints on the degree in the statement of Theorem 1.2 can easily be
removed at the expense of additional constant factors in the exponent in the lower bound. For example if
the degree d does not have the required form, then we can find the largest d1 ≤ d of the required form
and consider the polynomial F = fk · zd−d1 where z is a new variable and fk is the hard polynomial of
degree d1 as defined above. If F has a circuit of non-skew depth k of size s, then so does fk , which yields
s ≥ nΩ(d1 /k) . Since d1 = Ω(d), this yields an nΩ(d/k) bound.


7    Lower bound for the determinant and permanent
Nisan’s lower bounds from [15] held not only for the palindrome polynomial seen above, but also for the
permanent and the determinant polynomials, because it is easy to see that their partial derivative matrices
have high rank. In our case, we could also try to study the rank of the permanent or the determinant,
using our version of the partial derivative matrix. However it is simpler to use the fact that the permanent
and determinant can easily express the palindrome polynomial.
    Recall that the non-commutative (Cayley) determinant and permanent of an n × n matrix of variables
X = (Xi, j )i, j∈[n] are defined as follows.

     det(X) = ∑ sgn(σ )X1,σ (1) · X2,σ (2) · · · Xn,σ (n) ,    per(X) = ∑ X1,σ (1) · X2,σ (2) · · · Xn,σ (n) .
                σ ∈Sn                                                     σ ∈Sn

That is, we just take the commutative determinant and permanent and make it non-commutative by
ordering the variables in each monomial in increasing order of the rows in which they appear.

Lemma 7.1. Let Pd be the 2d × 2d matrix with x0 on the diagonal, x1 on the anti-diagonal, and 0
everywhere else. Let Dd be the 2d × 2d matrix with x0 on the diagonal, x1 on the first d positions of the
anti-diagonal and −x1 on the last d positions of the anti-diagonal. Then PALd (x0 , x1 ) = per Pd = det Dd .

                        T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                    30
                       L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

Proof. The permanent of Pd can be obtained by choosing in each row of Pd a column index, while
ensuring that each column index is taken only once; multiplying the values obtained; and then adding
the results for all possible choices. Since there are only two non-zero values per row, for the row i (with
1 ≤ i ≤ d), we can either choose the index i with value x0 or the index 2d + 1 − i with value x1 . In the
first case, the column of index i is now forbidden and therefore for the row 2d + 1 − i the only available
non-zero value is x0 with the column index 2d + 1 − i. In the the second case, the column of index
2d + 1 − i is now forbidden and therefore for the row 2d + 1 − i the only available non-zero value is x1
with column index i.
     For the determinant, note that the above reasoning shows that a permutation yielding a non-zero
value is a combination of fixed points (when choosing the value x0 at row i in column i one must then
choose value x0 at row 2d + 1 − i in column 2d + 1 − i) and transpositions (when choosing the value x1
at row 2d + 1 − i in column i one must then choose value x1 at row i in column 2d + 1 − i). Therefore
adding a minus sign to the last d values x1 cancels out the sign of the permutation in the determinant.

Corollary 7.2. Let k, d ∈ N be any parameters such that (64(8 + 12k)) | d. Any circuit of non-skew depth
k for the permanent or the determinant of an d × d matrix must have size 2Ω(d/k) .

Proof. Let us show the corollary for the permanent only, since the case for the determinant is similar. We
will show that there exists a matrix Pk such that the permanent of Pk is fk0 , where fk0 is fk but built with
the 2-variable palindrome polynomial (n = 2). We will follow the construction of fk from the proof of
Lemma 6.4. Lemma 7.1 shows that there exists a matrix of order d0 whose permanent is PALd0 (x0 , x1 ).
To get f00 from this polynomial, or to go from fk−1
                                                0   to fk0 we basically need two types of steps.

   1. Computing the product of two previously obtained polynomials. If we have already built two
      matrices M and N whose permanents are f and g, respectively, then clearly f · g is the permanent
      of the block diagonal matrix with M and N on the diagonal. The order of the block matrix is the
      sum of the orders of M and N.

   2. Computing a j-product of a previously computed polynomial with a palindrome polynomial. If we
      have already built a matrix M whose permanent is the polynomial f , then we can build a matrix
      whose permanent is f ×d0 PALd0 (x0 , x1 ) by considering the block matrix
                                                       
                                                  D 0 A
                                                 0 M 0 ,
                                                  A 0 D

      where D is the order-d0 matrix with x0 on the diagonal and A is the order-d0 matrix with x1 on the
      anti-diagonal (the reasoning is similar to the one in the proof of Lemma 7.1). The order of this
      matrix is the order of M plus 2d0 .

Thus f00 is the permanent of a matrix of order 8d0 and going from fk−1  0  to fk0 increases the size of the
matrix by 12d0 (refer once again to the proof of Lemma 6.4). The order of the matrix Pk whose permanent
is fk0 is thus d := Dk = (8 + 12k)d0 . By Theorem 1.2, any circuit of non-skew depth k for the permanent
must have size 2Ω(d0 ) = 2Ω(d/k) .

                       T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                              31
                       N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

Remark 7.3. We note that a result similar to the one for the permanent proved above can be deduced
from the VNP-completeness of the permanent, which also holds in the non-commutative setting as shown
by Hrubeš, Wigderson, and Yehudayoff [10]. However, by making the reduction explicit we gain slightly
in terms of parameters and additionally, a very similar proof works also for the determinant.


8     Full rank with respect to all partitions
Our lower bound proofs have been based on showing that any arithmetic circuit of non-skew depth at
most k cannot compute a polynomial that has large rank w. r. t. some fixed partition Πk . We can ask if this
strategy can yield lower bounds for general non-commutative arithmetic circuits (i. e., with no restrictions
on non-skew depth) as well. Our aim in this section is to show that the answer to this question is possibly
no: we show that over any sufficiently large field F and any set X of n variables, there is a polynomial
p ∈ FhXi that has non-commutative arithmetic circuits of polynomial size, but which furthermore satisfies
the property that for all partitions Π = (Y, Z) with |Y | ≤ |Z|, rel-rank(p, Π) = 1. This shows that we
cannot even hope to prove that for any polynomial p computed by a polynomial-size non-commutative
circuit, there exists some partition with respect to which p has small rank.
     The proof follows closely a very similar construction due to Raz and Yehudayoff from [19] in the
context of commutative multilinear circuits.

Notation. We first introduce some notation. Given a finite set S of even cardinality, we define an
S-matching to be an unordered partition of S into sets of size two, i. e., M is an S-matching if M ⊆ S2
and the sets in M partition S.
    Fix any degree parameter d ∈ N that is even. For any i, j ∈ [d] with i < j and |[i, j]| = j − i + 1 even,
we define a set Mi, j of [i, j]-matchings as follows. The set Mi, j is defined by induction on |[i, j]|. The
base case is when j = i + 1 and in this case, we set Mi, j = {{i, i + 1}}. In the case that j − i + 1 = 2` for
` > 1, we define the set Mi, j as follows.
              Mi, j = {M ∪ M 0 | M ∈ Mi, j0 , M 0 ∈ M j0 +1, j for some j0 ∈ {i + 1, i + 3, . . . , j − 2}}
                    ∪ {M ∪ {{i, j}} | M ∈ Mi+1, j−1 } .

Now, fix any λe ∈ F for each e ∈ [d]                       [d]
                                      
                                    2 . Given any set M ⊆ 2 , we denote by λM the product ∏e∈M λe .
Finally, we define the polynomial pλ (where λ denotes the tuple (λ1,2 , . . . , λd−1,d )) to be

                                               pλ (X) =      ∑ λM · pM (X)                                               (8.1)
                                                          M∈M1,d

where pM is defined as follows.
                                            pM (X) =              ∑                 x̃w .
                                                        w∈[n]d :wi =w j ∀{i, j}∈M

(Above, x̃w = xw1 · · · xwd as defined in Section 4.1.8 )
    8 The reader may find it instructive to note that each polynomial for which we have proved a lower bound so far has been of

the form pM for some [d]-matching M.


                           T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                            32
                        L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

    We will show that for any choice of λe (e ∈ [d]
                                                                          λ
                                                    2 ), the polynomial p has a non-commutative circuit of
size poly(n, d). On the other hand, if the field F is large enough, then there exists a choice of λe (e ∈ [d]
                                                                                                             
                                                                                                           2 )
such that for any partition Π = (Y, Z) with |Y | ≤ |Z|, rank(M[pλ , Π]) = n|Y | (i. e., rel-rank(pλ , Π) = 1).
    The first lemma gives us the circuit upper bound.

Lemma 8.1. Fix any field F and d, n ∈ N such that d is even. For any choice of field elements λe ∈ F
(e ∈ [d]
                          λ
      2 ), the polynomial p has a non-commutative arithmetic circuit of size poly(n, d).

Proof. We first define several intermediate polynomials that are computed in the course of computing the
polynomial pλ . For any i, j ∈ [d] such that i < j and ` := j − i + 1 is even, define the polynomial pλi, j to
be
                                       pλi, j (X) = ∑ λM · pM (X)
                                                             M∈Mi, j

where pM , for M ∈ Mi, j is defined as

                                    pM (X) =                        ∑                    x̃w .
                                                  w∈[n]` :ws−(i−1) =wt−(i−1) ∀{s,t}∈M


Note that pλ is the same as pλ1,d . Our circuit for pλ computes pλi, j for each i, j ∈ [d]. The construction is
increasing order of the parameter `.
     When ` = 2 (the smallest value possible), the polynomial is simply pλi,i+1 = λ{i,i+1} ∑x∈X xx, which
can be computed by a circuit of size O(n).
     Now say we have a circuit C of size S that computes pλs,t when t − s + 1 < `. To compute pλi, j where
 j − i + 1 = `, we use the following simple identity, which follows from the definition of Mi, j :
                                                                         !
                      pλi, j =            ∑               pλi, j0 · pλj0 +1, j + λi, j ∑ x · pλi+1, j−2 · x .
                                 j0 ∈{i+1,i+3,..., j−2}                             x∈X


Since each of the polynomials pλi, j0 , pλj0 +1, j , and pλi+1, j−2 have already been computed by the circuit C,
the additional size required to compute pλi, j is O(d + n). We continue this way until we have computed all
the pλi, j .
    The total number of pairs i, j is O(d 2 ) and hence the size of the circuit thus constructed is

                                             O(d 2 (d + n)) = poly(n, d) .

   The second lemma tells us that it suffices to consider only balanced partitions (Y, Z), i. e., partitions
such that |Y | = |Z| = d/2.

Lemma 8.2. Let d ∈ N be even. Let f ∈ FhXi be any homogeneous polynomial of degree d. If there
is a partition Π = (Y, Z) with |Y | ≤ |Z| such that rel-rank( f , Π) < 1, then for any balanced partition
Π0 = (Y 0 , Z 0 ) such that Y 0 ⊇ Y , we have rel-rank( f , Π0 ) < 1.

                       T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                    33
                     N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

Proof. Consider the matrix M[ f , Π0 ]. Each row is labelled by a monomial m of degree |Y 0 |, which can be
identified with a pair (m0 , m00 ) where m0 is the natural restriction of m to the locations in Y and m00 is the
restriction to the locations in Y 0 \Y .
      Fix any m00 and consider all the monomials m that give rise to this particular m00 . The resulting
                                         0
matrix has exactly n|Y | rows and n|Z | columns. Each column is labelled by a monomial m000 of degree
|Z 0 | and each row by a monomial m0 of degree |Y |. The (m0 , m000 )-th entry of the the matrix is the
coefficient, in the polynomial f , of the monomial m which equals m0 when restricted to Y , equals m00
when restricted to Y 0 \Y , and equals m000 when restricted to Z 0 . It is not hard to check that this matrix is a
submatrix of the matrix M[ f , Π] (obtained by removing some columns). Since rel-rank( f , Π) < 1, we
have rank(M[ f , Π]) < n|Y | .
      Thus, for any fixed m00 , the rank of the submatrix obtained as above has rank < n|Y | . Since there
         0                                                                     0                    0
are n|Y |−|Y | such matrices, the rank of M[ f , Π0 ] is strictly less than n|Y |−|Y | · n|Y | = n|Y | . Hence, we have
rel-rank( f , Π0 ) < 1.

Lemma 8.3. Let d ∈ N be even and F be any field such that F is either infinite or |F| > d22d . Then
there is a choice of field elements λe ∈ F (e ∈ [d]
                                                      
                                                    2 ) such that for any balanced partition Π, we have
rel-rank(pλ , Π) = 1.
Proof. We fix any finite subset F ⊆ F of size at least d22d +1 and choose each λe (e ∈ [d]
                                                                                          
                                                                                        2 ) independently
and uniformly at random from F. We will show that pλ (X) has the required property with non-zero
probability over the choice of the λe .
    Fix any balanced partition Π = (Y, Z). We say that a [d]-matching M is good for Π if, for each i ∈ Y ,
there is a j ∈ Z such that {i, j} ∈ M.
    We use the following simple fact about the set of matchings M1,d .
Fact 8.4. For any balanced partition Π = (Y, Z), there is a matching M ∈ M1,d that is good for Π.
    By Fact 8.4, there is a matching M0 ∈ M1,d such that M0 is good for Π. It follows then from the
definition of pM0 above that the matrix M[pM0 , Π] is a permutation matrix and hence rank(M[pM0 , Π]) =
nd/2 . We argue that, with high probability over the choice of λ , this is true of the polynomial pλ as well.
    In order to do this, we consider det(M[pλ , Π]). By the definition of pλ , we have

                M[pλ , Π] =     ∑ λN M[pN , Π] = λM M[pM , Π] +
                                                          0       0             ∑         λN M[pN , Π] .
                              N∈M1,d                                      N∈M1,d \{M0 }

    Since M[pN , Π] is a 0-1 matrix for each N, we see that det(M[pλ , Π]) is a polynomial in λe (e ∈ [d]
                                                                                                         
                                                                                                       2 )
of degree at most d2d . We claim that this polynomial is in fact non-zero. To see this, note that if we
substitute λe = 1 for e ∈ M0 and 0 for e 6∈ M0 in the above expression for M[pλ , Π], we obtain the matrix
M[pM0 , Π]; hence, under this substitution, the polynomial det(M[pλ , Π]) takes the value det(M[pM0 , Π])
which is non-zero since M0 is a permutation matrix. We have thus shown that det(M[pλ , Π]) is a non-zero
polynomial in λe (e ∈ [d]
                                                                             d
                        2 ). Since the degree of this polynomial is at most d2 , for λe uniformly randomly
chosen from F, we have by the Schwartz-Zippel lemma [21, 24]
                                                                      d2d   1
                                       Pr[det(M[pλ , Π]) = 0] ≤           < d
                                       λ                              |F|  2

                         T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                                      34
                       L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

                                                 d
since |F| > d22d . Union bounding over the
                                                   
                                                d/2    ≤ 2d choices for Π, we see that with probability
greater than 0 over the choice of λ , we have det(M[pλ , Π]) 6= 0 for each balanced partition Π and hence,
rel-rank(pλ , Π) = 1 for every balanced partition Π.

Theorem 8.5. Let d ∈ N be even and F be any field such that F is either infinite or |F| > d22d . Let
X be any set of n variables. Then there is a homogeneous polynomial p ∈ FhXi of degree d such
that p has a circuit of size poly(n, d) but given any partition Π = (Y, Z) such that |Y | ≤ |Z|, we have
rel-rank(p, Π) = 1.
Proof. Follows directly from Lemmas 8.1, 8.2, and 8.3.


References
 [1] E RIC A LLENDER , J IA J IAO , M EENA M AHAJAN , AND V. V INAY: Non-commutative arithmetic
     circuits: Depth reduction and size lower bounds. Theoret. Comput. Sci., 209(1-2):47–86, 1998.
     Preliminary versions in DIMACS, FSTTCS’94 and STOC’93. [doi:10.1016/S0304-3975(97)00227-
     2] 3
 [2] A LEXANDER BARVINOK: New permanent estimators via non-commutative determinants, 2000.
     [arXiv:math/0007153] 2
 [3] R ICHARD B EIGEL: When do extra majority gates help? Polylog(n) majority gates are equiv-
     alent to one. Comput. Complexity, 4(4):314–324, 1994. Preliminary version in STOC’92.
     [doi:10.1007/BF01263420] 4
 [4] P ETER B ÜRGISSER , J OSEPH M. L ANDSBERG , L AURENT M ANIVEL , AND J ERZY W EYMAN: An
     overview of mathematical issues arising in the geometric complexity theory approach to VP 6= VNP.
     SIAM J. Comput., 40(4):1179–1209, 2011. [doi:10.1137/090765328, arXiv:0907.2850] 2
 [5] A RKADEV C HATTOPADHYAY AND K RISTOFFER A RNSFELT H ANSEN: Lower bounds for cir-
     cuits with few modular and symmetric gates. In Proc. 32nd Internat. Colloq. on Automata, Lan-
     guages and Programming (ICALP’05), volume 3580 of LNCS, pp. 994–1005. Springer, 2005.
     [doi:10.1007/11523468_80] 4
 [6] X I C HEN , N EERAJ K AYAL , AND AVI W IGDERSON: Partial derivatives in arithmetic complexity
     and beyond. Found. Trends Theoret. Comput. Sci., 6(1-2):1–138, 2011. [doi:10.1561/0400000043]
     2
 [7] S TEVE C HIEN , L ARS E ILSTRUP R ASMUSSEN , AND A LISTAIR S INCLAIR: Clifford algebras and
     approximating the permanent. J. Comput. System Sci., 67(2):263–290, 2003. Preliminary version in
     STOC’02. [doi:10.1016/S0022-0000(03)00010-2] 2
 [8] Z EEV DVIR , G UILLAUME M ALOD , S YLVAIN P ERIFEL , AND A MIR Y EHUDAYOFF: Separating
     multilinear branching programs and formulas. In Proc. 44th STOC, pp. 615–624. ACM Press, 2012.
     Preliminary version in ECCC. [doi:10.1145/2213977.2214034] 10

                      T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                            35
                 N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

 [9] PAVEL H RUBEŠ , AVI W IGDERSON , AND A MIR Y EHUDAYOFF: Non-commutative circuits and
     the sum-of-squares problem. J. Amer. Math. Soc., 24(3):871–898, 2011. Preliminary version in
     STOC’10. [doi:10.1090/S0894-0347-2011-00694-2] 3, 6, 7, 11, 13, 15

[10] PAVEL H RUBEŠ , AVI W IGDERSON , AND A MIR Y EHUDAYOFF: Relationless completeness and
     separations. In Proc. 25th IEEE Conf. on Computational Complexity (CCC’10), pp. 280–290. IEEE
     Comp. Soc. Press, 2010. [doi:10.1109/CCC.2010.34] 32

[11] L AURENT H YAFIL: The power of commutativity. In Proc. 18th FOCS, pp. 171–174. IEEE Comp.
     Soc. Press, 1977. [doi:10.1109/SFCS.1977.31] 2, 5

[12] S HACHAR L OVETT AND S RIKANTH S RINIVASAN: Correlation bounds for poly-size AC0 circuits
     with n1−o(1) symmetric gates. In Proc. 15th Internat. Workshop on Randomization and Computation
     (RANDOM’11), pp. 640–651. Springer, 2011. [doi:10.1007/978-3-642-22935-0_54] 4

[13] M EENA M AHAJAN AND B. V. R AGHAVENDRA R AO: Small space analogues of Valiant’s classes
     and the limitations of skew formulas. Comput. Complexity, 22(1):1–38, 2013. Preliminary version
     in DROPS. [doi:10.1007/s00037-011-0024-2] 3

[14] G UILLAUME M ALOD AND NATACHA P ORTIER: Characterizing Valiant’s algebraic com-
     plexity classes. J. Complexity, 24(1):16–38, 2008. Preliminary version in MFCS’06.
     [doi:10.1016/j.jco.2006.09.006] 3, 13

[15] N OAM N ISAN: Lower bounds for non-commutative computation (extended abstract). In Proc. 23rd
     STOC, pp. 410–418. ACM Press, 1991. [doi:10.1145/103418.103462] 2, 3, 4, 9, 12, 13, 30

[16] N OAM N ISAN AND AVI W IGDERSON: Lower bounds on arithmetic circuits via partial
     derivatives. Comput. Complexity, 6(3):217–234, 1996. Preliminary version in FOCS’95.
     [doi:10.1007/BF01294256] 10

[17] R AN R AZ: Multi-linear formulas for permanent and determinant are of super-polynomial
     size.    J. ACM, 56(2):8:1–8:17, 2009.   Preliminary versions in STOC’04 and ECCC.
     [doi:10.1145/1502793.1502797] 10

[18] R AN R AZ , A MIR S HPILKA , AND A MIR Y EHUDAYOFF: A lower bound for the size of syntactically
     multilinear arithmetic circuits. SIAM J. Comput., 38(4):1624–1647, 2008. Preliminary version in
     FOCS’07. [doi:10.1137/070707932] 10

[19] R AN R AZ AND A MIR Y EHUDAYOFF: Balancing syntactically multilinear arithmetic circuits.
     Comput. Complexity, 17(4):515–535, 2008. [doi:10.1007/s00037-008-0254-0] 10, 32

[20] R AN R AZ AND A MIR Y EHUDAYOFF: Lower bounds and separations for constant depth mul-
     tilinear circuits. Comput. Complexity, 18(2):171–207, 2009. Preliminary version in CCC’08.
     [doi:10.1007/s00037-009-0270-8] 10

[21] JACOB T. S CHWARTZ: Fast probabilistic algorithms for verification of polynomial identities. J.
     ACM, 27(4):701–717, 1980. [doi:10.1145/322217.322225] 34

                     T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                       36
                      L OWER B OUNDS FOR N ON -C OMMUTATIVE S KEW C IRCUITS

[22] A MIR S HPILKA AND A MIR Y EHUDAYOFF: Arithmetic circuits: A survey of recent results and open
     questions. Found. Trends Theoret. Comput. Sci., 5(3-4):207–388, 2010. [doi:10.1561/0400000039]
     2

[23] S EINOSUKE T ODA: Classes of arithmetic circuits capturing the complexity of computing the
     determinant. IEICE Trans. Inf. Systems, E75-D(1):116–124, 1992. IEICE. 3

[24] R ICHARD E. Z IPPEL: Probabilistic algorithms for sparse polynomials. In Proc. Symbolic and
     Algebraic Comput. (EUROSAM’79), volume 72 of LNCS, pp. 216–226, 1979. Available from
     Springer. 34


AUTHORS

     Nutan Limaye
     Assistant professor
     Department of Computer Science and Engineering
     Indian Institute of Technology, Bombay, India


     Guillaume Malod
     Associate professor
     Université Paris Diderot, France


     Srikanth Srinivasan
     Assistant professor
     Department of Mathematics
     Indian Institute of Technology, Bombay, India


ABOUT THE AUTHORS

      N UTAN L IMAYE graduated from The Institute of Mathematical Sciences, Chennai, India,
         in 2009; her advisor was Meena Mahajan. Her thesis focused on the interconnection
         between language classes and complexity classes. She is interested in Boolean and
         arithmetic circuit complexity and graph algorithms. She likes all things Japanese and has
         been trying to learn Japanese for the last year.




                     T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                           37
            N UTAN L IMAYE , G UILLAUME M ALOD , AND S RIKANTH S RINIVASAN

G UILLAUME M ALOD received his Ph. D. from Université Claude Bernard Lyon 1 in 2003;
   his advisor was Bruno Poizat, whose writing style he admires. His thesis focused on
   arithmetic circuits and coefficient functions and his scientific interests have not changed
   much since. He lives in Paris with his wife Asako and children Naoto and Miyuki, except
   when they enjoy Japan’s hot and humid summer. He likes standing motionless or moving
   very slowly and cooking. He likes to fall asleep while listening to Anima.


S RIKANTH S RINIVASAN got his undergraduate degree from the Indian Institute of Technol-
    ogy Madras, where his interest in the theory side of CS was piqued under the tutelage
    of N. S. Narayanswamy. Subsequently, he obtained his Ph. D. from The Institute of
    Mathematical Sciences in 2011; his advisor was V. Arvind. His research interests span
    all of TCS (in theory), but in practice are limited to circuit complexity, derandomization,
    and related areas of mathematics. He enjoys running and pretending to play badminton.




                T HEORY OF C OMPUTING, Volume 12 (12), 2016, pp. 1–38                             38