Randomized Polynomial-Time Identity Testing for Noncommutative Circuits

Authors Vikraman Arvind, Pushkar S. Joglekar, Partha Mukhopadhyay, S. Raja,
Plaintext
                            T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36
                                         www.theoryofcomputing.org




         Randomized Polynomial-Time Identity
          Testing for Noncommutative Circuits
      Vikraman Arvind                       Pushkar S. Joglekar                Partha Mukhopadhyay
                                                    S. Raja∗
                 Received August 13, 2017; Revised October 4, 2018; Published October 13, 2019




       Abstract: In this paper we show that black-box polynomial identity testing (PIT) for n-
       variate noncommutative polynomials f of degree D with at most t nonzero monomials can be
       done in randomized poly(n, logt, log D) time, and consequently in randomized poly(n, logt, s)
       time if f is computable by a circuit of size s. This result makes progress on a question that
       has been open for over a decade. Our algorithm is based on efficiently isolating a monomial
       using nondeterministic automata.
           The above result does not yield an efficient randomized PIT for noncommutative circuits
       in general, as noncommutative circuits of size s can compute polynomials with a double-
       exponential (in s) number of monomials. As a first step, we consider a natural class of
       homogeneous noncommutative circuits, that we call +-regular circuits, and give a white-box
       polynomial-time deterministic PIT for them. These circuits can compute noncommutative
       polynomials with number of monomials double-exponential in the circuit size. Our algorithm
       combines some new structural results for +-regular circuits with known PIT results for
       noncommutative algebraic branching programs, a rank bound for commutative depth-3
     A preliminary version of this paper appeared in the Proceedings of the 49th ACM Symp. on Theory of Computing
(STOC’17) [4].
   ∗ Part of the work was done while the author was a postdoctoral fellow at Chennai Mathematical Institute, India.



ACM Classification: F.1.2
AMS Classification: 68W20
Key words and phrases: algebraic complexity, non-commutative computation, polynomial identity
testing, randomized algorithm, Polynomial Identity Lemma


© 2019 Vikraman Arvind, Pushkar S. Joglekar, Partha Mukhopadhyay and S. Raja
c b Licensed under a Creative Commons Attribution License (CC-BY)                      DOI: 10.4086/toc.2019.v015a007
            V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

        identities, and an equivalence testing problem for words. Finally, we solve the black-box
        PIT problem for depth-3 +-regular circuits in randomized polynomial time. In particular, we
        show if f is a nonzero noncommutative polynomial in n variables over the field F computed
        by a depth-3 +-regular circuit of size s, then f cannot be a polynomial identity for the matrix
        algebra Ms (F) when F is sufficiently large depending on the degree of f .


1     Introduction
Noncommutative computation, introduced in complexity theory by Hyafil [12] and Nisan [19], is a central
field of algebraic complexity theory. The main algebraic structure of interest is the free noncommutative
ring FhZi over a field F, where Z = {z1 , z2 , . . . , zn }, zi , 1 ≤ i ≤ n are free noncommuting variables.1
     A fundamental problem in the subject is designing efficient algorithms for noncommutative Polyno-
mial Identity Testing. The problem can be stated as follows:
     Let f ∈ FhZi be a polynomial represented by a noncommutative arithmetic circuit C. The polynomial
 f can be either given by a black-box for C (using which we can evaluate the polynomial f on matrices
with entries from F or an extension field), or the circuit C may be explicitly given. The algorithmic
problem is to check if the polynomial computed by C is identically zero.
     We recall the formal definition of a noncommutative arithmetic circuit.

Definition 1.1. A noncommutative arithmetic circuit C over a field F and indeterminates z1 , z2 , . . . , zn is
a directed acyclic graph (DAG) with each node of indegree zero labeled by a variable or a scalar constant
from F: the indegree 0 nodes are the input nodes of the circuit. Each internal node of the DAG is of
indegree two and is labeled by either a + or a × (indicating that it is a plus gate or multiplication gate,
respectively). Furthermore, the two inputs to each × gate are designated as left and right inputs which is
the order in which the gate multiplication is done. A gate of C is designated as output. Each internal gate
computes a polynomial (by adding or multiplying its input polynomials), where the polynomial computed
at an input node is just its label. The polynomial computed by the circuit is the polynomial computed at
its output gate. An arithmetic circuit is a formula if the fan-out of every gate is at most one.

    Notice that if the size of circuit C is s then the degree of the polynomial computed by C can be 2s .
    Bogdanov and Wee [8] have shown a randomized polynomial-time algorithm when the degree of the
noncommutative circuit C is polynomially bounded in s and n. Their algorithm is based on a classical
result of Amitsur-Levitzki [3] stated below.

Theorem 1.2 (Amitsur-Levitzki Theorem). For any field F, a nonzero noncommutative polynomial
P ∈ FhZi of degree ≤ 2d − 1 is not a polynomial identity for the matrix algebra Md (F). That is to say, P
does not vanish on all d × d matrices over F.

Remark 1.3. There is a second part to the Amitsur-Levitzki theorem which states that Md (F) has degree
2d identities. In particular, the standard polynomial S2d (x1 , . . . , x2d ) = ∑σ ∈S2d sgn(σ )xσ (1) · · · xσ (2d) , is a
minimal identity for Md (F).
    1 The z are also called indeterminates. We shall be using both terms interchangeably.
           i



                            T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                        2
         R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

      Bogdanov and Wee’s randomized PIT algorithm [8] applies the above theorem to obtain a randomized
PIT as follows: Let C(z1 , z2 , . . . , zn ) be a circuit of syntactic degree bounded by 2d − 1. For each i ∈ [n],
substitute the variable zi by a d × d matrix Mi of commuting indeterminates. More precisely, the (`, k)-
                      (i)
th entry of Mi is z`,k where 1 ≤ `, k ≤ d. By Theorem 1.2, the matrix M f = f (M1 , M2 , . . . , Mn ) is not
identically zero. Hence, in M f there is an entry (`0 , k0 ) which has the commutative nonzero polynomial
                             (i)
g`0 ,k0 over the variables {z`,k : 1 ≤ i ≤ n, 1 ≤ `, k ≤ d}. Notice that the degree of the polynomial g`0 ,k0 is at
most 2d − 1. If we do random substitutions from an extension field of F of size at least 4d, then we get a
randomized polynomial identity testing algorithm, with error probability at most 1/2, by the Polynomial
Identity Lemma2 [20, 26, 18, 9, 27, 28].
      The problem with this approach for general noncommutative circuits (whose degree can be 2s ) is that
the dimension of the matrices grows linearly with the degree of the polynomial. Therefore, this approach
only yields a randomized exponential-time algorithm for the problem. Finding an efficient randomized
identity test for general noncommutative circuits is a well-known open problem, as mentioned in a recent
workshop on algebraic complexity theory (WACT 2016).
      We also recall the definition of noncommutative algebraic branching programs [23].

Definition 1.4. A homogeneous noncommutative algebraic branching program ABP is a layered DAG
with one in-degree zero source node s, and one out-degree zero sink node t. The vertices are partitioned
into layers 0, 1, . . . , d (source at layer 0 and sink at layer d). Edges go only from layer i to layer i + 1 for
each i ≤ d − 1. Each edge is labeled with a homogeneous linear form in the noncommuting variables
{xi | 1 ≤ i ≤ n}. For each s-to-t directed path γ = e1 , e2 , . . . , ed , let fγ = `1 · `2 · · · `d be the product of
the linear forms `i labeling the edges in that order. The ABP computes the homogeneous degree d
noncommutative polynomial f = ∑γ fγ ∈ FhXi, where the sum is over all directed paths γ from s to t.

    We note that Raz and Shpilka [23] have shown a white-box deterministic polynomial-time PIT for
noncommutative ABPs. Forbes-Shpilka [11] and Agrawal et al. [1] have given a quasi-polynomial-time
black-box algorithm for small degree noncommutative ABPs.


2     Main results
The main result of the paper is the following theorem that we show about noncommutative identities
which is of independent mathematical interest.

Theorem 2.1. Let F be a field of size more than (n+2)d. Let f ∈ Fhz1 , z2 , . . . , zn i be a nonzero polynomial
of degree d and with t nonzero monomials. Then f cannot be a polynomial identity for the matrix ring
Mk (F) for k ≥ dlogte + 1.

    The above theorem yields a randomized PIT for black-box noncommutative polynomials. To see this,
suppose f ∈ Fhz1 , z2 , . . . , zn i be a nonzero polynomial of degree d and with t nonzero monomials. We can
assume F is of size more than (n + 2)d (if required, we take a suitable extension field). If f 6≡ 0 then, by
Theorem 2.1, the polynomial f does not vanish if we substitute for each zi , (logt +1)×(logt +1) matrices
    2 The Polynomial Identity Lemma is widely known as the DeMillo-Lipton-Schwartz-Zippel Lemma. We have given it an

alternative name for reasons explained in Section 3.1, where the lemma is also formally stated as Lemma 3.3.


                           T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                     3
          V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

of distinct commuting indeterminates. Indeed, f will evaluate to a nonzero (logt + 1) × (logt + 1) matrix
whose entries are polynomials in commuting variables of degree at most d. For any nonzero entry of
this matrix, by applying the Polynomial Identity Lemma (Lemma 3.3) for commutative polynomials,
random substitution from F (or a suitable extension field) for the commuting variables is nonzero with
high probability.

Corollary 2.2. Let f ∈ Fhz1 , z2 , . . . , zn i be a noncommutative polynomial of degree d and sparsity t given
by a black-box, where the black-box can be evaluated on matrices over F or an extension field. Then
there is a randomized algorithm to check whether f is an identically zero polynomial that runs in time
poly(log d, n, logt).
    In particular, suppose C is a noncommutative arithmetic circuit, given by black-box access, of size
s computing a polynomial in n variables of sparsity t. There is a randomized algorithm that checks if
C ≡ 0 in time poly(s, n, logt).

    To second part of the corollary follows because C computes a polynomial of degree bounded by 2s .

Remark 2.3.

   1. It is interesting to compare Theorem 2.1 with the classical Amitsur-Levitski theorem. Our result
      brings out the importance of the number of monomials in a polynomial identity for d × d matrices.
      It implies that any polynomial identity f for d × d matrices over a field F of size more than deg f
      must have more than 2d−1 monomials.

   2. We also note that the dimension k of the matrix ring Mk (F) in Theorem 2.1 is nearly optimal up
      to a logarithmic factor. In fact, the second part of the Amitsur-Levitzki theorem states that the
      standard polynomial S2d (x1 , . . . , x2d ), is a minimal identity for Md (F). Notice that the number of
      monomials in the standard polynomial is 2O(d log d) .
                                                                                                    Ω(s)
    In general, a noncommutative circuit of size s can compute a polynomial that can have 22 mono-
                                                        s
mials. For example the polynomial f (x, y) = (x + y)2 has noncommutative circuit of size O(s) but the
                            s
number of monomials is 22 . We consider identity testing for a subclass of homogeneous noncommutative
circuits, that we call +-regular circuits. These are syntactic homogeneous circuits where the + gates can
be partitioned into layers such that the following holds:

  (i) There are no directed paths between the + gates within a layer.

  (ii) All + gates in a layer have the same syntactic degree.

 (iii) The output gate is a + gate.

 (iv) All input to output paths go through exactly one + gate in each layer.

    The +-depth of a +-regular circuit is the number of +-layers in it.
    A couple of remarks about +-regular circuits are in order.

Remark 2.4.

                        T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                 4
          R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

      • We note that the computational power of noncommutative +-regular circuits is quite limited. We
        can easily adapt Nisan’s rank-based argument [19] to show that the noncommutative permanent
        cannot be computed by polynomial-size +-regular circuits. Such a result is not known for general
        noncommutative circuits.

      • However, polynomial-size +-regular circuits of +-depth 2 can compute polynomials of exponential
        degree and a double-exponential number of monomials. Such polynomials cannot be computed by
        bounded-depth noncommutative circuits.
                                                   s
      • As is evident from the example (x + y)2 , notice that +-regular circuits of size s can compute
                                           Ω(s)
        polynomials of degree 2s with 22 monomials. Nevertheless, we are able to exploit the circuit’s
        structure to give a deterministic polynomial-time identity testing algorithm for such circuits.

Theorem 2.5. Let C be a noncommutative +-regular circuit of size s given as a white-box computing a
polynomial in FhXi. There is a deterministic polynomial-time algorithm that tests whether C computes
the identically zero polynomial.

    Finally, we give a randomized polynomial identity test for +-regular circuits of +-depth 2 in the
black-box model. We denote such circuits by ΣΠ∗ Σ.

Remark 2.6. We use this notation because the polynomials computed by ΣΠ∗ Σ are sums of products of
linear forms, like the well-studied ΣΠΣ circuits. Our notation also brings out their +-regular structure:
there are two +-layers. The top + gate is the output gate and the bottom +-layer consists of gates
computing homogeneous linear forms. The Π∗ indicates that between the two +-layers we can have
several × gates. However, +-regularity guarantees that all inputs to the top + gate have the same syntactic
degree.

Theorem 2.7. Let F be a field of size more than D. Let f (x1 , . . . , xn ) ∈ FhXi be a nonzero homogeneous
polynomial of degree D computed by a ΣΠ∗ Σ circuit with top gate fan-in s and the fan-in of the product
gates are D. Then f cannot be a polynomial identity for the matrix ring Ms (F).

    The black-box randomized polynomial identity test for ΣΠ∗ Σ arithmetic circuits is an immediate
corollary.

Corollary 2.8. Let C be a depth-three +-regular circuit of size s computing a polynomial f (x1 , . . . , xn ) ∈
FhXi, where the circuit C is given only by black-box access. Then, there is a randomized algorithm that
checks whether f is identically zero, and the algorithm runs in time poly(s, n).

2.1     Outline of the proofs
In this section, we give informal description of the proofs for Theorem 2.1, Theorem 2.5, and Theorem 2.7.

Black-box algorithm for noncommutative polynomials of exponential sparsity
We first describe the basic steps required for the proof of Theorem 2.1. Since we are working in
the free noncommutative ring Fhz1 , z2 , . . . , zn i, notice that monomials are free words over the alphabet

                         T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                5
          V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

{z1 , z2 , . . . , zn }, and the polynomial f is an F-linear combination of monomials. Degree d monomials are
elements of {z1 , z2 , . . . , zn }d . Let m be a degree-d monomial. We use the notation m[i] to denote the i-th
variable in the monomial. More precisely, if m = z j1 z j2 · · · z ji−1 z ji · · · z jd then m[i] = z ji .

Converting to a bivariate polynomial
It is convenient to convert, by a simple encoding trick, the given noncommutative polynomial into a
noncommutative polynomial in Fhx0 , x1 i, where x0 and x1 are two noncommuting variables. Let
                                                        t
                                                  f = ∑ ci wi
                                                       i=1

with ci ∈ F, where wi are monomials in variables {z1 , z2 , . . . , zn }. We encode each variable zi using the
bivariate substitution ∀i ∈ [n] : zi → x0 x1i x0 . Thus, each monomial wi is uniquely encoded as a monomial
ŵi in the two variables {x0 .x1 }, where ŵi is obtained from wi by applying the bivariate substitution map
to each variable. Let the resulting polynomial be f 0 (x0 , x1 ) ∈ Fhx0 , x1 i. Since this encoding of monomials
is bijective, the following claim clearly holds.
Proposition 2.9. For any polynomial f ∈ Fhz1 , z2 , . . . , zn i of degree bounded by d, the corresponding
bivariate noncommutative polynomial f 0 (x0 , x1 ), of degree at most (n + 2)d, is nonzero if and only if f is
nonzero. Furthermore, if f is given by black-box access for evaluation on matrices, we can efficiently
create from it black-box access for f 0 .
    The following definition is crucial for the main result in the paper.
Definition 2.10. Let M ⊆ {x0 , x1 }D be a subset of degree D monomials over noncommuting variables
{x0 , x1 }. A subset of indices I ⊆ [D] is said to be an isolating index set for M if there is a monomial
m ∈ M such that for each m0 6= m, m0 ∈ M, there is some index i ∈ I for which m[i] 6= m0 [i]. In other
words, no other monomial in M agrees with monomial m on all positions in the index set I.
    The following lemma says that every subset of monomials M ⊆ {x0 , x1 }D has an isolating index set
of size log |M|. The proof is a simple halving argument.
Lemma 2.11. Let M ⊆ {x0 , x1 }D be a finite set of degree D monomials over variables {x0 , x1 }. Then M
has an isolating index set of size k which is bounded by log |M|.
Proof. The monomials m ∈ M are seen as indexed from left to right, where m[i] denotes the variable in
the i-th position of m. Let i1 ≤ D be the first index such that not all monomials agree on the i1 -th position.
Let

                                          S0 = {m : m[i1 ] = x0 },
                                          S1 = {m : m[i1 ] = x1 }.

    Either |S0 | or |S1 | is of size at most |M|/2. Let Sb1 denote that subset, b1 ∈ {0, 1}. We replace the
monomial set M by Sb1 and repeat the same argument for at most log |M| steps. Clearly, by this process
we identify a set of indices I = {i1 , . . . , ik }, k ≤ log |M| such that the set shrinks to a singleton set {m}.
Clearly, I is an isolating index set as witnessed by the isolated monomial m.

                         T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                   6
         R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

Remark 2.12. Notice that the size of the isolating index set denoted k is bounded by logt as well as the
degree D of the polynomial f (x0 , x1 ).

NFA construction
In an earlier paper [6] (for sparse polynomial identity testing) they used a deterministic finite state
automaton to isolate a monomial by designing an automaton which accepts a unique monomial. It seems
to not work in its current form for the proof of Theorem 2.1 because the number of states that such a
deterministic automaton requires is the length of the monomial which could be exponentially large. It
turns out that we can use a small nondeterministic finite automaton which will guess the isolating index
set for the set of nonzero monomials of f . The complication is that there are exponentially many wrong
guesses. However, it turns out that if we make our NFA a substitution automaton, we can ensure that
the monomials computed on different nondeterministic paths (which correspond to different guesses of
the isolating index set) all have disjoint support. Once we have this property, it is easy to argue that for
the correct nondeterministic path, the computed commutative polynomial is nonvanishing (because the
isolated monomial cannot be canceled). With this intuition, we proceed with the simple technical details
in Section 4.

White-box algorithm for +-regular circuits
Now we informally describe the proof of Theorem 2.5. We note a crucial observation: Let T (z1 , . . . , zs )
be a homogeneous noncommutative polynomial of degree d. Let R1 , . . . , Rs be homogeneous noncom-
mutative polynomials each of degree d 0 . Consider any maximal F-linearly independent subset of the
polynomials R1 , . . . , Rs . Let R1 , . . . , Rk be such a set. We can express R j = ∑ki=1 α ji Ri for k + 1 ≤ j ≤ s
where α ji ∈ F. Then, it turns out that T (R1 , . . . , Rk , ∑ki=1 αk+1i Ri , . . . , ∑ki=1 αsi Ri ) = 0 if and only if
T (y1 , . . . , yk , ∑ki=1 αk+1i yi , . . . , ∑ki=1 αsi yi ) = 0, where y1 , . . . , yk are fresh noncommuting variables. As a
consequence, it turns out that for a deterministic polynomial-time white-box identity testing for +-regular
circuits, it suffices to solve the following computational problem:
    Let P1 , . . . , P` ∈ FhXi be products of homogeneous linear forms given by multiplicative circuits of
size s. The degrees of the polynomials Pi could be exponential in s. Then find a maximal F-linearly
independent subset of the polynomials and express the others as linear combination of the independent
polynomials. We solve the above problem in deterministic polynomial time. We prove that it suffices to
replace Pi with P̃i which is obtained from Pi by retaining, in the product, only linear forms that appear in at
most `5 locations (roughly). This is shown using a rank bound for commutative depth-three identities [25].
We also require algorithms [16, 21, 17] over words to efficiently find the linear forms appearing in
those `5 locations. Since P̃i : 1 ≤ i ≤ ` are small degree, we are in the usual regime of low-degree
noncommutative polynomials, and can adapt the noncommutative ABP identity testing [23] to solve the
linear independence testing problem.

Black-box algorithm for depth-three +-regular circuits
We now outline the proof of Theorem 2.7. It is similar to the proof of Theorem 2.1. The main idea is
the following. Suppose P1 , P2 , . . . , Ps are D-products of homogeneous linear forms in FhXi. Consider

                            T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                            7
          V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

any F-linear combination ∑si=1 βi Pi where without loss of generality ∀i : βi ∈ F \ {0}. Then there exists
locations I ⊆ [D] with |I| ≤ s − 1 with the following property: consider polynomials Pi,I obtained from
the Pi by treating only the variables appearing in positions in I as noncommutative, and the rest as
commutative. Then
                                             s                       s
                                            ∑ βi Pi = 0 iff ∑ βi Pi,I = 0.
                                            i=1                  i=1

Now, we can design small nondeterministic substitution automata that guess the locations in I. The rest
of the proof is similar to the proof of Theorem 2.1.

2.2   Organization
The paper is organized as follows. In Section 3, we give some simple properties of noncommutative
polynomials. In Section 4 we prove Theorem 2.1. In Section 5 we prove Theorem 2.5. The proof of
Theorem 2.7 is in Section 6.


3     Preliminaries
In this section we state a few simple properties of noncommutative polynomials useful for our proofs. Let
A ∈ Fn×n be an n×n invertible matrix, and X = {x1 , x2 , . . . , xn } be n noncommuting variables. We note that
homogeneous F-linear forms ∑ni=1 ui xi , ui ∈ Fn is an n-dimensional vector space over F. We can identify
a vector u ∈ Fn with the homogeneous linear form ∑ni=1 ui xi . Thus, we can think of A as an invertible
linear transform on homogeneous linear forms: it maps xi to ∑n`=1 A`i x` . Let f (x1 , x2 , . . . , xn ) ∈ FhXi be a
homogeneous degree-d noncommutative polynomial. Let

                                                     f = ∑ fw · w,
                                                           w∈X d

where fw ∈ Fn is the coefficient of monomial w in f . Let

                                                    w = xi1 xi2 · · · xid .

    We can apply the linear transform A to the j-th position of the monomial w by replacing xi j with the
linear form ∑n`=1 A`i j x` to obtain the polynomial
                                                                                   !
                                                                          n
                                  A j (w) = xi1 xi2 · · · xi j−1 ·       ∑ A`i x` · · · xi .
                                                                               j          d
                                                                         `=1

By linearity, we define A j ( f ) = ∑w∈X d fw · A j (w), and observe the following proposition.

Proposition 3.1. Let A : Fn → Fn be any invertible linear transformation, and f (x1 , x2 , . . . , xn ) ∈ FhXi
be any homogeneous polynomial of degree d. Let A j ( f ) be the polynomial obtained by applying the
transform A to the j-th position of monomials of f , as defined above, for j ∈ [d]. Then f 6= 0 if and only
if A j ( f ) 6= 0.


                         T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                     8
         R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

Proof. If f ≡ 0 then clearly A j ( f ) ≡ 0.
     For the other direction, suppose f 6≡ 0. Choose and fix monomials w1 ∈ X j−1 and w2 ∈ X d− j such
that
                           W = {w1 xi w2 | xi ∈ X, and w1 xi w2 is nonzero in f }.
By collecting monomials of f with prefix w1 and suffix w2 we can write

                                              f = w1 · L · w2 + ∑ fw · w,
                                                                 w6∈W

where L = ∑n`=1 u` x` is some nonzero linear form. By linearity of A j we have

                                     A j ( f ) = A j (w1 · L · w2 ) + ∑ fw · A j (w).
                                                                    w6∈W

Notice that A j (w1 · L · w2 ) = w1 · L0 · w2 , and L0 = ∑n`=1 v` x` where v` = ∑nk=1 Ak` uk . Now, L0 is nonzero
because A is an invertible matrix. Hence, A j (w1 · L · w2 ) is a nonzero polynomial and all its nonzero
monomials are in W . On the other hand, ∑w6∈W fw · A j (w) has no nonzero monomials in W . It follows
that A j ( f ) 6≡ 0.

    Given any noncommutative polynomial f (x1 , . . . , xn ) ∈ FhXi of degree d, we can rename the vari-
able xi : 1 ≤ i ≤ n occurring in position j ∈ [d] (from the left), by a new commuting variable xi j and
obtain a commutative polynomial g in the variables {xi j }. The polynomial g is the set-multilinearized
polynomial obtained from the noncommutative polynomial f . The following proposition states that this
set-multilinearization preserves identities.

Proposition 3.2. Let f (x1 , . . . , xn ) ∈ FhXi be a noncommutative polynomial of degree d. For 1 ≤ i ≤ n,
replace the variable xi occurring in position j : 1 ≤ j ≤ d by a new commuting variable xi j . Let
g(x11 , . . . , x1d , x21 , . . . , xn1 , . . . , xnd ) be the polynomial obtained by this transformation. Then f = 0 if and
only if g = 0.

Proof. The proof follows easily as the above variable replacement uniquely encodes nonzero monomials
of f into nonzero monomials of g. Thus, nonzero monomials in f and in g are in 1-1 correspondence.

3.1    The Polynomial Identity Lemma
In this section we state the Polynomial Identity Lemma, which is widely known as the DeMillo-Lipton-
Schwartz-Zippel Lemma, and attempt to briefly trace its history [20, 26, 18, 9, 27, 28].

Lemma 3.3 (The Polynomial Identity Lemma). Let f (x1 , x2 , . . . , xn ) be a nonzero degree-d n-variate
polynomial over a field F, and S ⊂ F be a finite subset. The number of zeros of f in the cartesian product
set Sn is bounded by d · |S|n−1 .

    This basic lemma has played an important role in randomized and algebraic computation. The
chronology of this lemma is equally interesting. Until some years ago, it was known as the Schwartz–
Zippel Lemma, although the paper by DeMillo and Lipton [9], which also proves the lemma, appeared
before the papers by Zippel [28] and Schwartz [27].

                           T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                           9
          V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

     However, the lemma can be traced further back. A variant of the Polynomial Identity Lemma,
bounding the number of zeros of a nonzero n-variate degree-d polynomial over a finite field Fq by dqn−1 ,
is attributed to a 1922 article by Øystein Ore [20] in Lidl and Niederreiter’s book [14, Theorem 6.13, p.
275]. The book contains the standard proof by induction. This is the version of Lemma 3.3 with S = Fq ,
but the proof is practically identical. Additionally, we note that W. Schmidt’s 1976 monograph [26,
Lemma 3A, p. 147] also derives the dqn−1 bound using the same inductive argument, although he does
not credit anyone for it. Thus, it appears that the lemma was rediscovered multiple times in the last
century.
     The article by Bishnoi et al. [7], which also mentions Ore’s paper, contains a nice section on various
versions of this lemma and its connections to the Alon–Füredi theorem [2]. Lipton’s blog too [15] has an
interesting discussion on the Polynomial Identity Lemma.
     Finally, we note that a related result for multilinear polynomials over F2 was shown by Muller [18],
in order to give a lower bound on the distance of Reed-Muller codes [18, 24] for multilinear polynomials
over F2 .
     In the light of these findings, in the present paper we suggest an alternative name for this fundamental
result: we refer to it as the “Polynomial Identity Lemma.” It would be more expressive of the lemma’s
content than using a list of (now perhaps seven) names.


4    Black-box PIT for polynomials of exponential degree and sparsity
In this section we prove Theorem 2.1, which will yield a simple randomized black-box PIT algorithm
with polynomial run time for noncommutative polynomials of exponential degree and sparsity.
    We first give an intuitive sketch of the proof idea. In essence, our algorithm is a “nondeterministic
degree reduction” technique, which transforms the given polynomial f into another polynomial fˆ
whose noncommutative degree is polynomially bounded such that f ≡ 0 iff fˆ ≡ 0. In any monomial
of f , we can think of the algorithm as nondeterministically choosing polynomially many positions as
special, and replacing variables in other positions by commuting variables. In fact, the noncommutative
variables in the polynomially many special positions too are converted to commutative variables using
set-multilinearization as in Proposition 3.2.
    This transformation is carried out using finite automata. In an earlier paper [6], a deterministic
black-box PIT algorithm was obtained for noncommutative sparse polynomials of polynomial degree,
using a deterministic finite automaton that isolates a unique monomial, in the sense that it accepts a
unique nonzero monomial of the input sparse polynomial.
    This idea is not directly useful for us. However, small nondeterministic finite substitution automata
help. We can design a substitution NFA that nondeterministically guesses polynomially many positions
at which some nonzero monomial of f is uniquely defined, assuming f has only exponentially many
monomials. The variables in these positions are then replaced by set-multilinearized versions (using an
extra position index, as in Proposition 3.2), and in other positions the NFA replaces the variables with
commuting variables. One complication is that there are exponentially many wrong guesses made by
the NFA. However, we can ensure that monomials output on different nondeterministic paths (which
correspond to different guesses of the isolating set of positions) have disjoint support. This guarantees
that the overall polynomial output by the NFA is of polynomial noncommutative degree, and nonzero

                       T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                               10
         R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

because the isolated monomial will not be canceled. We now present the formal details.

Definition 4.1. [5] A finite nondeterministic substitution automaton (abbreviated as substitution NFA) is
a finite nondeterministic automaton A along with a substitution map

                                           δ : Q × X → P(Q × (Y ∪ F)),

where Q is the set of states of A, Y is a set of variables and for a set S, and P(S) is the power set of S. For
u ∈ Y ∪ F, if ( j, u) ∈ δ (i, x), it means that when the automaton A is in state i it can make transition to the
state j on reading variable x and replacing x by u. In our construction, Y is a set of commuting variables.

Remark 4.2. In fact, the nondeterministic substitution automaton we will design is more restrictive. It
has the following behavior: for i ∈ Q and x ∈ X if ( j, u), ( j, v) ∈ δ (i, x) then u = v. In other words, the
substitution made for variable x on transition from state i to state j is uniquely determined by the state
 j. To emphasize this dependence on j we denote the substitution by u j . Now, for x ∈ X we have the
|Q| × |Q| transition matrix Mx0 :

                                Mx0 (i, j) = u j , 1 ≤ i, j ≤ |Q|, if ( j, u j ) ∈ δ (i, x).                     (4.1)

   The substitution map δ is naturally extended to δ̂ : Q × X ∗ → P(Q × (Y ∪ F)∗ ). For a state j ∈ Q
and w0 ∈ (Y ∪ F)∗ , ( j, w0 ) ∈ δ̂ (i, w) means that the automaton starting state i, on input string w ∈ X ∗ can
nondeterministically move to state j by transforming the input string w to w0 on some computation path.

The output of a substitution NFA on a polynomial f ∈ FhXi
We first explain the output of a substitution NFA A on a degree D monomial w ∈ X D . Let s be a designated
initial state and t a designated final state of A. As defined in Equation (4.1), for each xi ∈ X we have a
|Q| × |Q| transition matrix Mx0 . Let
                                               w = xi1 xi2 · · · xiD ,
and define the matrix Mw0 as
                                               Mw0 = Mx0 i Mx0 i · · · Mx0 i .
                                                           1     2         D

   The output of the substitution NFA A on monomial w as input is defined as the (s,t)-th entry of the
matrix Mw0 , which is a polynomial in the variable set Y .

Remark 4.3. We can also think of the (s,t)-th entry of the matrix product Mw0 as the output of an algebraic
branching program of width |Q| and D layers, computing a polynomial in variable set Y .

    For f ∈ FhXi, the output of NFA A is defined as the (s,t)-th entry of the matrix f (Mx0 1 , Mx0 2 , . . . , Mx0 n )
obtained by substituting matrix Mx0 i for xi , 1 ≤ i ≤ n in the polynomial f . The output is clearly a polynomial
in variables Y .
    Alternatively, writing f ∈ FhXi as

                                               f = ∑ fw · w,         fw ∈ F,
                                                     w


                         T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                       11
           V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA


                      x0 → ξ1                 x0 → ξ2                  x0 → ξ3               x0 → ξk+1
                                 x0 → y0,1                 x0 → y0,2                  ···
                          q0                      q1                       q2                     qf
                                 x1 → y1,1                 x1 → y1,2                  ···
                      x1 → ξ 1                x1 → ξ2                  x1 → ξ3               x1 → ξk+1

                               Figure 1: The transition diagram of the automaton



note that the (s,t)-th entry of the |Q| × |Q| matrix f (Mx0 1 , Mx0 2 , . . . , Mx0 n ) is the polynomial
                                                                       !
                                               g = ∑ fw ·          ∑ w0 ,
                                                    w∈Wt         w0 ∈Wt0


where Wt0 = {w0 | (t, w0 ) ∈ δ̂ (s, w)}. The inner sum is over all w0 ∈ Wt0 , which are monomials output by
the substitution NFA A along different nondeterministic computation paths from state s to state t.
    Now we describe the construction of a substitution NFA A that substitutes, on its transition edges, new
commuting/noncommuting variables for the variables (x0 or x1 ) that it reads. Let Q = {q0 , q1 , q2 , . . . , qk }
be the states of automaton A and q0 be the initial state and qk = q f be the final state.
    We use the indices i1 , . . . , ik from Lemma 2.11 to define the transition of A. The set of indices partition
each monomial m into k + 1 blocks as follows.

                           m[1, i1 − 1]m[i1 ]m[i1 + 1, i2 − 1]m[i2 ] · · · · · · m[ik ]m[ik+1 , D],

where m[i] denotes the variable in i-th position of m and m[i, j] denotes the submonomial of m from
positions i to j.
     We define two new sets of variables. The block variables are a set of k + 1 commuting variables
{ξ1 , ξ2 , . . . , ξk+1 }. There are 2k many commuting index variables j∈[k] {y0, j , y1, j }.
                                                                         S

     Now we describe the transitions of the substitution NFA A. While A is reading input variables in
block j, it replaces each xb , b ∈ {0, 1} read by the block variable ξ j . It nondeterministically decides if
block j is over and the variable seen in the current position is an index in the isolating set. If so, A
replaces the input variable xb by the index variable yb, j , and it increments the block number to j + 1. Now
A will make its transitions in the ( j + 1)st block as described above. The substitution map for A is given
below.
For 0 ≤ i ≤ k − 1 and b ∈ {0, 1}:

                                     δ (qi , xb ) = {(qi , ξi+1 ), (qi+1 , yb,i+1 )},
                                     δ (qk , xb ) = {(qk , ξk+1 )}.

    Figure 1 depicts the automaton. The substitution NFA A is described by two (k + 1) × (k + 1)
transition matrices Mx0 and Mx1 as already explained.

                          T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                 12
        R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS


                                    ξ1              ξ2                    ξ3               ξk+1

                                           yb,1            yb,2                     ···
                                    q0              q1                    q2               qf

                       Figure 2: Transition diagram for the variable xb for b ∈ {0, 1}



   More precisely, for variable xb , we take the adjacency matrix Mxb of the labeled directed graph in
Figure 2, extracted from the automaton in Figure 1.
   The corresponding matrix Mxb of dimension (k + 1) × (k + 1), we substitute for xb is the following.
                                                                                          
                                        ξ1 yb,1 0                  ...         0      0
                                       0 ξ2 yb,2
                                                                  ...         0      0    
                                                                                           
                                       0   0   ξ3                 ...         0      0    
                                Mxb =  .
                                                                                          
                                            .    ..                ..          ..     ..
                                       ..  ..                                             
                                                 .                   .         .      .   
                                                                                           
                                       0   0   0                  . . . ξk yb,k           
                                        0   0   0                  . . . 0 ξk+1

   The rows and the columns of the matrices Mxb , b ∈ {0, 1} are indexed by the states of the automaton
and the entries are either block variables or index variables as indicated in the transition diagram. Let
                                                              t
                                               f (x0 , x1 ) = ∑ ci wi .
                                                             i=1

Recall that f (x0 , x1 ) is a polynomial with sparsity t, degree D and wi represents a monomial with
coefficient ci for i ∈ [t]. Define the matrix product wi (Mx0 , Mx1 ) obtained by substituting the matrix Mxb
for xb , b ∈ {0, 1} in the monomial wi , and multiplying these matrices. The following proposition is
immediate as f is a linear combination of the wi .

Proposition 4.4. M f = f (Mx0 , Mx1 ) = ∑ti=1 ci wi (Mx0 , Mx1 ).

    Now we are ready to prove Theorem 2.1.

Proof of Theorem 2.1. By Proposition 2.9, we can assume that the input polynomial (given by black-box)
is a bivariate polynomial f (x0 , x1 ) over noncommuting variables x0 and x1 of sparsity t. Let M 6= 0/ denote
the set of nonzero monomials of degree D occurring in f , where D = deg( f ). Then we can write the
polynomial f = ∑tj=1 c j w j in two parts

                                          f=      ∑ c jw j + ∑ c jw j,
                                               w j ∈M         w j 6∈M


where ∑w j ∈M c j w j is the homogeneous degree D part of f .

                        T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                               13
           V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

     Let us assume, without loss of generality, that w1 is in M and it is the monomial isolated in
Lemma 2.11. Let the isolating index set be I = {i1 , i2 , . . . , ik }, which means that for all w j ∈ M, w j |I 6= w1 |I
(i. e., the projections of each w j , ∀ j 6= 1 on index set I differs from the projection of w1 ). Let

                                                 w1 = xb1 xb2 · · · xbD ,

where b j ∈ {0, 1}.
    We will now analyze the polynomial computed at the (q0 , q f )-th entry of the matrix f (Mx0 , Mx1 ).
    Firstly, notice from the definition of the substitution NFA A that the only nondeterminism is in picking
the index set. Therefore, an index set J = { j1 , j2 , . . . , jk } nondeterministically picked by A determines a
unique computation path for it. Let w j ∈ M be a nonzero degree-D monomial of f . It is transformed by A
into a degree-D commutative monomial w j,J (which is over the block and index variables) as follows. Let

                                       ξJ = ξ1j1 −1 ξ2j2 − j1 −1 · · · ξk+1
                                                                        D− jk
                                                                              ,
                                     y j,J = ya1 ,1 ya2 ,2 · · · yak ,k , ai ∈ {0, 1},

where ai = b if ji -th variable of w j is xb . Then we have the following claim from the construction of A.
Claim 4.5. For the index set J = { j1 , j2 , . . . , jk } nondeterministically chosen by the substitution NFA A,
the monomial output by A on input w j is

                                                       w j,J = ξJ · y j,J .

    Notice that for two distinct index sets J and J 0 we clearly have

                                                           ξJ 6= ξJ 0 .

To see this, let j` be the first index where J and J 0 are different. Then the power of ξ` will be different in
ξJ and ξJ 0 . We thus have:
Claim 4.6. For any two index sets J, J 0 and any monomial w j ∈ M, the corresponding commutative
monomials w j,J and w j,J 0 are distinct.
    We also note that the degree-k monomial y j,J essentially encodes the projection of the degree-D
monomial w j to the k indices in J = { j1 , j2 , . . . , jk }. If variable xb occurs in position ji of monomial w j it
is encoded as variable yb,i in monomial y j,J .
Claim 4.7. For each w j ∈ M we note that the (q0 , q f )-th entry of the matrix w j (Mx0 , Mx1 ) is the sum

                                                  ∑ w j,J = ∑ ξJ y j,J
                                                   J               J

over all nondeterministically picked size-k index sets J ⊂ [D].
    To see the above claim we note that each computation path of the NFA A is determined by an index
set J along which the monomial ξJ y j,J is output by A. The matrix product w j (Mx0 , Mx1 ) sums up the
monomials produced over all the paths.

                          T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                        14
         R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

    Now, let fJ be the polynomial
                                       t
                                 fJ = ∑ c j w j,J =    ∑ c j w j,J + ∑ c j w j,J .
                                      j=1             w j ∈M           w j ∈M
                                                                           /

Claim 4.8. After the matrix substitution x0 = Mx0 and x1 = Mx1 in the polynomial f we note that the
(q0 , q f )-th entry of the matrix f (Mx0 , Mx1 ) is ∑J fJ .
     To see the above claim we need to change the order of summation in computing the (q0 , q f )-th entry
of the matrix f (Mx0 , Mx1 ). Notice that for a fixed index set J guessed by the NFA A, for each monomial
w j the NFA outputs w j,J . Hence, by linearity, fixing the guessed index set J, the NFA computes fJ on
input f . Summing over all guessed index sets J, it follows that (q0 , q f )-th entry of the matrix f (Mx0 , Mx1 )
is ∑J fJ .
     Finally, we focus on the monomial w1,I occurring in the polynomial ∑J fJ , where w1 is the isolated
monomial and I is the isolated index set.
Claim 4.9. The coefficient of w1,I in the polynomial ∑J fJ is c1 . As a consequence, the (q0 , q f ) entry of
the matrix M f = f (Mx0 , Mx1 ) which computes ∑J fJ is nonzero.
    To see the above claim we note the following:
    1. For a nonzero monomial w j ∈/ M of f , for each index set J of size k the contribution to the (q0 , q f )-
       th entry of the matrix f (Mx0 , Mx1 ) is a monomial of degree deg(w j ), and deg(w j ) < D. Hence,
       these monomials have no influence on the coefficient of w1,I in the polynomial ∑J fJ .
    2. Consider monomials w j ∈ M. Notice that w1,I = ξI y1,I and for j 6= 1 w j,I = ξI y j,I . Now,
                                           y1,I 6= y j,I , for all w j ∈ M \ {w1 },
       because I is an isolating index set for M and the monomial w1 is isolated. Therefore, w1,I 6= w j,I
       for all w j ∈ M \ {w1 }.
     It follows that the monomials w j ∈ M \ {w1 } also have no influence on the coefficient of w1,I in the
polynomial ∑J fJ .
     Hence, we conclude that the (q0 , q f )-th entry of the matrix M f = f (Mx0 , Mx1 ) is a nonzero polynomial
∑J J in the commuting variables j∈[k+1] {ξ j } and j∈[k] {y0, j , y1, j }. Moreover, the degree of polynomial
                                  S                     S
    f
∑J fJ is D. By the Polynomial Identity Lemma 3.3 it follows that the polynomial ∑J fJ is nonzero in a
suitable extension of size more than (n + 2)d of the field F.
     Since the polynomial f is nonzero over the algebra Mk+1 (F), it is also nonzero over the algebra
Mlogt+1 (F). This completes the proof of Theorem 2.1.


5    A deterministic PIT for regular circuits
In this section we consider polynomial identity testing for noncommutative +-regular circuits. As
mentioned earlier, these circuits can compute polynomials of exponential degree and a double-exponential
number of monomials. However, we exploit the circuit structure and obtain a white-box deterministic
polynomial-time identity test for +-regular circuits (Theorem 2.5).

                         T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                  15
          V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

Definition 5.1. A noncommutative circuit C, computing a polynomial in FhXi where X = {x1 , x2 , . . . , xn },
is +-regular if it satisfy the following properties:
   • The circuit is homogeneous. The gates are of unbounded fanin.
   • The + gates in the circuit are partitioned into layers (termed +-layers) such that if g1 and g2 are +
     gates in a +-layer then there is no directed path in the circuit between g1 and g2 .
   • All gates in a +-layer are of the same syntactic degree.
   • The output gate is a + gate.
   • Every input-to-output path in the circuit goes through a gate in each +-layer.
   • Additionally, we allow scalar edge labels in the circuit. For example, suppose g is a + gate in C
     whose inputs are gates g1 , g2 , . . . , gt such that βi ∈ F labels edge (gi , g), i ∈ [t]. If polynomial Pi is
     computed at gate gi , i ∈ [t], then g computes the polynomial ∑ti=1 βi Pi .
Remark 5.2. We draw the reader’s attention to the crucial fifth condition in Definition 5.1. An important
consequence of the condition is that each +-layer (other than the output and input) is a separating set for
the underlying DAG of the circuit C. This property allows us to do a deterministic depth reduction of
such circuits, preserving polynomial identity.
    We list below some observations about +-regular circuits, and also introduce some notation in the
process.

   1. In a +-regular circuit, for any + gate g of degree more than 1, we can assume that all inputs to g
      are ×-gates. This can be effected with a simple transformation to the circuit without increasing its
      size.
   2. The +-depth of a +-regular circuit C is the number of +-layers in it. Let d be the +-depth of C.
      We number the +-layers of C from bottom upward. For 1 ≤ i ≤ d, let L+  i denote the set of + gates
      in the i-th +-layer, and let L×
                                    i denote the set of ×-gates which are inputs to gates in L+
                                                                                              i . Notice
                         ×
      that all gates in Li and Li have the same syntactic degree Di . The degree of the circuit C is Dd .
                                 +


   3. Continuing from above, we can assume that all + gates of degree Di in the circuit are in layer L+
                                                                                                      i .
      Similarly, all ×-gates of degree Di in the circuit are in L×
                                                                 i .
   4. It is clear from the structure of circuit C that the part of C between +-layers L+   i and Li+1 is a
                                                                                                    +
                                                                                                        ×
      multiplicative circuit whose inputs are the gates in layer L+
                                                                  i and outputs are the gates in layer Li+1 ,
      1 ≤ i ≤ d − 1. We note that this property follows from the fifth condition in Definition 5.1.
                                                                                                  ×
   5. Finally, consider the bottommost +-layer L+  1 . We assume without loss of generality that L1 are
      the input variables, and the gates in L1 compute homogeneous linear forms. We can ensure this by
                                             +

      adding a dummy +-layer as the first layer 1.

Remark 5.3. The term “regular formulas” is also used in a somewhat different sense by Kayal et al. [13].
We note that their model of regular formulas is much more restricted than the +-regular circuits considered
here.

                        T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                     16
         R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

    In our study of +-regular circuits, it is important to consider certain special +-regular circuits of
+-depth 2. We term these as homogeneous ΣΠ∗ Σ circuits (as already mentioned in Section 2), in analogy
with the usual ΣΠΣ arithmetic circuits (see, e. g., [25]).
Definition 5.4. ΣΠ∗ Σ circuit are special +-regular circuits of +-depth 2 defined as follows:
    • The output gate is a + gate (which constitutes the second +-layer).
    • All inputs to the output gate are × gates of the same syntactic degree.
    • The gates in the first +-layer compute homogeneous linear forms (taking as input the circuit inputs).
    We will require the following definition of multiplicative circuits.
Definition 5.5 (Multiplicative Circuit). Let Γ = {y1 , y2 , . . . , yn } be a finite alphabet. A multiplicative
circuit over the free monoid (Γ∗ , .) is a directed acyclic graph with each indegree zero nodes labeled by
letters from Γ (the inputs to the circuit). Internal nodes are concatenation gates of fanin two: its two
inputs are designated left and right child, and the gate concatenates the output of the left child with the
output of the right child. A gate of the circuit is designated the output gate.
Remark 5.6.
    • Multiplicative circuits computing words as described above are also known as a compressed
      context-free grammars, see, e. g., [16].
    • If we let Γ be the set {y1 , y2 , . . . , yn } of noncommuting variables, then a multiplicative circuit can
      be seen as a special kind of noncommutative arithmetic circuit that computes a single monomial in
      this variable set. The internal concatenation gates of the circuit are × gates.
    • We sometimes allow a multiplicative circuit C to have multiple output gates. Thus, at each output
      gate g, the output Cg (y1 , y2 , . . . , yn ) computed by C is a monomial (i. e., a word) in variables
      y1 , y2 , . . . , yn .
    • We can define Π∗ Σ circuit as multiplicative circuits with homogeneous linear forms as input. More
      precisely, let C(y1 , y2 , . . . , yn ) be a multiplicative circuit with output gate g, and L1 , L2 , . . . , Ln be
      homogeneous linear forms in noncommuting variables x1 , x2 , . . . , xn . Then Cg (L1 , L2 , . . . , Ln ) is a
      Π∗ Σ circuit.
    The following theorem is crucial to our PIT algorithm for +-regular circuits. It will allow us
to decompose +-regular circuits of +-depth d into +-regular circuits of smaller +-depth, preserving
homogeneous polynomial identities.
Theorem 5.7. Let T (z1 , z2 , . . . , zs ) be a noncommutative homogeneous degree-d polynomial over a
field F in noncommuting variables z1 , z2 , . . . , zs . Let R1 , R2 , . . . , Rs be noncommutative homogeneous
degree d 0 polynomials in variables x1 , x2 , . . . , xn over F such that {R1 , R2 , . . . , Rk } are a maximal linearly
independent subset of {R1 , R2 , . . . , Rs } over F, where
                                           k
                                   R j = ∑ α ji Ri ,   k + 1 ≤ j ≤ s,      α ji ∈ F.
                                          i=1


                          T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                        17
             V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

For fresh noncommuting variables y1 , y2 , . . . , yk define linear forms

                                                           k
                                                   ` j = ∑ α ji yi ,      k + 1 ≤ j ≤ s.
                                                          i=1


Then T (R1 , R2 , . . . , Rs ) ≡ 0 if and only if T (y1 , y2 , . . . , yk , `k+1 , . . . , `s ) ≡ 0.

Proof. The reverse implication is immediate. For, suppose T (y1 , y2 , . . . , yk , `k+1 , . . . , `s ) ≡ 0. Then, by
substituting Ri for yi , 1 ≤ i ≤ k we obtain T (R1 , R2 , . . . , Rs ) ≡ 0.
    We now show the forward implication. Suppose T (R1 , R2 , . . . , Rs ) ≡ 0. As R1 , R2 , . . . , Rk are linearly
independent over F we can find degree-d 0 monomials m1 , m2 , . . . , mk such that the k × k matrix B of their
coefficients is of full rank. More precisely, if β ji is the coefficient of mi in R j then the matrix

                                                               B = (β ji )1≤ j,i≤k

is full rank.
     Define polynomials


                                                               k
                                                    R0j = ∑ β ji mi , 1 ≤ j ≤ k,                                                         (5.1)
                                                           i=1
                                                            k
                                                    R0j = ∑ α ji R0i , k + 1 ≤ j ≤ s.                                                    (5.2)
                                                           i=1


   Notice that T (R1 , R2 , . . . , Rs ) ≡ 0 implies T (R01 , R02 , . . . , R0s ) ≡ 0. This is because every nonzero mono-
mial occurring in T (R01 , R02 , . . . , R0s ) precisely consists of all monomials from the set {m1 , m2 , . . . , mk }d
occurring in T (R1 , R2 , . . . , Rs ) (with the same coefficient).
   Replacing mi by variable yi , 1 ≤ i ≤ k transforms each R0j to linear forms

                                                            k
                                                    `0j = ∑ β ji yi , for 1 ≤ j ≤ k,
                                                           i=1

and
                                                     k             k
                                             `0j = ∑ α ji ∑ βiq yq , for k + 1 ≤ j ≤ s.
                                                    i=1        q=1

     Note that the coefficient of any monomial yi1 yi2 . . . yid in T (`01 , `02 , . . . , `0k , `0k+1 , . . . , `0s ) is the same as
the coefficient of the corresponding monomial mi1 mi2 . . . mid in T (R01 , R02 , . . . , R0s ) which is zero. Hence
T (`01 , `02 , . . . , `0k , `0k+1 , . . . , `0s ) ≡ 0. Now, since B is invertible, we can apply the linear map B−1 to each
of the d positions in the polynomial T (`01 , `02 , . . . , `0k , `0k+1 , . . . , `0s ) and obtain T (y1 , y2 , . . . , yk , `k+1 , . . . , `s ),
which must be identically zero by Proposition 3.1. This completes the proof.

                               T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                                          18
         R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

Outline of the algorithm
In the following lemma we will apply Theorem 5.7 to decompose +-regular circuits. Using this we will
be able to design a deterministic polynomial-time PIT algorithm for +-regular circuits.
Lemma 5.8. Suppose C is a +-regular circuit of size s, syntactic degree D, and +-depth d, computing a
polynomial in FhXi. For i ∈ [d], let L×        2 = {g1 , g2 , . . . , gm }. Suppose, P1 , P2 , . . . , Pm are the polynomials
computed at the gates g1 , g2 , . . . , gm , respectively. Assume, without loss of generality, that {P1 , P2 . . . , Pt }
is a maximal F-linearly independent subset of {P1 , P2 , . . . , Pm }, and
                                                     t
                                            Pj = ∑ α ji Pi ,         t + 1 ≤ j ≤ m.
                                                    i=1

Let C0 be the circuit obtained from C as follows:
   (i) Delete all gates below L×
                               2.

  (ii) Replace g1 , g2 , . . . , gm by input variables y1 , y2 , . . . , ym , respectively.
    Then C0 (y1 , y2 , . . . , ym ) is a +-regular circuit of +-depth d − 1 and size at most s, computing a
homogeneous polynomial in y1 , y2 , . . . , ym . Let C00 be the circuit obtained from C0 by replacing y j by the
linear form ∑ti=1 α ji yi , for each t + 1 ≤ j ≤ m. Then the circuit C is identically zero if and only if the
circuit C00 is identically zero.
Proof. The proof is an easy consequence of Theorem 5.7. Let P(x1 , x2 , . . . , xn ) ∈ FhXi be the homo-
geneous degree D polynomial computed by C, where X = {x1 , x2 , . . . , xn }. Let T (y1 , y2 , . . . , ym ) be the
polynomial computed by the circuit C0 . Let P1 , P2 , . . . , Pm be the polynomials computed by the gates
g1 , g2 , . . . , gm , respectively. Since C0 is also a +-regular circuit, T is homogeneous of degree,say, D0 . Each
Pi is homogeneous of degree D00 = D/D0 . It follows from the structure of C that

                                         P(x1 , x2 , . . . , xn ) = T (P1 , P2 , . . . , Pm ).

    Now, applying Theorem 5.7 to the polynomials P, T, P1 , P2 , . . . , Pm , it follows that the circuit C is
identically zero if and only if the circuit
                                                           t                     t
                                   C0 (y1 , y2 , . . . , yt , ∑ αi+1i yi , . . . , ∑ αmi yi ) ≡ 0.
                                                          i=1                  i=1

    Given a size s +-regular circuit C of +-depth d, we can apply Lemma 5.8 to transform C to another
+-regular circuit C0 of size s and +-depth d − 1 such that

                                               C ≡ 0 if and only if C0 ≡ 0.

   The pseudo-code for this procedure Prune-plus-depth is given in Algorithm A. This procedure
runs in polynomial time assuming that we have a deterministic polynomial-time subroutine Basis that
computes a maximum F-linearly independent subset of a set of polynomials {P1 , P2 , . . . , Pm } ⊂ FhXi,
where each Pi is given by a Π∗ Σ circuit of size at most s computing it. Without loss of generality, suppose

                           T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                           19
           V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

{P1 , P2 , . . . , Pt } is the maximum linearly independent subset computed by Basis. We will assume that the
subroutine Basis will also compute scalars α ji ∈ F such that

                                                 t
                                          Pj = ∑ α ji Pi ,   t + 1 ≤ j ≤ m.
                                                i=1


      In the next subsection we will describe subroutine Basis, which will complete the overall algorithm.


  Prune-plus-depth

  Input: A +-regular circuit C of degree D, size s, and +-depth d.
  Output: A +-regular circuit C0 of +-depth d − 1 and size at most s such that C ≡ 0 iff C0 ≡ 0.

       1. Let L×
               2 = {g1 , g2 , . . . , gm } and Pi be the polynomial computed at gi , 1 ≤ i ≤ m.

       2. Notice that each Pi is computed by a Π∗ Σ circuit. Using subroutine Basis we compute a
          maximum linearly independent subset, say, {P1 , P2 , . . . , Pt } of {Pi | 1 ≤ i ≤ m}, and also
          compute scalars α ji ∈ F such that:
                                                       t
                                              Pj = ∑ α ji Pi ,   t + 1 ≤ j ≤ m.
                                                      i=1


       3. Construct the pruned circuit C00 as follows: in circuit C remove all gates below L×
                                                                                            2 , and replace
          gi by yi , 1 ≤ i ≤ t. These variables are the new inputs to C0 .

       4. Furthermore, in C, let the 2nd +-layer gates be L+         2 = {h1 , h2 , . . . , hq }. These gates com-
          pute linear combinations of the gates gi , 1 ≤ i ≤ m. We do the rewiring so that in C00 the
          gates h1 , h2 , . . . , hq compute the appropriate linear combinations of y1 , y2 , . . . , yt . Notice that
          h1 , h2 , . . . , hq becomes the first +-layer for C0 .

       5. Output C0 .



                                                      Algorithm A



5.1     Linear independence testing of Π∗ Σ circuits
In this subsection we solve the above mentioned linear independence testing problem. Namely, given as
input Π∗ Σ circuits computing noncommutative polynomials P1 , P2 , . . . , Pm ∈ FhXi, we give a deterministic
polynomial-time algorithm that finds a maximal linearly independent subset A of these polynomials , and
express the others as F-linear combinations of the Pi in A.

                          T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                       20
         R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

      We first need to analyze Π∗ Σ circuits. Let {L1 , L2 , . . . , Lt } be the set of homogeneous linear forms (in
noncommuting variables x1 , x2 , . . . , xn ) defined by the bottom Σ layers of the given Π∗ Σ circuits computing
polynomials Pi , 1 ≤ i ≤ m.
      Without loss of generality, let {L1 , L2 , . . . , Lr } be a maximal subset of {L1 , L2 , . . . , Lt } such that Li
and L j are not scalar multiples of each other for 1 ≤ i < j ≤ r. Thus, for each Li , i > r there is some
L j , j ≤ r such that Li is a scalar multiple of L j . Therefore, we can express each Pi as a product of linear
forms from L1 , . . . , Lr , upto a scalar multiple:

                     Pi = αi Li1 Li2 . . . LiD ,   1 ≤ i ≤ m, ∀u : Liu ∈ {L1 , . . . , Lr }, αi ∈ F.

    Thus, each of these polynomials Pi is, upto a scalar multiple, a product of D many linear forms from
the set {L1 , L2 , . . . , Lr }.

Proposition 5.9. Suppose polynomials Pi and Pj factorize as

                                               Pi = αi Li1 Li2 . . . LiD ,
                                               Pj = α j L j1 L j2 . . . L jD ,

where ∀u : Liu , L ju ∈ {L1 , . . . , Lt }. Then, Pi and Pj are scalar multiples of each other iff L ju = Liu , 1 ≤
u ≤ D.

Proof. The polynomial Pi and Pj are homogeneous degree D polynomials in the free noncommutative
ring FhXi. Now, it turns out that homogeneous polynomials in FhXi have unique factorization (see,
e. g., [5, Theorem 3.4]). Furthermore, the irreducible factors are all homogeneous and their order is also
uniquely determined (only scalar multiples can vary). The proposition immediately follows from this
observation.

     In order to analyze Π∗ Σ circuits further, it is useful to think of a Π∗ Σ circuit computing a product
of linear forms from {L1 , L2 , . . . , Lr } as a multiplicative circuit over the letters {L1 , L2 , . . . , Lr }. More
formally, corresponding to the linear forms L1 , L2 , . . . , Lr we define an alphabet Γ = {a1 , a2 , . . . , ar } of r
letters, where ai stands for Li , 1 ≤ i ≤ r.
     Clearly, a multiplicative circuit over (Γ∗ , .) computes a word in Γ∗ . Conversely, given a word w ∈ Γ∗
we could ask for its compression as a multiplicative circuit Cw . Such word compressions are well-studied
in the literature and are closely related to the Lempel–Ziv text compression [16, 21, 17]. Some basic
algorithmic results in this area turn out to be useful for our algorithm. We formally state these below.

Theorem 5.10 (Lohrey, Plandowski, Mehlhorn et al.). Let Γ = {a1 , a2 , . . . , ar } be a finite alphabet.

    • There is a deterministic polynomial-time algorithm that takes as input two multiplicative circuits
      Ci and C j over Γ and tests if the words computed by them are identical. If not the algorithm returns
      the leftmost index k where the two words differ.

    • Given a word w by a multiplicative circuit C as input over Γ, the following tasks can be done in
      deterministic polynomial time:

         (i) computing the length |w| of w.

                          T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                        21
           V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

         (ii) given as input an index k in binary computing the k-th letter w[k].
        (iii) Multiplicative circuit Ck,k0 for the subword w[k . . . k0 ] for indices k and k0 given in binary. In
              particular, it follows that the circuit Ck,k0 is of size polynomial in the sizes of C, k and k0 . The
              parameters k, k0 are given in binary.

Remark 5.11. For 1 ≤ i ≤ m, we transform the Π∗ Σ circuit computing Pi into a multiplicative circuit Ci
computing a word wi of length D in {a1 , a2 , . . . , ar }D as follows: replace linear form L j in the Π∗ Σ circuit
by letter ak if L j is a scalar multiple of Lk , for k ∈ [r]. Clearly, Ci has the same number of multiplication
gates as the Π∗ Σ circuit for Pi .

    The following lemma is immediate.

Lemma 5.12.

    • The polynomials Pi and Pj are scalar multiples of each other if and only if the multiplicative circuits
      Ci and C j compute the same word over {a1 , a2 , . . . , ar } (i. e., iff wi = w j ).

    • Thus, given as input Π∗ Σ circuits computing polynomials Pi and Pj , we can check if Pi and Pj are
      scalar multiples of each other in deterministic polynomial time.

Proof. The first part of the lemma follows directly from Proposition 5.9. The second part follows
immediately from Theorem 5.10.

    Our aim is to efficiently determine a maximal linearly independent subset A of P1 , P2 , . . . , Pr , and
express each Pi 6∈ A as a linear combination of polynomials in A. The difficulty is that the Pi , which are
given by Π∗ Σ circuits of size s, can have degree exponential in s.
    Our approach, that circumvents this difficulty, will crucially require a rank bound due to Saxena and
Seshadhri [25] obtained in their study of commutative ΣΠΣ arithmetic circuits. We recall the Saxena–
Seshadhri result. Consider a ΣΠΣ arithmetic circuit C0 , where the top Σ gate has fanin k, all Π gates are of
fanin D, and each Π gate computes a product Qi , 1 ≤ i ≤ k of homogeneous F-linear forms in commuting
variables y1 , y2 , . . . , yn . Circuit C0 is said to be simple if the gcd(Q1 , Q2 , . . . , Qk ) = 1. Circuit C0 is said to
be minimal if for any proper subset S ⊂ [k] the sum ∑i∈S Qi 6= 0.

Theorem 5.13 (Saxena, Seshadhri). Let C0 be a simple and minimal ΣΠΣ arithmetic circuit of the form

                                                                       k   D
                                            C0 (y1 , y2 , . . . , yn ) = ∑ ∏ Li j ,
                                                                     i=1 j=1

where Li j are homogeneous linear forms in commuting variables y1 , y2 , . . . , yn . If the polynomial computed
by C0 is identically zero then the rank of the set {Li j | 1 ≤ i ≤ k, 1 ≤ j ≤ D} of all F-linear forms occurring
at the bottom Σ layer of the circuit is bounded by O(k2 log D).

   This line of research, analyzing the rank of linear forms occurring in ΣΠΣ identities, was initiated by
Dvir and Shpilka [10].

                           T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                          22
        R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

Noncommutative to commutative
For applying the rank bound of Theorem 5.13, we make the polynomials Pi , 1 ≤ i ≤ ` commutative and
set-multilinear in variables {xim | 1 ≤ i ≤ n, 1 ≤ m ≤ D} as follows:
    For each linear form L j = ∑ni=1 γ ji xi , 1 ≤ j ≤ r define D many linear forms:
                                                       n
                                             L0jm = ∑ γ ji xim , 1 ≤ m ≤ D,
                                                      i=1

where xim , for 1 ≤ i ≤ n and 1 ≤ m ≤ D are new distinct commuting variables. Define polynomials P̂i
from Pi as
                                                               D
                                                                00
                                                       P̂i = ∏ Lim ,
                                                             m=1
        00 = L0 if L = L , j ∈ [r]. In other words, we obtain the set-multilinear polynomial P̂ from P
where Lim      jm    im     j                                                                  i      i
by replacing the m-th linear form, say L j , with L0jm , for 1 ≤ m ≤ D.3
    The following observation follows easily from the above definition of the polynomials P̂i .
Lemma 5.14. Consider the set-multilinear polynomials P̂i , 1 ≤ i ≤ m defined above. A linear form L0
divides the gcd of {P̂i | i ∈ S} for a subset S ⊆ [m] if and only if L0 = L0jm for some j ∈ [r] and m ∈ [D],
and the linear form L j occurs in the m-th position of each Pi , i ∈ S.
Proof. If L0 divides gcd of {P̂i | i ∈ S}, it follows that L0 = Lim
                                                                 00 for some m ∈ [D] for each i. But given i,
 00 = L0 for some j ∈ [r]. It follows that L must occur at m-th position in each of the noncommutative
Lim     jm                                      j
products Pi . The reverse implication is also immediate.

    The next lemma explains the importance of Theorem 5.13 to the problem of computing a maximum
independent subset of {Pi | i ∈ [m]}.
Lemma 5.15. Suppose the polynomial Pi can be expressed as an F-linear combination of the polynomials
P1 , P2 , . . . , Pi−1 . Let
                             S = { j ∈ [i − 1] | deg(gcd(P̂i , P̂j )) ≥ D − c · m4 },
where c > 0 is some absolute constant. Then Pi can be expressed as a linear combination of polynomials
from the subset {Pj | j ∈ S}.
Proof. Suppose Pi is expressible as an F-linear combination of P1 , P2 , . . . , Pi−1 . Let S ⊆ [i − 1] be a
minimal subset such that we can write

                                          Pi = ∑ γ j Pj ,     γ j 6= 0     for all j.
                                                j∈S

Now, consider the set-multilinear circuit C0 defined by the sum of products

                                                       P̂i − ∑ γ j P̂j .
                                                             j∈S

  3 The conversion to the set-multilinear polynomial is only for the sake of analysis and not for the actual algorithm.



                          T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                            23
          V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

By minimality of subset S, circuit C0 is minimal. Suppose for some j ∈ S, Pi and Pj disagree on ρ
positions, where ρ > `4 . That is, for more than `4 positions m, the linear forms occurring in the m-th
position in Pi and Pj are different. Let the gcd of the polynomials in the set {P̂j | j ∈ S} ∪ {P̂i } be P, and
let deg(P) = δ . By Lemma 5.14 it follows that

                                                  δ ≤ D − ρ.

Define polynomials Qi = P̂i /P and Q j = P̂j /P, j ∈ S. Notice that each Qi is a product of D − δ linear
forms. Furthermore,
                                  P̂i − ∑ γ j P̂j = P(Qi − ∑ γ j Q j ).
                                           j∈S                  j∈S

Consider the simple and minimal circuit C00 defined by the sum

                                                 Qi − ∑ γ j Q j .
                                                       j∈S


Clearly, C0 is zero iff C00 is zero. Since C00 is a set-multilinear circuit, the rank of the set of all F-linear
forms is at least D − δ (as linear forms L0jm and L0j0 m0 for m 6= m0 have disjoint support). Now, if C00 ≡ 0
then by the Saxena-Seshadhri rank bound [25] we have ρ ≤ D − δ ≤ O(`2 log(D − δ )). Hence, it follows
from the inequality log2 x ≤ x1/2 that ρ ≤ c · `4 , for some constant c > 0. Thus, for each j ∈ S, Pi and Pj
can disagree on at most c · `4 positions. This proves the lemma.

   Using Theorem 5.10 we can efficiently determine if Pi and Pj given by Π∗ Σ circuits disagree on at
most, say, q positions, q given in unary.

Lemma 5.16. Let Pi and Pj be noncommutative homogeneous degree-D polynomials in FhXi, given as
Π∗ Σ circuits of size at most s, where the first +-layer computes homogeneous linear forms L1 , L2 , . . . , Lr
which are not scalar multiples of each other. Then in deterministic time polynomial in s, n, q we can
determine if Pi and Pj differ in at most q positions, and if so, we can also determine those positions and
the linear forms at those positions.

Proof. The proof is a simple application of Theorem 5.10. First, as explained in Remark 5.11 we obtain
size s multiplicative circuits Ci and C j over letters ai corresponding to the linear forms Li , 1 ≤ i ≤ r.
Let wi and w j be the length D words computed by Ci and C j respectively. Applying the algorithm of
Theorem 5.10 we can check if Ci and C j compute identical words or find the leftmost index k such that
wi [k] 6= w j [k]. By Theorem 5.10 we can compute circuits Ci0 and C0j for the suffixes wi [k + 1 · · · D] and
w j [k + 1 · · · D] of wi and w j starting at (k + 1)-th position, and again look for the leftmost index where
they differ. We need to run this iteration at most q + 1 times to determine all q positions where Pi and Pj
differ, and if so, also find the linear forms at those locations.

    Before we design the algorithm for finding a maximum independent subset of {Pi | i ∈ [m]}, we need
an observation which is a simple consequence of the Raz-Shpilka deterministic polynomial-time PIT
algorithm [23] for noncommutative algebraic branching programs (see Definition 1.4).

                        T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                 24
         R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

Lemma 5.17. Let P0 , P10 , P20 , . . . , Ps0 be homogeneous degree-d noncommutative polynomials in FhXi,
each given by an ABP as input. Then, in deterministic polynomial time we can determine if P0 can be
expressed as an F-linear combination of the Pi0 , 1 ≤ i ≤ s, and if so, we can compute scalars αi ∈ F such
that
                                                           s
                                                  P0 = ∑ αi Pi0 .
                                                       i=1

Proof. Let B be the input ABP for P0 , and Bi be the given ABP for Pi0 , 1 ≤ i ≤ s, respectively. Let
X = {x1 , x2 , . . . , xn }.
      For each monomial m ∈ X d , let vm ∈ Fs denote the vector of the coefficients of m in the polynomials
Pi , 1 ≤ i ≤ s. More precisely, vm [i] is the coefficient of m in Pi0 , 1 ≤ i ≤ s. Let cm denote the coefficient of
  0

m in the polynomial P0 . Then we observe that
                                                            s
                                                  P0 = ∑ αi Pi0
                                                           i=1

if and only if for all monomials m ∈ X d
                                                       s
                                               cm = ∑ vm [i] · αi .                                          (5.3)
                                                     i=1

     Equation (5.3) cannot direct give an efficient algorithm as there are nd many equations to contend
with. However, we will show, exploiting the fact that the Pi0 are given as input by ABPs, that we can
efficiently reduce the number of equations to polynomially many, which can then be efficient solved for
the αi .
     Let y and z be two fresh noncommuting variables. Consider the noncommutative polynomial
P̂ = ∑si=1 yPi0 z. It is homogeneous of degree d + 2 and is computed by an ABP B̂ obtained by combining
the ABPs Bi , 1 ≤ i ≤ s with a new start node and sink node as follows:

    • Create a new start node s0 and introduce directed edges from s0 to the start node of Bi labeled by
      variable y for 1 ≤ i ≤ s.

    • Create a new sink node t0 and introduce directed edges from the sink node of Bi to t0 labeled by
      variable z for 1 ≤ i ≤ s.

    Clearly, B̂ computes the polynomial P̂.
    Based on Raz and Shpilka’s work [23], we now define a Raz-Shpilka basis for layer i of the ABP
P̂. Let the number of nodes at layer i be ni and let {p1 , p2 , . . . , pni } be the polynomials computed at
the nodes. Consider the ni × ni matrix Li whose rows are indexed by degree-i monomials, and the j-th
column, 1 ≤ j ≤ ni , is the coefficient vector of p j . A Raz-Shpilka basis for layer i is a set of at most ni
degree-i monomials such that the corresponding rows of Li is a basis for the row space of Li . Notice that
every degree-i monomial is identified with a row vector of Li . Following Raz and Shpilka [23], as also
elaborated subsequently [6, Theorem 3.4], we can compute a Raz-Shpilka basis for every layer of the
given ABP P̂ in deterministic polynomial time. In particular, consider a Raz-Shpilka basis M for the
(d + 1)-th layer of P̂ computed by the algorithm. The (d + 1)-th layer of P̂ consists of the s sink nodes of

                         T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                   25
          V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

the ABPs B1 , B2 , . . . , Bs . Thus, M is a collection of at most s monomials. Without loss of generality, we
can assume that M = {ym1 , ym2 , . . . , yms }, where mi ∈ X d , 1 ≤ i ≤ s.
     Since M is a Raz-Shpilka basis, it follows that for any monomial m ∈ X d the coefficient vector vm
is in the linear span of {vmi | 1 ≤ i ≤ s}. Next, we can compute the coefficient cmi of monomial mi
in the polynomial P0 (from the input ABP B) in deterministic polynomial time [23, 6]. It follows that
α1 , α2 , . . . , αs satisfies Equation (5.3) if and only if it satisfies the following s linear equations:

                                                 s
                                        cm j = ∑ vm j [i] · αi , 1 ≤ j ≤ s.                                (5.4)
                                               i=1


   Now, using Gaussian elimination, in polynomial time we can check feasibility of Equation (5.4) and
compute a solution.

    We are now ready to prove the result of this subsection.

Theorem 5.18. Given as input Π∗ Σ circuits computing noncommutative polynomials P1 , P2 , . . . , Pm ∈
FhXi, there is a deterministic polynomial-time algorithm that will find a maximal linearly independent
subset A of the polynomials Pi , 1 ≤ i ≤ m, and also express the others as F-linear combinations of the Pi
in A.

Proof. The algorithm for computing a maximal linearly independent subset A of P1 , P2 , . . . , Pm works as
follows: Include P1 in A. For i ≥ 2, include Pi in A iff it is linearly independent of {P1 , P2 , . . . , Pi−1 }.
    Thus, the problem boils down to checking if Pi is linearly independent of P1 , P2 , . . . , Pi−1 . Following
Lemma 5.15 let
                              S = { j ∈ [i − 1] | deg(gcd(P̂i , P̂j )) ≥ D − c · m4 },

where D is the degree of each of the polynomials Pi , 1 ≤ i ≤ m and c > 0 is some absolute constant. By
Lemma 5.15 it suffices to check if Pi is linearly independent of {Pj | j ∈ [i − 1] such that j ∈ S}.
    By Lemma 5.16 we can efficiently compute the subset S, and the subset I ⊂ [D] of at most c · m5
many positions such that Pi agrees with every Pj , j ∈ S in all positions in [D] \ I. Define Pi0 = ∏u∈I Liu
and Pj0 = ∏u∈I L ju for all j ∈ S. The following claim is immediate. It follows from the fact that for every
position u ∈ [D] \ I the linear forms Liu and L ju , j ∈ S are all identical.

Claim 5.19. Pi is linearly independent of {Pj | j ∈ S} if and only if Pi0 is linearly independent of
{Pj0 | j ∈ S}.

    Thus it suffices to check if Pi0 is linearly independent of {Pj0 | j ∈ S}. Each of these polynomials is a
product of at most cm5 many linear forms. Since a product of linear forms can clearly be seen as an ABP,
we can apply the algorithm described in Lemma 5.17 to check this in polynomial time.
    To conclude the overall proof we note that Algorithm B below can be applied to determine the
leftmost maximal linearly independent subset A of the input polynomials P1 , . . . , Pm and also express the
others as linear combinations of polynomials in A.

                        T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                  26
        R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS


Linear-Span

Input: Π∗ Σ circuits for polynomials P1 , P2 , . . . , Pi .
Output: Check whether Pi can be expressed as a linear combination of {P1 , P2 . . . , Pi−1 } or not. If so,
find such an expression.

   1. Let Li , i ∈ [r] be the linear forms computed at the bottom +-layer of the input circuits such that
      they are not multiples of each other.

   2. Find corresponding multiplicative circuits C j , j ≤ i over letters ai , i ∈ [r] corresponding to the
      Li . Using Lemma 5.16, determine S ⊆ [i − 1] such that Ci and C j differ in at most cm4 positions
      and find the letters occurring in those positions.

   3. Let I ⊂ [D] be the set of at most cm5 positions where Pi differs from some Pj , j ∈ S, and
      compute the polynomials Pi0 and Pj0 , j ∈ S by dropping the linear forms occurring in all other
      positions.

   4. Using Lemma 5.17 determine if Pi0 is can be expressed as a linear combination of Pj0 , j ∈ S. If
      so, then the same linear expression will hold for Pi and Pj , j ∈ S and is the output.



                                               Algorithm B


 Finally, we describe Algorithm C, which is the basis finding algorithm.


Basis

Input: Π∗ Σ circuits for polynomials P1 , P2 , . . . , Pm .
Output: A maximum F-linearly independent subset A of {P1 , P2 , . . . , Pm }.

   1. A := {P1 }.

   2. For i = 2 to m do: If Pi is linearly independent of {P1 , P2 , . . . , Pi−1 } then A := A ∪ {Pi }. (This
      is tested using Algorithm B).

   3. Output A.



                                               Algorithm C




                      T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                   27
          V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

    The main result of this section is now a simple consequence of Lemma 5.8.


Proof of Theorem 2.5. Let the input +-regular circuit C be of size s and +-depth d. By Lemma 5.8, using
Algorithm A we can transform, in deterministic polynomial time, C into a +-regular circuit C0 of size
bounded by s and +-depth d − 1 such that C ≡ 0 if and only if C0 ≡ 0.
    On repeated application we finally obtain a ΣΠ∗ Σ circuit Ĉ of size at most s, for which we need to do
polynomial identity testing. The polynomial P̂ computed by Ĉ is of the form

                                                        m
                                                 P̂ = ∑ αi Pi ,                                           (5.5)
                                                       i=1


where each Pi is computed by a Π∗ Σ circuit of size at most s, and Pm = ∏Dj=1 L jm , where each L jm is from
the set of linear forms {L1 , L2 , . . . , Lt },t ≤ s occurring at the bottom +-layer of circuit C.
    We can now apply Theorem 5.18 to find a maximal linearly independent subset {Pj | j ∈ S}, S ⊆ [m]
of {P1 , P2 , . . . , Pm }, and express each Pi , i 6∈ S as a linear combination of polynomials in {Pj | j ∈ S}.
Substituting these expressions in Equation (5.5) and collecting coefficients we will obtain

                                                 P̂ = ∑ β j Pj ,
                                                       j∈S


which, by linear independence of {Pj | j ∈ S} is identically zero iff each β j is zero.


6    Black-box randomized PIT for homogeneous ΣΠ∗ Σ
In this section we give a randomized black-box PIT algorithm for ΣΠ∗ Σ circuits.
    By Theorem 2.5 we can test if a given homogeneous ΣΠ∗ Σ circuit (white-box) is identically zero in
deterministic polynomial time. However, suppose we have only black-box access to a ΣΠ∗ Σ circuit C
computing a polynomial in FhXi. In other words, we can evaluate C on square matrices Mi substituted
for xi , 1 ≤ i ≤ n, where the cost of an evaluation is the dimension of the Mi . The results in Section 5
exploit the input circuit C, and hence are not applicable. Specifically, C may compute an exponential
degree noncommutative polynomial, but it is not clear if we can test whether C ≡ 0 by evaluating it on
polynomial dimension matrices. Neither is the black-box PIT of Section 4 applicable, since ΣΠ∗ Σ circuits
can compute polynomials with a double-exponential number of monomials.
    Suppose C is an s-sum P1 + P2 + · · · + Ps of D-products of linear forms in variables X. That is,

                                              Pi = Li1 Li2 . . . LiD ,

where D is exponential. Then we show that C 6≡ 0 iff C is nonzero on random O(s) × O(s) matrices
with entries from F or a suitably large extension of F. The proof is based on the notion of projected
polynomials defined below.

                        T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                 28
         R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

6.1   Projected polynomials
Definition 6.1. Let P ∈ FhXi be a homogeneous degree-D polynomial. For an index set I ⊆ [D] the
I-projection of polynomial P is the polynomial PI which is defined by letting all variables occurring in
positions indexed by the set I as noncommuting. In all other positions we make the variables commuting,
by replacing xi with a corresponding commuting variable zi for 1 ≤ i ≤ n. Thus, the I-projected polynomial
PI is in F[Z]hXi, and the (noncommutative) degree of PI , which is its degree only in the X variables, is |I|.

Lemma 6.2. Let P1 , P2 , . . . , Ps ∈ FhXi each be a product of D homogeneous linear forms

                                                Pi = Li,1 Li,2 . . . Li,D ,

where {Li, j : 1 ≤ i ≤ s, 1 ≤ j ≤ D} are linear forms in FhXi. Then, given any scalars β1 , β2 , . . . , βs ∈ F
there exists a subset I ⊆ [D] of size at most s − 1 such that
                                           s                    s
                                          ∑ βi Pi = 0 iff ∑ βi Pi,I = 0,
                                          i=1                 i=1

where Pi,I ∈ F[Z]hXi is the I-projection of the polynomial Pi .

Proof. The proof is by induction on s. The lemma clearly holds for the base case s = 1, because P1,0/ ,
which is a product of linear forms in F[Z], is nonzero iff P1 is nonzero.
    By induction hypothesis, we assume that an index set of size at most s − 2 exists for any set of at
most s − 1 polynomials in FhXi, each of which is a product of D homogeneous linear forms. The forward
implication is obvious, because making variables commuting can only facilitate cancellations. We prove
the reverse implication.
    Suppose ∑si=1 βi Pi 6= 0 for βi ∈ F, 1 ≤ i ≤ s. Let j0 ∈ [D] be the least index, if it exists, such that
rank{L1, j0 , . . . , Ls, j0 } > 1. If no such index exists then the Pi are all scalar multiples of each other. Then
∑si=1 βi Pi = αP1 , for some α ∈ F, which is zero if and only if αP1,0/ is zero (by the base case), proving
the implication.
    We can assume, by renumbering the polynomials, that {L1, j0 , . . . , Lt, j0 } is a maximal linearly inde-
pendent set in {L1, j0 , . . . , Ls, j0 }, where t > 1.
    Then,

                            Pi = ci PLi, j0 Li, j0 +1 . . . Li,D : 1 ≤ i ≤ t,
                                            t
                                                 (i)        
                            Pi = ci P · ∑ γk Lk, j0 Li, j0 +1 . . . Li,D : t + 1 ≤ i ≤ s,
                                         k=1

                                (i)
where {ci ∈ F : 1 ≤ i ≤ s}, {γk ∈ F : 1 ≤ k ≤ t,t + 1 ≤ i ≤ D}, and P ∈ FhXi is a product of homogeneous
linear forms (or a scalar). For 1 ≤ i ≤ s, let
                                                              D
                                                 Pi0 = ci    ∏ Li, j .
                                                            j= j0 +1


                         T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                   29
          V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

    We can then write
                                  s                   t                                          s               
                               ∑ βi Pi = P · ∑ βi Li, j Pi0 + P · ∑ βi Li, j Pi0 .
                                                                        0                                       0
                               i=1                           i=1                               i=t+1


    Note that                                                                                    !
                                       s                                      s           t
                                                                                             
                                                                                     (i)
                             P·        ∑ βi Li, j Pi0 = P ·
                                                         0                   ∑ βi ∑ γk Lk, j0 Pi0 .
                                      i=t+1                                 i=t+1            k=1

Now by rearranging terms, we have:

                      s                        t              
                                                            00
                     ∑ βi Pi = P ·                 ∑ 0 k where,
                                                    Lk, j P                                                             (6.1)
                     i=1                           k=1
                                                                   (t+1) 0                 (s)
                           Pk00 = βk Pk0 + βt+1 γk                      Pt+1 + · · · + βs γk Ps0 , for 1 ≤ k ≤ t.       (6.2)

    Now, Lk, j0 , 1 ≤ k ≤ t are linearly independent. Let A be an invertible linear transform such that
A : Lk, j0 7→ xk , 1 ≤ k ≤ t. Applying Proposition 3.1, consider the map A j0 applied to the polynomial
∑si=1 βi Pi . By Proposition 3.1, A j0 (∑ti=1 βi Pi ) 6= 0. Now, noting that A j0 applies A to the position j0 of the
polynomial ∑ti=1 βi Pi , we have:

                                                    s                       t                    
                                            A j0     ∑ βi Pi = P · ∑                      xk Pk00       6= 0.
                                                     i=1                            k=1


Thus, not all Pk00 are zero. Without loss of generality, suppose P100 6= 0. Since t > 1, by Equation (6.1), P100
is a sum of at most s − 1 polynomials, each of which is a product of homogeneous linear forms. Hence,
by induction hypothesis, there is an index set I 0 ⊆ { j0 + 1, . . . , D} of size at most s − 2 such that
                                                                                          
                            00          0             (t+1) 0                     (s) 0
                           P1,I 0 = β1 P1,I 0 + βt+1 γ1    Pt+1,I 0 + · · · + βs γ1 Ps,I 0   6= 0.

    Let I = I 0 ∪ { j0 }. Now consider the polynomial ∑si=1 βi Pi,I , which we claim is nonzero. By Equa-
tion (6.1), note that
                                        s              t            
                                                                 00
                                            P
                                       ∑ i i,I
                                          β    = P̂ ·  ∑ k, j0 k,I0 ,
                                                          L    P
                                                   i=1                      k=1

where P̂ is the commutative polynomial obtained by replacing xi by zi in P. Since P̂ is a product of linear
                                                                  00 is nonzero. Notice that, t
forms it is nonzero. Thus, it suffice to argue that ∑tk=1 Lk, j0 Pk,I                                       00 is a
                                                                      0                       ∑k=1 Lk, j0 Pk,I 0

homogeneous noncommutative polynomial of noncommutative degree |I| in F[Z]hXi. By Proposition 3.1,
applying the linear transform A to the first position (because the original position j0 is now the first
                                           00 ) = t
position), observe that A1 (∑tk=1 Lk, j0 Pi,I              00 . This sum is zero if and only if each P00 is zero.
                                              0  ∑k=1 xk Pi,I 0                                       k,I 0
However, P1,I00 is nonzero. This completes the induction step and the proof.
               0




                           T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                           30
            R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

                                                                                                          zi ξk+1

                                         zi ξ 1               zi ξ2                   zi ξ3

                                                   xi1                        xi2              ···
                                          q0                    q1                      q2               q f = qk


                           Figure 3: The transition diagram for the variable xi : 1 ≤ i ≤ n


6.2     The black-box identity test
We now prove Theorem 2.7 which will yield a black-box randomized polynomial-time identity testing
algorithm for ΣΠ∗ Σ circuits.
Proof of Theorem 2.7. Theorem 2.7 is an easy consequence of Lemma 6.2.
     Let C = ∑si=1 βi Ri ∈ FhXi be the polynomial computed by the black-box ΣΠ∗ Σ circuit, where each
Ri is a product of D homogeneous linear forms. By Lemma 6.2 there is a set I ⊆ [D] of size at most
s − 1 such that C = ∑si=1 βi Ri = 0 if and only if C̃ = ∑si=1 βi Ri,I = 0. As done in Section 4 we can
use a small size nondeterministic automaton to guess this subset I of locations, and substitute suitable
commuting variables at all locations in [D] \ I. It will turn out that the transition matrices for each variable
xi corresponding to this automaton will give us a matrix substitution for the variables where C is nonzero.
We now present the details.
     Let |I| = k ≤ s − 1. Consider the following nondeterministic k + 1-state finite automaton A whose
transition diagram we depict for xi : 1 ≤ i ≤ n in Figure 3. For locations in [D] \ I, the automaton
uses the block variables Z = {zi : 1 ≤ i ≤ n}, ξ = {ξi : 1 ≤ i ≤ k + 1} which are commuting variables.
Let I = {i1 , i2 , . . . , ik } listed in increasing order. For variable xi occurring in position i j (i. e., the j-
th position guessed to be in I) the automaton substitutes xi by xi j , 1 ≤ i ≤ n. These index variables
Z 0 = {xi j : 1 ≤ i ≤ n, 1 ≤ j ≤ k} are also commuting variables.
Remark 6.3. Notice that in Lemma 6.2, the variables occurring in positions in I were left as noncom-
muting. However, the automaton we construct replaces xi in position j ∈ I by commuting variable xi j .
This transformation for homogeneous polynomials is known to preserves identities by Proposition 3.2.
      Let
                                                    ∀i ∈ [s] : Ri = Li,1 . . . Li,D .
  Let Mxi be the matrix corresponding to variable xi , 1 ≤ i ≤ n (these are (k + 1) × (k + 1) matrices).
When we do this matrix substitution to variables in Ri , the (0, k)-th entry of the resulting matrix MRi is
                                          Rbi =             ∑                 P1, j1 −1 Pj1 +1, j2 −1 . . .
                                                  ( j1 , j2 ,..., jk )∈[D]k

where
                        j1 −1                                                                    j2 −1
            P1, j1 −1 = ∏ Li, j (Z)ξ1 j1 −1 Li, j1 (Z 0 ) ,               Pj1 +1, j2 −1 =        ∏ Li, j (Z)ξ2 j − j −1 Li, j (Z 0 ),
                                                                                                                    2   1
                                                                                                                              2
                         j=1                                                                   j= j1 +1


                               T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                                     31
          V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

and so on. For each i ∈ [s], the polynomial Rbi ∈ F[Z, ξ , Z 0 ]. The (0, k)-th entry of the resulting matrix MC
is
                                          s
                                         ∑ βi Rbi = ∑ βi PJ ξJ ,
                                            i=1        J∈[D]k

where ξJ = ξ1j1 −1 ξ2j1 − j2 −1 . . . ξkD− jk and PJ = ∑si=1 Pi,J .
    By Lemma 6.2, we know that PI = ∑si=1 Pi,I 6= 0. Thus, ∑si=1 βi Rbi is nonzero, as the monomials sets
for different PJ are disjoint (ensured by the terms ξJ ). The degree of ∑si=1 βi Rbi is D. So if |F| is more than
D, it can not evaluate to zero on F. This completes the proof of Theorem 2.7.

    Now the randomized identity testing algorithm follows by simply random substitution for variables in
the commutative polynomial computed at the (0, k)-th entry of the resulting matrix MC . This completes
the proof of Corollary 2.8.


7    Conclusion
The Amitsur-Levitzki theorem is a cornerstone result in the theory of polynomial identities [22]. Our
result that the dimension of the non-vanishing matrix algebra is at most logarithmic in the sparsity of the
polynomial, should be interesting even from a purely algebraic perspective.
    The main open problem is to extend our technique to solve the identity testing problem for all
noncommutative circuits in randomized polynomial time (even in the white-box model). Our result for
+-regular circuits is a first step towards that. Finding a randomized black-box identity testing algorithm
for +-regular circuits is an interesting problem. We have such a result only for depth-three +-regular
circuits.


Acknowledgements
We are grateful to the referees for their detailed comments and suggestions which have helped us improve
the presentation. We also thank the anonymous STOC 2017 referees for their valuable comments. We
thank Markus Lohrey, Amir Shpilka, and Srikanth Srinivasan for their comments on an earlier version
of this work. Finally, we thank the editor-in-chief, László Babai, for pointing out Muller’s paper [18],
suggesting the name “Polynomial Identity Lemma,” and numerous other editorial comments.


References
 [1] M ANINDRA AGRAWAL , ROHIT G URJAR , A RPITA KORWAR , AND N ITIN S AXENA: Hitting-
     sets for ROABP and sum of set-multilinear circuits. SIAM J. Comput., 44(3):669–697, 2015.
     [doi:10.1137/140975103, arXiv:1406.7535] 3

 [2] N OGA A LON AND Z OLTÁN F ÜREDI: Covering the cube by affine hyperplanes. Eur. J. Comb.,
     14:79–83, 1993. [doi:10.1006/eujc.1993.1011] 10

                        T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                                  32
        R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

 [3] AVRAHAM S HIMSHON A MITSUR AND JACOB L EVITZKI: Minimal identities for algebras. Proc.
     Amer. Math. Soc., 1:449–463, 1950. [doi:10.2307/2032312] 2
 [4] V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY, AND S. R AJA:
     Randomized polynomial time identity testing for noncommutative circuits. In Proc. 49th STOC, pp.
     831–841. ACM Press, 2017. [doi:10.1145/3055399.3055442] 1
 [5] V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , AND G AURAV R ATTAN: On the complexity of
     noncommutative polynomial factorization. Inform. and Comput., 262(1):22–39, 2018. Preliminary
     version in MFCS’15. [doi:10.1016/j.ic.2018.05.009, arXiv:1501.00671] 11, 21
 [6] V IKRAMAN A RVIND , PARTHA M UKHOPADHYAY, AND S RIKANTH S RINIVASAN: New results on
     noncommutative and commutative polynomial identity testing. Comput. Complexity, 19(4):521–558,
     2010. Preliminary version in CCC’08. [doi:10.1007/s00037-010-0299-8, arXiv:0801.0514] 7, 10,
     25, 26
 [7] A NURAG B ISHNOI , P ETE L. C LARK , A DITYA P OTUKUCHI , AND J OHN S CHMITT: On zeros of a
     polynomial in a finite grid. Combinatorics, Probability and Computing, 27(3):310–333, 08 2018.
     [doi:10.1017/S0963548317000566, arXiv:1508.06020] 10
 [8] A NDREJ B OGDANOV AND H OETECK W EE: More on noncommutative polynomial identity testing.
     In Proc. 20th IEEE Conf. on Computational Complexity (CCC’05), pp. 92–99. IEEE Comp. Soc.
     Press, 2005. [doi:10.1109/CCC.2005.13] 2, 3
 [9] R ICHARD A. D E M ILLO AND R ICHARD J. L IPTON: A probabilistic remark on algebraic program
     testing. Inform. Process. Lett., 7(4):193–195, 1978. [doi:10.1016/0020-0190(78)90067-4] 3, 9
[10] Z EEV DVIR AND A MIR S HPILKA: Locally decodable codes with two queries and polynomial
     identity testing for depth 3 circuits. SIAM J. Comput., 36(5):1404–1434, 2007. Preliminary version
     in STOC’05. [doi:10.1137/05063605X] 22
[11] M ICHAEL A. F ORBES AND A MIR S HPILKA: Quasipolynomial-time identity testing of non-
     commutative and read-once oblivious algebraic branching programs. In Proc. 54th FOCS, pp.
     243–252. IEEE Comp. Soc. Press, 2013. [doi:10.1109/FOCS.2013.34, arXiv:1209.2408] 3
[12] L AURENT H YAFIL: The power of commutativity. In Proc. 18th FOCS, pp. 171–174. IEEE Comp.
     Soc. Press, 1977. [doi:10.1109/SFCS.1977.31] 2
[13] N EERAJ K AYAL , C HANDAN S AHA , AND R AMPRASAD S APTHARISHI: A super-polynomial lower
     bound for regular arithmetic formulas. In Proc. 46th STOC, pp. 146–153. ACM Press, 2014.
     [doi:10.1145/2591796.2591847] 16
[14] RUDOLF L IDL AND H ARALD N IEDERREITER: Finite Fields. Encyclopedia of Mathematics and its
     Applications. Cambridge University Press, 2 edition, 1996. [doi:10.1017/CBO9780511525926] 10
[15] R ICHARD J. L IPTON: The curious history of the Schwartz–Zippel Lemma. https://rjlipton.
     wordpress.com/2009/11/30/the-curious-history-of-the-schwartz-zippel-lemma/,
     2009. 10

                      T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                          33
         V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

[16] M ARKUS L OHREY: Equality testing of compressed strings. In Proc. 10th Int. Conf. on Combina-
     torics on Words (WORDS’15), pp. 14–26. Springer, 2015. [doi:10.1007/978-3-319-23660-5_2] 7,
     17, 21
[17] K URT M EHLHORN , R. S UNDAR , AND C HRISTIAN U HRIG: Maintaining dynamic sequences under
     equality tests in polylogarithmic time. Algorithmica, 17(2):183–198, 1997. Preliminary version in
     SODA’94. [doi:10.1007/BF02522825] 7, 21
[18] DAVID E. M ULLER: Application of boolean algebra to switching circuit design and to error
     detection. Trans. Inst. Radio Engineers Professional Group on Electronic Computers, EC-3:6–12,
     1954. [doi:10.1109/IREPGELC.1954.6499441] 3, 9, 10, 32
[19] N OAM N ISAN: Lower bounds for non-commutative computation (extended abstract). In Proc. 23rd
     STOC, pp. 410–418. ACM Press, 1991. [doi:10.1145/103418.103462] 2, 5
[20] Ø YSTEIN O RE: Über höhere Kongruenzen. Norsk Mat. Forenings Skrifter Ser. I, 7(15):27, 1922. 3,
     9, 10
[21] W OJCIECH P LANDOWSKI: Testing equivalence of morphisms on context-free languages. In Proc.
     2nd Europ. Symp. on Algorithms (ESA’94), pp. 460–470. Springer, 1994. [doi:10.1007/BFb0049431]
     7, 21
[22] C LAUDIO P ROCESI: On the theorem of Amitsur–Levitzki. Israel J. Math., 207(1):151–154, 2015.
     [doi:10.1007/s11856-014-1118-8, arXiv:1308.2421] 32
[23] R AN R AZ AND A MIR S HPILKA: Deterministic polynomial identity testing in non-commutative mod-
     els. Comput. Complexity, 14(1):1–19, 2005. Preliminary version in CCC’04. [doi:10.1007/s00037-
     005-0188-8] 3, 7, 24, 25, 26
[24] I RVING S. R EED: A class of multiple-error-correcting codes and the decoding scheme.
     Transactions of the IRE Professional Group on Information Theory, 4(4):38–49, 1955.
     [doi:10.1109/TIT.1954.1057465] 10
[25] N ITIN S AXENA AND C. S ESHADHRI: From Sylvester–Gallai configurations to rank bounds:
     Improved blackbox identity test for depth-3 circuits. J. ACM, 60(5):33:1–33:33, 2013. Preliminary
     version in FOCS’10. [doi:10.1145/2528403, arXiv:1002.0145] 7, 17, 22, 24
[26] W OLFGANG M. S CHMIDT: Equations over Finite Fields: An Elementary Approach. Volume 536
     of Lecture Notes in Math. Springer, 1 edition, 1976. [doi:10.1007/bfb0080437] 3, 9, 10
[27] JACOB T. S CHWARTZ: Fast probabilistic algorithms for verification of polynomial identities. J.
     ACM, 27(4):701–717, 1980. Preliminary version in EUROSAM’79. [doi:10.1145/322217.322225]
     3, 9
[28] R ICHARD E. Z IPPEL: Probabilistic algorithms for sparse polynomials. In Symbolic and Algebraic
     Computation (EUROSAM’79), volume 72 of LNCS, pp. 216–226. Springer, 1979. [doi:10.1007/3-
     540-09519-5_73] 3, 9


                      T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                         34
     R ANDOMIZED P OLYNOMIAL -T IME I DENTITY T ESTING FOR N ONCOMMUTATIVE C IRCUITS

AUTHORS

    Vikraman Arvind
    Professor
    Institute of Mathematical Sciences (HBNI)
    Chennai, India,
    arvind imsc res in
    www.imsc.res.in/~arvind


    Pushkar S. Joglekar
    Assistant professor
    Vishwakarma Institute of Technology
    Pune, India
    joglekar pushkar gmail com


    Partha Mukhopadhyay
    Associate professor
    Chennai Mathematical Institute
    Chennai, India,
    partham cmi ac in
    www.cmi.ac.in/~partham


    S. Raja
    Assistant professor
    Indian Institute of Technology Tirupati
    Tirupati, India
    raja iittp ac in
    www.iittp.ac.in/dr-s-raja


ABOUT THE AUTHORS

    V IKRAMAN A RVIND , called Arvind by friends and colleagues, graduated from the Indian
        Institute of Technology, Kanpur, India in 1988. His advisor was Somenath Biswas. He
        is currently professor at The Institute of Mathematical Sciences, Chennai, India. His
        research interests are broadly in computational complexity and algorithms, and especially
        in problems of an algebraic flavor. He likes reading; his favorite authors include G. K.
        Chesterton and Ray Bradbury, and he enjoys watching movies.




                    T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                            35
   V IKRAMAN A RVIND , P USHKAR S. J OGLEKAR , PARTHA M UKHOPADHYAY AND S. R AJA

P USHKAR J OGLEKAR is currently an assistant professor at Vishwakarma Institute of Tech-
   nology, Pune, India. He received his Ph. D. in Theoretical Computer Science from the
   Institute of Mathematical Sciences, Chennai, India in 2011 under the supervision of V.
   Arvind. Hobbies: reading, mathematical puzzle solving, gardening.


PARTHA M UKHOPADHYAY graduated from the Institute of Mathematical Sciences, Chennai,
   India in 2009. His advisor was V. Arvind. Currently he is an associate professor at the
   Chennai Mathematical Institute, India. His research interests are mainly in algorithmic
   problems in the intersection of computational complexity and algebra. He enjoys reading,
   long-distance running, and playing table tennis.


S. R AJA graduated from the Institute of Mathematical Sciences, Chennai, India in 2017.
   His advisor was V. Arvind. His current research interests lie in arithmetic circuit com-
   plexity. He is presently an assistant professor in the Computer Science and Engineering
   department at IIT Tirupati.




                T HEORY OF C OMPUTING, Volume 15 (7), 2019, pp. 1–36                          36