A Non-linear Time Lower Bound for Boolean Branching Programs

Authors Mikl{\'o}s Ajtai,
Plaintext
                                 T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176
                                             http://theoryofcomputing.org




             A Non-linear Time Lower Bound for
                Boolean Branching Programs
                                                                 Miklós Ajtai
                       Received: October 6, 2004; revised: May 5, 2005; published: October 5, 2005.




        Abstract: We give an exponential lower bound for the size of any linear-time Boolean
        branching program computing an explicitly given function. More precisely, we prove that
        for all positive integers k and for all sufficiently small ε > 0, if n is sufficiently large then
        there is no Boolean (or 2-way) branching program of size less than 2εn which, for all inputs
        X ⊆ {0, 1, . . ., n − 1}, computes in time kn the parity of the number of elements of the set
        of all pairs hx, yi with the property x ∈ X, y ∈ X, x < y, x + y ∈ X.
            For the proof of this fact we show that if A = (ai, j )ni=0, j=0 is a random n by n matrix over
        the field with 2 elements with the condition that “A is constant on each minor diagonal,”
        then with high probability the rank of each δ n by δ n submatrix of A is at least cδ | log δ |−2 n,
        where c > 0 is an absolute constant and n is sufficiently large with respect to δ .
            (A preliminary version of this paper has appeared in the Proceedings of the 40th IEEE
        Symposium on Foundations of Computer Science.)




ACM Classification: F.2.2, F.2.3
AMS Classification: 68Q17, 68Q15
Key words and phrases: complexity theory, lower bounds, space complexity, branching programs,
Hankel matrices, matrix rigidity


Authors retain copyright to their work and grant Theory of Computing unlimited rights
to publish the work electronically and in hard copy. Use of the work is permitted as
long as the author(s) and the journal are properly acknowledged. For the detailed
copyright statement, see http://theoryofcomputing.org/copyright.html.

c 2005 Miklós Ajtai                                                                    DOI: 10.4086/toc.2005.v001a008
                                               M IKL ÓS A JTAI

1     Introduction
1.1     The statement of the main result
A Boolean (or 2-way) branching program is a finite directed acyclic graph (which may contain parallel
edges) with a unique source node, so that each non-sink node is labeled by one of the input variables
x0 , . . ., xn−1 , each non-sink node has outdegree two, each edge is labeled by an element of {0, 1} so that
the two outgoing edges of a non-sink node always get different labels, and each sink-node is labeled
by an element of {0, 1}. If an input is given we start from the unique source node and go along a path
according to the following rule. If we are at node v and the label of v is the variable xi then we leave v on
the unique outgoing edge whose label is the value of xi . This path will end in a sink node; the label of
the sink-node is the output of the program at the given input, the length of the path is the computational
time at the given output, the maximal length of a path in the graph that we may get from an input this
way is the length (or depth) of the branching program. The number of nodes in the graph is the size of
the branching program.
      This model describes a very general way of computing where the computational time measures the
number of accesses to the individual bits of the input and the size measures the number of different
states of the machine performing the computations. We do not measure the computational time needed
to determine the next state of our machine (that is, the next node in the graph along the path). We may
also think about this model as a random access machine whose input registers contain a single input bit,
with a working memory containing log2 M bits where M is the size of the branching program.
      Our goal is to give an explicit function which cannot be computed with a Boolean branching program
in linear time if the size of the branching program is 2εn . The function has to assign to each {0, 1}-
sequence of length n a single {0, 1}-value. We identify the set of all {0, 1}-sequences of length n
with the set of all subsets of {0, 1, . . ., n − 1}, that is, our function f will assign to each subset X of
{0, 1, . . ., n − 1} a {0, 1}-value. For such a set X let f (X) be the parity of the number of elements of the
set of all pairs hx, yi with the property x ∈ X, y ∈ Y , x < y, and x + y ∈ X. We will say that f (X) is the
parity of interior sums for the set X, where the expression “interior” refers to the fact that both the terms
in the sum and the sum itself must be in X.
      Our main result is that the parity of interior sums for a set of n elements cannot be computed with a
Boolean branching program in linear time if the size of the branching program is 2εn (see Theorem 3.4).

1.2     The history of the problem and related results
1.2.1    Boolean and R-way branching programs
One of the main goals of complexity theory is to describe explicitly given functions which cannot be
computed in certain computational models with specified amount of resources. Branching programs
form one of the most general models of computation, e.g. random access machines with memory M
and running time t can be described by branching programs of size 2M and time (depth) t. Because of
the generality and simplicity of the model and its mentioned close connection to the practical random
access models of computation, finding lower bounds for explicitly given functions in the branching pro-
gram model in general, and with the given values of parameters in particular (linear time and sublinear
memory) was always considered a question of great importance.

                        T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                               150
               A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

     A generalization of the Boolean branching programs is the computational model of R-way branching
programs. Since most of the results that we use in our proofs were established for R-way branching
programs, we describe briefly its definition. The computation is done in the same way, along a path of a
directed graph as in the Boolean case, with the following differences. A set Γ with R elements is given,
the elements of Γ are the possible values of the input variables. Each non-sink node has outdegree R, and
the outgoing edges are labeled by the elements Γ so that the R outgoing edges have R different labels.
Each sink node is labeled by an element of Γ. If an input is given, that is, we assign an element of Γ to
each of the input variables x1 , . . ., xn , then we follow a path of the graph in the following way. We start
at the unique source node. If we are at node v and the label of node v is the variable xi then we leave v
on the unique outgoing edge whose label is the element of Γ assigned to the variable xi . This path will
end in a sink node. The element of Γ which is the label of this sink node is the output of the R-way
branching program. The value of R, in the results most interesting for us, is nc where c is a constant.
(This corresponds to the random access machines where each register can contain c log2 n bits.)

1.2.2   Branching programs with many output bits, and the time segmentation method
The computational model of R-way branching programs was introduced by Borodin and Cook [8], who
proved a time-space trade-off for sorting n integers. This work also introduced a method for proving
lower bounds about R-way branching programs in the special case where the number of output bits is
relatively large compared to the time allowed for the computation. Several other lower bounds and time-
space trade-offs of similar nature were given, see e. g. Abrahamson [1, 2], Beame [5], Karchmer [11],
Reisch and Schnitger [12], and Yesha [14]. These lower bound proofs have a common high-level struc-
ture, namely the time is cut into short intervals and we use the fact that during such an interval any
information that we can use about the past must be contained in the limited memory at the beginning
of the interval. In particular if many output bits are provided in a single time interval, then these may
depend only on those input values which are accessed during this time interval and the the content of the
memory at the the beginning of the interval.

1.2.3   Lower bounds for explicit functions and decision problems
Using the same high level proof structure, and other new ideas, Beame, Saks and Jayram1 [6] gave
a lower bound on the computational time for an explicitly given function with a Boolean branching
program of size 2o(n) . Namely they proved that there is an ε > 0 so that the question whether the
quadratic form σ T Qσ is zero, (where σ is the input, a {0, 1}-vector of length n, and Q is the n × n
Sylvester matrix over the field with three elements) cannot be decided with a branching program of
length (1 + ε)n and of size 2o(n) . (The proof shows that the theorem holds for ε = .0178.) This is the
best previously known lower bound in the direction of our main result. In the same paper they gave a
nonlinear lower bound on the length of an R-way branching program computing an explicitly defined
function, (similar to the function they used in the Boolean case.) More precisely they prove that for
all k there is an rk so that for all sufficiently large n there is an (explicitly given) 0-1 valued function
g(x1 , . . ., xn ) of n variables such that: (a) each variable is takes its values from a set of size rk and (b)
there is no rk -way and size nc branching program which computes g(x1 , . . ., xn ) in depth kn.
   1 T.S. Jayram, formerly Jayram S. Thathachar



                           T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                              151
                                               M IKL ÓS A JTAI

    The author of the present paper proved [4] that the element distinctness problem (where each “ele-
ment” is the value of a variable) cannot be decided with an R-way branching program, for R = c log2 n,
in length linear in n if the size of the program is at most 2εn , provided that c ≥ 2. (If the problem is to
find two elements whose Hamming distance is smaller than 41 c log2 n then for a similar lower bound on
the length the necessary restriction on the size is only 2εn log2 n .) These proofs are based on the analysis
of certain combinatorial properties of the input, which are very similar to the combinatorial properties
used in [6]. The high level structure of the proof still follows the time segmentation idea described
earlier. Since the present proof uses some of the technical lemmata of [4], we will give later a more
detailed description of its techniques. At this point we sketch only some of the basic ideas of the proof
in [4]. Since these are also related to the proof methods of [6], this will show the additional ideas that
are needed to make the time segmentation method work when the number of output bits is small.




1.2.4   Lower bounds for binary functions and relations

It is shown in [4] that if a function f can be computed in linear time with the given restrictions on the
size then there are two large disjoint subsets, W1 ,W2 , of the set of the input variables and an input χ
with the following properties. For each i = 1, 2 we may change the input χ in many different ways by
changing the values of the variables in Wi only, so that the output does not change. Moreover, for i = 1, 2
we can select a large set of changes Yi so that even if we perform a change from Y1 (on the values of
the variables in W1 ) and another one from Y2 (on the values of the variables in W2 ) simultaneously, the
output remains unchanged.
      In the case of the element distinctness problem we are able to chose an input χ which meets these
requirements with the additional property that χ (which is a list of n integers) consists of pairwise distinct
integers. Therefore, if our branching program solves the element distinctness program, its output is, say,
1. However we can prove the for a fixed i = 1, 2 the inputs that we get from χ through the changes in
Yi , take more than 2n different values on the set of variables Wi . Therefore there will be an integer x so
that for both i = 1, 2, x will be a value of a variable from Wi if we perform suitable change on χ from Yi .
Consequently performing both suitable changes simultaneously we get an input which contains x twice
and still the unchanged output is 1. This contradicts the assumption that the program solves the element
distinctness problem.
    Similar ideas are used for the other relations, or functions in the mentioned lower bounds. In each
case we need a function F(x, y) or a relation R(x, y) with two variables so that if we can separately
change x and y in many different ways then among these changes there will be two so that performing
them simultaneously we are able to change the value of F(x, y) or R(x, y). If R is the equality predicate
then, as we have seen, this can be guaranteed if both sets of changes produces at least n2 different input
values. In the case of the Hamming distance problem described above the situation is even better since
n                       1−ε 0 for some small constant ε 0 > 0 (see [4]). (This is the reason that the proof of
2 can be replaced by n
the lower bound for the Hamming distance problem is much simpler and gives a stronger result than the
proof for the element distinctness problem.)

                        T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                               152
              A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

1.2.5   Quadratic forms and rigidity

The binary function F(x, y) can be also defined by a quadratic form xT By. Assume that, as before, x and y
independently run over large sets and now we want to guarantee that the xT By is not constant. Motivated
by similar considerations, quadratic forms were studied by Borodin, Razborov, and Smolensky [9],
Jayram [13], and Beame, Saks, and Thatachar [6]. The result in this direction that we will use in
our paper is the following. (This was proved in more general forms in [9], [13], and [6], and also
follows from the results of [10].) Suppose that the rank of the matrix B is r and x resp. y are taking
values independently from m1 resp. m2 dimensional subspaces of an m dimensional vectorspace. If
m1 + m2 + r > 2m then the quadratic form xT By is not constant.
      As we described earlier, in the lower bound proofs we can usually guarantee only that x and y can
take values independently only in some limited sense, namely we can apply independent changes to two
disjoint sets of variables. In [6] the mentioned property of the quadratic forms is applied in the following
way. The lower bound is proved for a function of the form xT Ay where A is a suitably chosen, explicitly
given, matrix over a finite field. This matrix has the property that each δ n × δ n submatrix which does
not contain elements from the main diagonal is of rank at least δ 2 n where δ > 0 is a small constant.
(This may be considered as a rigidity property of the matrix A.) The input variables of the branching
program take values from the field F. We pick two large sets of independent changes on two disjoint
sets of variables of at least δ n elements. The submatrix of A formed from the corresponding rows and
columns will be the matrix B in the mentioned property of quadratic forms. This way it is possible to
guarantee that, roughly speaking, under independent changes on these sets of variables, the quadratic
form cannot remain constant, which makes the lower bound proof possible.
      In the present paper we will follow the same strategy for our proof with the following improvements.
We use two-way branching programs with a single output bit which creates three new problems. (1) It is
more difficult to prove the existence of the two disjoint sets of variables which admit many independent
changes that leave the output of the branching program unchanged. For this we use the machinery
worked out in [4]. (2) The explicitly given matrix of [6] is a Sylvester matrix over the field F and so the
size of the field must be at least n. We need something similar for F2 , the field with two elements. We
do not give an explicit construction for such a matrix, but a random construction which depends only
on a linear number of random bits which can be included in the input. (3) it is not enough for us if the
rank of the submatrices of sizes δ n × δ n are δ 2 n, we need much larger ranks; what we prove will be
δ | log δ |−2 n.


1.2.6   Summary of the history of the problem

Summarizing the historical developments about the lower bound techniques for branching programs, we
can say that there were two parallel developments. The first is the time segmentation method which later
was supplemented by the technique of considering changes of the values of variables on two disjoint
sets: [8],[6],[4]. The second is the development of the algebraic techniques about quadratic forms based
on matrices with rigidity properties, providing explicitly defined functions which were suitable for the
lower bound proof techniques mentioned in the first direction of developments: [9],[13],[6]. The present
paper uses the techniques of both of these directions.

                        T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                             153
                                                 M IKL ÓS A JTAI

1.3   Subsequent developments
A preliminary version of this paper was published in [3] containing all of the essential elements of the
proofs presented here. Since then, the main result of this paper was further improved by Beame, Saks,
Sun, and Vee in [7] by making the time/space lower bounds sharper and generalizing the theorem for
the case of probabilistic branching programs. Their proofs use the results and techniques of the present
paper (together with methods of different nature).


2     Overview of the proof
2.1   A lower bound for a nonexplicit function
Our proof in the present paper uses a technical lemma of the element distinctness result. As we have
mentioned already in the introduction, it is shown in [4] that if a function f can be computed in linear
time with the given restrictions on the size then there are two large disjoint subsets, W1 ,W2 , of the set of
the input variables and an input χ so that for each i = 1, 2 we may change the input χ in many different
ways by changing only the values of the variables in Wi so that the output does not change; moreover
these changes can be performed simultaneously on W1 and W2 so that the output still does not change.
The ratio between the sizes of the sets Wi and the logarithm of the number of changes has a crucial
importance in the proofs of the present paper. (A precise statement of this result is given in Lemma 3.5
below.)
     We use this result to show that a quadratic form (which is not given explicitly) cannot be computed
in linear time. The algebraic part of this proof (Lemma 3.11) is a theorem proved by Borodin, Razborov,
and Smolensky [9] (and in more general forms by Jayram [13] and Beame, Saks, and Jayram [6]). We
reduce the problem of giving a quadratic form with the required properties to a question about the ranks
of the submatrices (or minors) of the matrix generating the quadratic form in a similar way as is done
in [6]. In both cases the goal is to get a matrix A so that each bδ nc by bδ nc submatrix of the matrix A
has rank at least ψ(δ )n, for each δ > 0, provided that n is sufficiently large with respect to δ , where
the function ψ should be as large as possible. The Sylvester matrices used in [6] are explicitly given
examples of such matrices with ψ(δ ) = δ 2 , provided that we consider only submatrices that do not
contain any elements of the main diagonal. (This restriction does not affect the applicability of the
matrix to the lower bound proof.)

2.2   Decreasing the randomness needed
Definition 2.1. We will call an n × n matrix A = (ai, j ) a Hankel matrix if ∀i, j, k, l ∈ {0, 1, . . ., n − 1}, i +
j = k + l implies ai, j = ak,l . In other words A is a Hankel matrix iff it is constant across minor diagonals.

Remark 2.2. A Hankel matrix is determined by only 2n − 1 suitably chosen entries, e.g. by entries of
the first row and last column. If a matrix is constant along all diagonals it is called a Toeplitz matrix.
Reversing the ordering of the rows creates a one-to-one correspondence between Toeplitz matrices and
Hankel matrices. Therefore all of our results concerning the ranks of Hankel matrices remain valid for
Toeplitz matrices as well.

                          T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                   154
               A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

    We show that if A is a random n by n Hankel matrix over the field with 2 elements, with uniform
distribution on the set of all such matrices, then with high probability the described property about the
ranks of the submatrices holds with ψ(δ ) = cδ | log δ |−2 for an absolute constant c > 0. As a conse-
quence, using also the mentioned lemma from [4], we are able to show that if Ã is the matrix that we
get from A by replacing each entry in the main diagonal and above by 0, then the quadratic form hÃx, xi,
where x is the input vector, cannot be computed with a branching program of linear length and size at
most 2εn .



2.3   From a non-explicit function to an explicit function

Of course this is not an explicitly given function; we only know that the lower bound holds for almost all
matrices. However, we got the matrix by randomizing only 2n−1 bits. Therefore if we include these bits
in the input, then we get an explicitly given problem (with 3n − 1 input variables, where the described
tradeoff holds between the length and size of any branching program computing the quadratic form).
In other words, if A(y), y = hy0 , . . ., y2n−2 i denotes the Hankel matrix with ai, j = yi+ j , then hÃ(y)x, xi
cannot be computed in the given length and size from the input hx, yi.



2.4   Obtaining a lower bound for the parity of interior sums problem

Assume now that A = (ai, j ) is a fixed Hankel matrix so that hÃx, xi cannot be computed with a branching
program with the given restrictions. Suppose that x = hx0 , . . ., xn−1 i and X = {i | xi = 1}, and


                                D = {i + j | ai, j = 1, i, j ∈ {0, . . ., n − 1}} .


It is easy to see that hÃx, xi is the parity of the number of all pairs hi, ji, i ∈ X, j ∈ X with the property
i < j and i + j ∈ D.
     This will already imply that if two subsets, X,Y , of the set {1, 2, ..., 2n} are given, then the problem
of computing the parity of the number of elements of the set of all pairs hi, ji with the property i ∈ X,
 j ∈ X, i < j, i + j ∈ Y cannot be solved by a branching program of linear length and of size at most
2εn . (The set D defined above will play the role of Y .) It will not be difficult to make a single set from
the two sets X,Y , by taking into account the sizes of their elements, and so we will get that the task of
computing, given as input an X ⊆ {1, 2..., n}, the parity of the number of elements of the set of all pairs
hi, ji with the property i ∈ X, j ∈ X, i < j, i + j ∈ X cannot be accomplished by a branching program of
linear length and of size at most 2εn .
   Finally we note that our results about random Hankel matrices remain true over any field with ap-
propriate modifications. (See the remarks after Lemma 4.5, Lemma 4.7 and Lemma 4.9. See also the
comment about the applicability of these modified versions to generalizations of Theorem 3.4 in the
proof of Lemma 3.11.)

                         T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                 155
                                              M IKL ÓS A JTAI

3     The reduction of the lower bound to a problem about Hankel matrices
3.1     The statement of Theorem 3.4
In this section we reduce the problem of giving a lower bound for the time needed to solve the problem
described in the introduction to the existence of a matrix A which can be constructed from n bits with
the property that each large submatrix of A has also relatively large rank.

Definition 3.1. If X,Y are sets then Func(X,Y ) will denote the set of all functions, defined on X, taking
values in Y .

    A branching program as we will define below will be what is usually called a (deterministic) Boolean
or 2-way branching program indicating that the input variables take their values from a set of size 2.

Definition 3.2. A branching program B with n input variables x0 , . . ., xn−1 is a five tuple

                                       hG, start, sink, var, vali ,

with the following properties

 (a). G is a finite directed acyclic graph, which may contain parallel edges
 (b). start is the unique source node of G,
 (c). var is a function defined on the non-sink nodes of G with values in the set {x0 , . . ., xn−1 } of
      variables,
 (d). out is a function defined on the set of sink nodes of G with values in {0, 1},
 (e). val is a function defined on the set of edges with values in {0, 1},
    (f). each non-sink node has out-degree 2, and the function val takes different values on the two
         outgoing edges.

    An input for the branching program B is a {0, 1}-assignment of the variables xi . (Instead of such an
assignment we usually will think about an input as a {0, 1}-valued function η defined on {0, 1, . . ., n − 1}
where η(i) is the value of xi .) If an input is given, then starting from start we go along a path in the
graph in the following way. When we are at a non-sink node v then we look at the value of the variable
var(v) and leave the node along the edge e where the value of val(e) is the same as the value of this
variable. Since the graph is acyclic and finite, this way we will reach a sink-node w. out(w) will be the
output of the branching program at the given input. The number of edges along the path determined this
way by the input is the computational time of the branching program at the given input. The maximal
computational time for the set of all inputs (that is, the maximal length of all paths arising from an input
in the given way) is the length of the branching program. The size of the branching program is the
number of nodes of G.

Definition 3.3. Assume that X is a subset of {0, . . ., n − 1}. N+ (X) will denote the number of all pairs
x, y ∈ X, x < y so that x + y ∈ X.

                        T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                              156
                A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

   The following theorem is the main result of the present paper. It states that the parity of the interior
sums of a subset of 0, 1, . . ., n − 1 cannot be determined by a branching program of size 2εn in linear
time.

Theorem 3.4. For all positive integers k, if ε > 0 is sufficiently small and n is sufficiently large then
there is no branching program B with n inputs, of length at most kn and of size at most 2εn , which for
all inputs η computes the parity of N+ (Xη ) where Xη = {i ∈ {0, 1, . . ., n − 1} | η(i) = 1}}


3.2    Results from earlier works
In the proof we will use the following lemma, Lemma 3.5, which is a consequence of Lemma A1 proved
in [4] (called Lemma 9 in that paper). The proof of Lemma 3.5 from Lemma A1 is almost identical to
the proof of Theorem 4 from Lemma 9 in [4] and does not require any new ideas. We describe this proof
(of Lemma 3.5 from Lemma A1) in the last section.
    The remaining part of the paper, starting with Lemma 3.7, is self-contained. We begin with the
definitions needed to understand the statement of Lemma 3.5.

Definitions.

   1. An input (of a branching problem with n input variables) is a function χ defined on {0, 1, . . ., n−1}
      with values in {0, 1}. A partial input is a function η defined on a subset of {0, 1, . . ., n − 1} with
      values in {0, 1}.
   2. Assume that χ is an input and η is a partial input. Then χ o η will denote the input which is
      identical to η on domain(η) and identical to χ on domain(χ)\domain(η).
   3. If δ ∈ {0, 1} and B is a branching program, then B−1 (δ ) will denote the set of all inputs η so that
      the output of B at input η is δ .

Lemma 3.5. For all positive integers k, if σ1 > 0 is sufficiently small with respect to k, σ2 > 0 is
sufficiently small with respect to σ1 , ε > 0 is sufficiently small with respect to σ2 , n is sufficiently large
with respect to ε, B is a branching program with n inputs of length at most kn and of size at most 2εn ,
and δ ∈ {0, 1} so that |B−1 (δ )| ≥ 2n−1 , then there exist a χ ∈ B−1 (δ ), λ ∈ (σ2 , σ1 ), µ ∈ (σ2 , σ1 ),
Wi ⊆ {0, 1, . . ., n − 1}, i = 1, 2, and sets of partial inputs Yi , i = 1, 2 defined on Wi satisfying the following
conditions:

(1). for all i ∈ W1 and j ∈ W2 we have i < j,

(2). |W1 | = |W2 | = µn,

(3). |Y1 |, |Y2 | ≥ 2µn−λ n ,
            1
(4). µ 1+ 100k ≥ 2λ , and

(5). for all η1 ∈ Y1 , η2 ∈ Y2 , we have (χ o η1 ) o η2 ∈ B−1 (δ ).

                            T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                 157
                                                     M IKL ÓS A JTAI

3.3     Branching programs and matrix rigidity
Definition 3.6. Assume that A is an n by n matrix over the field F and f is a real-valued function defined
on (0, 1]. We say that the matrix A is f -rigid if for each q = 1, . . ., n and for each q by q submatrix B of
A we have that the rank of B is at least f ( qn )n.

      The proof of Theorem 3.4 is based on Lemma 3.7 and Lemma 3.9 described below.

Lemma 3.7. There is a δ > 0 so that, for all γ > 0, if the function g(x) is defined by g(x) = δ x| log x|−2
if x ∈ (γ, 21 ) and g(x) = 0 otherwise, then for each sufficiently large positive integer n there is an n by n
Hankel matrix A over F2 , so that A is g-rigid.

Remark 3.8. It would be much easier to prove the lemma with g(x) = δ 2 x, though this is not enough
for the present application.

    We will prove Lemma 3.7 in the next section; more precisely, we will prove (Theorem 4.2) that a
random matrix A taken with uniform distribution on the set of all Hankel matrices meets the requirements
of the Lemma with high probability.

Definitions.

   1. Assume that η is a function with values in {0, 1} defined on {0, 1, . . ., n − 1}. uη will denote the
      n-dimensional vector hη(0), . . ., η(n − 1)i
   2. The inner product of the n-dimensional vectors u, v will be denoted by u · v.
   3. Assume that A = {ai, j }n−1i=0, j=0 is an n by n matrix. Ã will denote the n by n matrix that we get
      from A by keeping every entry of A below the main diagonal and replacing all other entries by 0.
      In other words Ã = {bi, j }n−1
                                   i=0, j=0 , where bi, j = ai, j for all i > j and bi, j = 0 for all i ≤ j, i = 1, . . ., n,
      j = 1, . . ., n.

Lemma 3.9. For all positive integers k, if σ1 > 0 is sufficiently small with respect to k, σ2 > 0 is
sufficiently small with respect to σ1 , ε > 0 is sufficiently small with respect to σ2 , and n is sufficiently
large with respect to ε, then the following holds. Assume that the function f is defined on (0, 1] by
              1
 f (x) = x1+ 100k if x ∈ (σ1 , σ2 ) and f (x) = 0 otherwise. If A is an f -rigid n by n matrix A over F2 then
there is no branching program B with n inputs of length at most kn and of size at most 2εn which, for all
inputs η, computes Ãuη · uη .

Remark 3.10. We use the matrix Ã instead of A in the expression Ãuη · uη at the conclusion of the
lemma, since over a field of characteristic 2 and for a symmetric matrix A, almost all of the terms of
Auη · uη will have 0 coefficients.

Proof of Lemma 3.9. Assume that, contrary to our statement, there is a branching program B with the
given properties which computes Ãuη · uη . We apply Lemma 3.5 with the given values of k, σ1 .σ2 , ε, n
and with the given B. According to Lemma 3.5 there exist δ ∈ {0, 1}, χ ∈ B−1 (δ ), λ , µ ∈ (σ2 , σ1 ),
and Wi ,Yi , i = 1, 2 with the properties listed in Lemma 3.5. Let v = hv0 , . . ., vn−1 i be an n dimensional
vector over F2 defined in the following way. For all i ∈  / W1 ∪ W2 let vi = χ(i) and for all i ∈ W1 ∪ W2

                           T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                          158
                A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

let vi = 0. Recall that for i = 1, 2, Yi is a set of functions from Wi to {0, 1}. We define a vector w(ξ ) =
    (ξ )      (ξ )                                                (ξ )                                      (ξ )
hw0 , . . ., wn−1 i for all ξ in Y1 ∪Y2 . If i ∈ domain(ξ ) then wi = ξ (i), if i ∈
                                                                                  / domain(ξ ) then wi = 0.
Let gi be the following function defined on Yi : for all ξ ∈ Yi , gi (ξ ) = Ã(v + w(ξ ) ) · (v + w(ξ ) ). Since the
functions gi take at most two different values there are Yi0 ⊆ Yi so that |Yi0 | ≥ 12 |Yi | and gi is constant on
Yi0 for i = 1, 2. Assume now that ξ1 ∈ Y10 , ξ2 ∈ Y20 and let η = (χ o ξ1 ) o ξ2 . By Lemma 3.5, η ∈ H and
therefore

                    Ãuχ · uχ = Ãuη · uη = Ã(v + w(ξ1 ) + w(ξ2 ) ) · (v + w(ξ1 ) + w(ξ2 ) )
                              = −Ãv · v + g1 (ξ1 ) + g2 (ξ2 ) + Ãw(ξ1 ) · w(ξ2 ) + Ãw(ξ2 ) · w(ξ1 ) .

Ãuχ · uχ and Ãv · v do not depend on the choices of ξ1 , ξ2 . By the definition of Y10 and Y20 , g1 (ξ1 ) + g2 (ξ2 )
is constant on Y10 ×Y20 . These facts imply that Ãw(ξ1 ) · w(ξ2 ) + Ãw(ξ2 ) · w(ξ1 ) , as a function of ξ1 , ξ2 , is also
constant on Y10 × Y20 . Condition (1) and the definition of Ã implies that Ãw(ξ2 ) · w(ξ1 ) is identically 0 on
Y10 ×Y20 ; therefore Ãw(ξ1 ) · w(ξ2 ) is constant on Y10 ×Y20 . Let V0 be the vectorspace all F2 -valued functions
defined on {0, . . ., n − 1}, and let Vi , i = 1, 2, be the subspace of functions that vanish outside Wi . The
dimension of Vi is µn. We may assume that Yi ,Yi0 ⊆ Vi . Let ι1 be the natural embedding of V1 into V0
and let π2 be the orthogonal projection of V0 onto V2 . B will be the linear map of V1 into V2 defined by
Bx = π2 Ãι1 x. For all ξ1 ∈ V1 , ξ2 ∈ V2 we have Ãw(ξ1 ) · w(ξ2 ) = Bξ1 · ξ2 . If we fix the bases in both V1 and
V2 which consist of those functions which take the value 1 at exactly one point and 0 everywhere else,
then the matrix of B is a submatrix of Ã consisting of those entries whose column numbers are in W1 and
row numbers are in W2 . By Condition (1) this submatrix of Ã is identical to the corresponding submatrix
                                                                              1
of A. Therefore by the f -rigidity of A, the rank of B is at least µ 1+ 100k n. We apply Lemma 3.11 (below)
with V1 , V2 , m → µn, X → Y10 , Y → Y20 and B. Condition (3) implies that

                                                      1
                                              |Yi0 | ≥ |Yi | ≥ 2µn−λ n−1 .
                                                      2
Therefore, according to Lemma 3.11, the fact that Bx · y is constant on γ1 (Y1 ) × γ2 (Y2 ) implies that
                    1                                                                      1
2(µn − λ n) + µ 1+ 100k n ≤ 2µn. This is however impossible since, by Condition (4), µ 1+ 100k > 2λ .

    The following lemma, in more general forms, is proved in [9], [13], [6], and also follows from the
results of [10]. To make the paper more self contained we provide a proof.

Lemma 3.11. Assume that V1 , V2 are m-dimensional vectorspaces over the field F2 , X ⊆ V1 ,Y ⊆ V2 ,
|X| ≥ 2m1 , |Y | ≥ 2m2 and B is a linear map of V1 into V2 so that the rank of B is at least r. If m1 + m2 + r >
2m then the function Bx · y, x ∈ X, y ∈ Y is not constant on X ×Y .

Proof of Lemma 3.11. Let x0 be an arbitrary but fixed element of X and let X 0 = {x − x0 | x ∈ X}. Clearly
|X| = |X 0 | and if Bx · y is constant on X × Y then Bx · y is identically 0 on X 0 × Y . Therefore it is
enough to prove that the assumptions of the lemma imply that Bx · y is not identically 0 on X × Y .
Assume that, contrary to our assertion, it is identically 0. Let H be the subspace in V1 generated by
X and G be the subspace in V2 generated by Y . We have BH · G = 0, that is, the subspaces BH and
G are orthogonal. Therefore dim(BH) + dim(G) ≤ m, where dim(W ) denotes the dimension of the
subspace W . Since the rank of B is at least r we have that dim(BH) ≥ dim(H) − (m − r). We have

                           T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                        159
                                               M IKL ÓS A JTAI

dim(H)−(m−r)+dim(G) ≤ m. The lower bound on the sizes of the sets X,Y imply the following lower
bound on the dimensions of the subspaces generated by them: dim(H) ≥ m1 , dim(G) ≥ m2 . This simply
follows from the fact that a d-dimensional subspace has 2d elements. We have m1 − (m − r) + m2 ≤ m,
that is, m1 + m2 − r ≤ 2m in contradiction to our assumption.

Remark 3.12. The lower bounds on dim(H) and dim(G) remain true even if the field has characteristic
different from 2, but we assume that the elements of X and Y have only {0, 1} coefficients in a suitably
chosen basis of V1 and V2 . See Lemma 7 of [13]. This is important for the generalization of Theorem 3.4
for fields with other characteristics.

3.4   The proof of the main result
Proof of Theorem 3.4. Assume that, contrary to our assertion, there is a branching program B with the
                                                                       n
given parameters which computes the parity of N+ (X). Let m = b 10       c. We apply Lemma 3.9 with n → m,
                                                                       ε
k → ck, where c is a sufficiently large absolute constant and ε → 2 . Assume that σ1 , σ2 are picked with
the properties described in the lemma.
    Let g be the function defined in Lemma 3.7. Applying Lemma 3.7 with n → m, γ → σ2 we get that
there is an m by m g-rigid matrix A = (ai, j ) over F2 . If σ1 is sufficiently small with respect to δ , then A
                                                                                                      ε
will be f -rigid as well. Therefore by Lemma 3.9 there is no branching program of size at most 2 2 n which
computes uζ Ã · uζ in time ckn for all ζ , where ζ is an F2 -valued function defined on {0, 1, . . ., m − 1}.
Let D = {i + j | ai, j = 1} and

                                 Xζ = {i ∈ {0, 1, . . ., m − 1} | ζ (i) = 1} .

For any pair of sets of integers X, Z let N+ (X, Z) be the number of pairs x, y, x < y so that x ∈ X, y ∈ X
and x + y ∈ Z. The statement of Lemma 3.9 in our case is that the parity of N+ (Xζ , D) cannot be decided
by a branching program with the given restrictions on its parameters. We show that this problem can be
reduced to the problem of determining the parity of N+ (Xη ) for a suitably chosen η ∈ Func(n, 2) in a
way which can be implemented by a linear-time branching program. Therefore our indirect hypothesis
will contradict to Lemma 3.9. η is defined in the following way.
    We define first two sets U1 ,U2 . U1 = 2m + Xζ , U2 = 4m + D. Let η be the unique element of
Func({0, 1, . . ., n − 1}, {0, 1}) so that Xη = U1 ∪ U2 . Clearly, if x, y ∈ Xζ , x < y, and x + y ∈ D then
2m + x ∈ Xη , 2m + y ∈ Xη , 2m + x < 2m + y, and (2m + x) + (2m + y) ∈ Xη . Conversely, assume that
z, w ∈ Xη , z < w and z + w ∈ Xη . It is easy to see that this implies z, w ∈ {2m, . . ., 3m − 1} and therefore
z − 2m, w − 2m ∈ Xζ , z − 2m < w − 2m, and (z − 2m) + w(−2m) ∈ Xζ . Therefore N+ (Xζ , D) = N+ (Xη ).
We claim that each value of η can be computed in constant time by a branching program, and to do
this the size of our program must be increased only by a factor of two since the extra memory needed
for this step is only one bit. Indeed, assume that we want to determine the value of η(i) for some
i ∈ {0, 1, . . . , n − 1}. First the program decides whether i ∈ U1 by checking whether ζ (i − 2m) = 1. If
not, then η(i) = 0. If ζ (i − 2m) = 1, then it has to decide whether i ∈ U2 . Since D is part of the input
this can be decided by checking whether i − 4m ∈ D. If the anser is no then η(i) = 0, if the answer is
yes then η(i) = 1. Therefore we have reduced the problem of determining the parity of N+ (Xζ , D) to
the problem of determining the parity of N+ (Xη ).


                         T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                               160
                A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

4     Random Hankel matrices
4.1     The statement of the result
In this section we show that with a positive probability all large submatrices of a random Hankel matrix
have relatively large ranks.

Definition 4.1. The field with q elements will be denoted by Fq .

Theorem 4.2. There exists a c1 > 0 so that, for all c2 > 0, if n is sufficiently large then the following
holds: Assume that A = {ai, j }, i = 0, . . ., n − 1, j = 0, . . ., n − 1 is a random n by n Hankel matrix over
F2 , taken with uniform distribution on the set of all such matrices. Then with a probability greater than
1
2 , A has the following property:

(6). Suppose S = {s0 , . . ., sq−1 }, T = {t0 , . . .,tq−1 } are subsets of {0, . . ., n − 1} with q elements, where
c2 n < q < 2n , and BS,T = (asi ,t j ), i = 0, . . ., q − 1, j = 0, . . ., q − 1 is the submatrix of A consisting of those
entries whose row numbers are in S and column numbers are in T . Then the rank of BS,T is at least
c1 | log( nq )|−2 q.

4.2     Sketch of the proof
4.2.1    A natural but unsuccessful attempt
The most natural way to prove the statement of the theorem would be the following. Assume that
the sets S, T are fixed. We give an upper bound M on the probability pS,T of the event that, for the
randomization of A, the matrix BS,T defined for the fixed sets S and T has rank smaller than c1 | log( nq )|−2 .
If M multiplied by the number of choices for the pair hS, T i is smaller than 21 then the assertion of the
theorem clearly holds.
    Unfortunately a proof of this type cannot work. Indeed if q = cn then the number of pairs hS, T i is
              1
about 22c(log c )n . On the other hand for a fixed pair S, T , in the worst case, the number of minor diagonals
of A intersected by S × T can be as small as 2cn − 1. Each of the choices of 0s and 1s in A on these
diagonals are equally probable so the probability that we get rank smaller than c1 | log( qn )|−2 is at least
2−2cn+1 . (It is not 0 since, e.g. the 0 matrix has such a small rank.) Since the absolute value of the
exponent in the number of pairs is greater by a factor of | log 1c | than in the upper bound M, the product
cannot be smaller than 21 if c is a small constant.

4.2.2    Reducing the number of relevant submatrices
The main problem with the argument described above was that the number of pairs hS, T i is too large
compared to the number of relevant minor diagonals. Let S be the set of these pairs, that is, the set of
all pairs hS, T i so that S, T ⊆ {0, 1, . . ., n − 1} and |S| = |T | = q. We will be able to avoid the mentioned
difficulty in the following way. Instead of working with the elements of S, we will consider a smaller set
S0 consisting of pairs hS0 , T 0 i so that |S0 | ≤ q, |T 0 | ≤ q and with the property that for all hS, T i ∈ S there
is a hS0 , T 0 i ∈ S0 so that S0 ⊆ S and T 0 ⊆ T . (We will refer to this property by saying that S0 is dense in
S.)

                           T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                       161
                                                 M IKL ÓS A JTAI

    It is enough to show that for all hS0 , T 0 i ∈ S0 the rank of BS0 ,T 0 is at least c1 | log( qn )|−2 q. Indeed,
since S0 is dense in S, each matrix BS,T with hS, T i ∈ S has a submatrix BS0 ,T 0 with hS0 , T 0 i ∈ S and so
rank(BS,T ) ≥ rank(BS0 ,T 0 ) ≥ c1 | log( nq )|−2 q.
    We will define the set S0 by constructing a function F defined on S so that for each hS, T i ∈ S,
F(hS, T i) = hS0 , T 0 i with S0 ⊆ S, T 0 ⊆ T . Clearly if such an F is given and S0 = {F(hS, T i) | hS, T i ∈ S},
then S0 is dense in S. We also have to make sure that |S0 | is small and that we are able to give a good
upper bound, for each fixed hS0 , T 0 i ∈ S0 , on the probability that the rank of the matrix BS0 ,T 0 is smaller
than c1 | log( nq )|−2 .


4.2.3   The rank of an enlarged submatrix

First we describe our method of estimating the probability that the rank of a submatrix BS,T of A is
small for a fixed pair hS, T i. This will be based on the following observation. Assume that a pair hS, T i,
S, T ⊆ {0, 1, . . ., n − 1}, is fixed and s > max S = maxx∈S x, t > max T , S1 = S ∪ {s}, and T1 = T ∪ {t}.
Then with probability at least 12 for the randomization of A we have that the rank of BS1 ,T1 is strictly
greater than the rank of BS,T . We will prove this statement in the following way. For all k = 0, 1, . . ., n−1,
let Dk be the minor diagonal of A containing the entries ai, j with i + j = k. We show that if the values
of the entries of A are fixed on all minor diagonals Dk with k < s + t, then out of the two possible
definitions of A on the minor diagonal Ds+t , at least one will yield a matrix BS1 ,T1 with the property that
rank(BS1 ,T1 ) > rank(BS,T ). The proof of this fact is a simple argument in linear algebra as described in
the proof of Lemma 4.5.
    Lemma 4.5 itself is a slight generalization of this assertion, stating that if we add not a single new
element to S and another single element to T , but a set of new elements S̃ to S so that max S < min S̃,
and a set of new elements T̃ to T so that max T < min T̃ then the resulting enlarged sets S1 = S ∪ S̃,
T1 = T ∪ T̃ have the following property. If the values of the entries of A are fixed on all minor diagonals
Dk with k ≤ max S + max T , then for the randomization of A on the remaining minor diagonals we have
that, with probability at least 1 − 2−|S̃+T̃ | , rank(BS1 ,T1 ) > rank(BS,T ). Indeed, there are |S̃ + T̃ | minor
diagonals which contain an entry as,t of A with s ∈ S̃ and t ∈ T̃ . According to the already described
special case the randomization of the values of the entries on each of these diagonals will lead to the
required increase of the rank with a probability of at least 21 . Since these randomizations are independent
we get that the rank increases with probability at least 1 − 2−|S̃+T̃ | .


4.2.4   Partitioning the rows and columns

What we have done so far is only good for estimating the probability of rank(BS1 ,T1 ) > rank(BS,T ) for
some pairs hS, T i, hS1 , T1 i where S ⊆ S1 , T ⊆ T1 . To get a lower bound on the probability of rank(BS,T ) >
R for some integer R, we will partition S into subsets S1 , . . ., Sl and T into subsets T1 , . . ., Tl so that
max Si < min Si+1 and max Ti < min Ti+1 for all i = 1, . . ., l − 1. If rank(BS,T ) ≤ R = l − r then there
must be at least r distinct elements i of {1, . . ., l} with the property rank(BΓi ,Λi ) = rank(BΓi+1 ,Λi+1 ), where
Γ j = S1 ∪ . . . ∪ S j and Λ j = T1 ∪ . . . ∪ T j for j = 1, . . ., l. We will denote by E the set of all integers
i ∈ {1, . . ., l} with this property. Then



                          T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                   162
               A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

(7). the probability of the event that we get equalities for every element of this set is at most

                                                 2− ∑i∈E (|Si +Ti |) .

     If we add these upper bounds for all of the possible choices for E, that is for all subsets of {1, . . ., l}
with r elements, then we get an upper bound on the probability of rank(BS,T ) ≤ l − r. (This upper bound
is formulated in a slightly more general form in Lemma 4.7.) We will use this estimate for each fixed
choice for hS, T i ∈ S0 with suitable choices of the partitions S1 , . . ., Sl , T1 , . . ., Tl .

4.2.5   The choice of the partitions and the submatrices
Our remaining task is to define the function F so that

                                       S0 = {F(hX1 , X2 i) | hX1 , X2 i ∈ S}

is dense in S, select a pair of partitions for each hS, T i ∈ S0 , and then add the corresponding upper
bounds (with R = c1 q| log( qn )|−2 ) for each hS, T i ∈ S0 . The upper bounds will not depend on the choice
of hS, T i ∈ S0 , so we will have to prove that the common upper bound multiplied by |S0 | is at most 21 .
     When we define F(hX1 , X2 i) for some hX1 , X2 i ∈ S, we will have already in mind the task of choosing
suitable partitions of S and T , where hS, T i = F(X1 , X2 ). We give here a somewhat simplified definition
of F, the final definition will be provided in the proof of Lemma 4.9. Let t be a positive integer which
is a large constant. We assume now, for the sake of simplicity, that t 2 |q. For j = 1, 2 we partition
                          ( j)     ( j)                                                   ( j)        ( j)
X j into tq2 subsets K1 , . . ., Kq/t 2 each containing exactly t 2 elements so that max Ki < min Ki+1 for
i = 1, . . ., tq2 − 1. Clearly these properties uniquely determine both partitions.
                                                         ( j)          ( j)
    For each fixed i = 1, . . ., q/t 2 we pick sets Ji ⊆ Ki , j = 1, 2, with exactly t elements so that
  (1)    (2)                                  ( j)
|Ji + Ji | is maximal. Since the sets Ji have t elements this maximum is at most t 2 . We will show
                                               ( j)
(Lemma 4.3) that, since we pick the sets Ji from sets of size t 2 , this upper bound can be can be attained,
          (1)  (2)
and so |Ji + Ji | = t 2 for all i = 1, 2, . . ., tq2 . Let

                                                          q/t 2
                                                           [ ( j)
                                                   Zj =           Ji
                                                          i=1

for j = 1, 2. Now we define F by F({X1 , X2 }) = hZ1 , Z2 i. For j = 1, 2 we will use the partition
 ( j)        ( j)
J1 , . . ., Jq/t 2 of the set Z j when estimating prob(rank(BZ1 ,Z2 ) ≤ c1 | log( qn )|−2 q). The inequality of Con-
dition (7) gives a good upper bound which can be easily evaluated (as a function of q,t and n) since in
                                                     (1)  (2)
the exponents the value of the expression −(|Ji + Ji |) is t 2 , and the number of exceptional sets E can
                                                                                                                 2
be also estimated without any problems. Finally the number of possible pairs hZ1 , Z2 i is at most q/tn 2 .
These estimates lead to the conclusion of the theorem.

4.2.6   Why did it work?
From the description of the necessary estimates at the end of the last paragraph it is not clear what
made it possible to get a good enough upper bound on the probabilities prob(rank(BZ1 ,Z2 ) ≤ R), where

                          T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                   163
                                                       M IKL ÓS A JTAI

R = c1 | log( nq )|−2 q, compared to the number of pairs hZ1 , Z2 i. It is true that the number of pairs became
smaller, since the sizes of the sets Z j are smaller by a factor of t then the sizes of the sets X j , but for
smaller sets the upper bounds on the probabilities can be larger. Why is it that we gained more on the
number of sets then lost on the upper bounds on the probabilities?
       The answer is that the upper bound did not depend on the sizes of the sets Zi but depended only on
                                 (1)    (2)
the common size of the sets Ji + Ji which was t 2 . The corresponding quantity for the pair hX1 , X2 i is
     (1)    (2)                                                            (1)   (2)
|Ki + Ki |. Since we do not have any assumption about the sets Ki , Ki other than that their sizes are
                            (1)   (2)
t 2 , in the worst case |Ki + Ki | can be as small as 2t 2 . Therefore, although the sizes of the sets went
down by a factor of t, the critical quantity in the upper bounds remained essentially unchanged. This
guaranteed that we won more on decreasing the number of pairs then we lost on increasing the upper
bound of the probabilities. To formulate the same phenomenon in the language of minor diagonals we
may say that: although the ratio of the sizes of the sets X j and Z j is t, if we consider the number of
                                             (1)    (2)         (1)      (2)
minor diagonals intersecting the subsets Ki × Ki resp. Ji × Ji the ratio, at least in certain cases, is
at most 2.
       This completes the sketch of the proof of the theorem. In the remaining part of the section we gave
a detailed proof of the mentioned lemmata and the theorem.

4.3    The proof of Theorem 4.2
Lemma 4.3. Assume that t is a positive integer and U,V are sets of integers with |U| = |V | = t 2 . Then
there are U 0 ⊆ U, V 0 ⊆ V , so that |U 0 | = |V 0 | = t and |U 0 +V 0 | = t 2 .

Remark 4.4. As the proof will show, the lemma remains true if we replace |U| = |V | = t 2 by the weaker
assumption |U| = |V | = t 2 − t + 1.

Proof of Lemma 4.3. We have to select the subsets U 0 ,V 0 of U and V so that each has exactly t elements
and all of the t 2 sums u + v, u ∈ U 0 , v ∈ V 0 are different. Suppose that this does not hold for some
selection of U 0 ,V 0 , that is u + v = ū + v̄ for some suitably chosen u, ū ∈ U 0 , v, v̄ ∈ V 0 . This would imply
that u − ū = v̄ − v and so the sets (U 0 −U 0 )+ and (V 0 −V 0 )+ are not disjoint, where for a set of integers X,
(X)+ = {x ∈ X | x > 0}. Therefore it is sufficient (and also necessary) to prove that there exist U 0 ⊆ U,
V 0 ⊆ V so that |U 0 | = |V 0 | = t and (U 0 −U 0 )+ ∩ (V 0 −V 0 )+ = 0.        /
     Let U = {u0 , . . ., ut 2 −1 }, V = {v0 , . . ., vt 2 −1 } so that u0 < . . . < ut 2 −1 and v0 < . . . < vt 2 −1 . We define
the integers mu , mv by

   mu = min {ui+t−1 − ui | i = 0, 1, ..,t 2 − t}             and        mv = min {vi+t−1 − vi | i = 0, 1, ..,t 2 − t} .

Suppose that, e.g. mu ≤ mv , and let s be an integer with mu = us+t−1 − us . We claim that the choice
U 0 = {us , us+1 , . . ., us+t−1 }, V 0 = {v jt | j = 0, 1, . . .,t − 1} meets our requirements. Indeed, if v jt , vkt ∈ V 0
and j < k then vkt −v jt ≥ v( j+1)t −v jt > v jt+t−1 −v jt ≥ mv ≥ mu , and therefore each element of (V 0 −V 0 )+
is strictly greater than mu . On the other hand U 0 ⊆ [us , us+t−1 ] = [us , us + mu ]; therefore (U 0 − U 0 )+
contains only integers not greater than mu . Consequently (U 0 −U 0 )+ ∩ (V 0 −V 0 )+ = 0.          /

Definitions.

                             T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                            164
                A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

   1. func(n, 2) will denote the set of all functions defined on {0, . . ., n − 1} with values in F2 . Similarly,
      func([l, n), 2) will denote the set of all functions defined on the interval [l, n) = {l, . . ., n − 1} with
      values in F2 .
   2. Assume that n1 , n2 are positive integers and f ∈ func(n1 + n2 − 1, 2). Then diag( f , n1 , n2 ) will be
      the n1 by n2 matrix (di, j ), i = 0, . . ., n1 − 1, j = 0, 1, . . ., n2 − 1, where di, j = f (i + j).
   3. Assume that n1 , n2 , k1 , k2 , are positive integers, n1 > k1 , n2 > k2 , f ∈ func(k1 + k2 − 1, 2), and g is
      taken with uniform distribution from the set func([k1 + k2 , n1 + n2 − 1), 2). Φ(n1 , n2 , f ) will be a
      random variable whose value is diag( f ∪ g, n1 , n2 ) (where f ∪ g is the unique common extension of
       f and g to [0, n1 +n2 −1)). Φ(n1 , n2 ) will denote the random variable whose value is diag(h, n1 , n2 )
      where h is taken with uniform distribution from the set func(n1 + n2 − 1, 2).
   4. Suppose A = (ai, j ), i = 0, . . ., n1 −1, j = 0, 1, . . ., n2 −1, is an n1 by n2 matrix and S ⊆ {0, 1, . . ., n1 −
      1}, T ⊆ {0, 1, . . ., n2 − 1}. Then sub(A, S, T ) will denote the |S| by |T | matrix consisting of those
      entries of A which have row numbers in S and in column numbers in T .

Lemma 4.5. Assume that n1 , n2 , k1 , and k2 are positive integers, k1 < n1 , k2 < n2 , f is a function on
{0, 1, . . ., k1 + k2 − 1} with values in F2 , S ⊆ {0, 1, . . ., n1 − 1}, T ⊆ {0, 1, . . ., n2 − 1}, and

                             |(S ∩ {k1 , . . ., n1 − 1}) + (T ∩ {k2 , . . ., n2 − 1})| ≥ m .

Then with probability at least 1 − 2−m the following holds: the rank of the matrix sub(Φ(n1 , n2 , f ), S, T )
is greater than the rank of the matrix

                        sub(Φ(n1 , n2 , f ), S ∩ {0, 1, . . ., k1 − 1}, T ∩ {0, 1, . . ., k2 − 1}) .

Remark 4.6. If we define random Hankel matrices over an arbitrary field F so that the random entries
of the Hankel matrices are picked from a finite subset D of F with uniform distribution, then our Lemma
remains true if we substitute 1 − |D|−m for the probability 1 − 2−m . (Naturally we also have to modify
the definition on Φ(n1 , n2 , f ), since in this case f is a function whose values are in the set D.)

Proof of Lemma 4.5. Let Φ(n1 , n2 , f ) = (ϕi, j ), i = 0, . . ., n1 − 1, j = 0, . . ., n2 − 1. For each ` = 0, 1, 2, . . .
let S` = S ∩ {0, 1, . . ., `}, T j = T ∩ {0, 1, . . ., `}. For each i ∈ S, j ∈ T , wi, j will be a function defined
on T j by wi, j (x) = ϕi,x for all x ∈ T j . Let r be rank of the matrix sub(Φ(n1 , n2 , f ), Sk1 −1 , Tk2 −1 ). r is
the dimension of the vectorspace generated by the functions wi,k2 −1 , i ∈ Sk1 −1 . Suppose that S̄ ⊆ Sk1 −1 ,
|S̄| = r so that the set of functions W = {wi,k2 −1 | i ∈ S̄} are linearly independent.
     According to the definition of Φ(n1 , n2 , f ), we have to randomize a function g with values in F2
which is defined on the interval [k1 + k2 , n1 + n2 − 1). We randomize the values of g sequentially for each
x ∈ [k1 + k2 , n1 + n2 − 1) ∩ (S + T ). Assume that x ∈ [k1 + k2 , n1 + n2 − 1)] and g(y) has been randomized
already for all y < x. Suppose that for a suitably chosen i ∈ S∩{k1 , . . ., n1 −1} and j ∈ T ∩{k2 , . . ., n2 −1}
we have i + j = x. By the assumption of the lemma this will happen for at least m different values of x.
Therefore, it is enough to show that for such an x the following holds with a probability at least 21 : the
function wi, j is linearly independent from the set of functions H = {wl, j |l ∈ S̄}. (Such an independence
obviously implies that the rank of the matrix sub(Φ(n1 , n2 , f ), S, T ) is greater than |S̄| = r.) Before the
randomization of g(x) the function wi, j is known in every point of T j with the exception of j. Since there

                           T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                        165
                                                            M IKL ÓS A JTAI

are two possibilities for the value of wi, j at j, we have two functions u, v; hence for the randomization of
g(x) we have P(wi, j = u) = P(wi, j = v) = 12 . Consequently, it is enough to show that at least one of the
two vectors u, v is linearly independent from the set H. Indeed, if both are linearly dependent, then their
difference is also linearly dependent on them, that is, u − v = ∑s∈S̄ γs ws, j where γs 6= 0 for at least one
s ∈ S̄. We show that this is impossible. Indeed, u − v is a function on T j which is zero everywhere but at
 j and (u − v)( j) = 1. Consequently j ≥ k2 implies that the restriction of u − v to Tk2 −1 is 0. Therefore we
get that ∑s∈S̄γs γs ws,k2 = 0. The functions ws,k2 are linearly independent so we have γs = 0 for all s ∈ S̄,
in contradiction to our assumption.
                                                                ( j)        ( j)
Lemma 4.7. Assume that for each j = 1, 2, I1 , . . ., Il , is a partition of the interval [0, . . ., n) into
                                                                                  (1)            (2)
pairwise disjoint subintervals, S(1) , S(2) ⊆ {0, 1, . . ., n − 1}, and |(S(1) ∩ Ii ) + (S(2) ∩ Ii )| ≥ mi for all
i = 1, . . ., l. Then, for any positive integer r, the probability that the rank of sub(Φ(n, n), S(1) , S(2) ) is not
greater than l − r is at most
                                                  ∑ 2−mi1 −...−mir .
                                                    1≤i1 <···<ir ≤l

Remark 4.8. If we define the random Hankel matrix Φ(n, n) over an arbitrary field F in the way de-
scribed in Remark 4.6 (just after Lemma 4.5), then our lemma remains true if we substitute |D|−mi1 −...−mir
for 2−mi1 −...−mir in the last expression of the lemma.
                                                                             ( j)   ( j)
Proof of Lemma 4.7. Assume that for all j = 1, 2, x ∈ Ii , y ∈ Ii+1 implies x < y, and assume further
                                         ( j)
that for all j = 1, 2, i = 1, . . ., l, Ii      = [b j,i , b j,i+1 ). Let
                                                     i
                                          ( j) ( j)      (1)
                                                        Ik = S( j) ∩ [0, b j,i+1 ) ,
                                                    [
                                         Si = S ∩
                                                    k=1

                                   (1)   (2)
and let Φi = sub(Φ(n, n), Si , Si ). If the rank of X = sub(Φ(n, n), S(1) , S(2) ) is not greater than l − r
then there are r integers 1 ≤ i1 < . . . < ir ≤ l so that the rank of Φit and Φit +1 is the same for t = 1, . . ., r.
We show that for each fixed i1 , . . ., ir the probability of this event is at most 2−mi1 −...−mir which clearly
implies the statement of the lemma. Suppose that i1 , . . ., ir are fixed. According to the definition of
Φ(n, n) we randomize an h ∈ func(2n−1, 2). We pick the values of h on [2n−1, 2) sequentially. Assume
that for some t ∈ {1, . . ., r} the values of h(0), . . ., h(bit ,1 + bit ,2 ) − 1 have been already fixed. We define a
function f on the set {0, . . ., h(bit ,1 + bit ,2 ) − 1} by f (y) = h(y) for all y = 0, . . ., h(bit ,1 + bit ,2 ) − 1. Now
we randomize the values of h(x) for all x = bit,1 + bit,2 , . . ., bit+1,1 + bit+1,2 − 1. We apply Lemma 4.5 for
                                                                                           (1)         (2)
this part of the randomization with n j → bt+1, j , k j → bit , j for j = 1, 2, S → St , T → St , m → mit , and
for the function f defined above. We get that the probability of the event rank(Φit−1 ) = rank(Φit ) is less
than 2−mt . This implies that the probability that rank(Φit−1 ) = rank(Φit ) for all t = 1, . . ., r is at most
2−m1 −...−mr .

    The following lemma will be used to give an estimate on the probability that the rank of a matrix
sub(A, S, T ) is at least R where the sets S, T ⊆ {0, 1, . . ., n − 1} are fixed and A is a random Hankel
matrix over F2 . Since the statement of the lemma depends on many parameters we restate their roles
as described in the “sketch of the proof” of Theorem 4.2. The sizes of the sets S, T are the same:

                             T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                       166
               A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

|S| = |T | = q. We may think of t as a large constant, although the lemma requires only t 2 < q. In the
“sketch of the proof” (using the notation S → X1 , T → X2 ) we made the simplifying assumption that t 2 |q,
and partitioned both S and T into tq2 subsets each of size t 2 . Without the assumption q2 |t, we will take
first subsets of both S and T with t 2 b tq2 c elements and partition these subsets into classes each of size
t 2 . Therefore Q = b tq2 c of the lemma refers to the number of classes in these partitions. In the estimate
                                               2
given in the lemma the factor 2−(Q−R+1)t is an upper bound on the probability that if we select R − 1
pairs of corresponding classes from the two partitions, then the remaining         ones do not increase the rank
                                                                            Q
of sub(A, S, T ) in the sense explained in the “sketch”. The factor Q−R+1 is the number of ways that we
                                                                              n 2
                                                                                
can select these R − 1 pairs from the Q pairs of classes. The factor Qt            has the following meaning. In
the proof we consider all of the pairs of subsets S0 ⊆ S, T 0 ⊆ S and estimate the probability that for at
least one of them the rank of sub(A, S0 , T 0 ) will be R or greater. Then we multiply this estimate by the
number of possible pairs of sets S0 , T 0 . Since we select S0 , T 0 with |S0 | = |T 0 | = Qt (they contain exactly
                                                                                    n 2
                                                                                      
t elements from each class) the number of possible selections is at most Qt              .

Lemma 4.9. Assume that n, q, R,t are positive integers, t 2 < q < n and R < b tq2 c. Suppose further that
A = {ai, j }, i = 0, . . ., n − 1, j = 0, . . ., n − 1 is a random n by n Hankel matrix over F2 , taken with uniform
distribution on the set of all such matrices. Let p be the probability of the following event:

(8). for all S ⊆ [0, n), T ⊆ [0, n), |S| = |T | = q the rank of the matrix sub(A, S, T ) is at least R.
    Then                                     2                
                                                n         Q                  2
                                   p ≥ 1−                         2−(Q−R+1)t
                                               Qt     Q−R+1
where Q = b tq2 c.

Remark 4.10. This lemma also remains true with some modifications over an arbitrary field if we
randomize the Hankel matrix A according to the distribution described in the remark after Lemma 4.5.
                                         2               2
Namely we have to substitute |D|−(Q−R+1)t for 2−(Q−R+1)t in the last expression of the lemma.

Proof of Lemma 4.9. We will define a function F on the set of all ordered pairs hX1 , X2 i with X j ⊆
{0, . . ., n − 1}, for j = 1, 2, |X1 | = |X2 | = q. Before getting into the details of this definition, recall
from the “sketch of proof”of Theorem 4.2 that, roughly speaking, we get F(hX1 , X2 i) in the following
                                                                                                    (1)   (2)
way. First we partition both X1 and X2 into classes of sizes t 2 . Then we select subsets Ji , Ji from
                                                                  (1)       (2)           (1)   (2)
each pairs of corresponding classes with the property that |Ji | = |Ji | = t, |Ji + Ji | = t 2 . The pair
 S (1) S (2)
h i Ji , i Ji i is the value of F.
       Each value of the function will be a pair hZ1 , Z2 i so that Z1 , Z2 ⊆ {0, . . ., n − 1} and |Z1 | = |Z2 | =
b tq2 ct. The definition is the following. Assume that the pair hX1 , X2 i is given with the described prop-
                                                                      ( j)    ( j)
erties. For each j = 1, 2 we pick pairwise disjoint subsets K1 , . . ., KQ of X j , where Q = b tq2 c, so that
  ( j)                                               ( j)      ( j)
|Ki | = t 2 for all j = 1, 2 i = 1, . . ., Q and x ∈ Ki , y ∈ Ki0 implies x < y for all j = 1, 2, 1 ≤ i < i0 ≤ Q.
(By the definition of Q this is possible.)
                                                                                      (1)          (2)
    Assume now that an i = 1, . . ., Q is fixed. We apply Lemma 4.3 with U → Ki , V → Ki . Let U 0 , V 0
                                                                  (1)       (2)                                ( j)
be the sets whose existence is stated in Lemma 4.3 and let Ji = U 0 , Ji = V 0 . Finally let Z j = Q
                                                                                                        S
                                                                                                          i=1 Ji


                          T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                   167
                                                         M IKL ÓS A JTAI

for j = 1, 2 and let F(hX1 , X2 i) = hZ1 , Z2 i. Clearly the pair hZ1 , Z2 i satisfies the conditions described
above as well as the following additional properties:
                          ( j)      ( j)
 (a). for all j = 1, 2 J1 , . . ., JQ is a partition of Z j , |Ji | = t for all i = 1, . . ., Q,
                                                  ( j)         ( j)
 (b). for all j = 1, 2, 1 ≤ i < i0 ≤ Q x ∈ Ji , y ∈ Ji0 implies x < y,
                                                                            (1)   (2)
 (c). for all j = 1, 2 and i = 1, . . ., q we have Z j ⊆ X j and |Ji | + |Ji | = t 2 .
Assume now that hZ1 , Z2 i ∈ range(F). We estimate the probability ρZ1 ,Z2 of the event that the rank of
the matrix sub(A, Z1 , Z2 ) is smaller than R.
                                           ( j)    ( j)
    We apply Lemma 4.7 with l → Q, Ii → Ji , S(1) → Z1 , S(2) → Z2 , mi → t 2 , r → Q − R + 1. We get
                                                2
that ρZ1 ,Z2 is at most QQ − R + 12−(Q−R+1)t . Therefore, using that |Z j | = Qt, we get that the probability
that the rank of the matrix sub(A, Z1 , Z2 ) is smaller than R for at least one hZ1 , Z2 i ∈ range(F) is at most
                                                           2                
                                   Q          −(Q−R+1)t 2     n           Q                    2
                  |range(F)|                2             ≤                       2−(Q−R+1)t .
                               Q−R+1                         Qt       Q−R+1
For each pair S, T , with the properties given in the lemma, if F(hS, T i) = hZ1 , Z2 i, then Z1 ⊆ S, Z2 ⊆ T
and this implies that rank(sub(A, S, T )) ≥ rank(sub(A, Z1 , Z2 )) so we have the same upper bound on the
probability that the rank of sub(A, S, T ) is smaller than R.

Proof of Theorem 4.2. Assume that θ > 0 is sufficiently small, c1 > 0 is sufficiently small with respect
to θ , and c2 > 0. Suppose further that n is sufficiently large and c2 n < q ≤ n. We apply Lemma 4.9 with
n, q, R = c1 | log( nq )|−1 q, and t = bθ −1 | log( nq )|c. We get that the probability that rank of sub(A, S, T ) is
at least R is at least                          2                 
                                                  n           Q                  2
                                     p ≥ 1−                           2−(Q−R+1)t
                                                 Qt        Q−R+1
where Q = b tq2 c. We show that
                                                2          
                                            n              Q               2
                                                                2−(Q−R+1)t
                                            Qt           Q−R+1

is at most 12 by giving upper bounds in its factors. As we have remarked already at the end the of sketch
of the proof, the crucial fact that leads to the desired result is that in the exponent of 2 we have the factor
t 2 and not only t. We will see that in the actual estimates this t 2 makes it possible to get the strong upper
bound we need.
     We will use that if 0 < α < 21 , n is sufficiently large, and x < αn then
                                               
                                                 x               1
                                                      ≤ e2αn log α .
                                                αn
                          2
Let γ = nq , and λ = Qtq . Clearly c2 < γ < 1 and 12 < λ ≤ 1. Hence
                                 
                   n            n             −1      −1 −1        −1      −1     −1
                        =        −1
                                       ≤ e2γλt n log(γ λ t) = e2γλt n(log γ +log λ +logt) .
                   Qt        γλt n

                              T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                168
               A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

                                                                                                 1   1
Using that t −1 log γ −1 = θ , t −1 log λ −1 ≤ t −1 log 2 ≤ t −1 ≤ θ , and t −1 logt ≤ t − 2 ≤ θ 2 we get that
                                            2
                                        n                         1              1
                                                  ≤ e4γλ (θ +θ +θ 2 )n ≤ 2 20 γλ n
                                        Qt

and                                       
                                        Q               −2     1
                                             ≤ 2Q = 2γλt n ≤ 2 20 γλ n
                                      Q−R+1
if θ is sufficiently small. Moreover,
                                             2        1   2      1    −2 t 2 n         1
                               2−(Q−R+1)t ≤ 2− 8 Qt = 2− 8 γλt                   = 2− 8 γλ n .

These inequalities imply that
                       2     
                   n         Q             2    1         1       1            1    1         1
                                 2−(Q−R+1)t ≤ 2 20 γλ n+ 20 γλ n− 8 γλ n ≤ 2−( 8 − 10 )γλ n <
                   Qt      Q−R+1                                                              2

if n is sufficiently large. (Here we use that c2 < γ and 21 < λ .)


5     The proof of Lemma 3.5
5.1    A lemma about disjoint sets of variables
In this section we prove Lemma 3.5 using Lemma 9 of [4]. (This is the most important technical lemma
of that paper with a long proof.) We reformulate below this result from [4] as Lemma A1 to make it
consistent with the terminology of the present paper. In the proof we will also use other lemmata from
[4]; we will formulate them as Lemma A2, Lemma A3, and Lemma A4 in the present paper. These
latter three lemmata have short proofs (given in [4]) using only the definitions of the concepts contained
in their statements. The following definitions are needed for the statement of Lemma A1.

Definitions.

    1. Assume that B is a branching program with n input variables and η is an input for B. (Recall that
       an input is a {0, 1}-valued function defined on {0, 1, . . ., n − 1} with the meaning that the value
       η(i) is assigned to the variable xi .) At input η the branching program follows a path in the directed
       graph G as described in the definition of a branching program. We associate a time (a nonnegative
       integer) with each node of this path. If the path is v0 , v1 , . . ., vi , where v0 is the source node and vi
       is a sink node, then for all integers t ∈ [0, 1], we will say that the program is at node vt at time t
       with respect to input η. We will use the notation state(t, η) = vt .
    2. Assume that state(t, η) = vt , and var(vt ) = xi . In this case we will say that the program accesses
       the variable xi at time t.
    3. An input η is visible if each variable xi , i = 0, 1, . . ., n − 1 is accessed at some time during the
       computation performed at input η.

                           T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                   169
                                                 M IKL ÓS A JTAI

   4. Assume that B is a branching program so that the path associated with each input is of length l.
      In this case we will say that for each input the length of the program is l.
     Additional assumptions about B. Without loss of generality we will make the following two addi-
tional assumptions about the branching program B in the proof of Lemma 3.5.
     (a) We assume that every input is visible. Indeed, we can modify the branching program B so that
the new branching program B0 first reads the value of each variable and then continues with the original
computation of B. The length and the size of the program are thus increased only by n. Moreover, if
Lemma 3.5 holds for B0 then clearly it also holds for B since, apart from the size and the depth of the
program, Lemma 3.5 treats the program as a black box, it speaks only about the function defined by the
branching program.
     (b) We assume that, independently of the input, the length of the branching program is exactly kn,
that is, for each input η the program reaches a sink node at time kn. This is not an essential restriction
because there is another program B0 , whose size is larger than the size of B by only a factor of at most
n2 , so that program B0 works exactly the same way as program B but also counts the time and when B
reaches a sink node v then it works further till time kn when it gives the same output as the output of B at
node v. (We may assume that at each time t in this new additional time interval, the branching program
B0 accesses, e.g., the variable x0 .)
     As a consequence of this second assumption, for each fixed input η, the function state(t, η) is
defined for all t = 0, 1, . . ., nk and the branching program accesses a variable at each time t for t =
0, 1, . . ., kn − 1.
Definitions.
   1. Suppose that σ is a real number with σ ∈ (0, 12 ). We assume that a partition of the set {0, 1, . . ., kn−
      1} into intervals is fixed so that the length of each interval is between σ n and 2σ n. I(σ ) will denote
      the set of these intervals. If the choice of σ is clear from the context, we will omit the superscript
      σ.
   2. Assume that T ⊆ {0, 1, . . ., kn − 1} is a set of integers. The set of all integers i ∈ {0, 1, . . ., n − 1},
      so that the input variable xi is accessed by the branching program at some t ∈ T , at input η, will
      be denoted by register(T, η). The set of all integers j in register(T, η) so that the value of
      variable x j is not accessed at any time outside T at input η will be denoted core(T, η). Clearly
      core(T, η) ⊆ register(T, η).
Remark 5.1. The notation register(η) was motivated by the fact that, in [4], instead of branching
programs we work with random access machines, and so instead of reading the values of variables the
machine reads the content of registers. To make the two papers more compatible we did not change this
notation.
Definitions.
   1. If a σ > 0 is given, F ⊆ I(σ ) , and χ is an input, then stem(F, χ) will denote the restriction of χ
      onto {0, 1, . . ., n − 1}\core(F, χ).
   2. Suppose that T ⊆ {0, . . ., kn − 1}. We say that x is at the right border of T if x ∈
                                                                                          / T and x − 1 ∈ T .
      The set of those integers which are at the right border of T will be denoted by right(T ).

                         T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                   170
                A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

   3. Suppose that T ⊆ {0, . . ., kn − 1} and χ is an input. Let f be a function defined on the set
      right(T ), so that for all t ∈ right(T ) we have f (t) = state(t, χ). We will call f the right-
      state function of the set T at input χ and will denote it by rstateT,χ .

Remark 5.2. The significance of the set core(T, χ), the right border of T , and the function rightT,χ
is the following. Assume that starting from the input χ we change the value of some of the variables in
core(T, χ) in a way that for the new input χ 0 we have rightT,χ = rightT,χ 0 . Then the output of the
program remains unchanged.

Lemma A1. For all positive integer k, if σ > 0 is sufficiently small with respect to k, ε > 0 is sufficiently
small with respect to σ , n is sufficiently large with respect to ε, B is a branching program with n input
variables, B is of size at most 2εn , for each input the length of the program is kn, and G is a set of visible
inputs, then the following holds. There exist κ > σ , F1 , F2 , f1 , f2 , H satisfying the following conditions:

(9). H ⊆ G and |H| ≥ 2−κn |G|

(10). F1 , F2 are disjoint subsets of I(σ )

(11). for all i = 1, 2 and j = 3 − i if χ, ξ ∈ H, and stem(Fi , χ) = stem(Fi , ξ ), then core(Fj , χ) =
core(Fj , ξ )
                                                                        1
(12). |core(Fi , χ)| ≥ κ τ n for all χ ∈ H and i = 1, 2, where τ = 1 − 50k ,

(13). rstateχ,S Fi = fi for all χ ∈ H, i = 1, 2.
                      1
(14). κ < 2−| log σ | 4

Motivation. The intuitive meaning of Lemma A1 is the following. Suppose that a branching program
works in linear time. Then, if we segment the time into intervals of length about σ n, it is possible to
select two disjoint sets of intervals F1 and F2 so that in each of them, for a large number of inputs χ, we
access many variables (the ones in core(F3−i , χ)) that are not accessed anywhere else. Moreover the
sets F1 and F2 can be selected with the additional property described below. If the state of the branching
program is fixed at the right borders of F1 and F2 (Condition (13)), then core(F1 , χ) and core(F2 , χ)
are independent from each other in the following sense. In order to know what is core(Fi , χ), we do
not have to know the values of the variables in core(F3−i , χ) (Condition (11)). This last condition is the
crucial part of the lemma, everything else in it can be proved by a simple counting argument.

Remarks.

   1. Condition (14) was not included in the original statement of the lemma in [4] but its proof clearly
      implies it. The exact form of the upper bound on κ is not important for us, any upper bound of the
      type κ < g(σ ) where limx→∞ g(x) = 0 would be sufficient for the proof of Lemma 3.5.

   2. We have changed the notation of the original lemma (by substituting κ for λ ) to make it more
      compatible to the notation of Lemma 3.5.

                          T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                               171
                                                 M IKL ÓS A JTAI

   3. The lemma in [4] was originally formulated for random access machines, however in the case
      when the possible contents of the input registers form a set with two elements, the notion of the
      random access machines used there is identical to the notion of (2-way) branching programs.
      Therefore Lemma A1 is a special case of Lemma 9 of [4].

   4. There is a slight difference between the notation of the two papers: in [4] an input is a function
      defined on the set {1, . . ., n} while in the present paper it is defined on {0, 1, . . ., n − 1}.

   5. The proof of Lemma 3.5 from Lemma A1 is almost identical to the proof of Theorem 4 of [4]
      from Lemma 9 of the same paper.

5.2   Reducing Lemma 3.5 to Lemma A1
As we pick the values of the various parameters in Lemma 3.5 we will describe the values of the param-
eters of Lemma A1 when we use it in our proof.
    Assume that k is given (we will apply Lemma A1 with the same value of k). Now we pick σ1 and
σ2 so that σ1 is sufficiently small with respect to k and σ2 is sufficiently small with respect to σ1 . Let
σ = 3σ2 . Let ε > 0 be sufficiently small with respect to σ2 , let n be sufficiently large with respect to ε,
and let B be a branching program of length kn (for each input)and size at most 2εn . (ε, B and n are the
same in the two lemmata.) We pick δ ∈ {0, 1} so that |B−1 (δ )| ≥ 2n−1 . Let G = B−1 (δ ) in Lemma A1.
(As we noted earlier we may assume that every input of B is visible, so G meets this requirement of
Lemma A1.) Now we pick κ, F1 , F2 , f1 , f2 , H with the properties listed in Lemma A1.
    As a first step in the proof of Lemma 3.5 we prove that there is a χ̄ ∈ H so that for each i = 1, 2 the
following condition is satisfied:

(15). assume that si = |core(Fi , χ̄)| and Ȳi is the set of all partial inputs η defined on core(Fi , χ̄) so that
χ̄ o η ∈ H; then |Ȳi | ≥ 61 2−κn 2si .

     For the proof we use the following two lemmata from [4]. The first one, Lemma A2 is Lemma 10 in
[4], the second one Lemma A2, is Proposition 3 in that paper.

Lemma A2. Suppose that F ⊆ I, χ, ξ are inputs with stem(F, χ) 6= stem(F, ξ ) and rstateχ,S F =
rstateξ ,S F . Then there is an x ∈ domain(stem(F, χ)) ∩ domain(stem(F, ξ )) so that χ(x) 6= ξ (x).

Lemma A3. Assume that A ⊆ A0 are finite sets, P is a partition of A, P0 is a partition of A0 , each class
of P is contained in a single class of P0 , and d = |A||A0 |−1 . Then for all λ > 0, there are at most λ |A|
elements x of A so that if C,C0 are the unique P, P0 classes containing x then |C||C0 |−1 ≤ λ d.

    As a first step in proving the existence of a χ̄ ∈ H so that for all i = 1, 2 Condition (15) is satis-
fied, we fix an i ∈ {1, 2} and give a lower bound on the number of inputs χ̄ ∈ H with Condition (15)
(with this fixed i). We define a partition Ti of H in the following way. ∀χ, ξ ∈ H, ξ , χ belong to the
same class iff stem(Fi , χ) = stem(Fi , ξ ). It is a consequence of this definition that if χ and ξ do not
belong to the same class of Ti , then the functions stem(Fi , χ) and stem(Fi , ξ ) must be different (for
the fixed value of i). Since the domains of these two functions, that is, {0, 1, . . ., n − 1}\core(Fi , χ)
and {0, 1, . . ., n − 1}\core(Fi , ξ ) are not necessarily identical, in principle it would be possible for the

                         T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                  172
               A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

functions stem(Fi , χ) and = stem(Fi , ξ ) to be compatible, that is, to be identical on the intersection of
their domains. However Condition (13) of Lemma A1 and Lemma A2 imply that this can never happen,
that is,

(16). functions that belong to different classes of Ti are not compatible.

      We will denote by H 0 the set of all inputs ζ so that there is a χ ∈ H with the property that ζ is an
extension of stem(χ, H). For each fixed χ ∈ H let Wχ be the set of all ζ ∈ H 0 so that ζ is an extension
stem(Fi , χ). Condition (16) implies that the sets Wχ , χ ∈ H (we take each of them only once) form a
partition Ti0 of H 0 . Clearly each class of Ti is contained in exactly one class of Ti0 .
      We want to apply Lemma A3 with A → H, A0 → H 0 , P → Ti , P0 → Ti0 , and λ → 31 . Since, by the
definition of G, we have |G| ≥ 2n−1 , Condition (9) of Lemma A1 implies that |H| ≥ 2−κn 2n−1 . Obviously
|H 0 | ≤ 2n and so d = |H||H 0 |−1 ≥ 12 2−κn . Therefore, according to Lemma A3, for at least 13 |H| inputs χ
from the set H, the following condition is satisfied: χ belongs to a class in Ti whose density in the unique
class of Ti0 containing it is at most 13 d ≥ 16 2−κn . Let Xi be the set of all inputs χ ∈ H with this property.
Since |Xi | ≤ 13 |H| for both i = 1 and i = 2 we have that |H\(X1 ∪ X2 )| ≥ 31 |H|. Let χ̄ ∈ H\(X1 ∪ X2 ). The
definition of Xi implies that for all i = 1, 2, χ̄ belongs to a class of Ti whose density in the corresponding
class of Ti0 is greater than 16 2−κn . Since each class of Ti0 contains exactly 2si elements this implies that χ̄
meets the requirements of Condition (15).
      Assume that χ̄ is fixed with Condition (15) and Ȳi , i = 1, 2 are the sets defined in the description of
that property. We will prove the following:

(17). for all ηi ∈ Ȳi , i = 1, 2 we have (χ̄ o η1 ) o η2 ∈ B−1 (δ ).

    For the proof of this fact we use the following lemma which is Lemma 2 in [4]:

Lemma A4. Assume that χ is an input, η1 , η2 are partial inputs, T1 , T2 ⊆ {0, 1, . . ., nk − 1}. If χ, η1 , η2 ,
T1 , and T2 satisfy the following conditions, then B(χ) =B((χ o η1 ) o η2 ).

(18). domain(η1 ) and domain(η2 ) are disjoint.

(19). T1 and T2 are disjoint.

(20). for all i = 1, 2 we have domain(ηi ) ⊆ core(Ti , χ)

(21). for all i = 1, 2 we have rstateTi ,χ = rstateTi ,χoηi

(22). for all i, j ∈ {1, 2}, i 6= j we have domain(ηi ) ∩ register(T j , χ o η j ) = 0.
                                                                                     /

    To prove that Condition (17) is satisfied by χ̄, we show that the assumptions of Lemma A4 hold with
χ → χ̄, η1 , η2 , T1 → F1 , and T2 → F2 .
                      S               S

    Condition (18). By the definitions of ηi and the function core we have domain(ηi ) = core(Fi , χ) ⊆
Fi for i = 1, 2. Condition (10) of Lemma A1 implies that F1 ∩ F2 = 0,   / so domain(η1 ) and domain(η2 )
are disjoint.
    Condition (19). This is a consequence of Proposition (10) of Lemma A1.
    Condition (20). By the definition of ηi this holds with equality.
    Condition (21). This follows from χ, χ̄ o ηi ∈ H and Condition (13) of Lemma A1.

                          T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                                173
                                                    M IKL ÓS A JTAI

   Condition (22). Assume that i, j ∈ {1, 2}, i 6= j are fixed. We have domain(ηi ) = core(Fi , χ̄).
Condition (11) of Lemma A1 and the fact χ̄ o η j ∈ H together imply that core(Fi , χ̄) = core(Fi , χ̄ o η j ).
Therefore domain(ηi ) = core(Fi , χ̄ o η j ). F1 ∩ F2 = 0/ so at input χ̄ o η j and at times belonging to the set
  Fj we can never access a variable in core(Fi , χ̄ o η j ), and consequently
S


                                     domain(ηi ) ∩ register(T j , χ̄ o η j )) = 0/ .


      Since all of the assumptions of Lemma A4 hold, its conclusion must hold as well and so χ̄ satisfies
Proposition (17).
      We will need the following observation to conclude the proof. Let core(Fi , χ̄) = Si . For any i = 1, 2
and for any X ⊆ Si , there is an Ȳi (X) ⊆ Ȳi so that η(x) = ζ (x) for all η, ζ ∈ Ȳi (X), x ∈ Si \X, and
|Ȳi (X)| ≥ 16 2−κn 2|X| . Indeed, we may partition the elements of Ȳi into disjoint classes according to the
values of its elements on the set Si \X. Since there are at most 2si −|X| classes, at least one class must
contain at least 2−si +|X| |Ȳi | elements. Ȳi (X) will be such a class. The stated lower bound on |Ȳi (X)| and
the lower bound on |Ȳi | formulated in Condition (15) imply |Ȳi (X)| ≥ 61 2−κn 2|X| .
      By Condition (12) of Lemma A1 we have |Si | ≥ κ τ n for i = 1, 2. Let [ 12 κ τ n] = r. Let zi be the rth
smallest element of Si and assume that e.g. z1 ≤ z2 . Let W1 be the set of the r smallest elements of S1 and
let W2 be the set of the r largest elements of S2 . Let Yi = Ȳi (Wi ) for i = 1, 2. According to our previous
observation we have

(23). for all i = 1, 2, |Yi | ≥ 61 2|Wi |−κn .

    By the definitions of r, zi , and Wi , Condition (1) is satisfied by W1 and W2 . We claim that the other
requirements of the lemma are also met by the following choice of the various parameters. We pick two
partial inputs ζ1 ∈ Y1 , ζ2 ∈ Y2 in an arbitrary way. Let χ = (χ̄ o ζ1 ) o ζ2 , λ = 2κ, and µ = |W1 |n−1 =
|W2 |n−1 . (Wi , Yi have already been defined.)
    The definitions of σ1 , σ2 , ε, and δ at the beginning of the proof of Lemma 3.5 show that the require-
ments of the lemma, stated before the conditions λ ∈ (σ2 , σ1 ), µ ∈ (σ2 , σ1 ), are met. λ ∈ (σ2 , σ1 ) is an
                                                                                       1
immediate consequence of λ = 2κ, the inequalities σ < κ, κ ≤ 2−| log σ | 4 , and the fact that we choose
σ2 = 31 σ so that it is sufficiently small with respect to σ1 .
                                                                                  1
   The fact that µ ∈ (σ2 , σ1 ) is a consequence of the following facts: τ = 1 − 50k (cf. Condition (12) of
                                                                       1
Lemma A1), µ = |Wi |n−1 , |Wi | = [ 21 κ τ n], σ < κ ≤ 2−| log σ | 4 , σ = 3σ2 , and σ2 is sufficiently small with
respect to σ1 . Indeed σ = 3σ2 is sufficiently small with respect to σ1 (for a fixed k), and so
                                         
                                  1 τ              1                   1   1
                           µ = κ n n−1 ≤ κ τ ≤ 2−| log σ | 4 (1− 50k ) < σ1 .
                                  2                2

On the other hand                        
                                      1 τ      1     1      1   1
                                  µ = κ n n−1 > κ τ ≥ σ 1− 50k ≥ σ = σ2
                                      2        3     3          3
and so µ ∈ (σ2 , σ1 ).
   We have already seen that Condition (1) of Lemma 3.5 is satisfied.

                             T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                              174
             A N ON - LINEAR T IME L OWER B OUND FOR B OOLEAN B RANCHING P ROGRAMS

   Condition (2) of Lemma 3.5 is a consequence of the definition of µ.
   Condition (3). Using the inequality of Condition (23) we get |Yi | ≥ 16 2|Wi |−κn ≥ 2|Wi |−λ n = 2µn−λ n .
   Condition (4). By the definition of r = |Wi | = µn we have µn = [ 21 κ τ n] and so

                                      1     1 λ    1 λ     1
                                   µ ≥ κ τ = ( )τ = ( )1− 50k .
                                      3     3 2    3 2
Therefore
                                   1        1      1     λ       1      1
                             µ 1+ 100k ≥ ( )1+ 100k ( )(1− 50k )(1+ 100k ) ≥ 2λ .
                                            3            2
(Here we used that by Condition (17), both κ and λ > 0 are sufficiently small with respect to k.)
   Condition (5) of Lemma 3.5 is a consequence of Condition (17) and the definitions of χ and Yi .
These definitions imply that (χ o η1 ) o η2 = (χ̄ o η10 ) o η20 where ηi0 = ηi ∪ ζi |Si −Wi ∈ Ȳi .


References
 [1] * K. A BRAHAMSON: Time-Space Tradeoffs for Branching Programs Contrasted With Those for
     Straight-Line Programs. In 27th IEEE FOCS. IEEE Computer Society, 1986. 1.2.2

 [2] * K. A BRAHAMSON: Time-Space Tradeoffs for Algebraic Problems on General Sequential Ma-
     chines. Journal of Computer and System Sciences, 43:269–289, 1991. [JCSS:10.1016/0022-
     0000(91)90014-V]. 1.2.2

 [3] * M. A JTAI: A non-linear time lower bound for boolean branching programs. In 40th IEEE FOCS,
     pp. 60–70. IEEE Computer Society, 1999. [FOCS:10.1109/SFFCS.1999.814578]. 1.3

 [4] * M. A JTAI: Determinism versus Non-Determinism for Linear Time RAMs with Memory Restric-
     tions. Journal of Computer and System Sciences, 65:2–37, 2002. [JCSS:10.1006/jcss.2002.1821].
     1.2.3, 1.2.4, 1.2.5, 1.2.6, 2.1, 2.2, 3.2, 5.1, 5.1, 1, 3, 4, 5, 5.2, 5.2

 [5] * P. B EAME: General Sequential Time-Space Tradeoff for Finding Unique Elements. SIAM J.
     Computing, 20(2):270–277, 1991. [SICOMP:20/0220017]. 1.2.2

 [6] * P. B EAME , M. S AKS , AND T. S. JAYRAM: Time-space tradeoffs for branching programs.
     Journal of Computer and System Sciences, 63(4):542–572, 2001. [JCSS:10.1006/jcss.2001.1778].
     1.2.3, 1.2.5, 1.2.6, 2.1, 3.3

 [7] * P. B EAME , M. S AKS , X. S UN , AND E. V EE: Time-space tradeoff lower bounds for
     randomized computation of decision problems. Journal of the ACM, 50(2):154–195, 2003.
     [JACM:10.1145/636865.636867]. 1.3

 [8] * A. B ORODIN AND S. A. C OOK: A time-space tradeoff for sorting on a general sequential model
     of computation. SIAM J. Computing, 11:287–297, 1982. [SICOMP:11/0211022]. 1.2.2, 1.2.6

 [9] * A. B ORODIN , A. A. R AZBOROV, AND R. S MOLENSKY: On lower bounds for read-k-times
     branching programs. Computational Complexity, 3:1–18, 1993. 1.2.5, 1.2.6, 2.1, 3.3

                        T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                               175
                                            M IKL ÓS A JTAI

[10] * B. C HOR AND O. G OLDREICH: Unbiased Bits from Sources of Weak Randomness and Proba-
     bilistic Communication Complexity. In 26th IEEE FOCS, pp. 429–442. IEEE Computer Society,
     1985. 1.2.5, 3.3

[11] * M. K ARCHMER: Two Time-Space Tradeoffs for Element Distinctness. Theoretical Computer
     Science, 47:237–246, 1986. [TCS:10.1016/0304-3975(86)90150-7]. 1.2.2

[12] * S. R EISCH AND G. S CHNITGER: Three applications of Kolmogorov-complexity. In 23rd IEEE
     FOCS, pp. 45–52. IEEE Computer Society, 1982. 1.2.2

[13] * J. S. T HATHACHAR: On separating the read-k-times branching program hierarchy. In 30th ACM
     STOC, pp. 653–662. ACM, 1998. [STOC:10.1145/276698.276881]. 1.2.5, 1.2.6, 2.1, 3.3, 3.12

[14] * Y. Y ESHA: Time-Space Tradeoffs for Matrix Multiplication and the Discrete Fourier Transform
     of Any General Sequential Random-Access Computer. Journal of Computer and System Sciences,
     29:183–197, 1984. [JCSS:10.1016/0022-0000(84)90029-1]. 1.2.2


AUTHOR

     Miklós Ajtai
     IBM Almaden Research Center
     ajtai almaden ibm com


ABOUT THE AUTHOR

      M IKL ÓS A JTAI received his Ph. D. from the Hungarian Academy of Sciences in 1975. His
         advisor was András Hajnal. He worked in the following areas: axiomatic set theory
         (independence proofs), lattice theory (posets with meet and join), combinatorics, the
         theory of random graphs, complexity theory, sorting networks, the theory of lattices (n-
         dimensional grids) and their applications to complexity theory and cryptography. He is
         a member of the Hungarian Academy of Sciences and was an invited speaker at ICM in
         1998. He received the Knuth prize in 2003, and the IBM Corporate Award in 2000.




                       T HEORY OF C OMPUTING, Volume 1 (2005), pp. 149–176                          176