How Many Bits Can a Flock of Birds Compute?

Authors Bernard Chazelle,
Plaintext
                        T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451
                                      www.theoryofcomputing.org




                          How Many Bits
                   Can a Flock of Birds Compute?
                                                 Bernard Chazelle∗
               Received August 3, 2012; Revised October 30, 2014; Published November 13, 2014




       Abstract: We derive a tight bound on the time it takes for a flock of birds to reach
       equilibrium in a standard model. Birds navigate by constantly averaging their velocities with
       those of their neighbors within a fixed distance. It is known that the system converges after
       a number of steps no greater than a tower-of-twos of height logarithmic in the number of
       birds. We show that this astronomical bound is actually tight in the worst case. We do so by
       viewing the bird flock as a distributed computing device and deriving a sharp estimate on the
       growth of its busy-beaver function. The proof highlights the use of spectral techniques in
       natural algorithms.

ACM Classification: F.2.0
AMS Classification: 68W25
Key words and phrases: natural algorithms, dynamical systems, bird flocking, busy-beaver function

1     Introduction
In a standard flocking model, n birds fly in the sky and modify their velocities at each time step by
averaging them with those of their neighbors within a fixed distance (Figure 1). It has been shown that
the system always tends to equilibrium and that the relaxation time is bounded by a tower-of-twos of
height O(log n), which is “two to the two to the two. . . ” repeated on the order of log n times [2]. Past
that worst-case limit, equilibrium is approached exponentially fast: the birds fly with constant velocities
subject fast-decaying deviations. We show here that, oddly enough, this astronomical upper bound is in
fact optimal.
    ∗ This work was supported in part by NSF grants CCF-0634958 and NSF CCF-0832797.



© 2014 Bernard Chazelle
c b Licensed under a Creative Commons Attribution License (CC-BY)                      DOI: 10.4086/toc.2014.v010a016
                                           B ERNARD C HAZELLE




       Figure 1: Each bird averages its velocity with those of its neighbors within a fixed radius.


     This work is part of an effort to understand the worst-case behavior of bird flocking viewed as a
natural algorithm. Because the actual biology of bird flying is beyond anybody’s grasp at this point,
a reasonable approach is to specify an idealized model of the phenomenon that, one hopes, captures
some essential aspect of the behavior. In this case, the model tries to express the birds’ uncanny ability
to reach consensus about their velocities despite the ever-changing topology of their communication
channels. To prove that the time to reach equilibrium can be very high, one must carefully engineer a
starting configuration of the birds and show how their behavior remains essentially nontrivial for a long
time. What does this mean? Because the velocities and positions of the birds can be assumed to be
rational numbers, the evolution of the group can be modeled by a distributed algorithm operating over the
rationals. Since it is known that the flocking algorithm will always converge, the question is: How long
can it take in the worst case?
     As it happens, this is equivalent to asking how many bits are required to encode the final stationary
velocity of the birds. If three birds are initialized with m bits each, none of them can compute a single
extra bit on their own: together, however, they can produce 2Ω(m) brand-new bits! It is in this sense that
one can talk about the number of bits “computed” by a bird flock. This is similar to what is known in
computational complexity as the “busy-beaver problem”: given a program, how long can it run and halt
for an input of size n? In our case, the birds never stop flying so, technically, the program never halts. The
creation of new flocks does, however. Natural algorithms, by definition, never stop. Their busy-beaver
functions, therefore, merely count how much time can elapse before their behaviors become trivial, i. e.,
fixed or asymptotically periodic.


The model and the main result. Most models of bird flocking studied in the literature follow the boids
framework of Reynolds [9, 12]. Roughly, birds try to (i) align their headings, (ii) stay grouped together,
and (iii) avoid collision. Applying all three rules together produces spectacular visuals: unsurprisingly,
all of CGI bird flocking animations in Hollywood are based on them. Analyzing their joint behavior has
proven to be a challenge, however. Of all three rules, the first one drives the dynamics, so it is customary
to dismiss the other two as merely corrective: indeed, the bulk of the mathematical work on bird flocking

                    T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                              422
                           H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?

has focused on variants of rule (i) [2–8, 10–13]. This bold choice was validated recently by the empirical
findings of the STARFLAG project, the most comprehensive experimental bird flocking investigation to
date [1].

    The model is easy to describe. We give a formal description below but a few words of explanation
suffice to tell the whole story. A bird is represented by a point in Euclidean 3-space along with a
three-dimensional vector indicating its velocity. For a system of n birds, the initial condition is thus
entirely specified by 6n numbers. The time t is discrete, so the position of a bird i at time t is given by its
placement xi at time t − 1 shifted by its current velocity vi ; in other words, the bird’s location xi becomes
xi + vi . To update vi , we form an undirected graph, called the flocking network, by connecting any two
birds within unit distance by an edge, and we express the new velocity of bird i by averaging its current
velocity vi with those of its neighbors in the graphs. The system reaches equilibrium when the flocking
network no longer changes. The terminology is justified by the fact that, once the network settles on a
final configuration, the birds soon begin to fly (essentially) in a straight line at constant speed: from that
point on, nothing of interest happens. The model is robust in that the averaging can be weighted, if so
desired, and a moderate amount of decaying noise can be tolerated as well. The connected components of
the flocking network are called the flocks of the system. They will typically fragment and merge in erratic
ways, and it is this evolving topology that makes such systems hard to analyze.

    Indeed, if the flocking network were fixed once and for all, the velocities would evolve by iterated
averaging in a manner lending itself to the theory of random walks. The crux of the matter is that the
flocking network may be constantly changing. If the changes were random, some of the tools for Markov
chains could be rescued, but the difficulty is that the changes are endogenous. There is a feedback loop
from the positions of the birds, which determine the flocking network, which in turn specifies the evolution
of the velocities, which then determine the new bird positions. To show that the system always reaches
equilibrium calls on tools from different areas [2]: combinatorics, linear algebra, circuit complexity,
computational geometry, even elimination theory—to determine whether two birds will ever be within
unit distance of each other requires root separation bounds for various characteristic polynomials.

    In the field of dynamics, attraction to a fixed point is usually established by exhibiting a Lyapunov
function. Because of the changing topology of the network, this approach runs into all sorts of problems
here. Fortunately, proving the matching lower bound can be done entirely within the language of linear
algebra. Once we engineer a starting configuration for the birds, to bound the relaxation time is a matter
of monitoring the evolution of various Fourier coefficients as flocks merge together. Of course, it is the
occasional presence of nonlinearities that causes the astronomical delay we seek. In the construction,
the number of nonlinearities can be kept remarkably small (in fact, less than the number of birds). To
see why Fourier analysis arises naturally, it is helpful to compare the transmission of information among
the birds to the diffusion of heat in a medium. By choosing the right topology, we can ensure that the
eigenfunctions of the Laplacian form a nice, simple set of harmonics.

     We define the model formally. Given n birds represented at time t by their position vector x(t) =
(x1 (t), . . . , xn (t)) ∈ (R3 )n , the flocking network at time t links any two birds (the nodes) within unit
distance of each other. The Laplacian Lt at time t is the n-by-n matrix such that Ltii is the number of birds
within distance 1 of bird i and, for i 6= j, Lti j is −1 if birds i, j are within unit distance and 0 otherwise.

                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                              423
                                            B ERNARD C HAZELLE

For any t > 0,                        (
                                          x(t) = x(t − 1) + v(t) ;
                                                                                                       (1.1)
                                          v(t + 1) = (Pt ⊗ I3 )v(t) ,
where Pt = In −Ct Lt and Ct is a diagonal matrix with positive rational entries. The system is initialized
by fixing x(0) and v(1). The Kronecker product is used to make the stochastic matrix Pt act on each
coordinate axis separately. We state the main result of this paper next and give the proof in Section 3. We
set the grounds for it in Section 2.

Theorem 1.1. There exists an initial configuration of n birds in R3 requiring O(log n) bits per bird to
specify such that the flocking network is still changing after a number of steps equal to a tower-of-twos of
height Ω(log n).

The initial configuration of a set of birds refers to an assignment of six rational numbers to each bird to
specify their initial positions and velocities. The lower bound above is the best possible [2].

Time to equilibrium. Our result is a worst-case lower bound. What about the matching upper bound?
To prove that a bird system converges is done by showing that (i) the flocking network freezes within
finite time and (ii) the birds travel with constant velocity from that point on, while subject to damped
oscillations decaying exponentially fast [2]. The latter is easy to prove. Indeed, once the network becomes
static, the system becomes a coupled oscillator dual to a Markov chain. The challenging part is (i): to
show that the flocking network converges to a fixed graph. The proof proceeds in two steps. First, it is
shown that the network must at some point cease to fragment, with all subsequent (nonlinear) events
consisting of flock merges. Second, the last such event is shown to occur within a number of steps equal
to a tower-of-twos of logarithmic height [2]. In this paper we establish that this unusual bound is actually
optimal.
     What can possibly account for such astronomical delays? Imagine two flocks flying almost parallel to
each other and headed toward collision: the smaller the angle the longer it will take for the two flocks
to merge. Picture now a whole set of such flocks merging two-by-two so as to become increasingly
parallel to one another after each merge. A careful choice of initial conditions can lead to delays growing
exponentially between consecutive merges. By scheduling the latter in a balanced fashion, we can ensure
that each bird witnesses about log n of them: the tower-of-twos lower bound follows directly. Although
birds fly in three dimensions, the construction can be achieved within a fixed plane (X,Y ). In fact, the
motion along the Y axis is identical for all birds, so it suffices to focus on the X coordinates. For that
reason, it will be helpful to leave birds aside momentarily and use the one-dimensional imagery of trains
colliding on a railroad track. The reason for doing this is that we can lift a system of colliding trains to
higher dimension to form flocking birds. The transformation is entirely straightforward.

Spectral shift. Imagine n railroad cars moving on a train track. Whenever two of them collide, they
become attached with a spring which acts mechanically to set the resulting two-car train’s stationary
velocity to some average of the two individual speeds. Trains can in turn collide to form larger ones:
after the collision, each train keeps averaging its velocity with its immediate neighbors on both sides (if
any). What is the lowest nonzero speed that can be thus achieved? (Note that the dissipative nature of the

                    T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                            424
                             H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?

dynamics rules out speedups.) Viewing the process as a distributed computation raises a busy-beaver
type question. Assuming that each of the n cars is given an initial velocity encodable as an O(log n)-bit
rational, the initial speed of any car can be no smaller than inverse polynomial in n. In fact, as one
will easily see, the mass center of the train resulting from a collision between two cars cannot have a
nonzero speed lower than inverse polynomial as well. It is only through repeated collisions between
bigger and bigger trains that the speed can begin to drop substantially. And when it does, the dropping
can be precipitous. Indeed, the lowest nonzero speed of an n-car train is inversely proportional to a
tower-of-twos of height logarithmic in n.
      This explosive decay points to an intriguing phenomenon of spectral shift: a collision between two
trains cancels the highest eigenvalue of each one and shifts the remaining spectrum upward. A quick
word of explanation. A moving n-car train can be regarded as a vibrating string with n harmonics. The
highest one determines the speed of the whole train, while the other ones tell us how the motions of the
individual cars deviate from the average. Because the system is dissipative (all but one of the eigenvalues
are of magnitude less than 1), the nondominant eigenmodes decay exponentially fast. When two trains
A and B hit each other, the new train C (of twice the size) acquires its eigenvalues from A and B: the
two average speeds are made to be exactly opposite, so the highest modes cancel each perfectly. As a
result, all the modes of C are linear combinations of the nondominant modes of the smaller ones; in other
words, if the colliding trains have Fourier coefficients of the form (a1 , a2 , . . . , ak ) and (−a1 , b2 , . . . , bk ),
then the new train C has a spectrum (c1 , . . . , c2k ), where the ci are linear combinations of a2 , . . . , ak and
b2 , . . . , bk . Note the absence of a1 in the formation of c1 : this is the spectral shift in action. Because of
dissipation, at the time of collision, the coefficients ai , bi (i > 1) will be much smaller than |a1 |. This is
easy to achieve so as to get a one-shot exponential boost. The tricky part is to arrange for the shift to kick
in over repeated collisions: too much symmetry in the initial configuration of the cars brings the trains to
a halt; too little fails to clear the dominant modes—a precondition for any spectral shift—and makes the
exponential boost unsustainable over several collisions.
      The train model attempts to represent a one-dimensional projection of the birds. Once two trains
collide, they are assumed to stay attached forever. If we lift the construction to model birds, however,
this “glueing” provision no longer holds and cohesion must be checked. It is imperative that birds
corresponding to attached trains should remain connected on their own, i. e., stay within a flock by sheer
virtue of the distant-based attachment rule of the flocking network. Unlike trains, whose cohesion is
enforced exogenously, flocks may fragment and an integrity analysis is necessary to establish that they
do not: this task is necessary but merely of technical interest, so we deal with it separately in Section 3.
Meanwhile, the main idea of the construction, the spectral shift, is entirely contained in the analysis of
slow-train systems given in Section 2.


Number encodings. The model relies on the ability of birds to update their velocities with arbitrarily
high precision. (All computations are over the rationals so there is no need for real-value computation and
all bit lengths are finite.) Is this a realistic assumption? Our results are likely to be mostly of theoretical
interest but large precision cannot be the reason. To see why, an analogy is helpful. The relevance of
chaos theory hinges on precisely the same point. Chaos is meaningful, strictly speaking, only if physics
operates with infinite precision, which of course it does not. The pertinence of the concept is that its
underlying motivation—sensitivity to initial conditions—still matters over a finite time horizon, hence

                      T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                       425
                                            B ERNARD C HAZELLE




Figure 2: The mass center of the train moves at constant speed, which is given by its stationary velocity,
i. e., the lowest mode of the system. The speeds of the individual cars, given by the higher modes,
converge exponentially fast to the stationary value.


with bounded precision. In other words, the mathematical framework of chaos is an idealization of a
phenomenon that is still present with imprecise computations. The same is true of the flocking model.
Even with finite precision, the slowdown induced by the spectral shift can occur and delay convergence:
the assumption of perfect accuracy is merely a convenient mathematical idealization. If the model has a
serious weakness, it is not infinite precision but determinism: no one knows what happens if we inject
noise with constant entropy rate into the system. This is a very interesting open problem.


2    Slow train coming
Picture n railroad cars on a track, all separated from one another by at least a fixed distance. At time 0,
give each one a little kick, some to the left, others to the right. There is no friction on the track, so the
cars will move at constant speed until they start to hit one another. Should this happen, the collisions
will be softened by the presence of a spring at the right end of each car (Figure 2). While absorbing the
shock, the spring latches on to the other car. At that moment, the two colliding cars get attached to form a
two-car train. Because of the spring, now attached to both cars, the train forms a coupled oscillator. The
mass center of the two-car train will move at a constant speed equal to the average of the two individual
velocities at the moment of impact. (We take liberties with the physics of the construction and emphasize
that the analogy is only of mathematical interest.) In spectral terms, this is the lowest mode of the system
and the speed of the mass center forms the stationary velocity of the train: it is associated with the
principal eigenvector whose corresponding eigenvalue is 1, which explains why the mass center moves at
constant speed.
    Because of the coil spring, the coupled oscillator is damped, so the velocities of the individual cars
will deviate from the stationary one by an error term decaying exponentially fast. (This is dual to an
ergodic Markov chain with two states.) For a brief moment, the two cars might jerk back and forth a little,
reeling from the shock, but very quickly the stationary velocity will assert its dominance and both cars
will proceed to move in the same direction. Further collisions can happen, resulting in multicar trains
fastened together by their springs. (Recall that a spring locks onto any car it hits.) The mass center of the
three-car train in Figure 2 has constant velocity but, though still fast-decaying, the oscillations of the cars’
individual speeds, being now coupled, become increasingly complex. In our discrete model, the speeds of
adjacent cars are being averaged at each time step, so these oscillations are governed by a rapidly mixing
Markov chain. In the language of distributed computing, this is a consensus system: messages about
the cars’ current speeds are passed around up and down the train to cause endless readjustment toward
consensual agreement on a common speed. The twist is that the trains keep colliding and hence growing,
with more and more cars getting attached to them. To investigate how slowly they can go, it is useful

                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                              426
                            H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?

to distinguish the number of bits used in the initial conditions from the number of cars; therefore, we
assume that the initial positions and velocities are encoded as rationals over O(log m) bits per car, for
large enough m.


2.1   A small example
Two railroad cars hurling toward each other at the same speed collide to produce a two-car train with
zero stationary velocity: the train oscillates around its mass center, which does not move. Obviously, we
must initialize the two cars with distinct speeds if only to keep the system moving. A moment’s reflection
shows that, regardless of how we do it, the stationary velocity cannot be smaller than 1/poly(m). With
three cars (n = 3), however, it can already be as small as 1/exp(m). We explain why next. A pattern
thus seems to emerge: with each new car, the stationary velocity is (inversely) exponentiated, leading
to a tower-of-twos of height linear in the number of cars. But this is not what happens: indeed, we
know from [2] that the height cannot be superlogarithmic. Why? At this point, it is useful to build some
intuition by working out the case n = 3 in detail:


                                              A THREE - CAR TRAIN

      We show how, with only O(log m) bits of input, a three-car train can compute a rational number with
      more than m bits: an exponential expansion. For concreteness, assume the cars are 10 times longer
      than the unit-length springs. Cars a and b are separated by a distance of 1, so that they’re joined into a
      two-car train. We give them an initial velocity of va = 4/m and vb = −2/m. We choose a (1/3, 2/3)
      coupling action for the spring—just about any choice works—so that at the next step the velocity of a
      becomes va → va /3 + 2vb /3 and that of b becomes vb → 2va /3 + vb /3. At time t > 0, the velocities
      of a, b satisfy:
                     t             t                                                  
                     va      −t 1 2         va     1             1     1      −t               1
                         =  3                   =    (va + vb )     +    (−3)    (va − v b )      .
                     vtb          2 1       vb     2             1     2                      −1

      The second equality follows from direct diagonalization. The ab-train has a stationary velocity of
      (1/2)(va + vb ) = 1/m. The third car, c, starts to the right of the ab-train with a speed of vc = −3/m.
      Its initial position puts its left end at a distance about 4 from the right spring attached to b. We can
      thus easily ensure that the impending collision between ab and c, hurtling toward each other at a
      relative speed of 1/m − (−3/m) = 4/m will happen at time t = m. The coupling action for the train
      abc is given by: va → va /3 + 2vb /3; vb → va /3 + vb /3 + vc /3; and vc → 2vb /3 + vc /3. Assuming
      that m is large and even, the post-collision velocity vector is:
                                                 s  t 
                                                                                       1 + 31−m
                     t+s                                              t                   
                      va               1 2 0           va                 va
                    vt+s  = 3−s 1 1 1 vt  , where                            1
                                                                        vt  = 1 − 31−m  .
                        b                               b                  b                             (2.1)
                        t+s                             t                         m
                      vc               0 2 1           vc                 vtc            −3

      Since (1, 2, 1) is a principal left eigenvector for the abc-train, the stationary velocity is, as claimed, a
      rational whose binary expansion is more than m-bit long, i. e.,

                                          1                                  31−m
                                            (1, 2, 1)(vta , vtb , vtc )T = −      .
                                          4                                   4m


                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                        427
                                              B ERNARD C HAZELLE

In this example, the exponentially small stationary velocity is the product of a careful balance: (i) the
stationary velocities of ab and c cancel out perfectly; (ii) the decaying energy of the lower modes—
signalled by the 31−m term in (2.1)—is transferred to the lowest mode to set the stationary velocity of the
abc-train. This is called a spectral shift [2]. Parts (i) and (ii) are in tension: the first calls for symmetry
while the second must avoid undesired cancellations among the nontrivial frequencies of the spectrum.
For example, suppose we tried to generalize this construction to n = 4 by setting ab on a collision course
with its mirror image cd. This would be perfect for (i) but disastrous for (ii), as the excessive symmetry
would bring the abcd-train to a halt. The trick is to inject enough symmetry to kill off the lowest mode of
each new train formation but not so much that it kills the higher modes too. Condition (i) calls for setting
up head-on collisions between trains of the same size. These collisions will thus follow the bottom-up
pattern of complete binary tree with n leaves.
     The example above shows that choosing the right initial velocities is the name of the game. The actual
initial placement of the cars, if not quite arbitrary, is straightforward: essentially, we need to position
the cars far enough apart so that two trains should have a chance to travel at least distance 1 before they
collide. This suggests factoring out all positions and analyzing a “consensus” system defined entirely
by the repeated averaging of the velocities: this way we can analyze the spectral shift in dimension n
instead of 2n. It will then be the (easy) matter of restoring the initial placements and lifting the system to
two dimensions. In the end, we will show how to position the individual train cars and give them the
right initial velocities so that the time elapsed before all n cars are joined together into a single train is a
tower-of-twos of height proportional to log n.


2.2     The characteristic time
Let v = (v1 , . . . , vn ) be a vector in Qn , where n is a power of two and each vi is encoded as a ratio of two
O(log n)-bit integers. We denote by T the complete binary tree with n leaves, and we label the i-th leaf
from the left vi . Write
                                                                                
                                             1 2            0        0 ...    0
                                           1 1             1        0 ...    0
                                          1                                     
                                      Pj =  ... . . .                 ..     ..  .                       (2.2)
                                           
                                                                          .    .
                                          3                                     
                                           0 . . .         0        1   1    1
                                             0 ...          0        0   2    1
                                           |                    {z               }
                                                                2j

This is the transition matrix of an ergodic reversible random walk on a graph consisting of a path of 2 j
nodes, so Pj∞ = limk→∞ Pjk is the rank-one matrix 1π T , with

                                                 1
                                         π=           (0.5, 1, . . . , 1, 0.5)T .
                                              2j −1
We define two vectors for each node a of height j:

      • If j = 0, set va = v a = vi , where a is the i-th leaf of T from the left.

                      T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                               428
                             H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?

                                           j
      • If j > 0, set va = (v b , v c ) ∈ Q2 and v a = Pjθa va , where b (resp. c) is the left (resp. right) child of
        a and θa = kPj∞ va k−1
                             ∞ . (We ignore rounding issues, which are inconsequential, and simply assume
        that θa is an integer without further notice.)

The leaves of the subtree rooted at a correspond to consecutive cars along the track. Together these
cars form the train associated with node a: its stationary velocity, 1π T va , is given by any of the (equal)
coordinates of Pj∞ va . The value of θa , therefore, is roughly the time it takes for train a to travel a distance
of 1 at its stationary speed. We call θroot the characteristic time of the system. How large can it be and
still be finite? We define the “tower-of-twos” function: 2  n = 22(n−1) for n > 1 and 2  1 = 2.


Theorem 2.1. The characteristic time can be as large as 2  log n, where n is any large enough power
of two.

Proof. Set n = 2ν for ν assumed large enough, and define
                                                            ν−1
                                               sk = −n−c ∏ (−1)bl ,
                                                             l=1

where c > 0 is a large enough integer constant and ∑ν−1             l
                                                       l=0 bl 2 is the binary expansion of k = 0, . . . , n − 1.
We define v as follows:
                                                       n
                                    z                 }|                      {
                              v = s0 , −2s0 , s2 , −2s2 , . . . , sn−2 , −2sn−2 .
This definition has a simple interpretation. Indeed, it can be arrived at by starting with the initially
assignment
                                                     n
                                  z                 }|                      {
                            v = −n−c , 2n−c , −n−c , 2n−c . . . , −n−c , 2n−c                      (2.3)
and then, while interpreting these coordinates as labels assigned to the leaves of T, applying the following
algorithm: for every node of T that is a right child but not a leaf, flip the signs of the labels of the leaves
of the subtree rooted at it. Note that labels might be flipped more than once. In general, replacing a vector
va by −va for any right child a of positive height is called a flip (more on which below).
    This alternative view of the dynamics makes the analysis easier so we adopt it. By symmetry, we may
focus the analysis on the left spine a1 , . . . , aν , where a j is the leftmost node of height j. Note that va j is
of the form (∗, −∗). Quite clearly, θa depends only on the height j of a, so we refer to it by θ j . We are
only interested in the case θ j < ∞, a property that we show below holds true.


2.3     The modes of the system
The dominant left eigenvector of Pj is (1, 2, . . . , 2, 1) up to scaling, so the first Fourier coefficient is

                                                                   2j
                                                  1   z       }|          {
                                         ma j = j   ( 21 , 1, . . . , 1, 12 )va j                                (2.4)
                                               2 −1

                       T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                   429
                                                     B ERNARD C HAZELLE

and satisfies
                                                   ma j = Pj∞ va j ∞ = θ j−1 .                                             (2.5)
Technically, this is the Fourier coefficient of index 0 for va j with respect to the additive group of the
integers modulo 2 j+1 − 2, from which the structure can be derived by folding the cycle of length 2 j+1 − 2
into an interval of size 2 j . This gives us the average speed of the joint group of 2 j leftmost railroad cars.
Elementary trigonometry shows that, for k = 1, . . . , m = 2 j ,
                                      π(k − 1)                π(k − 1)(m − 2)           T
                        uk = 1, cos              , . . . , cos                 , (−1)k−1
                                         m−1                        m−1
is the unique (up to scaling) right eigenvector of Pj for
                                                                         
                                               1                 π(k − 1)
                                         λk =         1 + 2 cos
                                               3                   m−1
Write
                                        2j                                                                2j
                        1       z       }|          {            1 z }| {
                 π= j         ( 12 , 1, . . . , 1, 12 )T
                                              and diag C j = diag ( 2, 1, . . . , 1, 2 ) .                                 (2.6)
                     2 −1                                        3
                                      s     T    s
For s ≥ 1, we diagonalize the matrix Pj = 1π + Q j , where
                                                           2j
                                                                       1/2       −1/2
                                              Qsj =    ∑ λksC j wk wTk C j                 ,                               (2.7)
                                                       k=2
                                         1/2
where the k-th right eigenvector C j wk of Pj is proportional to uk , with the normalization condition
kwk k2 = 1. It follows that, for any 1 < k ≤ 2 j ,
                 
                         1 2         π(k − 1)
                  λk = + cos j               ,
                 
                 
                         3 3          2 −1
                             1        π(k − 1)                π(k − 1)(2 j − 2) (−1)k−1 T
                  wk = δk
                 
                             √   , cos j        , . . . , cos                  , √         ,
                                2       2 −1                        2j −1           2
            √
where δk = 2 (2 j − 1)−1/2 for 1 < k < 2 j and δ2 j = (2 j − 1)−1/2 . By the triangle inequality and the
submultiplicativity of the Frobenius norm, for any z,
                                             1/2                −1/2                           1/2        −1/2
                  Qsj z 2 ≤ |λ2 |s ∑ C j wk wTk C j                    z 2 ≤ |λ2 |s ∑ C j            F
                                                                                                         Cj      F
                                                                                                                     z 2
                                 k>1                                                 k>1
                               1 2          π s
                         ≤ 22 j+1+ cos j           z 2.
                               3 3        2 −1
A Taylor series approximation for the cosine function around 0 shows that, for j, s ≥ 1 and any z,
                                                                              − j)
                                              Qsj z 2 ≤ 22 j+1 e−Ω(s4                z 2.                                  (2.8)
It follows that
                                                                        2j
                                                                                     1/2
                                         Pjs va j = ma j 1 +           ∑ αk (s)C j wk ,
                                                      k=2
                  s  T   −1/2 a j
where αk (s) = λk wk C j v . For k > 1, the k-th Fourier coefficient αk (s) decays exponentially fast
with s, while the first one, ma j , remains constant.


                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                               430
                              H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?

2.4   The spectral shift in action
We can check by direct calculation that

                                                      ma2 = 12 n−c (−3)−θ1 ,

so                            
                              −1                                                                                                          −1
                a1
               v =n     −c
                                 ;              θ1−1 = ma1 = 12 n−c ;                      ma2 = 21 n−c 3−θ1 ≥ e−Ω(ma1 ) .                                (2.9)
                               2
In general,
                                                 θ
                                                              !                                      θ
                                                                                                                        !
                                                  j−1 a j−1                                    j−1 a j−1
                                                Pj−1 v                           ma j−1 1 + Q j−1 v
                              va j =              θ
                                              j−1 a j−1
                                                                  =                                   θ
                                                                                              j−1 a j−1
                                                                                                                            .                            (2.10)
                                           −Pj−1 v                             −ma j−1 1 − Q j−1 v

The averaging operator Pj cannot increase the `∞ -norm; therefore, for any j ≥ 1,

                                   kva j k2 ≤ 2 j/2 kva j k∞ ≤ 2 j/2 kva1 k∞ = 2 j/2+1 n−c .                                                             (2.11)

The proof of this next result features the critically important cancellation of the stationary velocities ma j−1
of the two subgroups at height j − 1.
                                                              − j)
Lemma 2.2. For any j > 1, |ma j | ≤ e−Ω(θ j−1 4                      .

                                                                             j−1 a j−1                             θ
Proof. The stationary distribution for the Markov chain Pj−1 is normal to Q j−1 v ; hence, by (2.4, 2.10),

                                         2j                                                 2j                                        !
                                                                                                                    θj−1 a j−1
                       1        z        }|          {              1             z       }|          {           Q j−1 v
              ma j = j         ( 12 , 1, . . . , 1, 21 )va j =    j
                                                                                ( 12 , 1, . . . , 1, 12 )              θ
                    2 −1                                         2 −1                                                j−1 a j−1
                                                                                                                 −Q j−1 v
                                        2 j−1                2 j−1                                            !
                                                                              θ j−1 a j−1
                      1   z       }|          {z           }|          {     Q j−1   v
                  = j   ( 12 , 1, . . . , 1, 21 , 12 , 1, . . . , 1, 12 , )     θ j−1 a j−1
                   2 −1                                                     −Q j−1 v
                                                                                    2 j−1                2 j−1                                       !
                                                                                                                                  θj−1 a j−1
                                                            1                  z }| { z }| {                                    Q j−1 v
                                                        + j                  ( 0, . . . , 0, 12 , 21 , 0, . . . , 0 )             θ
                                                         2 −1                                                                   j−1 a j−1
                                                                                                                            −Q j−1 v
                          1        
                                         θ j−1 a j−1       θ j−1 a j−1
                                                                       
                  =                    (Q j−1 v )2 j−1 − (Q j−1 v )1 .
                      2 j+1 − 2

The subscripts 1 and 2 j−1 index the coordinate being selected. By (2.11), therefore, kva j−1 k2 ≤ 2( j+1)/2 n−c
and, by (2.8),

                               1                                         1                                                                − j)
                                                                                                              ≤ n2−c e−Ω(θ j−1 4
                                          θj−1 a j−1                                   θ
                                                                                       j−1 a j−1
                 ma j ≤                 Q j−1 v              ≤                      Q j−1 v                                                      .
                           2j −1                         ∞        2j −1                                   2


                      T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                                                             431
                                                    B ERNARD C HAZELLE

The cancellations of the two copies of ma j−1 in the computation of ma j has the effect of making that
Fourier coefficient a linear combination of powers of higher eigenvalues. That part of the spectrum being
exponentially decaying, the corresponding spectral shift implies a similar exponential decay in the next
first Fourier coefficients up the tree T. By Lemma 2.2,
                                                                         − j)
                                                       θ j ≥ eΩ(θ j−1 4         ,                                 (2.12)

for any j > 1. We also note that θ1 > nc by (2.9); therefore, by (2.12),

                                                         θ j > n4 θ j−1 ,                                         (2.13)
                                             p
for any j > 0, with θ0 = 1. Let θ j =              θ j ; by (2.9), θ 1 > 2 and, for j > 1, θ j ≥ 2θ j−1 ; hence

                                                    θ j ≥ θ j ≥ 2  log n ,

which proves Theorem 2.1, provided that the first spectral coordinates ma j never vanish. This is what we
show in the next section. For future use, we state a weaker bound on ma j . By Lemma 2.2, for j > 1,

                                                                 − j)        −Ω(|m−1      −j
                                                                                  a j−1 |4 )
                                          ma j ≤ e−Ω(θ j−1 4            ≤e                     .

By (2.11), |ma j | = |π T va j | ≤ kva j k∞ ≤ kva j k2 < n1−c . It then follows from (2.9) that
                                                       (
                                                            n−c                 if j = 1 ;
                                            ma j <                                                                (2.14)
                                                            n−c ma j−1          if j > 1 .



2.5    Nonvanishing dominant modes
We prove that ma j 6= 0. For j ≥ 1, we define the 2 j -by-2 j−1 matrix1
                                                                           
                                                    θ             1
                                              Fj = Pj j               ⊗ I2 j−1 .
                                                                 −1

                                                        θ
We form Fj by subtracting the right half of Pj j from its left half:

                                                       θ         θ 
                                           (Fj )k,l = Pj j k,l − Pj j k,l+2 j−1 .

By (2.10), for j > 1,
                                               θ
                                                             !
                                               j−1 a j−1                                  2
                        θ           θ        Pj−1 v                      θ
                                                                                                    
                       Pj j va j = Pj j        θ
                                              j−1 a j−1
                                                                        j−1 a j−1
                                                                 = Fj Pj−1 v      =          ∏ Fi P1θ va .
                                                                                                        1   1
                                                                                                                  (2.15)
                                           −Pj−1 v                                           i= j

   1 Recall that ⊗ denotes the Kronecker product between two matrices.



                      T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                      432
                           H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?

Note that indices run down, as the products are not commutative. By (2.4), for j > 1,
                                                   2j                             !
                                                                        θ
                                                                      j−1 a j−1
                                 1        z
                                           1
                                                  }|         {
                                                             1      Pj−1 v
                        ma j = j         ( 2 , 1, . . . , 1, 2 )            θ
                              2 −1                                    j−1 a j−1
                                                                   −Pj−1 v
                                                    2j                θ j−1 a j−1
                                                                                   !
                                   1         z }| {                  Pj−1   v
                             =             ( 1, 0, . . . , 0, 1 )      θ j−1 a j−1
                               2(1 − 2 j )                         −Pj−1    v
                                                                                                       (2.16)
                                                                      ∏2i= j−1 Fi P1 1 va1
                                                                                      θ
                                                    2j
                                   1         z }| {                                               
                             =             ( 1, 0, . . . , 0, 1 )                                 
                               2(1 − 2 j )                                                      
                                                                    − ∏2i= j−1 Fi P1θ1 va1
                                   1             2      
                                           T
                             =            z            Fi P1θ1 va1 ,
                               2(1 − 2 j ) j−1,1 i=∏
                                                   j−1

where ∏2i= j−1 Fi = 1 if j = 2 (the product running down from i to 2 makes sense only for j ≥ 3) and

                                                                   2j
                                                     z          }|        {
                                           z j,k , ( 1, 0, . . . , 0, (−1)k )T .
                                                                                                       θ
We now look more closely at the structure of Fj , going back to the spectral decomposition of Pj j . By (2.7),
for j ≥ 1,                    ( θ                     θ
                                 Pj j = 12 j π T + Q j j ,
                                   θ         j                                                         (2.17)
                                 Q j j = ∑2k=2 µ j,k u j,k (u j,k − 12 z j,k−1 )T ,
where, for clarity, we subscript 1 to indicate its dimension; for any j ≥ 1 and 1 < k ≤ 2 j ,
                            1 2       π(k − 1) θ j
        
                      ε
         µ j,k = j j,k
                              + cos j               , with ε j,k = 2 if k < 2 j and ε j,2 j = 1 ;
                    2   − 1  3   3       2   − 1                                                           (2.18)
                                                           j
         u j,k = 1, cos π(k − 1) , . . . , cos π(k − 1)(2 − 2) , (−1)k−1 ∈ R2 j .
                                                                            T
        
                             2j −1                   2j −1
                                                                                                       θ
Our algebraic approach requires bounds on eigenvalue gaps and on the Frobenius norm of Q j j . Note that
|µ j,k | < 1 for all j ≥ 1 and k ≥ 2. We need much tighter bounds. Recall that n is assumed large enough
and define µ0,2 = 1 for notational convenience.
                                                                        θ                    1.5
Lemma 2.3. For any j ≥ 1, both µ j,2 /µ nj−1,2 and Q j j F are less than e−n ; for j > 1 and k > 2, so
is the ratio µ j,k /µ j,2 .
                                     θ                                                                         4
Proof. We leave the bound on kQ j j kF for last. If j = 1, then µ j,2 = (−3)−θ1 and, by (2.9), |µ j,2 | < e−n .
Since µ0,2 = 1, this proves the first upper bound for j = 1. Suppose now that j > 1. For 2 ≤ k ≤ 2 j ,

                                                 π(k − 1)               π
                                   1 + 2 cos       j
                                                          ≤ 1 + 2 cos j   .
                                                  2 −1               2 −1

                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                433
                                                  B ERNARD C HAZELLE

In view of the fact that j ≤ log n and, by (2.13), θ j > n4 , for all k ≥ 2,
                                                                            − j)       1.7
                                  |µ j,k | ≤ |µ j,2 | ≤ O(2− j )e−Ω(θ j 4          < e−n .                    (2.19)

By (2.13),
                                       − j)             4   − j)             2               1.5
                    |µ j,2 | ≤ e−Ω(θ j 4      ≤ e−Ω(n θ j−1 4       ≤ e−Ω(n θ j−1 ) < e−n |µ j−1,2 |n .
The last inequality follows from the fact that n is large enough and that, with j > 1, it follows from (2.18)
that 21− j 3−θ j−1 ≤ |µ j−1,2 | < 1. To bound the ratio |µ j,k /µ j,2 | for j > 1 and k > 2, we begin with the case
                                      3
 j = 2 and verify directly that e−n is a valid upper bound. Indeed,
                                              
                                                  2 θ2 +1
                                              ( 3 )
                                                                      if k = 2 ;
                                        µ2,k = 0                       if k = 3 ;
                                                         1
                                                      θ     θ
                                                (−1) 2 ( 3 ) 2  +1     if k = 4 .

Assume now that j, k > 2. Then
                                                        π(k − 1)             2π
                                  −1 ≤ 1 + 2 cos                 ≤ 1 + 2 cos j   .
                                                         2j −1              2 −1
Since 1 + 2 cos 22π
                 j −1 > 1,

                                                  π(k − 1)             2π
                                     1 + 2 cos      j
                                                           ≤ 1 + 2 cos j   ;
                                                   2 −1               2 −1
therefore,
                                                !θ j
                 µ j,k       1 + 2 cos 22π
                                        j −1
                                                                      π     θ j            −j      1.5
                       ≤                               = 2 cos       j
                                                                         − 1      = e−Ω(θ j 4 ) < e−n .
                 µ j,2       1 + 2 cos 2 jπ−1                       2 −1

For all j ≥ 1, by (2.17, 2.19), the submultiplicativity of the Frobenius norm and the triangle inequality,
                     2j
                                                      kz j,k−1 k2
                                                                 
                                                                                                  1.7   1.5
                                                                    ≤ 2O( j) |µ j,2 | ≤ 2O( j) e−n < e−n .
          θ
         Qj j F ≤   ∑ |µ j,k | × ku j,k k2 ku j,k k2 + 2
                    k=2

                                                                θ
     For j > 1, we express Fj , the “folded” half of Pj j , by subtracting the lower half of u j,k − (1/2)z j,k−1
from its upper half, forming
                                                                         z j−1,k
                                     w j−1,k , (ξ1 , . . . , ξ2 j−1 )T −         ,                       (2.20)
                                                                            2
where
                                    π(k − 1)(l − 1)              π(k − 1)(2 j−1 + l − 1)
                         ξl = cos                     −    cos                           .
                                         2j −1                             2j −1
It follows from (2.6, 2.17) that, for j > 1,
                                                                2       j
                                               1            T
                                  Fj =                1 j z   + ∑ µ j,k u j,k wTj−1,k .
                                           2(1 − 2 j ) 2 j−1,1 k=2
                                                                                                              (2.21)


                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                   434
                              H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?

To tackle the formidable product ∏i Fi in (2.15), we begin with an approximation ∏i Gi , where
                                                  1
                                       Gj =              1 j zT   + µ j,2 u j,2 wTj−1,2 .             (2.22)
                                              2(1 − 2 j ) 2 j−1,1
Setting k = 2, we find that
                                                       π                 π(2 j − 2)     T
                                 u j,2 = 1, cos             , . . . , cos            , −1   .
                                                      2j −1                2j −1
For 0 ≤ l < 2 j−1 ,
                                                 πl         π(2 j − l − 1)
                                         cos               + cos                  = 0.
                                               2j −1             2j −1
This extends to the case j = 1, so that, for any j ≥ 1, we can highlight the symmetry of u j,2 :
                              (
                                u j,2 = ( u1 , . . . , u2 j−1 , −u2 j−1 , . . . , −u1 )T ;
                                                                                                      (2.23)
                                ul = cos π(l−1)
                                            2 j −1
                                                       .

For k = 2, we simplify ξl into
                                                        π(l − 1)       π(l − 1/2)
                                         ξl = cos         j
                                                                 + sin            ,
                                                         2 −1            2j −1
for 1 ≤ l ≤ 2 j−1 , which shows that ξl = ξ2 j−1 +1−l ; therefore, for j > 1,
                                                                                    T
                             w j−1,2 = ( w1 , . . . , w2 j−2 , w2 j−2 , . . . , w1 ) ;
                            
                              w1 = 21 + sin 2π/2
                                               j −1 ;                                                 (2.24)
                                                          π(l− 1 )
                                       π(l−1)
                                                                         (1 < l ≤ 2 j−2 ) .
                            
                              wl = cos 2 j −1 + sin 2 j −12

By (2.22), for j > 2,
                                   2             2     n                                      o
                                 ∏ Gi = ∏                     1
                                                                   1 zT + µi,2 ui,2 wTi−1,2
                                                           2(1−2i ) 2i i−1,1
                                                                                                  .   (2.25)
                                i= j−1         i= j−1

Expanding this product is greatly simplified by observing that, by (2.23, 2.24), for any j ≥ 1,
                        
                            T             T
                         z j,1 12 j = w j,2 u j,2 = 0 ;
                        
                           zTj,1 u j,2 = 2 ;                                                          (2.26)
                        
                         T
                           w j,2 12 j , γ j , where 2 j−1 − 1 < γ j < 2 j+1 − 1 .

To prove the bounds on γ j , we rely on (2.24),
                                                 2 j−1 
                                                              π(l − 1)        π(l − 1/2) 
                                γ j = −1 + 2 ∑ cos                      + sin              ,
                                                 l=1          2 j+1 − 1        2 j+1 − 1

and the fact that
                                   π(l − 1) π                           π(l − 1/2) π
                                             ≤                and                  < ,
                                   2 j+1 − 1   3                         2 j+1 − 1  2

                       T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                         435
                                                    B ERNARD C HAZELLE


                                                         j−2
                                     1z


                                    μuw
                                                 j −1            j −3                       2



         Figure 3: If j is odd, the word A is of the form z(µuw)(1z)(µuw)(1z) · · · (µuw)P1θ1 va1 .


from which the two inequalities in (2.26) follow readily. By (2.25), for j > 2,
                     !                                                           !
                2                             2 n                               o
      T                 θ1 a1       T                       T                T
                                                     1
     z j−1,1 ∏ Gi P1 v = z j−1,1 ∏ 2(1−2i ) 12i zi−1,1 + µi,2 ui,2 wi−1,2          P1θ1 va1 .                                   (2.27)
                  i= j−1                                i= j−1

If we drop all sub/superscripts and expand the scalar expression above, we find a sum of 2 j−2 words
za j−1 · · · a2 P1θ1 va1 , where each ai is of the form µuw or 1z (suitably scaled). By (2.26), however, the only
nonzero word is of the form

                                          A = z(µuw)(1z)(µuw)(1z) · · · P1θ1 va1 .

This necessitates distinguishing between even and odd values of j.

Case I. (odd j > 2: Figure 3) It follows from (2.26) that
                       !
                     2                                                       3    n                                         o
        zTj−1,1     ∏ Gi         = zTj−1,1 µ j−1,2 u j−1,2 wTj−2,2          ∏            1
                                                                                              1 zT µ         u     wT
                                                                                      2(1−2i ) 2i i−1,1 i−1,2 i−1,2 i−2,2
                   i= j−1                                                odd i= j−2
                                                            3        n                          o
                                 = 2µ j−1,2 wTj−2,2        ∏              1
                                                                             1 µ      wT
                                                                         1−2i 2i i−1,2 i−2,2
                                                                                                    = α odd T
                                                                                                        j w1,2 ,
                                                        odd i= j−2


where
                                                                 3
                                                                     γi µi+1,2
                            α odd
                              j   = 2(−1)( j+1)/2 µ2,2          ∏       i
                                                                               .                                                (2.28)
                                                           odd i= j−2 2 − 1

One must verify separately that this also holds for the case j = 3, where, by convention, the product
∏i evaluates
         √   to 1 because the indices do not go down. Recall that, by (2.9, 2.24), w1,2 = (1, 1)T and
kva1 k2 = 5 n−c . By Lemma 2.3 and the submultiplicativity of the Frobenius norm,
                                                                                                    1.5
                                    wT1,2 Qθ11 va1 ≤ w1,2 2 Qθ11 F va1 2 < e−n .

By (2.17), it follows that
                                                                                           1.5 
                             wT1,2 P1θ1 va1 = wT1,2 12 π T + Qθ11 va1 = va11 + va21 ± O e−n
                                                                 
                                                                                                                                (2.29)

                         T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                                 436
                               H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?


                                                           j−2                                                    2
                                     1z


                                     μuw
                                                j −1               j −3



             Figure 4: If j is even, the word A is of the form z(µuw)(1z)(µuw) · · · (1z)P1θ1 va1 .

and
                                                                   !
                                                       2
                                  A = zTj−1,1         ∏ Gi P1θ va = α odd T
                                                                          1   θ a
                                                                      j w1,2 P1 v
                                                                                  1                               1        1

                                                                                                                                             (2.30)
                                                   i= j−1
                                             a1   a1                                      −n1.5
                                     = α odd                odd
                                                                                                  
                                         j (v1 + v2 ) ± O α j e                                       .


Case II. (even j > 2: Figure 4)!
                              2                                3    n                                                                    o
                  zTj−1,1                 = zTj−1,1                     µi,2 ui,2 wTi−1,2                     1
                                                                                                                           12i−1 zTi−2,1
                                                                                                                          
                             ∏ Gi                          ∏                                              2(1−2i−1 )
                            i= j−1                    odd i= j−1
                                                               3    n                                                 o
                                          = zTj−1,1                           1
                                                                                                µi,2 ui,2 γi−1 zTi−2,1 = β j zT1,1 ,
                                                                                            
                                                           ∏              2(1−2i−1 )
                                                      odd i= j−1
where
                                                                              3
                                                                                  γi−1 µi,2
                                            β j = (−1) j/2+1               ∏       i−1 − 1
                                                                                            .
                                                                       odd i= j−1 2
It follows that                                                     !
                                                           2
                                     A = zTj−1,1        ∏ Gi P1θ va = β j zT1,1 P1θ va .
                                                                              1       1                       1       1

                                                       i= j−1
By (2.17),
                        zT1,1 P1θ1 va1 = zT1,1 12 π T + Qθ11 va1 = zT1,1 Qθ11 va1 = µ1,2 va11 − va21 ;
                                                                                                   
                                                                                                                                             (2.31)
therefore,
                                                   A = α even va11 − va21 ,
                                                                         
                                                         j                                                                                   (2.32)
where
                                                                                      3
                                                                                         γi−1 µi,2
                                       α even
                                         j    = (−1) j/2+1 µ1,2                   ∏       i−1 − 1
                                                                                                   .                                         (2.33)
                                                                              odd i= j−1 2
This concludes the case analysis. Next, we still assume that j > 2 but we remove all restriction on parity.
Recall that Gi is only an approximation of Fi and, instead of (2.27), we must contend with
                     !                                                                     !
                2                          2 n                         2i                 o
     zTj−1,1 ∏ Fi P1θ1 va1 = zTj−1,1 ∏ 2(1−2        1          T                    T
                                                      i ) 12i zi−1,1 + ∑ µi,k ui,k wi−1,k    P1θ1 va1 . (2.34)
               i= j−1                                 i= j−1                                          k=2


                        T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                                               437
                                                 B ERNARD C HAZELLE

If, again, we look at the expansion of the product as a sum of words
                                               B = za j−1 · · · a2 P1θ1 va1 ,
then we see that each B-word is the form
                              z(µuw){1z, µuw}{1z, µuw}{1z, µuw} · · · P1θ1 va1 ,
where µ, u, w are now indexed by k. Recall that previously the only word was of the form
                                       A = z(µuw)(1z)(µuw)(1z) · · · P1θ1 va1 .
We show that |A| always dominates |B|.
Lemma 2.4. For any 2 < j ≤ log n,
                              !                      (
                                2
                                                      (1 + εn )(va11 + va21 )α odd
                                                                                j            if j is odd;
                    zTj−1,1    ∏ Fi       P1θ1 va1 =
                              i= j−1                  (1 + εn0 )(va11 − va21 )α even
                                                                                j            else,

where εn , εn0 are reals of absolute value O(e−n ).

Proof. Note that, by (2.26), γi > 2i−1 − 1 for any i ≥ 1. Also, by (2.9), va11 + va21 = n−c and va21 − va11 =
3n−c . It follows from (2.28, 2.30, 2.32, 2.33) that, for any 2 < j ≤ log n,
                                 !                        (
                            2                    1 c+1 |µ µ · · · µ
                  T                  θ1 a1                    2,2 4,2          j−1,2 | if j is odd;
           |A| = z j−1,1 ∏ Gi P1 v ≥                                                                  (2.35)
                         i= j−1                   n         |µ1,2 µ3,2 · · · µ j−1,2 | else.
Let’s extend our notation by defining, for i > 1,
                                        
                                                   1      i −1
                                         µi,1 = 2 (1 − 2 ) ;
                                        
                                           ui,1 = 12i ;
                                        
                                           wi−1,1 = zi−1,1 .
                                        

Then, any B-word is specified by an index vector (k j−1 , . . . , k2 ):
                                                              2                     
                               Bk j−1 ,...,k2 = wTj−1,1       ∏ µi,k ui,k wTi−1,k P1θ va .
                                                                         i   i   i
                                                                                         1   1

                                                          i= j−1

Observe that the A-word we considered earlier is a particular B-word, i. e.,
                                                    A = B 2,1,2,1,. . . .
                                                          | {z }
                                                                   j−2


Since we wish to show that all the other B-words are considerably smaller, we may ignore the settings of
ki that make a B-word vanish. All the conditions on the index vector are summarized here:
                                      
                                                      i
                                       1 ≤ ki ≤ 2 ;
                                      
                                         k j−1 6= 1 ;                                             (2.36)
                                      
                                         ki ki−1 6= 1 (2 < i < j) .
                                      


                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                               438
                          H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?


                              i = j −1 j − 2         j−4                                 2




                              2i

Figure 5: The top horizontal line represents ki = 1. The white dots below the line correspond to ki = 2.
The B-word in white is brought into canonical form (black jagged line) by setting all the indices ki > 2
to 2. This cannot cause the magnitude of B to drop. We may also assume that the end result is not the
A-word, as this would cause an exponential growth in line with the lemma.


By (2.18, 2.20), for all i > 1 and k ≥ 1, kui,k k2 ≤ 2i/2 and for i, k ≥ 1, kwi,k k2 ≤ 2i/2+O(1) ; so, by
Cauchy-Schwarz, for i > 2 and k, l ≥ 1,

                                             wTi−1,k ui−1,l ≤ 2i+O(1) .

Since 2 < j ≤ log n,
                                                         2
                                   wTj−1,1 u j−1,k j−1 ∏ wTi,ki+1 ui,ki < n2 log n ;
                                                     i= j−2

therefore,
                                                             2
                             Bk j−1 ,...,k2 ≤ n2 log n    ∏ µi,k × wT1,k P1θ va .
                                                                    i            2
                                                                                     1   1
                                                                                                  (2.37)
                                                         i= j−1

We prove that all B-words are much smaller than A in absolute value.
Lemma 2.5. All B-words distinct from A satisfy:
                                                                    1.2
                                              Bk j−1 ,...,k2 < e−n |A| .

Proof. Since P1 is stochastic, by (2.9),

                                wT1,k2 P1θ1 va1 = O(kva1 k∞ ) = O(n−c ) < 1 ,

and the upper bound (2.37) becomes

                                                                        2
                                        Bk j−1 ,...,k2 ≤ n2 log n    ∏ µi,k .i                    (2.38)
                                                                    i= j−1


                    T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                         439
                                                     B ERNARD C HAZELLE



                                    i = j −1                                 a                     2




                                                                    a +1           a −1


Figure 6: We trace the index vectors of the A and B-words from left to right until they diverge (i = a). In
this case, j is odd and the index vector of the B-word is (2, 1, 2, 1, 2, 2, 1, 2, 2).


To maximize the right-hand side of (2.38), by Lemma 2.3, we may replace any instance of ki > 2 by
ki = 2 (Figure 5). This does not contradict conditions (2.36) since no index is set to 1. Note the importance
for this step of having all vectorial presence removed in (2.38). We may assume that the new B-word is
not A, so its index vector is not of the form (2, 1, 2, 1, . . .). Indeed, if we ended up with this pattern, and
hence with A, obviously at least one index replacement would have taken place. By Lemma 2.3, any such
                                                                       1.5
replacement would cause an increase by a factor of at least en and Lemma 2.5 would follow. So, we
may assume now that ki ∈ {1, 2} and

                                          (k j−1 , k j−2 , . . . , k2 ) 6= (2, 1, 2, 1, . . .) .

Scan the string (k j−1 , . . . , k2 ) against (2, 1, 2, 1, . . .) from left to right and let ka be the first character that
differs (Figure 6). By (2.36), k j−1 = 2, so 2 ≤ a ≤ j − 2; hence j > 3; since we cannot have consecutive
ones, ka = 2 and j − a is even. By (2.35, 2.38) and Lemma 2.3,

               |Bk j−1 ,...,k2 |                    |µ j−1,2 µ j−2,1 µ j−3,2 · · · µa+1,2 µa,2 µa−1,ka−1 · · · µ2,k2 |
                                 ≤ (nc+1 n2 log n )
                    |A|                                    |µ j−1,2 µ j−3,2 · · · µa+1,2 µa−1,2 µa−3,2 · · · |
                                            |µ j−2,1 µ j−4,1 · · · µa+2,1 µa,2 µa−1,ka−1 · · · µ2,k2 |
                                 ≤ n3 log n                                                            .
                                                             |µa−1,2 µa−3,2 · · · |
The first numerator mirrors the index vector of the B-word accurately. For the denominator, however,
we use the lower bound of (2.35). The reason we can afford such a loose estimate is the presence of the
factor µa,2 , which plays the central role in the calculation by drowning out all the other differences. Here
are the details. All the coefficients µ are less than 1 and, by Lemma 2.3, |µa−1,2 | ≤ |µa−l,2 |; therefore,
                          |Bk j−1 ,...,k2 |            |µa,2 |          |µa,2 |                 1.5
                                            ≤ n3 log n log n < n3 log n n         < n3 log n e−n .
                               |A|                    |µ       |       |µa−1,2  |
                                                         a−1,2

which proves Lemma 2.5.

    There are fewer than nlog n B-words; so, by Lemma 2.5, their total contribution amounts to at most a
                   1.2
fraction nlog n e−n of |A|. In other words, by (2.34), for j > 2,
                                    !                                       !
                                   2                                                           2
                      zTj−1,1     ∏ Fi P1θ1 va1 = (1 ± O(e−n ))zTj−1,1                       ∏ Gi P1θ va ,
                                                                                                       1    1

                                 i= j−1                                                     i= j−1


                       T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                          440
                           H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?

                                      2/3       1/3           1/3


                                      1/3       1/3           2/3



                                                      2−1/α




                                            α


Figure 7: Birds join in flocks of size 2, 4, 8, etc., each time flying in a direction closer to the Y -axis. The
angle decreases exponentially at each level. The big arrow indicates the Markov chain corresponding to a
four-bird flock.


and the proof of Lemma 2.4 follows from (2.9, 2.30, 2.32).

Recall from (2.16) that, for j > 1,
                                                                        !
                                                                    2
                                            1
                                 ma j =            zT           ∏ Fi P1θ va .
                                                                            1   1
                                        2(1 − 2 j ) j−1,1      i= j−1


We know from (2.26, 2.28, 2.33) that neither α evenj    nor α odd
                                                              j   is null. By Lemma 2.4, it then follows that
the first Fourier coefficient ma j never vanishes for j > 2. By (2.9), this is also the case for j = 1, 2, so the
proof of Theorem 2.1 is complete.


3    A lower bound on the flocking time
Theorem 2.1 sets the grounds for a lower bound of 2  log cn (for constant c) on the convergence time of
a flock of birds. The idea is to lift the previous construction to higher dimension and interpret railroad
cars as birds and trains as flocks. While checking that the convergence time matches the characteristic
time of the corresponding consensus system is easy, to verify the integrity of the flocks takes some effort.
     The n birds all start from the X-axis and fly in the (X,Y )-plane, merging in twos, fours, eights,
etc., in a pattern mirroring the tree T of the previous section (Figure 7). The horizontal motion slows
down drastically as time goes on, which creates a flocking time of 2  Θ(log n). The upper bound
in [2] tolerates a small amount of noise in the model, which we can use to simplify our construction:
specifically, any given flock is allowed a single velocity flip, meaning that the vector va associated with
node a becomes −va . The stochastic matrix used for flocking (1.1) is of the form Pj , as defined in (2.2),
when applied to a flock of 2 j birds. The projection of the birds on the X-axis corresponds to the railroad
car system discussed in the introduction: the springs of length 1 match the distance threshold for the

                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                441
                                             B ERNARD C HAZELLE

flocking network. Let x(t) denote the vector (x1 (t), . . . , xn (t)) of bird positions from left to right, and
write v(t) = x(t) − x(t − 1).
                                                                                         T
                       x(0) = 0, 2/3, 2, 8/3, . . . , 2l, 2l + 2/3, . . . , n − 2, n − 4/3 .

Let t1 = 0 and t j = t j−1 + θ j−1 , for j > 1. The velocity vector satisfies v(t1 ) = va1 as set in (2.9) and
v(1) = P1 va1 ; in general, for t > 0,
                                                           t−1
                                           x(t) = x(0) + ∑ P1s v(1) ,
                                                           s=0

hence                               t−1                    t−1  −c 
                       x1 (t)   x (0)                      2 0          n
                              = 1      + ∑ P1s (P1 va1 ) =      + ∑ P1s     .
                       x2 (t)   x2 (0)   s=0               3  1   s=0
                                                                         0
Diagonalizing P1 shows that, for any integer s > 0,
                                                      
                            s    1 1           1   −s   1     
                          P1 =           1 1 + (−3)        1 −1 .
                                 2 1              2     −1

It follows that, for 0 = t1 < t ≤ t2 ,
                (
                  x1 (t) = 2t n−c + 21 n−c ∑t−1
                                             s=0 (−3)
                                                       −s = 1 n−c (t + 3 + 1 (−3)1−t ) ;
                                                            2           4  4                                (3.1)
                  x2 (t) = 23 + 2t n−c − 21 n−c ∑t−1
                                                  s=0 (−3) −s = 2 + 1 n−c (t − 3 − 1 (−3)1−t ) .
                                                                3     2        4   4

Let Bi denote the bird associated with the i-th leaf of T. Note that B1 always stays to the left of B2 and
their distance is                                               t 
                                               2 3 −c               1
                              x2 (t) − x1 (t) = − n         1− −          .                           (3.2)
                                               3 4                  3
Left to their own devices, the two birds would slide to the right at speed ma1 , plus or minus an exponentially
vanishing term; their distance would oscillate around 2/3 − (3/4)n−c and converge exponentially fast,
with the oscillation created by the negative eigenvalue. This is what happens until the flock at a1 begins
to interact with its “sibling” flock to the right, (B3 , B4 ). The latter’s velocity vector is (−n−c , 0)T at time
t = 1 and, for t1 < t ≤ t2 ,      (
                                    x3 (t) = 2 − 21 n−c (t + 34 + 41 (−3)1−t ) ;
                                                                                                             (3.3)
                                    x4 (t) = 83 − 21 n−c (t − 34 − 14 (−3)1−t ) .
The stationary velocity of (B3 , B4 ) is −ma1 = −(1/2)n−c , but the flock is not the mirror image of
(B1 , B2 ), a situation that would destroy the lower bound construction. In particular, note that the diameter
of the flock is                                                  t 
                                                   2 3 −c             1
                                  x4 (t) − x3 (t) = + n      1− −            ,                           (3.4)
                                                   3 4                3
which always exceeds that of (B1 , B2 ) for all t > 0. The diameters of both flocks oscillate around 2/3
but in phase opposition: indeed, their sum remains constant. Both two-bird flocks drift toward each other

                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                442
                           H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?


                                                                                t3
                                                       flip

                                                                        θ2


                                   a2                                           t2
                                   a1


                            Figure 8: A four-bird flock flipping at time t2 + n f .


at distance x3 (t) − x2 (t) = 4/3 − tn−c . (The linearity in t is due to an accidental cancellation that will not
occur for bigger flocks.) This implies that t2 = t1 + θ1 = d(1/3)nc e. The two flocks link up at time t2 , and

                                        1 − n−c < x3 (t2 ) − x2 (t2 ) ≤ 1 .                                (3.5)

     Two issues arise: which way does the four-bird flock move and does it stay in one piece? Indeed, if
the distance between B2 and B3 is too close to 1, can’t it jump back up above 1 to cause the breakup of
the flock? The issue of flock integrity deserves a separate treatment but, in this simple case, we easily
verify it. We begin with the motion of the four-bird flock. At time t2 , the flock at a2 is formed with the
initial velocity
                                                −c  
                                             θ1 −n
                                                                          1 + (−3)1−θ1
                                                                                       
                        t2 −t1 a   P1 2n−c 
                          P1 v 1                          1 −c  1 − (−3)1−θ1 
                 va2 =
                                        
                            t2 −t1 a1 =         −c  = 2 n −1 − (−3)1−θ1  .                      (3.6)
                                                                                     
                        −P1 v            θ −n           
                                          −P1 1                         −1 + (−3)1−θ1
                                                  2n−c

By (2.4), the stationary velocity for the four-bird flock is
                                                                                     c
                                ma2 = 31 ( 12 , 1, 1, 12 )va2 = − 12 n−c (−3)−dn /3e .

By Lemma 2.4, for j > 2,
                                              (
                                    1          (1 + εn )(va11 + va21 )α odd
                                                                         j           if j is odd;
                         ma j =                                                                            (3.7)
                                2(1 − 2 j )    (1 + εn0 )(va11 − va21 )α even
                                                                         j           else.

It is imperative that ma j > 0. If it is negative, therefore, we must flip the velocity at a j instead of its
sibling (the right child of its parent). The previous analysis holds as long as exactly one child “flips”
irrespective of which one. Prior to flipping, the flocks of any siblings are translated copies of each other.

                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                443
                                              B ERNARD C HAZELLE

We amend the flipping rule so that the left (resp. right) sibling flips if the flocks move left (resp. right) so
as to set up a head-on collision. (We refer the reader to [2] for a detailed explanation of why flipping
conforms the noisy flocking model with the matching upper bound.)
    We now turn to the issue of flock integrity. Instead of negating one of the siblings’ velocity vector at
time t j , we wait an extra n f steps, for some large enough constant f : the goal is to give the flock enough
time to stabilize so it does not break apart because of the flip. By straightforward diagonalization, we find
that, for any integer s > 0,
                                                                    
              1                             2                             1
                                          1  (1, 1, −1, −1) + (−3)−s −1 (1, −2, 2, −1) ;
          1   1                1  2 s                       1
     P2s = 
                                                                       
                 (1, 2, 2, 1) +                                                                               (3.8)
          6 1
                              6 3       −1                 6         1
              1                            −2                            −1

                                                2 −1 s+1 a2
therefore, it follows from x(t) = x(t2 ) + ∑t−t
                                            s=0     P2 v that, by (3.6),
                                            
  
   x1 (t)
          
            x1 (t2 )
                                 1         11
   ..   ..                  1 1 −c  5 
   .  =  .  + ma2 (t − t2 ) 
                                1 + 8 n  −5 
                                             
   x4 (t)   x4 (t2 )              1        −11
                                                                                        
                                                                       −2                 −1
                                                         2 t−t2 +1 −1 1
                                                  + n−c                + n−c (−3)t2 −t  1  . (3.9)
                                                                                            
                                                          3            1  24           −1
                                                                        2                  1

It follows from (3.2, 3.4) that, for t > t2 , both x2 (t) − x1 (t) and x4 (t) − x3 (t) are 2/3 ± O(n−c ); therefore,
the two end edges of the four-bird flock are safe, which we define as being of length less than 1 (so as to
belong to the flocking network) but greater than 1/2 (so as to avoid edges joining nonconsecutive birds).
The middle edge between B2 and B3 is more problematic. Its length is
                                                                           t−t2               !
                                                       1 −c                 2
                x3 (t) − x2 (t) = x3 (t2 ) − x2 (t2 ) − n        15 − 16             + (−3)t2 −t .
                                                       12                   3

We can verify that
                                                t−t2
                                                2
                                       15 − 16         + (−3)t2 −t ≥ 0 ,
                                                3
for all t > t2 , which, by (3.5), shows that the distance between the two middle birds stays is 1 − Θ(n−c ),
which is also 1 − |ma1 |. This points to a general principle crucial to the integrity of the flocks: when
two of them meet, the new edge’s length stays away from 1 (the breaking point) at the next step by a
margin roughly equal to the stationary velocity of the colliding flocks. This new stationary velocity is
exponentially smaller than the previous one, so local vibrations are insufficient to make edges unsafe
(either bigger than 1 or less than 1/2). Because of the polynomial-time mixing of the Markov chains,
delaying flips by n f steps ensures that individual velocities are roughly equal to the new stationary

                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                  444
                            H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?

velocity, so changing their signs cannot jeopardize edge safety. We flesh out the details in the next lemma,
which concludes the lower bound proof.                                                                   


Lemma 3.1. Any two adjacent birds within the same flock lie at a distance between 0.58 and 1. This
holds over the entire lifetime of the flock, whether it flips or not.

Proof. For notational convenience, put ma0 = (1/4)n−5 and define h(i) as the height of the nearest
common ancestor of the two leaves associated with birds Bi and Bi+1 ; e. g., h(1) = 1 and h(2) = 2. We
prove by induction on j that, for any 1 ≤ j < log n, t j ≤ t ≤ t j+1 , and 1 ≤ i < 2 j ,

              1 − 53 (n5 + jn4 ) mah(i)−1 ≤ DISTt (Bi , Bi+1 )
                                            (
                                             1                                     if i = 2 j−1 and t = t j ;   (3.10)
                                          ≤
                                             1 − 14 (1 − nj ) mah(i)−1             else.

Recall that a0 , a1 , etc., constitute the left spine of the merge tree T. By (2.14), the upper and lower bounds
above fall between 0.58 and 1, so satisfying them implies the integrity of the flocks along the spine:
indeed, the upper bound ensures the existence of the desired edges, while the lower bound, being greater
than 1/2, rules out edges between nonconsecutive birds. Before we proceed with the proof, we should
explain why the upper bound of (3.10) distinguishes between two cases. In general, once two consecutive
birds are joined in a flock, they stay forever at a distance strictly less than 1. There is only one exception
to this rule: at the time t when they join, the only assurance we can give is that their distance does not
exceed 1; it could actually be equal to 1, hence the difficulty of a nontrivial upper bound when t = t j and
i = 2 j−1 .
     Relations (3.10) show that the time θ j between the formation of the flock at a j and its next collision
at a j+1 is proportional to the reciprocal of the stationary velocity |ma j |; note that choosing c large enough
makes the delay n f inconsequential. This implies that the earlier setting of θ j in (2.5) must now be
understood up to a constant factor, i. e., as θ j = Θ(|m−1    a j |); the previous analysis still holds. For the case
 j = 1, we observe that, for 0 ≤ t ≤ t2 , by (3.2, 3.4),
                               2                                            2
                                 − n−c ≤ x2 (t) − x1 (t) ≤ x4 (t) − x3 (t) ≤ + n−c .
                               3                                            3
Assume now that j ≥ 2. By applying successively (2.8, 2.11, 2.14), we find that
                                                               1− j )                −Ω(n−2 /|ma j−1 |)
                                        ≤ 22 j−1 e−Ω(θ j−1 4
                           θj−1 a j−1
                         Q j−1 v      2
                                                                        kva j−1 k2 ≤ e
                                               −2n−Ω(n−2 /|ma j−1 |)
                                        ≤e                               < ma j−1 e−2n .

By (2.10),                                     !                                               !
                                   θ
                                   j−1 a j−1                                     θ j−1 a j−1
                                 Pj−1 v                           1               Q j−1  v
                   va j = ±                        = ma j−1           ⊗ 12 j−1 ±    θ j−1 a j−1 .
                                   θ
                                   j−1 a j−1
                                −Pj−1 v                          −1              −Q j−1  v
The ± leaves open the possibility of a flip of either type (left or right sibling) before the 2 j−1 -bird flocks
join at time t j . As we saw earlier, the choice of type ensures that the flock with the lower-indexed birds

                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                     445
                                                  B ERNARD C HAZELLE

drifts to the right while its sibling, with the higher-indexed birds, flies to the left; hence the certainty that,
after flipping, the “fixed” part of the velocity vector va j is of the form |ma j−1 |(1, −1)T ⊗ 12 j−1 . (In fact, to
achieve this is the sole purpose of flipping.) It follows that
                                       
                                         1
                        aj
                       v = ma j−1             ⊗ 12 j−1 + ζ ,     with kζ k2 < ma j−1 e−n .                    (3.11)
                                        −1

For 1 ≤ i < 2 j , define
                                                           i
                                                  z }| {
                                           χi = ( 0, . . . , 0, −1, 1, 0, . . . , 0 )T .
                                                  |              {z               }
                                                                     2j
By (2.7), for s ≥ 1,
                                  χiT Pjs va j = ma j χiT 12 j + χiT Qsj va j = χiT Qsj va j ;
hence, for t j < t ≤ t j+1 ,
                                                                               t−t j
                           DISTt (Bi , Bi+1 ) = DISTt j (Bi , Bi+1 ) +          ∑ (−1) f (s) χiT Qsj va ,
                                                                                                       j
                                                                                                             (3.12)
                                                                               s=1

where f (s) = 1 if there is a flip and s > n f , and f (s) = 0 otherwise. Note that there is no risk in using
DISTt (Bi , Bi+1 ), instead of the signed version, xi+1 (t) − xi (t), that birds might cross unnoticed: indeed,
the bound in (2.11) applies to all the velocities, so that distances cannot change by more than O(n1−c )
in one step. This implies that a change of sign for xi+1 (t) − xi (t) would be preceded by the drop of
DISTt (Bi , Bi+1 ) below 1/2 and a violation of (3.10). By Cauchy-Schwarz and (2.8, 3.11),
                            √              √                −j                           2
              |χiT Qsj ζ | ≤ 2 kQsj ζ k2 ≤ 2 22 j+1 e−Ω(s4 ) kζ k2 ≤ O(n2 )e−n−Ω(s/n ) ma j−1 ;

and, since n is assumed large enough, for s ≥ 1,
                                                                 1        −3
                                             χiT Qsj ζ < e− 2 n−sn             ma j−1 .                      (3.13)

Likewise,
                                               √                             2
                                 χiT Qsj va j ≤  2 Qsj va j 2 ≤ n1.45 e−Ω(s/n ) va j 2
                                                            2          √
                                             ≤ n1.45 e−Ω(s/n ) ma j−1 n + ζ 2 .
                                                                                  

For s ≥ 1 and 1 ≤ i < 2 j , by (3.11),
                                                                                       2
                                           χiT Qsj va j ≤ n2 ma j−1 e−Ω(s/n ) .                              (3.14)

Recall that j ≥ 2. To prove (3.10), we distinguish between two cases: whether the birds Bi , Bi+1 are
joined at node a j or earlier.


Case I. (i = 2 j−1 ): The edge (i, i + 1) is created at node a j and h(i) = j, where 2 ≤ j < log n (Figure 9).
We begin with the case t = t j . By construction, the upper bound in (3.10) is equal to 1. To establish the

                       T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                 446
                             H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?




                                                            Bi
                                                                aj




                        Figure 9: The birds Bi and Bi+1 are joined together at time t j .


lower bound, we observe that at time t j − 1 the two middle birds were more than one unit of distance
apart. By the expression of the velocity given in (3.11), which expresses the displacement right before t j ,
neither bird moved by more than (1 + e−n )|ma j−1 | in that one step; therefore,

                                             DISTt j (Bi , Bi+1 ) > 1 − 3 ma j−1 ,                                       (3.15)

which exceeds the lower bound of (3.10), i. e.,
                                                     5
                                                  1 − (n5 + jn4 )|mah(i)−1 | .
                                                     3
Assume now that t j < t ≤ t j+1 . Observe that
                                                     j  2
                                                                  { n 
                                                                       1
                                             z         }|                        o
                                                1                 1
                                         12 j ( 2 , 1, . . . , 1, 2 )    ⊗ 12 j−1 = 0.
                                                                      −1

The i-th row of Pj is the same as the (2 j + 1 − i)-th row read backwards. This type of symmetry is closed
under multiplication, so it is also true of Pjs . By (2.7), for any s ≥ 0, it then follows that
                 n 1                           n 1 
                                                                                                               (s) T
                                     o                            o                                              
                                                                      (s)          (s)       (s)
           Qsj            ⊗ 12 j−1       = Pjs            ⊗ 12 j−1 = b1 , . . . , b2 j−1 , −b2 j−1 , . . . , −b1     .
                   −1                              −1

                                                    (0)
The following recurrence relation holds: bi = 1; for s ≥ 0, we get the identities below for l ≤ 2 j−1 , plus
an antisymmetric set for l > 2 j−1 :
                                (s)            (s)
                               
                               
                                    b1 + 2b2                       if l = 1;
                                (s)           (s)       (s)
                               
                               
                               
                                    bl−1 + bl + bl+1               if 1 < l < 2 j−1 ;
                                      (s)
                                                                    if l = 2 j−1 ;
                               
                    (s+1)    1      b2 j−1 −1
                   bl     =            (s)
                             3
                                  −b2 j−1 −1                       if l = 2 j−1 + 1;
                                       (s)           (s)     (s)
                                   −b2 j +2−l − b2 j +1−l − b2 j −l if 2 j−1 + 1 < l < 2 j ;
                               
                               
                               
                               
                               
                                      (s)       (s)
                                   −b1 − 2b2                        if l = 2 j .
                               


                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                              447
                                                         B ERNARD C HAZELLE

We find by induction that
                                                         (s)                 (s)
                                                     b1 ≥ · · · ≥ b2 j−1 ≥ 3−s ;

therefore,
                                                 n 1                    o
                                                                                (s)
                                    χ2Tj−1 Qsj                    ⊗ 12 j−1 = −2b2 j−1 < −3−s .                                  (3.16)
                                                      −1
Considering that the two middle birds in the flock for a j get attached in the flocking network at time t j ,
DISTt j (Bi , Bi+1 ) ≤ 1. Assume that the flock at a j does not undergo a flip. Then, by (3.11, 3.12, 3.13), for
t j < t ≤ t j+1 ,

                                                 t−t j
               DISTt (Bi , Bi+1 ) ≤ 1 +          ∑ χiT Qsj va           j

                                                 s=1
                                                               t−t j                n 1                      o   t−t j
                                      ≤ 1 + ma j−1 ∑                   χ2Tj−1 Qsj                 ⊗ 12 j−1 + ∑ χ2Tj−1 Qsj ζ
                                                               s=1
                                                                                           −1                      s=1
                                           1
                                      ≤ 1 − ma j−1 + ∑ |χ2Tj−1 Qsj ζ |
                                           3         s≥1
                                                 1                      1     −3
                                      ≤ 1−         ma j−1 + ma j−1 ∑ e− 2 n−sn
                                                 3                 s≥1
                                           1                       1
                                      < 1 − (1 − o(1)) ma j−1 = 1 − (1 − o(1)) mah(i)−1 ,
                                           3                       3

which proves the upper bound in (3.10) for i = 2 j−1 . The negative geometric series we obtain from (3.16)
reflects the “inertia” of the two flocks as they collide and penetrate each other’s zone of influence before
being stabilized. Suppose now that the flock at a j undergoes a flip at time t j + n f . The previous analysis
holds for t j < t ≤ t j + n f ; so assume that t j + n f < t ≤ t j+1 . By (3.14) and h(i) = j,

                 t−t j −n f                          t−t j −n f
                                       f a                                                      −2 +n f −2 )
                              χiT Qs+n                   ∑ n2 ma                      e−Ω(sn
                                                                                                                           
                    ∑              j    v j      ≤                          h(i)−1
                                                                                                               = o mah(i)−1 .
                   s=1                                   s=1

By (3.12), therefore,

                                                                            nf                         t−t j
              DISTt (Bi , Bi+1 ) = DISTt j (Bi , Bi+1 ) +                   ∑     χiT Qsj va j −       ∑ χiT Qsj va        j

                                                                            s=1                     s=n f +1
                                                                                  t−t j −n f
                                                                                                          f
                                    ≤ DISTt j +n f (Bi , Bi+1 ) +                    ∑         χiT Qs+n
                                                                                                    j   va j
                                                                                     s=1
                                         1                                     1
                                    < 1 − (1 − o(1)) mah(i)−1 + o mah(i)−1 < 1 − mah(i)−1 .
                                         3                                      4

This establishes the upper bound in (3.10) for i = 2 j−1 , whether there is a flip or not. We prove the lower

                      T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                                    448
                                H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?




                                                Bi

                                                                   aj
                                                 h(i)




                            Figure 10: The birds Bi and Bi+1 are joined earlier than t j .


bound as follows. By (3.12, 3.14, 3.15), for t j < t ≤ t j+1 ,
                                                                             t−t j
                               DISTt (Bi , Bi+1 ) ≥ 1 − 3 ma j−1           − ∑ χiT Qsj va j
                                                                             s=1
                                                                                                       2
                                                    ≥ 1 − 3 ma j−1 − n2 ma j−1 ∑ e−Ω(s/n )
                                                                                           s≥1
                                                               5                       5
                                                    ≥ 1 − n ma j−1 = 1 − n mah(i)−1 .
Note that this derivation still holds if the flock “flips,” i. e., reverses the sign of Qsj va j . This estab-
lishes (3.10) for i = 2 j−1 .

Case II. (i < 2 j−1 ): This implies that h(i) < j (Figure 10). Recall that j ≥ 2. We omit the case i > 2 j−1 ,
which is treated similarly. The case t = t j follows by induction2 for j0 = j − 1 and t = t j0 +1 . Note that
t 6= t j−1 , so the inductive use of (3.10) does not take us to the upper bound of 1; it actually provides
stronger bounds, as j0 < j. Assuming that t j < t ≤ t j+1 , by (3.12, 3.14),
                DISTt (Bi , Bi+1 ) − DISTt j (Bi , Bi+1 ) ≤          ∑ χiT Qsj va      j

                                                                     s≥1
                                                                                        2
                                                                   ≤ n2 ma j−1 ∑ e−Ω(s/n ) ≤ O n4 ma j−1 .
                                                                                                        
                                                                                     s≥1

We apply (3.10) inductively once more for j0 = j − 1 and t = t j0 +1 :
                                                                                    
              5 5                                                     1     1
           1 − (n + ( j − 1)n ) mah(i)−1 ≤ DISTt j (Bi , Bi+1 ) ≤ 1 −
                             4
                                                                         1 − ( j − 1) mah(i)−1 ;
              3                                                       4     n
hence, for t j < t ≤ t j+1 ,
     5
  1 − (n5 + ( j − 1)n4 ) mah(i)−1 − O n4 ma j−1 ≤ DISTt (Bi , Bi+1 )
                                               
     3                                                               
                                                      1        1
                                                          1 − ( j − 1) mah(i)−1 + O n4 ma j−1 .
                                                                                             
                                                 ≤ 1−
                                                      4        n
   2 Defining the induction invariant over the interval [t ,t
                                                          j j+1 ] instead of (t j ,t j+1 ] shortens the presentation.



                         T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                                       449
                                             B ERNARD C HAZELLE

Because j > h(i), by (2.14), |ma j−1 | < n−c |mah(i)−1 |, for h(i) > 1. In the case h(i) = 1,

                                ma j−1 ≤ ma1 < n−c ≤ 4n−6 ma0 = n−11 ,

with c as usual assumed large enough. This shows that, in all cases,

                                           ma j−1 < 4n−6 mah(i)−1 ;

hence (3.10). Since sums involving velocities are immediately taken with absolute values, the same
derivation can be repeated verbatim in the case of a flip.

Acknowledgments
I wish to thank the referees for their detailed, helpful comments.


References
 [1] A NDREA C AVAGNA , A LESSIO C IMARELLI , I RENE G IARDINA , G IORGIO PARISI , R AF -
     FAELE S ANTAGATI , FABIO S TEFANINI , AND M ASSIMILIANO V IALE : Scale-free correla-
     tions in starling flocks. Proc. National Academy of Sciences, 107(26):11865–11870, 2010.
     [doi:10.1073/pnas.1005766107] 423

 [2] B ERNARD C HAZELLE: The convergence of bird flocking. J. ACM, 61(4/21):1–35, 2014. Preliminary
     version in SoCG’10. [doi:10.1145/2629613, arXiv:0905.4241] 421, 423, 424, 427, 428, 441, 444

 [3] F ELIPE C UCKER AND S TEPHEN S MALE: Emergent behavior in flocks. IEEE Trans. on Automatic
     Control, 52(5):852–862, 2007. [doi:10.1109/TAC.2007.895842] 423

 [4] J ULIEN M. H ENDRICKX AND V INCENT D. B LONDEL: Convergence of different linear and non-
     linear Vicsek models. In Proc. 17th Internat. Symp. on Mathematical Theory of Networks and
     Systems (MTNS’06), pp. 1229–1240, 2006. Corrected version available at UCL. 423

 [5] A LI JADBABAIE , J IE L IN , AND A. S TEPHEN M ORSE: Coordination of groups of mobile au-
     tonomous agents using nearest neighbor rules. IEEE Trans. on Automatic Control, 48(6):988–1001,
     2003. [doi:10.1109/TAC.2003.812781] 423

 [6] J I M ENG AND M AGNUS E GERSTEDT: Distributed coordination control of multiagent sys-
     tems while preserving connectedness. IEEE Trans. on Robotics, 23(4):693–703, 2007.
     [doi:10.1109/TRO.2007.900638] 423

 [7] L UC M OREAU: Stability of multiagent systems with time-dependent communication links. IEEE
     Trans. on Automatic Control, 50(2):169–182, 2005. [doi:10.1109/TAC.2004.841888] 423

 [8] R EZA O LFATI -S ABER: Flocking for multi-agent dynamic systems: algorithms and theory. IEEE
     Trans. on Automatic Control, 51(3):401–420, 2006. [doi:10.1109/TAC.2005.864190] 423

                     T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                   450
                          H OW M ANY B ITS C AN A F LOCK OF B IRDS C OMPUTE ?

 [9] C RAIG W. R EYNOLDS: Flocks, herds and schools: A distributed behavioral model. SIGGRAPH
     Comput. Graph., 21(4):25–34, 1987. [doi:10.1145/37402.37406] 422

[10] A LIREZA TAHBAZ -S ALEHI AND A LI JADBABAIE: On recurrence of graph connectivity in Vicsek’s
     model of motion coordination for mobile autonomous agents. In American Control Conference
     (ACC’07), pp. 699–704. IEEE Comp. Soc. Press, 2007. [doi:10.1109/ACC.2007.4282958] 423

[11] G ONGGUO TANG AND L EI G UO: Convergence of a class of multi-agent systems in probabilistic
     framework. J. Systems Science & Complexity, 20(2):173–197, 2007. [doi:10.1007/s11424-007-
     9016-3] 423

[12] H ERBERT G. TANNER , A LI JADBABAIE , AND G EORGE J. PAPPAS: Flocking in fixed
     and switching networks.     IEEE Trans. on Automatic Control, 52(5):863–868, 2007.
     [doi:10.1109/TAC.2007.895948] 422, 423

[13] TAMÁS V ICSEK , A NDRÁS C ZIRÓK , E SHEL B EN -JACOB , I NON C OHEN , AND O FER S HOCHET:
     Novel type of phase transition in a system of self-driven particles. Phys. Rev. Lett., 75(6):1226–1229,
     1995. [doi:10.1103/PhysRevLett.75.1226] 423


AUTHOR

      Bernard Chazelle
      Eugene Higgins Professor of Computer Science
      Department of Computer Science
      Princeton University
      Princeton, NJ 08540
      chazelle cs princeton edu
      http://www.cs.princeton.edu/~chazelle/


ABOUT THE AUTHOR

      B ERNARD C HAZELLE is Eugene Higgins Professor of Computer Science at Princeton
         University, where he has been on the faculty since 1986. He is currently on leave as an
         Addie and Harold Broitman Member at the Institute for Advanced Study, Princeton, New
         Jersey.




                    T HEORY OF C OMPUTING, Volume 10 (16), 2014, pp. 421–451                            451