Time Bounds for Streaming Problems

Authors Rapha{\"e}l Clifford, Markus Jalsenius, Benjamin Sach,
Plaintext
                           T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31
                                        www.theoryofcomputing.org




          Time Bounds for Streaming Problems
              Raphaël Clifford∗                   Markus Jalsenius†                    Benjamin Sach
               Received March 13, 2017; Revised January 11, 2018; Published September 7, 2019




       Abstract: We give tight cell-probe bounds for the time to compute convolution, multipli-
       cation and Hamming distance in a stream. The cell probe model is a particularly strong
       computational model and subsumes, for example, the popular word RAM model.
           • We first consider online convolution where the task is to compute the inner product
             between a fixed n-dimensional vector and a vector of the n most recent values from a
             stream. One symbol of the stream arrives at a time and then each output symbol must
             be computed before the next input symbol arrives.
           • Next we show bounds for online multiplication of two n-digit numbers where one of
             the multiplicands is known in advance and the other arrives one digit at a time, starting
             from the lower-order end. When a digit arrives, the task is to compute a single new
             digit from the product before the next digit arrives.
           • Finally we look at the online Hamming distance problem where the Hamming distance
             is computed instead of the inner product.
           For each of these three problems, we give a lower bound of Ω ((δ /w) log n) time on
       average per output symbol, where δ is the number of bits needed to represent an input
       symbol and w is the cell or word size. We argue that these bounds are in fact tight within the
       cell probe model.

ACM Classification: F.2.2
AMS Classification: 68Q17
Key words and phrases: lower bounds, cell probe, streaming algorithms, online algorithms
     Preliminary versions of these results first appeared in the Proceedings of the 38th International Colloquium on Automata,
Languages and Programming (ICALP), 2011 [7] and the Proceedings of the 24th ACM-SIAM Symposium on Discrete
Algorithms (SODA), 2013 [8].
   ∗ Supported by EPSRC Fellowship EP/J019283/1.
   † Supported by EPSRC Fellowship EP/J019283/1.



© 2019 Raphaël Clifford, Markus Jalsenius and Benjamin Sach
c b Licensed under a Creative Commons Attribution License (CC-BY)                              DOI: 10.4086/toc.2019.v015a002
                     R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

1    Introduction
We consider the complexity of three related and fundamental problems: computing the convolution of
two vectors, multiplying two integers, and computing the Hamming distance between two strings. We
study these problems in an online or streaming context and provide matching upper and lower bounds in
the cell-probe model. Time lower bounds in the cell-probe model also hold for the popular word-RAM
model in which many of today’s algorithms are given.
    The importance of these problems is hard to overstate. The integer multiplication and convolution
problems have played a central role in modern algorithm design and theory. The question of how to
compute the Hamming distance efficiently has a rich literature, spanning many of the most important
fields in computer science. Within the theory community, communication complexity based lower
bounds and streaming model upper bounds for the Hamming distance problem have been the subject
of particularly intense study [10, 35, 17, 19, 4, 5]. This previous work has however almost exclusively
focused on providing resource bounds either in terms of space or bits of communication rather than time
complexity.
    We begin by introducing the problems and stating our results. In the following problem definitions
and throughout, we write [q] to denote the set {0, . . . , q − 1}, where q is a positive integer and a parameter
of the problem.
Problem 1.1 (Online convolution). For a fixed vector F ∈ [q]n of length n, we consider a stream in
which numbers from [q] arrive one at a time. For each arriving number, before the next number arrives,
we compute the inner product (modulo q) of F and the vector that consists of the most recent n numbers
of the stream.
Theorem 1.2. In the cell-probe model with w bits per cell, for any positive integers q and n, and any
randomised algorithm solving the online convolution problem, there exist instances such that the expected
amortised time is Ω((δ /w) log n) per arriving value, where δ = dlog2 qe.
Problem 1.3 (Online multiplication). Given two numbers F, X ∈ [qn ], where q is the base and n is the
number of digits per number, we want to compute the n least significant digits of the product of F and X,
in base q. We must do this under the constraint that only F is known in advance and the digits of X arrive
one at a time, starting from the lower-order end. When the i-th digit of X arrives, before the (i + 1)-th
digit arrives, we compute the i-th digit of the product.
Theorem 1.4. In the cell-probe model with w bits per cell, for any positive integers q and n, and any
randomised algorithm solving the online multiplication problem in base q, there exist instances such
that computing the n least significant digits of the product takes Ω((δ /w)n log n) expected time, where
δ = dlog2 qe.
Problem 1.5 (Online Hamming distance). For a fixed string F of length n, we consider a stream in
which symbols from the alphabet [q] arrive one at a time. For each arriving symbol, before the next
symbol arrives, we compute the Hamming distance between F and the last n symbols of the stream.
Theorem 1.6. In the cell-probe model with w bits per cell, for any positive integers q and n, and any
randomised algorithm solving the online Hamming distance problem, there exist instances such that the
expected amortised time is Ω((δ /w) log n) per arriving value, where δ = dmin{log2 q, log2 n}e.

                         T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                 2
                               T IME B OUNDS FOR S TREAMING P ROBLEMS

    Our Hamming distance lower bound also implies a matching lower bound for any problem to which
Hamming distance can be reduced. The most straightforward of these is online L1 distance computation,
where the task is to compute the L1 distance between a fixed vector of integers and the last n numbers in
the stream. A suitable reduction was shown in [26]. The expected amortised cell probe complexity for
the online L1 distance problem is therefore also Ω((δ /w) log n) per output symbol.
    One of our main technical contributions is to extend methods designed to give lower bounds on
dynamic data structures to the seemingly distinct field of online algorithms. Where δ = w, for example, we
have Ω(log n) lower bounds for all three problems. In particular for online multiplication and convolution,
these lower bounds match the best currently known offline upper bounds in the RAM model. As we
discuss in Section 1.1, this may be the highest lower bound that can be proved for all the problems we
consider without a further breakthrough.
    In order to prove our lower bounds we show the existence of probability distributions on the inputs
for which we can prove lower bounds on the expected running time of any deterministic algorithm. By
Yao’s minimax principle [36] this immediately implies that for every (randomised) algorithm there is a
worst-case input such that the (expected) running time is equally high. Therefore our lower bounds hold
equally for randomised algorithms as for deterministic ones.
    The lower bounds we give are also tight within the cell-probe model. This can be seen by application
of reductions described in [12, 6]. It was shown there that any offline algorithm for convolution [6] or
multiplication [12] can be converted to an online one with at most an O(log n) factor overhead. For
details of these reductions we refer the reader to the original papers. In our case, the same approach
also allows us to directly convert any cell-probe algorithm from an offline to online setting. An offline
cell-probe algorithm for convolution, multiplication or Hamming distance could first read the whole
input, then compute the answer. This takes O((δ /w)n) cell probes. We can therefore derive online cell-
probe algorithms which take only O((δ /w)n log n) probes over n input symbols, hence O((δ /w) log n)
(amortised) probes per output. This upper bound matches the new lower bounds we give. We summarise
this in the following corollary.
Corollary 1.7. The expected amortised cell-probe complexity of the online convolution, multiplication,
Hamming distance and L1 -distance problems is Θ((δ /w) log n) per arriving value.
    One consequence of our results is the first strict separation between the complexity of exact and
approximate pattern matching. Online exact matching can be solved in constant time [15] per new input
symbol and our new lower bound proves for the first time that this is not possible for Hamming distance.
    Another consequence of our results is a new separation between the time complexity of online
exact matching and any convolution-based online pattern matching algorithm. Convolution has played
a particularly important role in the field of combinatorial pattern matching where many of the fastest
algorithms rely crucially for their speed on the use of fast Fourier transforms (FFTs) to perform repeated
convolutions. These methods have also been extended to allow searching for patterns in rapidly processed
data streams [6, 9].

1.1   Previous results and upper bounds in the RAM model
Almost all previous algorithmic work for exact Hamming distance computation has considered the
problem in an offline setting. Given a pattern P and a text T of length m and n respectively, the

                       T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                              3
                     R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

                                                                                                   √
best current deterministic upper bound for offline Hamming distance computation is an O(n m log m)
time algorithm based on convolutions [2, 22]. In [21] a randomised algorithm was given that takes
O((n/ε 2 ) log2 n) time which was subsequently modified in [18] to O((n/ε 3 ) log n). Particular interest
has also been paid to a bounded version of this problem called the k-mismatch problem. Here a bound k
is given and we need only report the Hamming distance if it is less than or equal to k. In [23], an O(nk)
algorithm was given that is not convolution based and uses O(1)-time lowest common ancestor (LCA)
                                                                             √
operations on the suffix tree of P and T . This was then improved to O(n k log k) time by a method that
combines LCA queries, filtering and convolutions [3].
     The best time complexity lower bounds for online multiplication of two n-bit numbers were given in
the 1974 by Paterson, Fischer and Meyer. They presented an Ω(log n) lower bound for multitape Turing
machines [31] and also gave an Ω(log n/ log log n) lower bound for the bounded activity machine (BAM).
The BAM, which is a strict generalisation of the Turing machine model but which has nonetheless largely
fallen out of favour, attempts to capture the idea that future states can only depend on a limited part of the
current configuration. To the authors’ knowledge, there has been no progress on cell-probe lower bounds
for online multiplication, convolution or Hamming distance previous to the work we present here.
     There have however been attempts to provide offline lower bounds for the related problem of
computing the FFT. In [28] Morgenstern gave an Ω(n log n) lower bound conditional on the assumption
that the underlying field of the transform is the complex numbers and that the modulus of any complex
numbers involved in the computation is at most one. Papadimitriou gave the same Ω(n log n) lower bound
for FFTs of length a power of two, this time excluding certain classes of algorithms including those
that rely on linear mathematical relations among the roots of unity [30]. This work had the advantage
of giving a conditional lower bound for FFTs over more general algebras than was previously possible,
including for example finite fields. In 1986, Pan [29] showed that another class of algorithms having a
so-called synchronous structure must require Ω(n log n) time for the computation of both the FFT and
convolution.
     The fastest known algorithms for both offline integer multiplication and convolution in the word-
RAM model require O(n log n) time by a well known application of a constant number of FFTs. As a
consequence our online lower bounds for these two problems match the best known time upper bounds
for the offline problem. As we discussed above, our lower bounds for all three problems are also tight
within the cell-probe model for the online problems.
     The question now naturally arises as to whether one can find higher lower bounds in the RAM model.
This appears as an interesting question as there remains a gap between the best known time upper bounds
provided by existing algorithms and the lower bounds that we give within the cell-probe model. However,
as we mention above, any offline algorithm for convolution, Hamming distance or multiplication can be
converted to an online one with at most an O(log n) factor overhead [12, 6]. As a consequence, a higher
lower bound than Ω(log n) for any of these problems would immediately imply a superlinear lower bound
for the offline version of the corresponding problem. This would be a truly remarkable breakthrough in
the field of computational complexity as no such offline lower bound is known even for the canonical
NP-complete problem SAT.
     Our only alternative route to find tight time bounds would be to find better upper bounds for the
online problems. For the case of online multiplication at least, where the fastest online RAM algorithm
takes O(log2 n) time per arriving pair of digits, this has been an open problem since at least 1973 and has

                        T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                4
                                T IME B OUNDS FOR S TREAMING P ROBLEMS

so far resisted our best attempts. On the other hand, for online Hamming distance, while our lower bound
is tight within the model, it is still distant from the time complexity of the fastest known RAM algorithms.
                                                √
The best known online complexity is O( n log n) time per arriving symbol [6]. An improvement of
the upper bound for Hamming distance computation to meet our new lower bound would also have
significant implications. A reduction that is now regarded as folklore tells us that any O( f (n)) time
algorithm for computing the Hamming distance between a pattern and all substrings of a text, assuming a
pattern of length n and a text of length 2n, implies an O( f (n2 )) time algorithm for multiplying binary
(n×n)-matrices over the integers. Therefore an O(log n)-time online Hamming distance algorithm would
imply an O(n log n) offline Hamming distance algorithm, which would in turn imply an O(n2 log n)-time
algorithm for binary matrix multiplication. Although such a result would arguably be less shocking than
a proof of a superlinear offline lower bound for Hamming distance computation, it would nonetheless be
a significant breakthrough in the complexity of a classic and much studied problem.

1.2   The cell-probe model
Our bounds hold in the cell-probe model which is a particularly strong computational model that
was introduced originally by Minsky and Papert [27] in a different context and then subsequently by
Fredman [13] and Yao [37]. A data structure in the cell-probe model consists of a set of memory cells,
each storing w bits. When presented with an update, the data structure reads and updates a number of
these cells. Similarly, when presented with a query, the data structure reads cells and from their contents,
returns the desired answer. These reads and updates are referred to as cell probes and the cost of an
update or query operation is simply the number of cells that are probed. As is typical, we will require that
the cell size w is at least of order log n bits. This allows each cell, or a constant number of cells, to hold
the address of any location in memory.
    This abstraction makes the model very strong, subsuming for instance the popular word-RAM model.
In the word-RAM model certain operations on words, such as addition, subtraction and possibly multipli-
cation take constant time (see for example [16] for a detailed introduction). Here a word corresponds to a
cell. Any time lower bound in the cell-probe model with w-bit cells gives an asymptotically equal time
lower bound in the word-RAM model with w-bit words. This is because each constant-time operation in
the word-RAM model only probes a constant number of cells.
    The generality of the cell-probe model makes it particularly attractive for establishing lower bounds
for dynamic data structure problems and many such results have been given in the past couple of decades.
The approaches taken had historically been based only on communication complexity arguments and the
chronogram technique of Fredman and Saks [14]. However in 2004, a breakthrough lead by Pǎtraşcu
and Demaine gave us the tools to seal the gaps for several data structure problems [34] as well as
giving the first Ω(log n) lower bounds. The new technique is based on information-theoretic arguments
that we also deploy here. Pǎtraşcu and Demaine also presented ideas which allowed them to express
more refined lower bounds such as trade-offs between updates and queries of dynamic data structures.
For a list of data structure problems and their lower bounds using         these and related techniques, see
                                                                    2
for example [32]. A new lower bound of Ω (log n/ log log n) was given by Larsen in 2012 for the
cell-probe complexity of performing queries in the dynamic range counting problem [24]. This result
holds under the natural assumptions of Θ(log n)-size words and polylogarithmic time updates and is
another exciting breakthrough in the field of cell-probe complexity. In a recent significant advance for

                        T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                5
                     R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

the field, an Ω((log1/2 n/ log log n)3 ) time lower bound for the unweighted version of dynamic range
counting was given which holds even over F2 [25].

1.3   Technical contributions
We use one of the most important techniques for proving data structure lower bounds called the information
transfer method of Pǎtraşcu and Demaine [33, 34]. For a pair of adjacent intervals of arriving values in
the stream, the information transfer is the set of memory cells that are written during the first interval and
read in the next interval. These cells must contain all the information from the updates during the first
interval that the algorithm needs in order to produce correct output in the next interval. If one can prove
that this quantity is large for many pairs of intervals then the desired lower bound follows. To do this we
relate the size of the information transfer to the conditional entropy of the output in the relevant interval.
The main task of proving lower bounds reduces to that of devising a hard input distribution for which
output symbols have high entropy conditioned on selected previous values of the input.
     Although the use of information transfer to provide time lower bounds for data structure problems is
not new, applying the method to our new online setting has required a number of new insights. At the
simplest level, where a standard data structure problem has a number of different possible queries, in our
setting there is only one query which is to return the latest result as soon as a new symbol arrives. As a
result we provide a complete description of the information transfer method in a form which is relevant to
this different setting. At a more detailed mathematical level, perhaps the most surprising innovation we
present is a new relationship between the Hamming distance, vector sums and constant-weight binary
cyclic codes.
     For the three problems we consider, our key contribution is the design of a fixed vector or string F
which together with some random distribution over possible input streams provide a lower bound for the
information transfer between successive intervals. For the convolution and multiplication problems we
show that a randomly picked F has a good chance of being suitable for proving the lower bounds. We also
give an explicit description of a particular F for which the lower bounds are obtained when the values of
the input stream are drawn independently and uniformly at random. The vector F is easy to describe and
naturally yields large conditional entropy of the output symbols for intervals of power-of-two lengths.
     The results of the convolution and multiplication problems can be seen as a first step towards the lower
bound for the Hamming distance problem. Here the string F is derived by a sequence of transformations.
These start with binary cyclic codes and go via binary vectors with many distinct sums and an intermediate
string to finally arrive at F itself. The use of such a purposefully designed input departs from the closely
related work of the convolution and multiplication lower bounds and also from much of the lower bound
literature where simple uniform distributions over the whole input space often suffice.
     The central fact that enabled a lower bound to be proven for the online convolution problem is that
the inner product between a vector and successive suffixes of the stream reveals a lot of information about
the history of the stream. Establishing a similar result for online Hamming distance problem appears,
however, to be considerably more challenging for a number of reasons. The first and most obvious is that
the amount of information one gains by comparing whether two, potentially large, symbols are equal is at
most one bit, as opposed to O(log n) bits for multiplication. The second is that the particularly simple
worst-case vector F of the convolution problem greatly eased the resulting analysis. We have not been
able to find such a simple fixed string for the Hamming distance problem and our proof of the existence of

                        T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                6
                                 T IME B OUNDS FOR S TREAMING P ROBLEMS

a hard instance is non-constructive and involves a number of new insights, combining ideas from coding
theory and additive combinatorics.
    When computing the Hamming distance there is a balance between the number of symbols being used
and the length of the strings. For large alphabets and short strings, one would expect a typical Hamming
distance to be close to the length of the string on random input symbols and therefore to provide very little
information about the random string itself. This suggests that the length of the strings must be sufficiently
long in relation to the alphabet size to ensure that the entropy of the output symbols is large, as required
by the information transfer method. At first glance, it is not immediately obvious that large entropy can
be obtained unless the fixed string F is exponentially larger than the alphabet size. This potentially poses
another problem for the information transfer method, namely that a word size w of order log n would
be much larger than δ (the number of bits needed to represent a symbol), making a log n lower bound
impossible to achieve.
    Our main technical contribution is to show that fixed strings of length only polynomial in the size
of the alphabet exist which provide output symbols of sufficiently high entropy. Such strings, when
combined with a suitable input distribution maximising the number of distinct Hamming distance output
sequences, give us the overall lower bound. We design a fixed string F with this desirable property in
such a way that there is a one-to-one mapping between many of the different possible input streams and
the computed Hamming distances. This in turn implies large entropy. The construction of F is non-trivial
and we break it into smaller building blocks, reducing our problem to a purely combinatorial question
relating to vectors sums. That is, given a relatively small set V of vectors of length m, how many distinct
vector sums can be obtained by choosing m vectors from V and adding them. We show that even if we
are restricted to picking vectors only from subsets of V , there exists a V such that the number of distinct
vector sums is mΩ(m) . We believe this result is interesting in its own right. Our proof for the combinatorial
problem is non-constructive and probabilistic, using constant-weight cyclic binary codes to prove that
there is a positive probability of the existence of a set V with the desired property.


1.4   Organisation
In Section 2 we introduce notation and describe the setup for proving the lower bounds. In Section 3 we
prove the lower bounds for all three problems that we consider. The proofs hinge on a set of lemmas which
give lower bounds related to the entropy of the outputs of the problems considered. These lemmas are
proven separately in subsequent sections. In Section 4 we deal with the lemmas related to the convolution
problem, and in Section 5 we deal with the lemmas related to the multiplication problem. Finally, in
Sections 6, 7 and 8 we prove the lemma related to the Hamming distance problem.


2     Basic setup for the lower bounds
In this section we introduce notation and concepts that are used heavily in the lower bound proofs. For
an array, vector or string A of length n and i, j ∈ [n], we write A[i] to denote the value at position i, and
where j ≥ i, A[i, j] denotes the length-( j − i + 1) subarray of A starting at position i. All logarithms are in
base two. We first introduce a unifying framework for the problems we consider.

                         T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                 7
                      R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

2.1   The framework
There is a fixed array F and an array S which is referred to as the stream. Both F and S are of length n
and over the set [q] of integers, and we let δ = dlog qe denote the number of bits required to encode a
value from [q]. The value q, or alternatively δ , is a parameter of the problem. The problem is to maintain
S subject to an update operation UPDATE(x) which takes a symbol x ∈ [q], modifies S by appending x to
the right of the rightmost symbol S[n − 1] and removing the leftmost symbol S[0], and then outputs the
value of a function of F and the updated S. In the convolution problem the output is the inner product
of F and S, that is ∑i∈[n] (F[i] · S[i]), and in the Hamming distance problem the output is the number of
positions i ∈ [n] such that F[i] 6= S[i].
     We let U ∈ [q]n denote the update array which describes a sequence of n UPDATE operations. That
is, for each t ∈ [n], the operation UPDATE(U[t]) is performed. We will usually refer to t as the arrival of
the value U[t]. Observe that just after the arrival t, the values U[t + 1, n − 1] are still not known to the
algorithm. Finally, we let the length-n array A denote the outputs such that for t ∈ [n], A[t] is the output of
UPDATE (U[t]).
     In the multiplication problem we let F denote one of the two operands to be multiplied, hence F is
fixed and known in advance by the algorithm. Specifically we let F[i] denote the i-th least significant
digit. We let U be the unknown operand so that U[t] is its t-th least significant digit. Prior to the arrival of
the first digit U[0], the stream S contains only zeros. The output A[t] is the t-th digit in the product of F
and S, which is a function of F and U[0,t] as required.

2.2   Hard distributions
Our lower bounds hold for any randomised algorithm on its worst case input. This will be achieved by
applying Yao’s minimax principle [36]. That is, we develop lower bounds that hold for any determinis-
tic algorithm on some random input. The basic approach is as follows: we devise a fixed array F and
describe a probability distribution for n new values arriving in the stream S. We then obtain a lower bound
on the expected running time for any deterministic algorithm over these arrivals. Due to the minimax
principle, the same lower bound must then hold for any randomised algorithm on its own worst case input.
The amortised bound is obtained by dividing by n.
     From this point onwards we consider an arbitrary deterministic algorithm running with some fixed
array F on a random input of n values. The algorithm may depend on F. We refer to the choice of F and
distribution on U as a hard distribution since it is used to show a lower bound.

2.3   Information transfer
The information transfer tree, denoted T, is a balanced binary tree over n leaves. To avoid technicalities
we assume that n is a power of two. The leaves of T, from left to right, represent the arrivals t from 0 to
n − 1. For a node v of T, we let `v denote the number of leaves in the subtree rooted at v. An internal
node v is associated with three arrivals, t0 , t1 and t2 . Here t0 is the arrival represented by the leftmost node
in subtree rooted at v, similarly t2 = t0 + `v − 1 is the rightmost such node and t1 = t0 + `v /2 − 1 is in the
middle. That is, the intervals [t0 ,t1 ] and [t1 + 1,t2 ] span the left and right subtrees of v, respectively. For
example, in Figure 1, the node labelled v is associated with the intervals [16, 23] and [24, 31].

                         T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                   8
                                        T IME B OUNDS FOR S TREAMING P ROBLEMS


                                                                                    v



      0   1   2   3   4   5   6   7 8    9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31


Figure 1: An information transfer tree T with n = 32 leaves. For the node labelled v, the arrival times
t0 = 16, t1 = 23 and t2 = 31.

    We define the subarray Uv = U[t0 ,t1 ] to represent the `v /2 values arriving in the stream during the
arrival interval [t0 ,t1 ], and we define the subarray Av = A[t1 + 1,t2 ] to represent the `v /2 output symbols
produced during the arrival interval [t1 + 1,t2 ].
    We define U  ev to be the concatenation of U[0, (t0 − 1)] and U[(t2 + 1), (n − 1)]. That is, Uev contains all
                                                  e
symbols of U except for those in Uv . When Uv is fixed to some constant uev and Uv is random, we write
H(Av | U ev = uev ) to denote the conditional entropy of Av under the fixed U  ev .
    We define the information transfer of a node v of T, denoted Iv , to be the set of memory cells c such
that c is probed during the interval [t0 ,t1 ] and also probed in [t1 + 1,t2 ]. The cells in the information
transfer Iv therefore contains all the information about the values in Uv that the algorithm uses in order to
correctly produce the output symbols Av .
    By adding up the sizes of the information transfers Iv over the internal nodes v of T we get a lower
bound on the number of cell probes, that is a lower bound on the total running time of the algorithm.
To see this it is important to make the observation that a particular cell probe is counted for only once.
Suppose that the cell c ∈ Iv for some node v. Let p be the first probe of c in the arrival interval [t1 + 1,t2 ].
By including the cell c ∈ Iv in the cell probe count we are in fact counting the probe p. Now observe that
p cannot be counted for in the information transfer Iv0 of any node v0 where v0 is a proper descendant or
ancestor of v.
    Since the concept of the size of the information transfer is central to the lower bound proofs, we
define as a shorthand Iv = |Iv | to denote the size of the information transfer.

Definition 2.1 (Large expected information transfer). A node v of T has large information transfer if
                                                               k · δ · `v
                                                    E[Iv ] ≥              ,
                                                                   w
where k is a constant that depends on the problem and input distribution.

    Our aim is to show that a substantial proportion of the nodes of T have large information transfer. As
we will see in Section 3, this will be achieved by relating the size of the information transfer, Iv to the
entropy of the output symbols Av .


3    Overall proofs of the lower bounds
In this section we give the overall proofs for our lower bound results. Consider any node v in T. Suppose
     ev is fixed arbitrarily but the symbols in Uv are randomly drawn in accordance with the distribution
that U

                              T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                             9
                     R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

on U, conditioned on the fixed value of U  ev . This induces a distribution on the output symbols Av . If
the entropy of Av is large, conditioned on the fixed U  ev , then any algorithm must probe many cells in
order to correctly produce the output symbols Av , as it is only through the information transfer Iv that
the algorithm can know anything about Uv . We will begin in Section 3.1 by making this claim precise
by giving a problem independent upper bound on H(Av | U         ev = uev ) in terms of Iv . We will then give
                           e
lower bounds on H(Av | Uv = uev ) in Section 3.2 for each problem we consider. The proofs of these
entropy lower bounds form the heart of our contributions and are deferred to Sections 4 onwards. In
Section 3.3, we combine the upper and lower bounds on the entropy to show that many nodes of T have
large information transfer. Finally, we calculate our final lower bounds in Section 3.4 by summing over
all large information transfer nodes as discussed in Section 2.3 above.

3.1   An upper bound on the entropy
Towards showing that high conditional entropy H(Av | U     ev = uev ) implies large information transfer we
use the information transfer Iv to describe an encoding of the output symbols Av . The following lemma
gives a direct relationship between the size of the information transfer Iv and the entropy. The lemma was
originally stated in [34] but for completeness we restate it here in our notation and provide a full proof.
Lemma 3.1 (Pǎtraşcu and Demaine [34]). Under the assumption that the address of any cell can be
specified in w bits, for any node v of the information transfer tree T, the entropy
                                       ev = uev ) ≤ w + 2w · E[Iv | U
                                H(Av | U                            ev = uev ] .

Proof. The expected length of any encoding of Av , conditioned on U    ev , is an upper bound on the conditional
entropy of Av . We use the information transfer Iv as an encoding in the following way. For every cell
c ∈ Iv we store the address of c, which takes at most w bits under the assumption that a cell can hold the
address of any cell in memory. We also store the contents of c, which takes w bits. In total this requires
2w · Iv bits. We will use the algorithm, which is fixed, and the fixed values of U     ev as part of the decoder
to obtain Av from the encoding. Since the encoding is of variable length we also store the size of the
information transfer, which requires at most w additional bits.
    In order to prove that the described encoding of Av is valid we now describe how to decode it. First
we simulate the algorithm on the fixed input U     ev from the first arrival of U[0] until just before the first
value in Uv arrives. We then skip over all input symbols in Uv and resume simulating the algorithm from
the beginning of the interval where Av is computed until the last value in Av has been obtained. For every
cell being read, we check if it is contained in information transfer Iv by looking up its address in the
encoding. If it is in the information transfer, its contents is fetched from the encoding. If not, its contents
is available from simulating the algorithm on the fixed input symbols. Observe that it suffices to store
only the first time a cell in the information transfer is probed as the decoder remembers every cell it has
already accessed.

3.2   Lower bounds on the entropy
Lemma 3.1 above provides a direct way to obtain a lower bound on the expected size of the information
                                                               ev = uev ). To show that a node has large
transfer given a lower bound on the conditional entropy H(Av | U
information transfer we introduce the following definition.

                        T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                 10
                                T IME B OUNDS FOR S TREAMING P ROBLEMS

Definition 3.2 (High-entropy node). A node v in T is a high-entropy node if there is a positive constant
k such that for any fixed uev ,
                                          ev = uev ) ≥ k · δ · `v .
                                   H(Av | U
    To put this bound in perspective, note that the maximum conditional entropy of Av is bounded by
the entropy of Uv , which is at most δ · (`v /2) and obtained when the values of Uv are independent and
uniformly drawn from [q]. Thus, the conditional entropy associated with a high-entropy node is the
highest possible up to some constant factor. Establishing high-entropy nodes is the main contribution of
this paper and the results are given in the following lemmas.
Lemma 3.3. For the convolution problem, suppose that U is chosen uniformly at random from [q]n ,
where q is a prime. For any v ∈ T, at least a (1 − 1/q)-fraction of all F ∈ [q]n have the property that v is
a high-entropy node.
     The proof of the above lemma is given in Section 4 and relies on properties of Toeplitz matrices over
a finite field of q elements. The proof does not give explicit descriptions of fixed arrays F for which nodes
are high-entropy nodes. In the proof of the next lemma however, we show that there exists a particular
array F for which high-entropy nodes are obtained. This F is a 0/1-array and is easy to describe: zeroes
everywhere except for at power-of-two positions from the right hand end. The proof is given in Section 4.
Lemma 3.4. For the convolution problem there exists a fixed array F ∈ [q]n such that when U is chosen
uniformly at random from [q]n , all v ∈ T are high-entropy nodes.
    Before we give the lemmas concerning online multiplication, recall that in this problem there is a
fixed operand F multiplied with an operand U for which digits arrive one at a time.
Lemma 3.5. For the online multiplication problem, suppose that the operand U is chosen uniformly
at random from [qn ]. For any v ∈ T, at least half of all operands F ∈ [qn ] have the property that v is a
high-entropy node.
    The proof of Lemma 3.5 is given in Section 5. Similarly to the convolution problem we also give an
explicit description of a number F for which high-entropy nodes are obtained. This number resembles
the fixed array that we described above for the convolution problem. The proof of the next lemma is also
given in Section 5.
Lemma 3.6. For the online multiplication problem there exists a fixed operand F ∈ [qn ] such that when
U is chosen uniformly at random from [qn ], all v ∈ T are high-entropy nodes.
     Finally, for the Hamming distance problem we show that there exists an F and distribution for U such
that sufficiently many nodes are high-entropy nodes. The proof of the next lemma is rather involved and
is given over Sections 6 to 8.
Lemma 3.7. For the Hamming distance problem there exists a hard distribution with a fixed F and
                                                    √
random U such that any node v ∈ T for which `v ≥ h · n is a high-entropy node, where h is a positive
constant.
    In the proof of Lemma 3.7 we demonstrate that there exists a very specific set of strings such that
when F is drawn randomly from this set, there is a non-zero probability of picking an F for which many
nodes are high-entropy nodes. Unlike the convolution and multiplication problems, the distribution for U
is not uniform over of all strings [q]n .

                       T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                               11
                    R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

3.3   Lower bounds on the information transfer
In the previous section we gave a series of lemmas saying that for all three problems we consider, there
are instances for which many nodes of T are high-entropy nodes. In this section we combine these results
with the entropy upper bound of Lemma 3.1 to show that many nodes have large information transfer.
The following lemmas match the lemmas of the previous section. We start with the convolution problem.

Lemma 3.8. For the convolution problem where both F and U are chosen uniformly at random from
[q]n , and q is a prime, every v ∈ T has large information transfer.
                                                                              ev , at least half of all
Proof. By combining Lemmas 3.1 and 3.3 we have that for any v ∈ T under fixed U
       n
F ∈ [q] imply that v is a high-entropy node, that is,

                                                               ev = uev ] ,
                                  k · δ · `v ≤ w + 2w · E[Iv | U

where k is the constant from Definition 3.2 of a high-entropy node. Rearranging terms gives

                                            ev = uev ] ≥ δ · `v − 1 .
                                     E[Iv | U
                                                         2k · w 2
                                                      ev under a random U. When F is chosen
We remove the conditioning by taking expectation over U
                            n
uniformly at random from [q] we therefore have

                                                     δ · `v 1
                                          E[Iv ] ≥         − ,
                                                     4k · w 4
hence v has large information transfer.

    Similarly to Lemma 3.8, we combine Lemmas 3.1 and 3.4 to obtain the following property for the
case where F is a fixed string and not randomly chosen.

Lemma 3.9. For the convolution problem there exists a hard distribution where F is fixed and U is
chosen uniformly at random from [q]n , such that every v ∈ T has large information transfer.

Proof. Similarly to the proof of Lemma 3.8 we combine Lemmas 3.1 and 3.4 to obtain, for all v ∈ T
            ev ,
under fixed U
                                            ev = uev ] ≥ δ · `v − 1 ,
                                     E[Iv | U
                                                         2k · w 2
where k is the constant from Definition 3.2 of a high-entropy node. The conditioning is removed by
                        ev under a random U.
taking expectation over U

   The proofs of the following two lemmas, in which we establish large information transfer for the
multiplication problem, are similar to the proofs of the previous two lemmas, only that we here combine
Lemma 3.1 with Lemmas 3.5 and 3.6, respectively.

Lemma 3.10. For the online multiplication problem where both operands are chosen uniformly at
random from [qn ], every v ∈ T has large information transfer.

                      T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                           12
                               T IME B OUNDS FOR S TREAMING P ROBLEMS

Lemma 3.11. For the online multiplication problem there exists a fixed operand in [qn ] such that when
the other operand is chosen uniformly at random from [qn ], every v ∈ T has large information transfer.
    Finally, large information transfer is also established for the Hamming distance problem. The proof of
the next lemma is identical to the proof of Lemma 3.9, only that we combine Lemma 3.1 with Lemma 3.7
                                                                                        √
instead, and restrict the nodes v to those for which `v is greater than a constant times n.
Lemma 3.12. There exists a hard distribution for the Hamming distance problem such that every v ∈ T
                  √
for which `v ≥ h · n has large information transfer, where h is a positive constant.

3.4   Obtaining the cell-probe lower bounds
Now that we have established large information transfer for sufficiently many nodes of T we are ready to
prove the lower bounds of Theorems 1.2, 1.4 and 1.6.
    For both the convolution and multiplication problems, large information transfer has been established
for every node v of T, whereas for the Hamming distance problem, large information transfer has only
                                √
been established where `v ≥ h · n where h is a positive constant. In order to unify the presentation of
                                                                        √
the proofs we restrict the summation of Iv to nodes for which `v ≥ h · n. Let V denote this set of nodes.
We have           "       #     "      #
                                                             k · δ · `v   k0 · δ · n · log n
                E ∑ Iv ≥ E ∑ Iv = ∑ E[Iv ] ≥ ∑                          =                    ,       (3.1)
                    v∈T          v∈V          v∈V        v∈V     w                w
where k is the constant from Definition 2.1 of large information transfer and k0 is a new suitable constant.
The first equality follows by linearity of expectation and the second inequality follows by Lemmas 3.8
to 3.12, respectively. The last equality follows from the fact that

                                           ∑ `v ∈ Θ(n log n) .
                                            v∈T√
                                         `v ⩾h· n

    Since the running time is bounded by the number of cell probes we have from Equation (3.1) that
the expected running time for any deterministic algorithm solving the convolution, multiplication or
Hamming distance problem, respectively, on n random input symbols is
                                                          
                                             δ · n · log n
                                         Ω                   .
                                                  w
By Yao’s minimax principle, as discussed in Section 2, this implies that any randomised algorithm on its
worst case input has the same lower bound on its expected running time. The amortised time per arriving
value is obtained by dividing the running time by n. This concludes the proofs of Theorems 1.2, 1.4
and 1.6.


4     Hard distributions for the convolution problem
In this section we prove Lemmas 3.3 and 3.4, that is we show that there are instances to the convolution
problem such that the conditional entropy of the output symbols Av is large, where all input symbols but
Uv are fixed.

                       T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                              13
                     R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

    We begin by proving Lemma 3.3 because the proof is straightforward and the description of the hard
distribution is simple: pick the input symbols U uniformly at random from [q]n . As to the choice of F we
only argue that a large fraction of all length-n arrays have the desired entropy lower bound. In Section 4.2
we will specify a particular F with this property, which will lead to a proof of Lemma 3.4.


4.1   Entropy lower bound over all arrays F
We now prove Lemma 3.3. Let v be any internal node of T and let tv ∈ [n] denote the arrival time of
                                                                                             ei , such that
Uv [0]. Let ` = `v /2. For i ∈ [`], the i-th output in Av can be broken into two sums Ai and A
Av [i] = Ai + Aei , where
                                                                          
                                    Ai = ∑ F[n − 1 − (` + i) + j] ·Uv [ j]
                                        j∈[`]

is the contribution from the alignment of F with Uv , and Aei is the contribution from the alignments that
                            e                        e
do not include Uv . Hence Ai is constant under fixed Uv . We define MF,` to be the ` × ` matrix with entries
MF,` (i, j) = F[n − 1 − (` + i) + j]. That is,
                                                                                              
                       F[n − ` − 1]      F[n − ` + 0]        F[n − ` + 1]   ···     F[n − 2]
                     F[n − ` − 2]       F[n − ` − 1]        F[n − ` + 0]   ···     F[n − 3]   
                                                                                              
                                                                                              
              MF,` = F[n − ` − 3]       F[n − ` − 2]        F[n − ` − 1]   ···     F[n − 4]   .
                           ..                ..                  ..        ..         ..      
                            .                 .                   .           .        .      
                          F[n − 2`]     F[n − 2` + 1] F[n − 2` + 2] · · ·          F[n − ` − 1]

Observe that MF,` is a Toeplitz matrix (or “upside down” Hankel matrix) since it is constant on each
descending diagonal from left to right. It follows that
                                                               
                                               Uv [0]         A0
                                              Uv [1]      A1 
                                                               
                                      MF,` ×    ..       =  ..                                     (4.1)
                                                 .        . 
                                                Uv [` − 1]           A`−1

which describes a system of linear equations. Since output symbols are given modulo q, where q is
assumed to be a prime, we operate in the finite field Z/qZ. It has been shown in [20] that for any `,
out of all the ` × ` Toeplitz matrices over a finite field of q elements, a fraction of exactly (1 − 1/q) is
non-singular. This fact was actually already established in [11] almost 40 years earlier but incidentally
reproved in [20]. Thus, a (1 − 1/q)-fraction of all F has the property that all the ` input symbols in Uv
can be uniquely determined from the output symbols in Av . Since the induced distribution for Uv under
any fixed Uev is the uniform distribution on [q]` , the conditional entropy

                                        ev = uev ) = ` · log2 q ≥ δ · `v ,
                                 H(Av | U
                                                                    2
where δ = dlog2 qe. This concludes the proof of Lemma 3.3.

                       T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                              14
                                    T IME B OUNDS FOR S TREAMING P ROBLEMS

                                                          `v
                                                                                 1
                                                                                 2 `v
                                                                                                   1
                                                                                                   4 `v
       F    0     ···000 1 000···                              ···000 1 000···          ···000 1 000···   10

       1                                                                                Uv

       2                                                            Uv

       3                                       Uv


Figure 2: Three alignments of F = Kn and the stream S: 1 the last value of Uv has just arrived, 2 half of
the output symbols in Av have been outputted, and 3 all output symbols in Av have been outputted.

4.2   Entropy lower bound with a fixed array F
We now prove Lemma 3.4 by demonstrating that it is possible to design a fixed array F such that for all
nodes v ∈ T, a large portion of the values in Uv can be uniquely determined from the output symbols Av .
Since U is drawn uniformly at random from [q]n , this implies large entropy of the output symbols Av .
    The fixed array F that we consider consists of stretches of 0s interspersed by 1s. The distance between
two succeeding 1s is an increasing power of two, ensuring that for half of the alignments of F and S in the
arrival interval where Av is computed, all but exactly one element of Uv are simultaneously aligned with a
0 in F, hence not contributing to the outputted inner product of F and S. We define Kn ∈ [2]n such that
           Kn [0], Kn [1], . . . , Kn [n − 1] = . . . 000000000100000000000000010000000100010110 ,
where commas between elements on the right hand side have been omitted, or formally,
                                  (
                                   1, if n − 1 − i is a power of two;
                         Kn [i] =
                                   0, otherwise.
The hard distribution for Lemma 3.4 is F = Kn and the input symbols U drawn uniformly at random from
[q]n .
     Let v be any node of T and consider Figure 2 which illustrates three alignments of F and S, denoted
 1 , 2 and 3 , respectively. At alignment 1 , the last value of Uv has just arrived in the stream. At
alignment 2 , half of the output symbols in Av have been outputted. At alignment 3 , all output symbols
in Av have been outputted. The key observation is that between alignment 2 and 3 , exactly one input
symbol x of Uv is aligned with a 1 in F, hence x can be uniquely determined from the corresponding
output symbol. Thus, over all output symbols Av , a total of `v /4 values of Uv can be determined, implying
that the entropy of Av must be at least δ · `v /4, where δ = dlog2 qe. We now formalise this reasoning.
     Using the definition of ` = `v /2 and the matrix MF,` above, recall that entry MF,` (i, j) = F[n − 1 −
(` + i) + j]. Thus, MF,` (i, j) = 1 if and only if
                                                              
                                   n − 1 − n − 1 − (` + i) + j = ` + i − j
is a power of two. Since ` is a power of two it follows that for row i ∈ {`/2, . . . , ` − 1} there can be at
most one entry with the value 1. More precisely,
                                                   (
                                                    1 if j = i,
                                     MF,` (i, j) =
                                                    0 otherwise.

                         T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                  15
                       R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

                                                           n
                                                            1
                                                            2 `v
                                                                                          tv

             U [n−1] U [n−2] U [n−3] · · ·                  Uv                                 · · · U [2] U [1] U [0]
                                                                                                   `v

     ×       F [n−1] F [n−2] F [n−3] · · ·                                                     · · · F [2] F [1] F [0]




             A[n−1] A[n−2] A[n−3] · · ·        Av                                              · · · A[2] A[1] A[0]
                                               1                              1
                                               2 `v                           2 `v + tv



Figure 3: An illustration of A = U × F. Digits of U arrive one at a time, where U[0] is the low-order digit
that arrives first.

From the system of linear equations in Equation (4.1) it follows that for i ∈ {`/2, . . . , ` − 1}, Ai = Uv [i].
                                                      ev is the uniform distribution on [q]` , the conditional
Since the induced distribution for Uv under any fixed U
entropy
                                 H(Av | Uev = uev ) = ` · log2 q ≥ δ · `v ,
                                                      2               4
where δ = dlog2 qe. This concludes the proof of Lemma 3.4.


5    Hard distributions for the multiplication problem
In this section we prove Lemmas 3.5 and 3.6, that is we show that there are instances of the online
multiplication problem such that the conditional entropy of the output symbols Av is large, where all
input symbols but Uv are fixed. For the purposes of proving a lower bound we assume that all digits of
the operand F are available at any time whereas the digits of the operand U arrive one at a time. Figure 3
illustrates U × F, where U[0] and F[0] are the least significant digits and the product A is capped at n
digits.
    The following property of multiplying binary numbers was established by Paterson, Fischer and
Meyer [31]. The lemma is stated in our notation, but the translation from the original notation of [31] is
straightforward.

Lemma 5.1 (Corollary of Lemma 5 in [31]). Suppose q = 2. Let v be any node of T and fix the digits of
ev arbitrarily. At least half of all F[0, `v − 1] ∈ [q]`v (first `v digits of F) have the property that any value
U
of Av can arise from at most four distinct Uv .

    Although Lemma 5.1 applies only to binary numbers, it naturally scales to any q that is a power of
two. To see this, observe that the property holds for any v, and a sequence of digits in base q is after all
just a bit sequence.

Corollary 5.2. Lemma 5.1 holds for any q that is a power of two.

                          T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                           16
                                T IME B OUNDS FOR S TREAMING P ROBLEMS

    We use the above corollary to prove Lemma 3.5. Let v be any node of T. At least half of all F ∈ [qn ]
have the property that Uv can be determined up to a set of four possible values given the output symbols
in Av . Since the induced distribution for Uv under any fixed U  ev is the uniform distribution on [q]` (the
digits of Uv ), the conditional entropy
                                                               !
                                                        q`v /2        δ · `v
                              H(Av | Uev = uev ) ≥ log2           ≥          −2,
                                                          4             2

where δ = log2 q. This concludes the proof of Lemma 3.5.
    In order to prove Lemma 3.11 we specify a fixed F which together with the uniform distribution for
U gives the desired entropy lower bound. Similarly to the array Kn from Section 4.2 we define Kq,n to
be the largest number in [qn ] such that the i-th bit in the binary expansion of Kq,n is 1 if and only if i is
a power of two (starting with i = 0 at the lower-order end). Thus, the binary expansion of Kq,n is the
reverse of Kn log2 q . For example, suppose that q = 16 (i. e., hex) and n = 8. Then K16,8 = 10116 in base
16, or 65,814 in decimal, since the binary expansion of K16,8 is

                                0000
                                |{z} 0000
                                     |{z} 0000
                                          |{z} 0001
                                               |{z} 0000
                                                    |{z} 0001
                                                         |{z} 0001 |{z} .
                                                              |{z} 0110
                                  0    0     0     1    0     1    1     6

     Paterson, Fischer and Meyer [31] also studied the multiplication of binary numbers where one operand
is fixed. The following property was given in [31], here translated into our notation.

Lemma 5.3 (Lemma 1 of [31]). Suppose q = 2 and F = Kq,n . Let v be any node of T and fix the digits of
ev arbitrarily. Any value of Av can arise from at most two distinct Uv .
U

   Similarly to Lemma 5.1 and from our definition of Kq,n , the above lemma scales to any q that is a
power of two.

Corollary 5.4. Lemma 5.3 holds for any q that is a power of two.

    We use the above corollary to prove Lemma 3.6 where F = Kq,n . Let v be any node of T. The value of
Uv can be determined up to a set of two possible values given the output symbols in Av . Now suppose that
F has this property. Since the induced distribution for Uv under any fixed Uev is the uniform distribution
      `
on [q] (the digits of Uv ), the conditional entropy
                                                               !
                                                        q`v /2     δ · `v
                              H(Av | Uev = uev ) ≥ log2          ≥        −1,
                                                          2          2

where δ = log2 q. This concludes the proof of Lemma 3.6.


6    Hard distribution for the Hamming distance problem
In this section we prove Lemma 3.7, that is we show that there are instances of the Hamming distance
problem such that the conditional entropy of the output symbols Av is large, where all input symbols but

                        T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                               17
                     R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

Uv are fixed. We will show this property for nodes in the upper part of the tree T, namely nodes v such
                                                             √
the number of leaves `v is greater than some constant times n.
    Unlike the hard distributions we gave for the convolution and multiplication problems, we will not
give an explicit description of the array F for which the Hamming distance lower bound holds. We
only show the existence of such an F. Further, for both the convolution and multiplication problems we
showed that the lower bound was obtained for a majority of all F, where U was chosen uniformly at
random from [q]n . For the Hamming distance problem we will instead show that there exists an F and
some particular subset of [q]n such that when U is drawn uniformly at random from this subset, we obtain
the desired lower bound.

6.1   Terminology, choice of q and rounding issues
We will refer to the input arrays, including F and U, as strings, and the set [q] as the alphabet. The values
of the alphabet are referred to as symbols.
     Unlike the convolution and multiplication problems, for the Hamming distance problem there is no
benefit in having an alphabet of size greater than n, the length of F. Our hard distribution is constructed
such that with an alphabet of size q, n has to be at least q3 . From now on we assume that n ≥ q3 . Observe
that whenever n is polynomial in q, the number of bits needed to represent a symbol is δ ∈ Θ(log n).
     We will often treat various roots of integers as integers. For example, we may say that some string of
length q3/2 is the concatenation of q smaller strings, each of length q1/2 . This is of course only possible
whenever these numbers are integers, which is not the case for arbitrary q. One could overcome this
problem by adjusting the values with appropriate floors and ceilings, as well as introducing padding
symbols where necessary, but this would without doubt clutter the presentation. We have decided to keep
it simple by treating any root of any integer as an integer, and assuming that everything adds up nicely.
This is only to keep the presentation clean and it should be obvious from the context that this has no
impact on the asymptotic behaviour.

6.2   The overall structure of the fixed string F
Recall the definition of the array Kn ∈ {0, 1}n from Section 4.2 which consists of 0s everywhere except at
power-of-two positions from the right-hand end. A hard distribution for the convolution problem was
given by setting F to Kn and choosing U uniformly at random from [q]n . Recall Figure 2 which illustrates
why we chose this hard distribution: for each output symbol in the second half of Av , that is between the
alignments marked 2 and 3 in the figure, exactly one input symbol of Uv is aligned with a 1 in F and all
other input symbols of Uv are aligned with 0. Thus, the second half of Uv can be uniquely determined
from the output symbols Av .
     To show a lower bound for the Hamming distance problem we devise a string F that resembles Kn .
First we introduce an auxiliary string R of length Θ(q3/2 ). We will use r = (q/2)3/2 as a shorthand for
the length of R. We will give the details of R later but will highlight an important property of it below. We
obtain F from Kn by first replacing each 0 by a symbol that we denote ?. The symbol ? will never occur
in the stream, hence will always generate a mismatch. We then replace every length-r substring starting
at a 1 with a copy of R. Any 1 that is closer than r positions from the right-hand end of F is replaced by a
?-symbol instead. Figure 4 illustrates F.

                       T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                               18
                                  T IME B OUNDS FOR S TREAMING P ROBLEMS

                                                               n
                                                                      2i
                                                                                                i
      F   ? ? ? ? ? ? ? ? ? ? ? ?R
                                 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?R
                                                                                                    ???????? ???


Figure 4: The string F has a copy of R starting at each position n − 1 − i where i ≥ |R| is a power of two.
All other positions have the symbol ? which only occurs in F and not in the stream.
                                            r                               r
                                            R

                                                          U0

                                                                   HamArray(R, U 0 )
                                                                           r+1


Figure 5: HamArray(R,U 0 ) contains the Hamming distances between R and every length-r substring of
U 0 as R slides along U 0 .


6.3   Properties of the string R and Hamming arrays
The string R will play the same role as the value 1 in Kn did for the convolution problem, namely it
will allow us to uniquely determine symbols from U. To see how, we first introduce the notion of a
Hamming array, illustrated in Figure 5. For a string U 0 of length 2r, we write HamArray(R,U 0 ) to denote
the length-(r + 1) array such that for i ∈ [r + 1], HamArray(R,U 0 )[i] is the Hamming distance between
R and U 0 [i, i + r − 1]. That is, HamArray(R,U 0 ) contains the Hamming distances between R and every
length-r substring of U 0 .
    To see the resemblance with a 1 in Kn , we give the following lemma. The proof is non-trivial and
deferred to Section 7.3. A high-level explanation of the lemma is given immediately after its statement.

Lemma 6.1. There exists a constant k > 0 such that for any r there is a length-r string R ∈ [q]r where
q = 2r2/3 such that
                            
                              HamArray(R,U 0 ) | U 0 ∈ [q]2r     ≥ qkr .

    The lemma says that there is a string R such that over all possible U 0 of length 2|R|, one can obtain
qΘ(r) distinct Hamming arrays. Since there are only q2r possible values of U 0 , this means that a non-
negligible fraction of all U 0 can be put in one-to-one correspondence with Hamming arrays. Thus, as
symbols in Uv slide past an R in a similar fashion to symbols in Uv sliding past a 1 in Kn in the hard
distribution for the convolution problem, we can infer a substantial portion of the symbols of Uv from the
output symbols Av , hence obtain large entropy. We formalise this in the next section and explain how the
lower bound is obtained.

6.4   The hard distribution and obtaining the lower bound
Relying on Lemma 6.1 above we will now describe a hard distribution for the Hamming distance problem
and use it to prove Lemma 3.7, which is the purpose of Section 6.

                        T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                       19
                       R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

                                                                       `v
                                                                                                       1
                                                                                                       2 `v
                                                                                                                              1
                                                                                                                              4 `v
       F   ?    · · · ? ?? R ? ? ? · · ·                                    · · · ? ?? R ? ? ? · · ·          · · · ? ?? R ? ? · · · · · · ?

       1                                                                                U10      U20          U30    ···                0
                                                                                                                                       Um

       2                                                   U10   U20    U30      ···                    ···          0
                                                                                                                    Um

       3                     U10      U20      U30   ···               ···       0
                                                                                Um


Figure 6: Three alignments of F and the stream S: 1 the last value of Uv has just arrived, 2 half of the
outputs in Av have been outputted, and 3 all outputs in Av have been outputted. The string Uv is here the
concatenation of U10 , . . . ,Um0 ∈ UR , where m = `v /(4r).


   Given a string R ∈ [q]r , we let UR ⊆ [q]2r be any largest set of length-(2r) strings such that for any
two distinct strings U10 ,U20 ∈ UR ,

                                           HamArray(R,U10 ) 6= HamArray(R,U20 ) .

To uniquely specify a string in UR we need log2 |UR | bits. By Lemma 6.1 we have that there exists an R
such that log2 |UR | ∈ Θ(r log q).
    For the hard distribution we use F from above with an R that has the properties of Lemma 6.1. The
input U is given by concatenating n/2r strings drawn independently and uniformly at random from UR .
    Similarly to Figure 2 we can now illustrate how strings from UR slide past R during the second
                                                                                     √
half of the output symbols in Av , where v is any node of T such that `v ≥ h · n. Here h is the positive
constant from Lemma 3.7. We defer picking the value of h until later. For any such node, we have that
`v > h · r because we assumed that n ≥ q3 and the definition of r implies that q3 > r2 . In Figure 6 we
have illustrated Uv as the concatenation of random strings U10 , . . . ,Um0 drawn from UR , where m = `v /(4r).
Between alignments 2 and 3 in the figure, the second half of the substrings Ui0 of Uv slide in turn past R,
and from the outputs in Av we can infer HamArray(R,Ui0 ) for each such Ui0 . By construction of UR this
allows us to uniquely determine the strings Ui0 . Thus, over all outputs Av , a total of approximately m/2
substrings Ui0 of Uv can be completely determined. We pick above h to be sufficiently large so that, even
compensating for border cases, the number of substrings Ui0 of Uv that can be determined is always at least
one. This implies that the entropy of Av must, by Lemma 6.1, be at least Θ((m/2) · r log q) = Θ(`v · δ ),
where δ = dlog2 qe. This concludes the proof of Lemma 3.7.


7    A string with many different Hamming arrays
In this section we prove Lemma 6.1, that is we show that there exists a string R which gives many different
Hamming arrays. This is arguably the most technically detailed part of our lower bound proofs. To recap,
we claim that for any r there exists a string R ∈ [q]r with q = 2r2/3 which permits at least qkr distinct
Hamming arrays when combined with every string in [q]2r , where k is a constant. Next we describe the
overall structure of an R with this property.

                           T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                                                20
                                         T IME B OUNDS FOR S TREAMING P ROBLEMS

                                                                                                                              µ2−1


            R    ?0??0??1?12??223?3????4????55?66?66?7?77?8?8?                                                 ···       ??    ?
                 |   {z   }|   {z   }|   {z   }|   {z   }|   {z   }|   {z   }|   {z   }|   {z   }|   {z   }              | {z }
                     ρ0        ρ1        ρ2        ρ3        ρ4        ρ5        ρ6        ρ7        ρ8                   ρ(µ2 −1)


Figure 7: An example of the string R of length r = µ 3 , which is the concatenation of the µ 2 strings
ρ0 , . . . , ρµ 2 −1 , where ρi ∈ {?, i}µ .
                     µ
      1     R    ?0??0??1?12??223?3????4????55?66?66?7?77?8?8?                                                 ···
                 |   {z   }|   {z   }|   {z   }|   {z   }|   {z   }|   {z   }|   {z   }|   {z   }|   {z   }
                     ρ0        ρ1        ρ2        ρ3        ρ4        ρ5        ρ6        ρ7        ρ8

      2               R    ?0??0??1?12??223?3????4????55?66?66?7?77?8?8?                                                    ···
                           |   {z   }|   {z   }|   {z   }|   {z   }|   {z   }|   {z   }|   {z   }|   {z   }|   {z    }
                               ρ0        ρ1        ρ2        ρ3        ρ4        ρ5        ρ6        ρ7        ρ8

            U0   057

            U 00      0          1     2          5     5     7     


          Figure 8: Setting symbols of U 0 renders a large set of possible Hamming distance outputs.

7.1       The structure of R
To shorten notation it will be convenient to introduce the variable µ as a shorthand for r1/3 . Hence R has
length r = µ 3 and q = 2µ 2 . The string R is constructed by concatenating µ 2 substrings, each of length µ.
For i ∈ [µ 2 ] we let ρi denote the i-th substring of R, that is

                                                        R = ρ0 ρ1 · · · ρ(µ 2 −1) .

Each substring ρi can only contain symbols from the set {?, i}, where ? is the special symbol that will not
occur in the stream. Therefore, the total number of distinct symbols in R is at most µ 2 + 1 ≤ q as claimed.
Figure 7 illustrates an example of R.
    The purpose of the substrings ρi is to support a reduction from vector addition to Hamming arrays
that we explain next.

7.2       Vector sums and Hamming arrays
Before we describe the full reduction in the following section we begin by giving some high level
intuition. We first introduce some notation. Let V = {v0 , v1 , v2 . . . vµ 2 −1 } be a multi-set of vectors where
vi ∈ {0, 1}µ . We can define a correspondence between the length-µ substring ρi of R and the i-th vector
vi as follows. The j-th symbol of ρi equals i if the j-th component of vi is 1 and ? otherwise. For example,
ρ2 = 2??22 from Figure 7 corresponds to the vector v2 = (1, 0, 0, 1, 1).
     We will now see that this correspondence between vectors in V and substrings of R can be used to
encode elementwise sums of vectors in V in the Hamming array for R. Consider the string U 0 ∈ [q]2r
in Figure 8 as an illustrative example. Here the string U 0 contains the other special symbol that we
introduce, denoted . This symbol does not occur in R, hence will always mismatch. In the figure we see
that all positions of U 0 have the symbol , except for three positions where the symbols are 0, 5 and 7,

                               T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                                  21
                     R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

respectively. The positions holding these symbols are chosen such that in the first alignment between R
and U 0 , marked 1 , the symbols 0, 5 and 7 sit immediately after ρ0 , ρ5 and ρ7 in R, respectively. As R
slides µ steps to the right towards the alignment marked 2 , the symbols 0, 5 and 7 of U 0 will generate
matches whenever they are aligned with their corresponding symbols in R. Thus, for i ∈ {1, . . . , µ},

                               HamArray(R,U 0 )[i] = r − (v0 + v5 + v7 )[i] ,

where (v0 + v5 + v7 )[i] is the i-th component of the sum of the vectors v0 , v5 and v7 . In other words, from
HamArray(R,U 0 )[1, µ] we can uniquely determine the sum v0 + v5 + v7 . As shorthand we will think of
HamArray(R,U 0 )[1, µ] as encoding the sum v0 + v5 + v7 .
    The idea above can be repeated by populating U 0 with more symbols from [µ 2 ]. As an example we
have added the symbols 1 and 2, and another copy of 5 to U 0 , which is the string denoted U 00 in the figure.
As R slides another µ steps to the right, HamArray(R,U 00 )[µ + 1, 2µ] encodes (uniquely specifies) the
sum v1 + v2 + v5 .
    Observe that as we populate U 0 with symbols, positions become blocked. For example, we could
not have added symbols to U 0 to define an alternate U 00 which instead encodes the sum v1 + v2 + v4 in
HamArray(R,U 00 )[µ + 1, 2µ] since the position where the 4 would have to be set is already occupied
by a 5. Observe however that setting symbols of U 00 as above generates matches only in the intended
length-µ window of the Hamming array. Thus, we have full control of which vector sums we want to
encode, under the constraint that positions become blocked, limiting the choice of vectors.
    The conclusion thus far is that vector sums have a direct correspondence with the Hamming array.
Next we take the ideas from above further and show that if there exists a pool of µ 2 vectors such that
many different vector sums can be obtained when adding µ vectors from the pool, then the number of
distinct HamArray(R,U 0 ) one can obtain is large. This would prove Lemma 6.1.


7.3   The string R and the proof of Lemma 6.1
Before we state the next lemma which will be used to prove Lemma 6.1, we introduce some basic
notation that we will use when reasoning about multisets of vectors. Let X be a multiset of vectors.
Consider an arbitrary ordering of the elements of X and refer to X[i] as the i-th element of X. We
use the term sub-multiset of X to denote any multiset obtained from X by removing zero or more
elements. We will use the notation v to denote the sub-multiset relation so that we have, for example,
{1, 1, 4, 5, 5} v {1, 1, 1, 4, 4, 5, 5, 7, 8}. We can now introduce the required lemma.

Lemma 7.1. For any µ > 40 such that µ − 1 is a prime, there exists a multiset V of vectors from {0, 1}µ
such that |V | = µ(µ − 1) and for any sub-multiset V 0 ⊆ V of size at least (63/64)|V |,

                            { w1 + · · · + wµ | {w1 , . . . , wµ } v V 0 } ≥ µ (µ/10) .

    The lemma is proved in Section 8 and we will now use it to construct an R that proves Lemma 6.1.
The introduction of a sub-multiset V 0 in the lemma above is to reflect the fact that positions of U 0 get
blocked as we populate it with symbols. We will see next that at any step, a fraction of at most 1/64 of
the µ 2 vectors are blocked.

                       T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                22
                                     T IME B OUNDS FOR S TREAMING P ROBLEMS

     Suppose that V = {v0 , . . . , vµ(µ−1) } is a multiset of length-µ vectors over {0, 1} with the properties
of Lemma 7.1. That is, we assume that µ > 40 and µ − 1 is a prime. Again as discussed in Section 6.1,
we can always tweak relevant values in order to meet this criteria.
     The string R is simply chosen such that for i ∈ [µ(µ − 1)], the substring ρi corresponds to the vector vi
of V as described at the start of Section 7.2. For i ∈ {µ(µ − 1), . . . , (µ 2 − 1)}, the substring ρi = {?}µ as
we will ignore these substrings anyway. In order to show that this R proves Lemma 6.1 we will populate
a length-(2r) vector U 0 with symbols and show how length-µ subarrays of HamArray(R,U 0 ) correspond
to vector sums of µ vectors chosen arbitrarily from a sub-multiset of V . The process for constructing a
string U 0 is as follows:

   1. Set all 2µ 3 positions of U 0 to the symbol .

   2. Align R with the left half of U 0 as illustrated in Figure 5.

   3. Let V 0 ⊆ V be the set of vectors that are not blocked. Initially this means that V 0 = V but as we
      return to this step, V 0 shrinks.

   4. Choose any sub-multiset {w1 , . . . , wµ } v V 0 and set their corresponding positions in U 0 accordingly.
      For any vi ∈ {w1 , . . . , wµ } the corresponding position in U 0 is the one which is currently aligned
      with the position immediately after substring pi in R. This position is set to the symbol i.

   5. Slide R by µ steps along U 0 . Over these alignments, HamArray(R,U 0 ) uniquely specifies the vector
      sum w1 + · · · + wµ .

      Steps 3–5 are referred to as a round.

   6. Repeat from Step 3 for a total of (µ − 1)/64 rounds. Observe that a total of µ(µ − 1)/64 =
      (1/64)|V | vectors become blocked, hence |V 0 | is always at least (63/64)|V |.

   7. Slide R by one single step along U 0 . This will offset all previously blocked vectors and allow us to
      start over again at Step 3 as if no vectors are blocked. This is repeated until this step is reached for
      the µ-th time. At that point the offsetting of blocked vectors has cycled and previously set positions
      of U 0 are yet again blocked.

    Populating U 0 according to the procedure above means that R is shifted by a total of

                         µ · (µ − 1)/64 · µ + (µ − 1) = µ 3 /64 − µ 2 /64 + µ − 1 < r

steps. Over these steps we have by Lemma 7.1 that for each length-µ subarray of HamArray(R,U 0 ) that
corresponds to a vector sum, there is a choice of at least µ (µ/10) distinct values. Thus, when µ > 40, the
number of distinct HamArray(R,U 0 ) is at least
                             µ(µ−1)/64         3   2           3
                                                                                   (r/656)
                   µ (µ/10)                = µ (µ −µ )/640 ≥ µ (µ /656) =    r(1/3)          = rkr ,

where k = 1/1968. This concludes the proof of Lemma 6.1.

                          T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                               23
                     R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

8     Vector sets with many distinct sums
In this section, we prove Lemma 7.1. We first rephrase the lemma slightly by introducing some notation.
For any multiset V 0 of vectors from {0, 1}µ , we define

                           Sum(V 0 ) = { w1 + · · · + wµ | {w1 , . . . , wµ } v V 0 }

to be the size of the set of distinct vector sums one can obtain by summing the vectors of size-µ sub-
multisets of V 0 . Addition is elementwise and over the integers. Lemma 7.1 says that there exists a multiset
V of vectors from {0, 1}µ such that |V | = µ(µ − 1) and for any sub-multiset V 0 v V of size at least
(63/64)|V |, we have that Sum(V 0 ) ≥ µ (µ/10) .
    Our approach will be an application of the probabilistic method. Specifically, we will show that when
the vectors of V are chosen independently and uniformly at random from {0, 1}µ , the expected value,

                                                         1
                                       E [Sum(V )] ≥       (µ − 1)(µ/9) .
                                                         2

Thus, there must exist a V such that Sum(V ) ≥ (µ − 1)(µ/9) /2. Given such a V , we then show that for
every sub-multiset V 0 v V such that |V 0 | ≥ (63/64)|V |, Sum(V 0 ) ≥ µ (µ/10) .


8.1   Vectors and codes
We now describe a connection between vectors and codes which we will use in our analysis to lower
bound the number of distinct vector sums, Sum(V ) that can be obtained from a vector set V . We will
require the following lemma from the field of Coding Theory. The lemma is tailored for our needs and is
a special case of “Construction II” in [1]. For our purposes, a binary constant-weight constant-weight
binary cyclic code can be seen simply as set of bit strings (codewords) with two additional properties: the
first is that all codewords have constant Hamming weight µ, i. e., they have exactly µ 1s, and the second
property is that any cyclic shift of a codeword is also a codeword.

Lemma 8.1 ([1]). For any µ ≥ 4 such that µ − 1 is a prime and any odd γ ∈ [µ], there is a binary
constant-weight constant-weight binary cyclic code with (µ − 1)γ codewords of length µ(µ − 1) and
Hamming weight µ such that any two codewords have Hamming distance at least 2(µ − γ).

     Let Ce be the binary code that contains all codewords of length µ(µ − 1) with Hamming weight µ.
We can think of a codeword of Ce representing a size-µ sub-multiset X v V such that the i-th vector of V
(under any enumeration of the elements of V ) is in X if and only if position i of the codeword is 1. That
is, Ce represents all possible sub-multisets of V of size µ. To shorten notation, we refer to ce ∈ Ce as both a
codeword and a sub-multiset of µ vectors from V .
     Suppose that µ ≥ 4 and µ − 1 is a prime. We let C ⊆ Ce be a cyclic code of size (µ − 1)γ , where γ is
any odd integer in the interval [µ/9, µ/8], such that the Hamming distance between any two codewords
in C is at least 7µ/4. The existence of such a C is guaranteed by Lemma 8.1 since 2(µ − µ/8) = 7µ/4.
Observe that every codeword of C has Hamming weight µ.

                        T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                24
                                 T IME B OUNDS FOR S TREAMING P ROBLEMS

   For c ∈ C we define the ball, Ball(c) to be the set of bit strings in Ce at Hamming distance at most
µ/16 from c. Formally,

            Ball(c) = { ce | ce ∈ Ce and Hamming distance between c and ce is at most µ/16 } .

    Observe that the |C| balls are all disjoint since the Hamming distance between any two codewords in
            than 7µ/4. In particular, for any c ∈ C, Ball(c) ∩C = {c}. We have that for any c ∈ C, using
C is at least
the fact ab ≤ (ae/b)b ,
                                                            
                                      µ       |V |     µe · |V |e µ/16  µ µ/16
                    Ball(c) ≤             ·         ≤                 ≤          .
                                     µ/16    µ/16      (µ/16)2          16

    For ce ∈ Ce we write sum(e c) to denote the vector in [µ + 1]µ obtained by adding the µ vectors in the
vector set ce, that is sum(e
                           c) vector sum of the vectors represented by ce.
    Towards proving Lemma 7.1 we will show that when the vectors of V are chosen uniformly at random,
we expect more than half of all |C| balls to have the property that for every ce in the ball, sum(e
                                                                                                  c) can only
be obtained by summing vectors from that ball.

8.2   Choosing the vectors in V
So far we have not discussed the choice of vectors in V . Initially we consider choosing the vectors
independently and uniformly at random from {0, 1}µ . We will first show that

                                                          1
                                        E [Sum(V )] ≥       (µ − 1)(µ/9) ,
                                                          2
then we will fix V and show that this fixed V has the property of Lemma 7.1.
    Consider any two distinct c1 , c2 ∈ C and the corresponding balls, Ball(c1 ) and Ball(c2 ) which
are disjoint subsets of C.e For any ce1 ∈ Ball(c1 ) and ce2 ∈ Ball(c2 ), we now analyse the probability
that sum(ce1 ) = sum(ce2 ). From the definitions above it follows that ce1 and ce2 must differ on at least
7µ/4 − 2(µ/16) ≥ µ positions, implying that the two vector sets ce1 and ce2 have at most µ/2 vectors in
common, thus at least µ/2 of the vectors in ce1 are not in ce2 . Let w1 , . . . , w(µ/2) denote an arbitrary choice
of µ/2 of those vectors. For i ∈ [µ] we can write the i-th component of sum(ce1 ) as

                                 sum(ce1 )[i] = w1 [i] + · · · + w(µ/2) [i] + x[i] ,

where the vector x does not depend on w1 , . . . , w(µ/2) . In order to have sum(ce1 ) = sum(ce2 ) we must have

                                  w1 [i] + · · · + w(µ/2) [i] = sum(ce2 )[i] − x[i]

for each i ∈ [µ]. Since the vectors are picked independently and uniformly at random from {0, 1}µ , the
most likely value of w1 [i] + · · · + w(µ/2) [i] is µ/4. The probability that this sum equals µ/4 is

                                                               µ/2
                                                    π     µ/2    1        µ −1/2
                   Pr w1 [i] + · · · + w(µ/2) [i] =    =        ·       ≤           ,
                                                    4      µ/4    2         2

                         T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                   25
                      R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

                                                                       a
                                                                                 √
where the inequality follows from the fact that for any a, a/2             ≤ 2a / a. Thus, the probability that
sum(ce1 ) = sum(ce2 ), that is sum(ce1 )[i] = sum(ce2 )[i] for all i ∈ [µ], is
                                                                            µ −µ/2
                                                             µ −1/2 µ
                          Pr sum(ce1 ) = sum(ce2 ) ≤                        ≤            .                (8.1)
                                                              2                  2
For two distinct c1 , c2 ∈ C, we define the indicator random variable
                         (
                           0 if sum(ce1 ) = sum(ce2 ) for some ce1 ∈ Ball(c1 ) and ce2 ∈ Ball(c2 ),
           I(c1 , c2 ) =
                           1 otherwise.
Taking the union bound over all ce1 ∈ Ball(c1 ) and ce2 ∈ Ball(c2 ), and using the probability bound in
Equation (8.1), we have
                                                                      µ −µ/2
                        Pr I(c1 , c2 ) = 0 ≤ Ball(c1 ) · Ball(c2 ) ·                              (8.2)
                                                                        2
                                               µ 2(µ/16)  µ −µ/2
                                            ≤
                                               16            2
                                               µ/8
                                                1
                                            ≤           .
                                                µ3
For any c1 ∈ C, we now define the indicator random variable
                                  (
                         0         0 if I(c1 , c2 ) = 0 for some c2 ∈ C \ {c1 },
                        I (c1 ) =
                                   1 otherwise.
That is, I 0 (c1 ) = 1 if and only if sum(ce1 ) 6= sum(ce2 ) for every ce1 ∈ Ball(c1 ) and every ce2 in a different
ball. In other words, each sums of codewords, sum(ce1 ) corresponding to a ce1 ∈ Ball(c1 ) is unique to this
ball. It is possible, however that sum(ce1 ) = sum(ce2 ) if ce2 is in the same ball as ce1 . We say that Ball(c1 )
is good if and only if I 0 (c1 ) = 1.
    Taking the union bound over all c2 ∈ C, and using Equation (8.2) and the fact that |C| ≤ µ (µ/8) , we
have
                                                                      
                           Pr I 0 (c1 ) = 0 ≤ ∑ Pr I(c1 , c2 ) = 0
                                           c2 ∈C\{c1 }
                                                        µ/8                        µ/8
                                                1                    (µ/8)       1               1
                                          ≤ |C|                 ≤µ                           ≤     .
                                                µ3                               µ3              2
By linearity of expectation we have that the expected number of good balls is
                                           "         #
                                                         |C|
                                          E ∑ I 0 (c) ≥      .
                                             c∈C          2

The conclusion is that there is a multiset V of vectors for which at least |C|/2 balls are good. As
|C| ≥ (µ − 1)(µ/9) it then follows that as required,
                                                          1
                                          Sum(V ) ≥         (µ − 1)(µ/9) .
                                                          2

                         T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                   26
                                 T IME B OUNDS FOR S TREAMING P ROBLEMS

8.3   Many distinct sums for subsets of V
Let V be a multiset of vectors such that the number of good balls is at least |C|/2 in accordance with the
conclusion of the previous section. As observed above, Sum(V ) ≥ (µ − 1)(µ/9) /2. It remains to show
that for any sub-multiset V 0 v V of size (63/64)|V |, sum(V 0 ) is also large.
    Over all codewords in C, seen as bit strings, the total number of 1s is |C|µ. Since C is cyclic, the
number of codewords in C that have a 1 in position i ∈ [|V |] is the same as the number of codewords that
have a 1 in position j, for any j ∈ [|V |]. Thus, for each one of the |V | positions there are exactly |C|µ/|V |
codewords in C with a 1 in that position.
    Let V 0 v V be of size (63/64)|V |. Let J be the set of |V |/64 positions that correspond to the vectors
of V that are not in V 0 . We will now modify the codewords of C as follows. For each j ∈ J and codeword
c ∈ C we set c[ j] to 0. The total number of 1s across all codewords in C is therefore reduced from |C|µ
by exactly
                                              |V | |C|µ     |C|µ
                                                   ·     =        .
                                               64 |V |       64
The number of codewords of C that have lost µ/16 or more 1s is therefore at most

                                              |C|µ . µ    |C|
                                                        =     .
                                               64    16    4

Let C0 ⊆ C be the set of codewords c that have lost fewer than µ/16 1s and for which Ball(c) is good.
Since there are at least |C|/2 good balls, |C0 | ≥ |C|/4.
    Let the code C00 be obtained from C0 by replacing, for each codeword in C0 , every removed 1 with
a 1 at some other arbitrary position that is not in J. Thus, every c00 ∈ C00 has Hamming weight µ and
belongs to the good ball Ball(c0 ), where c00 was obtained from c0 ∈ C0 . Hence |C00 | = |C0 | ≥ |C|/4. Every
codeword of c00 ∈ C00 corresponds to a sub-multiset of V of size µ. Crucially, this sub-multiset only
contains vectors from the sub-multiset V 0 . Further, from the definition of a good ball we have that each
codeword in C00 corresponds to a sub-multiset of V 0 with a distinct vector sum. As |C00 | ≥ |C|/4 we have
that at least |C|/4 distinct vector sums can be obtained by adding µ vectors from V 0 . Thus,

                                                 1
                                   Sum(V 0 ) ≥     (µ − 1)(µ/9) ≥ µ (µ/10)
                                                 4
when µ > 40. This completes the proof of Lemma 7.1.


References
 [1] N GUYEN Q UANG A, L ÁSZLÓ G Y ŐRFI , AND JAMES L. M ASSEY: Constructions of binary constant-
     weight cyclic codes and cyclically permutable codes. IEEE Trans. Inform. Theory, 38(3):940–949,
     1992. [doi:10.1109/18.135636] 24

 [2] K ARL R. A BRAHAMSON: Generalized string matching. SIAM J. Comput., 16(6):1039–1051, 1987.
     [doi:10.1137/0216067] 4

                        T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                                 27
                    R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

 [3] A MIHOOD A MIR , M OSHE L EWENSTEIN , AND E LY P ORAT: Faster algorithms for string match-
     ing with k mismatches. J. Algorithms, 50(2):257–275, 2004. Preliminary version in SODA’00.
     [doi:10.1016/S0196-6774(03)00097-X] 4

 [4] J OSHUA B RODY, A MIT C HAKRABARTI , O DED R EGEV, T HOMAS V IDICK , AND RONALD
     DE W OLF : Better gap-Hamming lower bounds via better round elimination. In Proc. 14th
     Internat. Workshop on Randomization and Computation (RANDOM’10), pp. 476–489, 2010.
     [doi:10.1007/978-3-642-15369-3_36, arXiv:0912.5276] 2

 [5] A MIT C HAKRABARTI AND O DED R EGEV: An optimal lower bound on the communication
     complexity of gap-Hamming-distance. SIAM J. Comput., 41(5):1299–1317, 2012. Preliminary
     version in STOC’11. [doi:10.1137/120861072] 2

 [6] R APHAËL C LIFFORD , K LIM E FREMENKO , B ENNY P ORAT, AND E LY P ORAT: A black box for
     online approximate pattern matching. Inform. and Comput., 209(4):731–736, 2011. Preliminary
     version in CPM’08. [doi:10.1016/j.ic.2010.12.007] 3, 4, 5

 [7] R APHAËL C LIFFORD AND M ARKUS JALSENIUS: Lower bounds for online integer multiplication
     and convolution in the cell-probe model. In Proc. 38th Internat. Colloq. on Automata, Languages
     and Programming (ICALP’11), pp. 593–604. Springer, 2011. [doi:10.1007/978-3-642-22006-7_50,
     arXiv:1101.0768] 1

 [8] R APHAËL C LIFFORD , M ARKUS JALSENIUS , AND B ENJAMIN S ACH: Tight cell-probe bounds for
     online Hamming distance computation. In Proc. 24th Ann. ACM-SIAM Symp. on Discrete Algorithms
     (SODA’13), pp. 664–674. ACM Press, 2013. [doi:10.1137/1.9781611973105.48, arXiv:1207.1885]
     1

 [9] R APHAËL C LIFFORD AND B ENJAMIN S ACH: Pattern matching in pseudo real-time. J. Discrete
     Algorithms, 9(1):67–81, 2011. Preliminary version in CPM’10. [doi:10.1016/j.jda.2010.09.005] 3

[10] G RAHAM C ORMODE , M AYUR DATAR , P IOTR I NDYK , AND S. M UTHUKRISHNAN: Comparing
     data streams using Hamming norms (how to zero in). IEEE Trans. on Knowl. and Data Eng.,
     15(3):529–540, 2003. Preliminary version in VLDB’02. [doi:10.1109/TKDE.2003.1198388] 2

[11] DAVID E. DAYKIN: Distribution of bordered persymmetric matrices in a finite field. J. reine angew.
     Math., 203:47–54, 1960. [doi:10.1515/crll.1960.203.47] 14

[12] M ICHAEL J. F ISCHER AND L ARRY J. S TOCKMEYER: Fast on-line integer multiplication. J.
     Comput. System Sci., pp. 317–331, 1974. Preliminary version in STOC’73. [doi:10.1016/S0022-
     0000(74)80047-4] 3, 4

[13] M ICHAEL L. F REDMAN: Observations on the complexity of generating quasi-Gray codes. SIAM J.
     Comput., 7(2):134–146, 1978. [doi:10.1137/0207012] 5

[14] M ICHAEL L. F REDMAN AND M IKE E. S AKS: The cell probe complexity of dynamic data structures.
     In Proc. 21st STOC, pp. 345–354, 1989. [doi:10.1145/73007.73040] 5

                      T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                          28
                             T IME B OUNDS FOR S TREAMING P ROBLEMS

[15] Z VI G ALIL:     String matching in real time.             J. ACM, 28(1):134–149,        1981.
     [doi:10.1145/322234.322244] 3

[16] T ORBEN H AGERUP: Sorting and searching on the word RAM. In Proc. 15th Symp. Theoretical
     Aspects of Comp. Sci. (STACS’98), pp. 366–398, 1998. [doi:10.1007/BFb0028575] 5

[17] W EI H UANG , YAOYUN S HI , S HENGYU Z HANG , AND Y UFAN Z HU: The communication
     complexity of the Hamming distance problem. Inform. Process. Lett., 99(4):149–153, 2006.
     [doi:10.1016/j.ipl.2006.01.014] 2

[18] P IOTR I NDYK: Faster algorithms for string matching problems: Matching the convolution bound.
     In Proc. 39th FOCS, pp. 166–173, 1998. [doi:10.1109/SFCS.1998.743440] 4

[19] T. S. JAYRAM , R AVI K UMAR , AND D. S IVAKUMAR: The one-way communication complexity of
     Hamming distance. Theory of Computing, 4(6):129–135, 2008. [doi:10.4086/toc.2008.v004a006] 2

[20] E RICH K ALTOFEN AND AUSTIN A. L OBO: On rank properties of Toeplitz matrices over finite
     fields. In Proc. International Symp. on Symbolic and Algebraic Computation (ISSAC’96), pp.
     241–249, 1996. [doi:10.1145/236869.237081] 14

[21] H OWARD J. K ARLOFF: Fast algorithms for approximately counting mismatches. Inform. Process.
     Lett., 48(2):53–60, 1993. [doi:10.1016/0020-0190(93)90177-B] 4

[22] S. R AO KOSARAJU: Efficient string matching. Manuscript, 1987. 4

[23] G AD M. L ANDAU AND U ZI V ISHKIN: Efficient string matching with k mismatches. Theoret.
     Comput. Sci., 43:239–249, 1986. [doi:10.1016/0304-3975(86)90178-7] 4

[24] K ASPER G REEN L ARSEN: The cell probe complexity of dynamic range counting. In Proc. 44th
     STOC, pp. 85–94, 2012. [doi:10.1145/2213977.2213987, arXiv:1105.5933] 5

[25] K ASPER G REEN L ARSEN , O MRI W EINSTEIN , AND H UACHENG Y U: Crossing the logarith-
     mic barrier for dynamic Boolean data structure lower bounds. In Proc. 50th STOC, 2017.
     [doi:10.1145/3188745.3188790, arXiv:1703.03575] 6

[26] O HAD L IPSKY AND E LY P ORAT: L1 pattern matching lower bound. Inform. Process. Lett.,
     105(4):141–143, 2008. Preliminary version in SPIRE’05. [doi:10.1016/j.ipl.2007.08.011] 3

[27] M ARVIN M INSKY AND S EYMOUR PAPERT: Perceptrons: An Introduction to Computational
     Geometry. MIT Press, 1969. 5

[28] JACQUES M ORGENSTERN: Note on a lower bound on the linear complexity of the fast Fourier
     transform. J. ACM, 20(2):305–306, 1973. [doi:10.1145/321752.321761] 4

[29] V ICTOR YA . PAN: The trade-off between the additive complexity and the asynchronicity of
     linear and bilinear algorithms. Inform. Process. Lett., 22(1–2):11–14, 1986. [doi:10.1016/0020-
     0190(86)90035-9] 4

                     T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                        29
                   R APHA ËL C LIFFORD , M ARKUS JALSENIUS AND B ENJAMIN S ACH

[30] C HRISTOS H. PAPADIMITRIOU: Optimality of the Fast Fourier Transform. J. ACM, 26(1):95–102,
     1979. [doi:10.1145/322108.322118] 4

[31] M IKE S. PATERSON , M ICHAEL J. F ISCHER , AND A LBERT R. M EYER: An improved overlap
     argument for on-line multiplication. In SIAM-AMS Proceedings, volume 7, pp. 97–111. Amer. Math.
     Soc., 1974. 4, 16, 17

[32] M IHAI P ǍTRA ŞCU: Lower bound techniques for data structures. Ph. D. thesis, MIT, 2008. ACM
     DL. 5

[33] M IHAI P ǍTRA ŞCU AND E RIK D. D EMAINE: Tight bounds for the partial-sums problem. In Proc.
     15th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’04), pp. 20–29. ACM Press, 2004.
     ACM DL. 6, 30

[34] M IHAI PĂTRA ŞCU AND E RIK D. D EMAINE: Logarithmic lower bounds in the cell-probe
     model. SIAM J. Comput., 35(4):932–963, 2006. Preliminary versions in [33] and STOC’04.
     [doi:10.1137/S0097539705447256, arXiv:cs/0502041] 5, 6, 10

[35] DAVID W OODRUFF: Optimal space lower bounds for all frequency moments. In Proc. 15th Ann.
     ACM-SIAM Symp. on Discrete Algorithms (SODA’04), pp. 167–175. ACM Press, 2004. ACM DL. 2

[36] A NDREW C HI -C HIH YAO: Probabilistic computations: Toward a unified measure of com-
     plexity.   In Proc. 18th FOCS, pp. 222–227, 1977.     Preliminary version in FOCS’78.
     [doi:10.1109/SFCS.1977.24] 3, 8

[37] A NDREW C HI -C HIH YAO: Should tables be sorted?             J. ACM, 28(3):615–628, 1981.
     [doi:10.1145/322261.322274] 5


AUTHORS

      Raphaël Clifford
      Reader in Algorithm Design
      University of Bristol, UK
      raphael.clifford bristol ac uk


      Markus Jalsenius
      Telia Company AB
      Solna, Sweden
      markus.jalsenius gmail com




                      T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                       30
                            T IME B OUNDS FOR S TREAMING P ROBLEMS

    Benjamin Sach
    Alan Turing Institute
    London, UK
    bsach turing ac uk


ABOUT THE AUTHORS

    R APHAËL C LIFFORD graduated from Oxford University in 1995 and completed his Ph. D. un-
       der the supervision of Marek Sergot at Imperial College London. His early work focused
       on combinatorial pattern matching problem. He now works on streaming algorithms and
       on conditional and unconditional lower bounds for data structure problems.


    M ARKUS JALSENIUS completed his Ph. D. in Computer Science at the University of Liv-
       erpool in 2008 under the supervision of Leslie Ann Goldberg and has been a Research
       Associate at both the University of Liverpool and the University of Bristol. Markus is
       now working as a Data Scientist at Telia Company. The present work was completed
       while Markus was working at the University of Bristol.


    B ENJAMIN S ACH completed his Ph. D. in Computer Science at the University of Bristol
       in 2011 under the supervision of Raphaël Clifford. He was an EPSRC fellow at the
       University of Warwick and subsequently a Lecturer at the University of Bristol. Ben
       now works as a Data Scientist with the Alan Turing Institute. The present work was
       completed while Ben was working at the University of Bristol.




                    T HEORY OF C OMPUTING, Volume 15 (2), 2019, pp. 1–31                        31