The Gram--Schmidt Walk: A Cure for the Banaszczyk Blues

Authors Nikhil Bansal, Daniel Dadush, Shashwat Garg, Shachar Lovett,
Plaintext
                           T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27
                                        www.theoryofcomputing.org




                    The Gram–Schmidt Walk:
                 A Cure for the Banaszczyk Blues
    Nikhil Bansal∗                 Daniel Dadush†                  Shashwat Garg‡         Shachar Lovett§
               Received July 24, 2018; Revised December 12, 2019; Published December 23, 2019




       Abstract: A classic result of Banaszczyk (Random Str. & Algor. 1997) states that given any
       n vectors in Rm with `2 -norm at most 1 and any convex body K in Rm of Gaussian measure
       at least half, there exists a ±1 combination of these vectors that lies in 5K. Banaszczyk’s
       proof of this result was non-constructive and it was open how to find such a ±1 combination
       in polynomial time. In this paper, we give an efficient randomized algorithm to find a ±1
       combination of the vectors which lies in cK for some fixed constant c > 0. This leads to new
       efficient algorithms for several problems in discrepancy theory.

ACM Classification: F.2.2, G.3
AMS Classification: 68W20
Key words and phrases: discrepancy, random walks


1     Introduction
Let (X, S) be a finite set system, with X = {1, 2, . . . , n} and S = {S1 , S2 , . . . , Sm } a collection of subsets
of X. Given a two-coloring x : X → {−1, 1}, the discrepancy of a set S is defined as |x(S)| where

     A conference version of this paper appeared in the Proceedings of the 50th ACM Symposium on Theory of Computing,
2018.
   ∗ Supported by an ERC consolidator grant 617951 and NWO VICI grant 639.023.812.
   † Supported by NWO Veni grant 639.071.510.
   ‡ Supported by NWO grant 022.005.025.
   § Supported by an NSF CCF award 1614023.



© 2019 Nikhil Bansal, Daniel Dadush, Shashwat Garg and Shachar Lovett
c b Licensed under a Creative Commons Attribution License (CC-BY)                        DOI: 10.4086/toc.2019.v015a021
              N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT

x(S) = ∑i∈S x(i), and measures the imbalance between the number of elements in S colored −1 and 1.
The discrepancy of the set system (X, S) is defined as

                                      disc(S) =      min    max |x(S)|
                                                  x:X→{−1,1} S∈S

and is the minimum imbalance achieved by all the sets in S over all possible two-colorings.
   Discrepancy is defined more generally for any m × n matrix A as

                                        disc(A) =     min       kAxk∞ .
                                                    x∈{−1,1}n

That is, minimum achievable `∞ -norm of the vector Ax over all two-colorings x of the columns of A. This
can be seen as a vector balancing problem: given vectors v1 , . . . , vn ∈ Rm (specified by the columns of A),
find a two-coloring x : [n] → {−1, 1} to minimize k ∑ni=1 x(i)vi k∞ . The set system view mentioned earlier
corresponds to the special case where A is the incidence matrix of the set system with columns indexed
by elements in X and rows by sets in S.
    Discrepancy is a widely studied topic and has applications to many areas in mathematics and computer
science. In particular in computer science, it arises naturally in computational geometry, data structure
lower bounds, rounding in approximation algorithms, combinatorial optimization, communication com-
plexity, pseudorandomness and differential privacy. For more on these connections we refer the reader to
[14, 27, 30].
    One of the earliest techniques employed in discrepancy theory was linear algebraic in nature and
similar to the well-known iterated-rounding technique [9, 12, 22]. Though this technique gave surprisingly
goodbounds for some problems in discrepancy theory, there remained a big gap between the bounds
obtained and the lower bounds known for these problems.
    A huge breakthrough was made in the early 80’s with Beck’s partial-coloring method [11], that was
further refined by Spencer to the entropy method [35]. A similar approach based on ideas from convex
geometry was developed independently by Gluskin [19]. Roughly speaking, this method guarantees the
existence of a coloring of a constant fraction of the elements where every set in the set system incurs
a low discrepancy. This is then repeated O(log n) times in order to get a coloring of all the n elements.
The partial-coloring method led to improved bounds for many problems in discrepancy theory and in
particular, it led to the famous “six standard deviations” theorem of Spencer [35]: given a set system with
                                                                        √
n sets and n points, there exists a coloring of discrepancy at most 6 n. This bound is tight up to constant
factors.
    While the original proofs of the partial-coloring method were based on the pigeonhole principle
and were non-algorithmic, several new algorithmic versions of the partial-coloring method have been
developed over the past few years [3, 25, 34, 20, 16]. In particular, all known applications of partial-
coloring [35, 27] can now be made algorithmic. These ideas have also led to new results in approximation
algorithms [33, 4, 8, 32].
    Despite its huge success, the partial-coloring method gives suboptimal bounds for many problems.
The reason is that the partial-coloring step only colors a constant fraction of the points and hence must
be repeated O(log n) times before all the n points are colored. The discrepancies incurred at each of
these steps are independent of each other and in fact can add up adversarially. Consider for instance the
long-standing Beck–Fiala problem [12] about discrepancy of low-degree set systems, where every point

                       T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                2
                  T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

                                                                                     √
lies in at most t sets. Here, one round of partial-coloring ensures
                                                                √ a discrepancy of O( t) to every set, but
over all the O(log n) rounds the discrepancy ends up being O( t log n).√On the other hand, the Beck–Fiala
conjecture is that the discrepancy of such set systems should be O( t). Such logarithmic-factor gaps
also exist in several other problems in discrepancy theory for similar reasons.

Banaszczyk’s method. One of the key results in discrepancy theory is the following result by Ba-
naszczyk [1].
Theorem 1.1 (Banaszczyk). Given any convex body K ⊆ Rm of Gaussian measure γm (K) ≥ 1/2, and
vectors v1 , . . . , vn ∈ Rm with `2 -norm at most 1, there exists a coloring x : [n] → {−1, 1} such that
∑ni=1 x(i)vi ∈ cK, where c is an absolute constant. In particular, c = 5 suffices.
    Here the Gaussian measure γm (S) of any measurable set S ⊆ Rm is defined as
                                                            1
                                                     Z
                                                                       2
                              γm (S) = Pr[g ∈ S] =            m/2
                                                                  e−kyk /2 dy
                                                     y∈S (2π)

where g is a standard Gaussian random vector in Rm .
     In contrast to the partial-coloring method, Theorem 1.1 gives a full coloring directly, resulting in
improved bounds for several problems. For instance, Banaszczyk’s result implies a discrepancy bound
       √                                                √
of O( t log n) for the Beck–Fiala problem and an O( log m) bound for the Komlós problem (defined
                                                           √
in Section 1.2). The latter follows by noting that an O( log m) scaling of the hypercube [−1, 1]m has
                                                                    √
Gaussian volume at least half. This was further improved to O( log n) in [10] (standard reductions give
                                                                                       √
n ≤ m [12]). Matoušek et al. [28] used it to bound the hereditary discrepancy with O( log m) factor of a
certain matrix factorization norm, and Larsen [21] used it to give update-query tradeoffs for dynamic data
structures. Theorem 1.1 was also used in a very interesting way in a later work by Banaszczyk [2] to
show improved bounds for several variants of the Steinitz problem. Recently, Nikolov [31] used this to
obtain improved bounds for Tusnády’s problem.
     Banaszczyk’s proof is highly elegant and based on deep ideas from convex geometry. However,
his approach is non-algorithmic and finding an algorithmic version of Theorem 1.1 has been a major
challenge in discrepancy theory [34, 30, 15]. Partial progress was made on this recently [5, 7, 23] and
algorithms achieving the same bounds as Banaszczyk for the Komlós problem and for the Steinitz problem
in the `∞ -norm were obtained. Roughly, these results correspond to the case when the convex body K in
Theorem 1.1 is a scaling of the hypercube, and the question about general convex bodies remained open.
     In a recent paper, Dadush et al. [15] reformulated Banaszczyk’s result in terms of certain sub-
Gaussian distributions and reduced the question of finding an algorithmic version of Banaszczyk’s result
to constructing such sub-Gaussian distributions. To state this result, we first need some definitions. A
random vector Y taking values in Rm is said to be sub-Gaussian with parameter σ (or σ -sub-Gaussian) if
for all θ ∈ Rm ,                           h        i     2       2
                                          E ehθ ,Y i ≤ e(σ /2)kθ k2 .
Roughly, the entries of Y are approximately independent while having Gaussian-like marginals. Observe
that a standard Gaussian random vector is 1-sub-Gaussian.
    We can now state the result of [15]. Let (P1) denote the following statement:


                       T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                            3
                N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT

    “Let v1 , . . . , vn ∈ Rm be vectors with `2 -norm at most 1 and let x0 ∈ [−1, 1]n . Then there
    exists a distribution D on {−1, 1}n , such that for x sampled from D, the random variable
                                                                                                                         (P1)
    ∑ni=1 (x(i) − x0 (i))vi is σ -sub-Gaussian, for some absolute constant σ > 0. Moreover, for
    every i for which x0 (i) ∈ {−1, 1} we have x(i) = x0 (i) with probability one.”


Theorem 1.2 (Dadush et al.). Theorem 1.1 (up to the exact value of c) is equivalent to the statement (P1).
   More precisely, for v1 , . . . , vn ∈ Rm of `2 -norm at most 1 and a convex body K ⊆ Rm of Gaussian
measure at least 1/2, there exists x0 ∈ [−1, 1]n such that for x(1), x(2), . . . , x(n) with properties as
promised by (P1).

   (i) ∑ x0 (i)vi ∈ K,

  (ii) Prx∼D [∑ni=1 x(i)vi ∈ cK] ≥ 1/2,

where D is as in (P1) and c := c(σ ) is an absolute constant.
   Furthermore, for a symmetric convex body (i. e., K = −K), the choice x0 = 0 suffices, and for a
general convex body given by a membership oracle, x0 can be computed in expected polynomial time.

    The above theorem implies that to get a constructive version of Theorem 1.1 for any convex body it
suffices to give an algorithm that can efficiently sample from the σ -sub-Gaussian distributions in (P1).
Furthermore, restricting the sampler to the choice x0 = 0, gives a universal algorithm for finding colorings
that land inside any symmetric convex body of Gaussian measure at least 1/2.
    The idea behind Theorem 1.2 is the following. Suppose first that K is symmetric about the origin.
Then as γm (K) ≥ 1/2, a random Gaussian vector G satisfies Pr[G ∈ K] ≥ 1/2, or equivalently, Pr[kGkK ≤
1] ≥ 1/2 where k · kK is the norm having K as its unit ball. By standard tail bounds, this gives that
E[kGkK ] = O(1). Theorem 1.2 now follows with x0 = 0 by the following theorem of Talagrand, together
with Markov’s inequality.

Theorem 1.3 (Talagrand [38]). Let K ⊂ Rm be a symmetric convex body and Y ∈ Rm be a σ -sub-Gaussian
random vector. Then for the standard Gaussian in G ∈ Rm , we have that

                                               E[kY kK ] ≤ σ · c · E[kGkK ]

where c is an absolute constant.

    Next, suppose that K is non-symmetric but satisfies the slightly stronger condition1 γm (K) ≥ 3/4.
Then the body K ∩ (−K) is convex and symmetric and has Gaussian measure at least 1/2 (as γm (K c ) =
γ((−K)c ) ≤ 1/4), and we can apply the argument above to K ∩ (−K). For non-symmetric bodies K with
Gaussian measure 1/2, Dadush et al. [15] show that one can compute a well chosen center x0 in K and
reduce to the symmetric case by symmetrizing K around x0 .
   1 As far as we know, for all known applications of Banaszczyk’s result, it makes no qualitative difference whether γ (K) ≥ 3/4
                                                                                                                       m
or γm (K) ≥ 1/2.


                           T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                                4
                   T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

1.1   Main result
In this paper we give the first efficient algorithm for obtaining Banaszczyk’s result (Theorem 1.1), which
also yields a new constructive proof, by providing a new random walk procedure to sample from the
requisite sub-Gaussian coloring distributions. We dub this procedure the Gram–Schmidt walk and state
its guarantees below:
Theorem 1.4 (Gram–Schmidt Walk). There is a polynomial-time randomized algorithm that, given as
input vectors v1 , . . . , vn ∈ Rm of `2 -norm at most 1 and x0 ∈ [−1, 1]n , outputs a coloring x√∈ {−1, 1}n
such that the random variable ∑ni=1 (x(i) − x0 (i))vi is sub-Gaussian with parameter σ = 40 ≈ 6.32.
Moreover, if |x0 (i)| = 1 then x(i) = x0 (i) with probability 1.
     Our algorithm in fact runs in time of O(n · (n + m)ω ), where ω is the exponent of matrix multiplication.
In particular, it runs in n iterations (one variable gets colored at each iteration), and in each iteration the
most expensive step is solving a linear system in n variables and m equations, which can be done in time
O((n + m)ω ).
     Previous works on the problem either achieved O(1)-sub-Gaussanity only for coordinate directions
                         √
[5, 7], or achieved O( log n)-sub-Gaussianity in all directions [15]. In contrast, Theorem 1.4 gives
O(1)-sub-Gaussianity guarantee in all directions. In contrast to the algorithms in [5, 7], where the steps
for the walk were generated by solving a semidefinite program at each step, the walk steps for our sampler
require only basic linear algebra, namely, the Gram–Schmidt orthogonalization. The idea for the walk
was inspired by the constructive proof of Dadush et al. [15] for the existence of solutions to the Komlós
vector coloring program of Nikolov [29], where the Gram–Schmidt orthogonalization plays a crucial role
in the analysis.

1.2   Other results
Theorem 1.4 directly gives an algorithm for previous applications of Banaszczyk’s result.

Komlós problem. This is a generalization of the Beck–Fiala problem and is defined as follows: given
an m × n matrix A with columns of `2 -norm at most one, find a coloring x ∈ {−1, 1}n to minimize
disc(A) = kAxk∞ . The Komlós conjecture [36] states that disc(A) = O(1) and is a generalisation of the
                                                          √
Beck–Fiala conjecture. Theorem 1.1 directly gives an O( log n) bound for the Komlós problem [1].
While algorithms to find such a coloring were recently given in [5, 7, 23], Theorem 1.4 gives another,
more direct, algorithm to find such a coloring.

` p discrepancy. For p ∈ [1, ∞), the ` p discrepancy of an m × n matrix A under a coloring x is defined as
                                                       1/p
                                               1      p
                                                 kAxk p      .
                                               m
Matoušek [26] showed an ` p discrepancy bound of O(pt 1/2 ) for the Beck–Fiala problem, using partial-
coloring methods. An improved bound of O(p1/2 ) for the more general Komlós setting follows directly
from Banaszczyk’s result (and a standard estimate of the Gaussian measure of the ` p -ball). Our result
gives an algorithmic version of this bound.

                        T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                5
              N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT

Corollary 1.5. There is an efficient randomized algorithm which given a matrix A with n columns of
                                                                                              √
`2 -norm at most 1, and p ∈ [1, ∞), finds a {−1, 1}n coloring with expected ` p discrepancy O( p).

Remark 1.6. The single algorithm given by Theorem 1.4 produces a (random) coloring that simulta-
neously achieves this bound for every p ∈ [1, ∞). In contrast, the coloring in Banaszczyk’s approach
depends on the body K which is different for different values of p.

Discrepancy relative to γ2 -norm. Given an m × n matrix A and a set J ⊆ [n], let A|J denote the m × |J|
matrix restricted to columns of A in J. The hereditary discrepancy of A is defined as

                                        herdisc(A) = max disc(A|J ).
                                                       J⊆[n]

Hereditary discrepancy is often a better measure of the complexity of a set system than the discrepancy.
It is also rather well behaved; while no polynomial time algorithm can distinguish between set systems
                                                  √
with zero discrepancy and set systems with O( n) discrepancy (assuming P 6= NP) [13], hereditary
discrepancy can be efficiently approximated. It was shown in [28] that for any matrix A,
                                                                    p
                            Ω(γ2 (A)/ log m) ≤ herdisc(A) ≤ O(γ2 (A) log m),                       (1.1)

where γ2 (A) is a matrix factorization norm that can be efficiently computed by a semidefinite program.
    The proof for the upper bound above was non-constructive in the sense that, given any J ⊆ [n], it was
not known how to efficiently find a coloring x : J → {−1, 1} with discrepancy of A|J bounded by the right
hand side of (1.1). Using Theorem 1.4, we get an algorithm to find such a coloring.

Corollary 1.7. There exists an efficient randomized algorithm that, given any m × n matrix A and J ⊆ [n],
returns a coloring x : J → {−1, 1} such that with constant probability,
                                                         p
                                      kA|J xk∞ = O(γ2 (A) log m).

A generalization of Banaszczyk’s result. We also give a generalization of Theorem 1.1. Let Bm
                                                                                            2
denote the Euclidean ball in Rm of radius 1 and centered at the origin.

Theorem 1.8. Let S1 , S2 , . . . , Sn be sets such that for each i ∈ [n], Si ⊆ Bm
                                                                                2 and 0 lies in the convex hull
of Si . Then for any convex body K with γm (K) ≥ 1/2, there exist vectors vi ∈ Si such that ∑ni=1 vi ∈ cK,
where c > 0 is an absolute constant. Moreover, there is an efficient algorithm to find these vectors.

     Theorem 1.1 is the special case of the above theorem, obtained by taking the sets Si = {−vi , vi } for
each i ∈ [n]. We will constructively reduce Theorem 1.8 to Theorem 1.1, which implies that an algorithm
for the latter also gives an algorithm for the former. Similar generalizations for several other problems in
discrepancy theory are mentioned in [9].

1.3   Organization of the paper
We state the algorithm for Theorem 1.4 in Section 2. The analysis is in Section 3, with a sketch of the
main ideas in Section 3.3. The applications are discussed in Section 4.

                        T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                6
                   T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

2    Algorithm description: The Gram–Schmidt walk
The algorithm will proceed in time steps t = 1, 2, . . . , n, and will maintain a fractional coloring xt ∈ [−1, 1]n
at all time steps. Recall that x0 is an arbitrary initial fractional coloring. Let xt denote the coloring at the
end of time step t. An element i is called alive at time t if |xt−1 (i)| < 1 and frozen (or fixed) otherwise.
Let At ⊆ [n] denote the set of alive elements at time t.
     We now give some notation to describe the update step at each time t. Let n(t) ∈ At denote the
largest indexed element that is alive at time t. We call n(t) the pivot element at time t. Let Vt denote the
subspace spanned by the vectors {vi : i ∈ At , i 6= n(t)} and let v⊥ (t) := ΠVt⊥ vn(t) denote the projection of
vn(t) orthogonal to Vt .
     We now describe the algorithm formally.

    Algorithm description:
    Input: vectors v1 , . . . , vn ∈ Rm of `2 -norm at most one; an initial coloring x0 ∈ [−1, 1]n .
    Output: coloring x ∈ {−1, 1}n .

       1. Given the initial coloring x0 , initialize A1 = {i ∈ [n] : |x0 (i)| < 1}, n(1) = max {i ∈ A1 } and
          t = 1.

       2. While At 6= 0,
                      / do the following:

            (a) Compute an update direction ut = (ut (1), . . . , ut (n)) ∈ Rn as follows:
                   • if i ∈
                          / At set ut (i) = 0.
                   • if i = n(t) set ut (i) = 1.
                   • for i ∈ At \ {n(t)} set ut (i) to satisfy v⊥ (t) = vn(t) + ∑i∈At \{n(t)} ut (i)vi , where
                     v⊥ (t) is the projection of vn(t) on the subspace orthogonal to the span of the
                     vectors {vi : i ∈ At , i 6= n(t)}.
            (b) Let δt− < 0 < δt+ be the unique negative and positive solutions for δ , respectively, for
                the equation maxi∈At |xt−1 (i) + δ ut (i)| = 1. Update the coloring xt−1 randomly as

                                                       xt = xt−1 + δt ut

                 where δt ∈ {δt− , δt+ } is chosen randomly as
                                                 
                                                 δ − with probability +δt+ − ,
                                                    t                 δt −δt
                                            δt =                           −
                                                 δt+ with probability +−δt − .
                                                                            δt −δt


            (c) Update At+1 = {i ∈ [n] : |xt (i)| < 1}, n(t + 1) = max{i ∈ At+1 } and t ← t + 1.

       3. Output xt .

   Note that v⊥ (t) depends on both At and vn(t) , and may change between time steps (if At changes)
even if n(t) remains the same. From the perspective of the walk, v⊥ (t) will correspond to the direction

                         T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                   7
                N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT

that the output of the random walk will move in during timestep t. In other words, the update direction
ut , in the step 2(a), ensures that Aut = v⊥ (i). The step 2(b) is the standard randomized pipage-rounding
step [37, 18], where the algorithm updates the coloring randomly along the direction ut , while ensuring
that the update has mean-zero, and xt stays in [−1, 1]n and at least one additional coordinate of xt reaches
−1 or 1. To justify the name of the walk, note that v⊥ (t) is simply the last vector of the Gram–Schmidt
orthogonalization of the ordered sequence of vectors (vi )i∈At .


3     Algorithm analysis
We proceed now to the analysis of our algorithm. In the first subsection we develop some preliminaries
which will be helpful later. The main ideas and the roadmap of the analysis are in Section 3.3.

3.1    Preliminaries
To bound the discrepancy, we will use a concentration inequality which is a variant of Freedman’s
inequality for martingales [17]. The following lemma will be useful.

Lemma 3.1. Let X be a random variable such that X ≤ 1. Then for any λ > 0,
                                                                    
                          E[eλ X ] ≤ exp λ E[X] + (eλ − λ − 1)E[X 2 ] .

Proof. Let
                                                            eλ x − λ x − 1
                                                  f (x) =
                                                                  x2
where we set f (0) = λ 2 /2. It can be verified that f (x) is increasing for all x. This implies eλ x ≤
f (1)x2 + 1 + λ x for any x ≤ 1. Taking expectations, this becomes
                                                                                                                   2
       E[eλ X ] ≤ 1 + E[λ X] + f (1)E[X 2 ] = 1 + λ E[X] + (eλ − λ − 1)E[X 2 ] ≤ eλ E[X]+(e −λ −1)E[X ]
                                                                                                       λ




where the last inequality uses the fact that 1 + x ≤ ex .

    We will use the following concentration inequality to bound the discrepancy, which is a slight variant
of Freedman’s inequality for martingales [17].

Lemma 3.2. Let X1 , . . . , Xn and Z0 , Z1 , . . . , Zn be random variables that satisfy

    1. Z0 is deterministic,

    2. Zt − Zt−1 ≤ Xt for all t = 1, . . . , n with probability one,

    3. Xt ≤ 1 for all t = 1, . . . , n with probability one, and

    4. For all t = 1, . . . , n, and for all admissible values z1 , . . . , zt−1 for Z1 , . . . , Zt−1 , with probability one
       it holds that E[Xt + Xt2 | Z1 = z1 , . . . , Zt−1 = zt−1 ] ≤ 0.

                           T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                           8
                    T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

Then
                                                     E[eZn ] ≤ eZ0 .
Proof. Let λ > 0 be a real number to be determined later. We shorthand as Et−1 [·] the conditional
expectation E[·|Z1 , . . . , Zt−1 ]. We first bound Et−1 [eλ Zt ] for which we observe the following:
      h     i                    h            i
 Et−1 eλ Zt = eλ Zt−1 Et−1 eλ (Zt −Zt−1 )
                                 h    i
              ≤ eλ Zt−1 Et−1 eλ Xt
                                                                    
              ≤ eλ Zt−1 exp λ Et−1 [Xt ] + (eλ − λ − 1)Et−1 Xt2             (using Lemma 3.1)
                                                          
              ≤ eλ Zt−1 exp (eλ − 2λ − 1)Et−1 Xt2                           (using Et−1 [Xt ] ≤ −Et−1 [Xt2 ]). (3.1)
                                                         


Set λ > 0 to be the (unique) solution of eλ − 2λ − 1 = 0. We first claim that λ ≥ 1. This follows as
ex ≤ 1 + x + x2 for x ≤ 1, and hence for 0 < λ < 1, eλ − 2λ − 1 ≤ λ 2 − λ < 0.
    Using this λ in (3.1) gives Et−1 [eλ Zt ] ≤ eλ Zt−1 . As
                                         E[eλ Zn ] = E0 E1 . . . En−1 [eλ Zn ],
by induction, we have that E[eλ Zn ] ≤ E[eλ Z0 ] = eλ Z0 , as Z0 is deterministic.
    As λ ≥ 1, by Jensen’s inequality E[eZn ] ≤ E[eλ Zn ]1/λ ≤ eZ0 , as needed.
Remark 3.3. Condition (4) in Lemma 3.2 can be improved to Et−1 [Xt + cXt2 ] ≤ 0 for c ≥ e − 2, as the
analysis only needs that the non-negative solution λ of eλ − (c + 1)λ − 1 = 0 should satisfy λ ≥ 1.

3.2    Notation and preliminary observations
We can now start with the analysis of the algorithm. Since at the end of each time step t, at least one more
element gets colored −1 or 1, the algorithm clearly terminates in at most n steps. To simplify notation, if
the algorithm terminates after t < n steps and outputs xt , we set xt+1 = . . . = xn := xt . As we maintain
kxt k∞ ≤ 1 for all time steps t, we see that xn ∈ {−1, 1}n .
     It should also be noted in step (2.a) of the algorithm that such a direction ut always exists. This is
because vn(t) − v⊥ (t) lies in the subspace Vt and hence there always exist ut (i)’s such that
                                       vn(t) − v⊥ (t) = −        ∑          ut (i)vi .
                                                             i∈At \{n(t)}

This then gives v⊥ (t) = vn(t) + ∑i∈At \{n(t)} ut (i)vi = ∑i∈At ut (i)vi . As ut (i) = 0 if i ∈
                                                                                              / At , we obtain that
                                                            n
                                                v⊥ (t) = ∑ ut (i)vi .                                           (3.2)
                                                           i=1
    We now focus on showing the sub-Gaussian bound in Theorem 1.4. Henceforth, we fix a vector
θ ∈ Rm with respect to which we want to show sub-Gaussianity. Let A be the m × n matrix whose columns
are given by v1 , . . . , vn . Define Y := ∑ni=1 (x(i) − x0 (i))vi and let
                                                n                              n
                    disc(θ ) = hθ ,Y i = hθ , ∑ (x(i) − x0 (i))vi i = ∑ (x(i) − x0 (i))hθ , vi i
                                               i=1                            i=1


                         T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                        9
              N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT

and hence hθ ,Y i can be seen as the discrepancy of the “row” θ T A, which has hθ , vi i as its i-th entry.
   Let us denote the respective signed discrepancy at the end of time step t by
                                           n                         n
                         disct := hθ , ∑ (xt (i) − x0 (i))vi i = ∑ (xt (i) − x0 (i))hθ , vi i,
                                       i=1                       i=1

let
                                               ∆t x := xt − xt−1 = δt ut
denote the coloring update at time t, and let
                                                         n                       n
                       ∆t disc := disct − disct−1 = ∑ hθ , vi i∆t x(i) = δt ∑ hθ , vi iut (i)
                                                        i=1                    i=1

                                                                                                  2   2
be the change in discrepancy at time t. Our end goal is to show that E[ediscn ] ≤ e(σ /2)kθ k2 , for σ = O(1)
to be computed later.
    A key observation is that the change in discrepancy at time t depends only on the vector v⊥ (t).

Lemma 3.4. At each time step t, ∆t disc = δt hθ , v⊥ (t)i.

Proof. The change in discrepancy at time t is
                                       n                         n
                       ∆t disc = δt ∑ hθ , vi iut (i) = δt hθ , ∑ ut (i)vi i = δt hθ , v⊥ (t)i,
                                     i=1                        i=1

where the last equality follows by (3.2).

3.3   Main ideas
Before we give the technical details, we describe the main ideas of the analysis, which are simple, and
then give a roadmap of the analysis.
    First, as our algorithm can start with any initial coloring x0 , we may assume without loss of generality
that the vectors v1 , . . . , vn are linearly independent. Otherwise, we can apply a standard preprocessing
step that removes linear dependencies, by finding some linear dependency among the vectors and making
an update step which incurs zero discrepancy.
    Our algorithm can be viewed as a randomized extension of the above dependent rounding approach.
Suppose for a moment that the element that got colored at each time step was the pivot. That is, the
elements got colored in the order n, n − 1, . . . , 1. Then, at time t, the pivot is n(t) = n − t + 1 and the
vectors v⊥ (t) can be described as follows. Let w1 , . . . , wn be the orthonormal vectors obtained by applying
the Gram–Schmidt orthonormalization procedure (GS) on the vectors v1 , . . . , vn in that order. That is,
w1 = v1 /kv1 k and for i > 1, wi is the projection of vi orthogonal to v1 , . . . , vi−1 , normalized to have unit
norm. By our assumption that the vectors vi are linearly independent, each wi is non-zero. It is easily
checked that v⊥ (t) = hvn(t) , wn(t) iwn(t) . Another observation that will be useful later is that wn(t) (and
hence v⊥ (t)) depends only on the set {v1 , . . . , vn(t)−1 } and not the particular order in which GS is applied
to this set.

                        T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                  10
                   T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

    Now δt is a mean-zero random variable which is independently chosen at each time t. Also |δt | ≤ 2
for all t (see Lemma 3.7). This suggests that the moment generating function of the discrepancy is
                            h        i    h n           ⊥
                                                             i         n        ⊥    2
                          E edisc(θ ) = E e∑t=1 δt hθ ,v (t)i ≤ eO(1)·∑t=1 hθ ,v (t)i .

                              2
But this is at most eO(1)·kθ k2 as

                     ∑hθ , v⊥ (t)i2 = ∑hθ , hvn(t) , wn(t) iwn(t) i2 ≤ ∑hθ , wn(t) i2 ≤ kθ k22 .
                      t                 t                                     t

Here the first inequality follows as |hvn(t) , wn(t) i| ≤ 1 as the wi ’s are orthonormal unit vectors and kvi k2 ≤ 1
for each i. The second inequality follows as ∑i hθ , wi i2 ≤ kθ k22 for any orthonormal collection of vectors
wi .
     There are two issues that need to be addressed in the simplified description above. First, non-pivot
elements may get colored sometimes. That is, the variables may not get colored in the order n, n − 1, . . . , 1.
The key point of the analysis is to show that this only makes the problem easier. To make this a bit more
precise, let us view ∑i hθ , wi i2 ≤ kθ k22 as the energy budget initially available to us. If some non-pivot
element xk is colored at some time t, then the GS procedure (without vk ) will produce a different set of
orthonormal vectors {w0i }. However we can bound the increase

                                             hθ , w0n(t) i2 − hθ , wn(t) i2

in the pivot’s energy by the amount hθ , wk i2 which was available to us, but we will never use it anymore
as k will never be a pivot once it is colored. The formal analysis later on will divide the time steps into
phases where the pivot element remains the same during a phase and consider the evolution of v⊥ (t) in
each phase.
     A second technical issue is that to obtain a sub-Gaussian distribution for all θ ∈ Rm , we need to
control the variance of the energy which requires that |hθ , v⊥ (ti)| = O(1). Let us refer to the times when
this does not happen as bad times. To get around this issue we break the discrepancy contribution into
two parts: a deterministic part due to the bad times, and a random contribution due to the other time
steps. Again by considering the dynamics of how v⊥ (t) evolves during a phase, one can show that the
cumulative discrepancy due to all the bad times can be at most O(kθ k22 ).

3.4   Phases and dynamics of pivots
To analyze the process, we need to set up some more notation. Let T be the first time at the end of which
all the elements are colored. Recall that at the beginning of each time step t ∈ [T ], there is some pivot
element n(t) and some set of alive elements At . Recall that Vt = span {vi : i ∈ At , i 6= n(t)}. After the
update at time t, some element is frozen. By convention, we define V0 := span {v1 , . . . , vn }. Note that for
any time t ∈ [T ], we have Vt ⊆ Vt−1 .
     We divide the time steps t ∈ [T ] into κ ∈ [T ] disjoint phases, where a phase is a maximal sequence of
time steps with the same pivot element (note that κ is a random variable). Let tkb be the time when phase
k ∈ [κ] begins and tke be the time when this phase ends. Note that n(t) = n(tkb ) for all t ∈ [tkb ,tke ] and the
pivot element n(tkb ) gets frozen after the update at time step tke . Thus, the first phase, with n(1) as pivot,

                          T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                 11
              N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT

begins at the beginning of time step t1b = 1 and ends at the end of time step t1e = t2b − 1. Given a time
t ∈ [T ], we define ft = min {t 0 ∈ [t] : n(t 0 ) = n(t)} to be the beginning time of the phase that t belongs to.
Note that if t ∈ [tkb ,tke ] then ft = tkb .
    We will now give a useful characterization of the discrepancy vector v⊥ (t) at any time step t ∈ [T ].
For any subspace W ⊆ Rm , let W ⊥ denote the subspace orthogonal to W and let ΠW (·) denote the
projection operator on W . For two linear subspaces W1 ,W2 satisfying W1 ⊇ W2 , we use the notation
W1 /W2 := W1 ∩W2⊥ .
Lemma 3.5. At each time step t ∈ [T ] of the algorithm the following holds:
                                                      t
                                          v⊥ (t) = ∑ ΠVi−1 /Vi (vn( ft ) ),                                 (3.3)
                                                    i= ft

where the subspaces Vi−1 /Vi are mutually orthogonal.
    Lemma 3.5 follows directly from the following useful fact.
Lemma 3.6. Let Rm ⊇ W0 ⊇ W1 ⊇ · · · ⊇ Wt , t ≥ 1, denote a non-increasing sequence of linear subspaces
and let y ∈ W0 . Then the subspaces Wi−1 /Wi for i ∈ [t] are mutually orthogonal and are also orthogonal
to Wt , and
                                                            t
                                          ΠWt⊥ (y) = ∑ ΠWi−1 /Wi (y).                                       (3.4)
                                                          i=1

Proof. We first prove orthogonality. Let Wt+1 = 0/ denote the empty subspace. Then Wt = Wt /Wt+1 . Now
for 1 ≤ i < j ≤ t + 1, we need to show that Wi−1 /Wi and W j−1 /W j are orthogonal. This follows directly
since Wi−1 /Wi is orthogonal to the subspace Wi and W j−1 /W j ⊆ W j−1 ⊆ Wi since j − 1 ≥ i.
    To prove (3.4), notice that span{Wi ∪ (Wi−1 /Wi )} = Wi−1 and thus by induction,
                                                 (               )
                                                            t+1
                                                            [
                                         W0 = span                (Wi−1 /Wi )
                                                            i=1

where each of the subspaces Wi−1 /Wi are mutually orthogonal. This implies that for y ∈ W0 ,
                                         t+1                         t
                         y = ΠW0 (y) = ∑ ΠWi−1 /Wi (y) = ∑ ΠWi−1 /Wi (y) + ΠWt (y).
                                         i=1                        i=1

The lemma follows now by observing that ΠWt⊥ (ΠWt ) = 0 and ΠWt⊥ (ΠWi−1 /Wi ) = ΠWi−1 /Wi as Wi−1 /Wi ⊆
Wt⊥ .

Proof of Lemma 3.5. First, by definition of ft , note that n(t) = n( ft ) and hence
                                      v⊥ (t) = ΠVt⊥ (vn(t) ) = ΠVt⊥ (vn( ft ) ).
     We now check that vn( ft ) ∈ V ft −1 . If ft = 1, this is immediate since V0 = span {v1 , . . . , vn } by
convention. Otherwise, since n( ft ) is both alive and not the pivot at time ft − 1 ≥ 1, we have that
n( ft ) ∈ A ft −1 ⇒ vn( ft ) ∈ V ft −1 . The main statement now follows directly by applying Lemma 3.6 on the
subspaces V ft −1 ⊇ V ft ⊇ · · · ⊇ Vt and the vector vn( ft ) .


                        T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                  12
                    T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

3.5   Discrepancy in a phase
We now bound the discrepancy incurred during any subinterval of a phase. We first need the following
simple but very useful fact.
                                                                                        q
Lemma 3.7. Let p ≤ q be any two time steps in [tkb ,tke ] during a phase k ∈ [κ]. Then ∑t=p δt ≤ 2.

Proof. At any time t ∈ [p, q], note that n(t) = n(p), i. e., the pivot remains unchanged. The color of the
pivot element n(p) at time t ∈ [p, q] is updated by δt ut (n(p)) = δt and hence
                                                                                                 q
                                                xq (n(p)) − x p−1 (n(p)) = ∑ δt .
                                                                                              t=p

As |xt (n(p))| ≤ 1 for all t ∈ [T ], we have that |xq (n(p)) − x p−1 (n(p))| ≤ 2 as needed.

Lemma 3.8. Let p ≤ q be any two time steps in [tkb ,tke ] during a phase k ∈ [κ]. The discrepancy
| discq − disc p−1 | incurred during the time interval [p, q] is at most 2kθ (k) k2 , where

                                                             θ (k) := ΠV b     /Vt e (θ ).
                                                                        t −1          k
                                                                         k


Furthermore, the subspaces Vt b −1 /Vtke for k ∈ [κ] are mutually orthogonal.
                                  k

Proof. We first prove orthogonality. Take k1 , k2 ∈ [κ], where k1 < k2 . We must show that the subspaces

                                              Vt b −1 /Vtke           and                 Vt b −1 /Vtke
                                                k1              1                           k2        2


are orthogonal. This follows since the first subspace is orthogonal to Vtke and the second subspace is
                                                                                                          1
contained in Vt b −1 ⊆ Vtke since tkb2 − 1 ≥ tke1 . We now prove the main statement. First, note that
               k2         1

                                                                              q
                                              discq − disc p−1 = ∑ δt hθ , v⊥ (t)i.                           (3.5)
                                                                             t=p

By Lemma 3.5, for t ∈ [p, q] ⊆ [tkb ,tke ], letting h := n(tkb ) = n(t) denote the pivot index, we have that
                                                                                  t
                                           v⊥ (t) := ΠVt⊥ (vh ) = ∑ ΠVi−1 /Vi (vh ).                          (3.6)
                                                                              i=tkb

Combining (3.5), (3.6) we get that

                                       q               t
            discq − disc p−1 = ∑ δt · ∑ hΠVi−1 /Vi (vh ), θ i
                                      t=p            i=tkb

                                       q                                          q
                              =       ∑ hΠVi−1 /Vi (vh ), θ i ·                   ∑          δt
                                      i=tkb                             t=max{i,p}


                       T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                    13
                 N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT

                                                    q                              q
                                            ≤ ∑ hΠVi−1 /Vi (vh ), θ i ·           ∑         δt
                                               i=tkb                           t=max{i,p}
                                                        q
                                            ≤ 2 ∑ hΠVi−1 /Vi (vh ), θ i                              (by Lemma 3.7)
                                                    i=tkb
                                                        q
                                            = 2 ∑ hΠVi−1 /Vi (vh ), ΠVi−1 /Vi (θ )i .                                     (3.7)
                                                    i=tkb

Now applying the Cauchy-Schwarz inequality twice, we get
          q
         ∑ hΠV /V (vh ), ΠV /V (θ )i
                     i−1         i            i−1       i
                                                                                                                          (3.8)
         i=tkb
                     q
                 ≤ ∑ kΠVi−1 /Vi (vh )k2 kΠVi−1 /Vi (θ )k2
                   i=tkb
                                                           1/2                           1/2
                           q                                          q
                 ≤  ∑ kΠVi−1 /Vi (vh )k22                     ·  ∑ kΠVi−1 /Vi (θ )k22 
                         i=tkb                                       i=tkb

                 ≤ kΠV b             /Vq (vh )k2 · kΠVt b −1 /Vq (θ )k2
                           t −1
                            k                               k

                 ≤ kΠV b             /Vq (θ )k2                                                      (since kvh k2 ≤ 1)
                           t −1
                            k
                                                                                                                      
                 ≤ kΠV b             /Vt e (θ )k2                                                     since Vq ⊇ Vtke     (3.9)
                           t −1        k
                            k


where the third inequality follows by orthogonality of the subspaces Vi−1 /Vi for all i ∈ [tkb ,tke ], and their
containment inside Vt b −1 /Vq . The lemma now follows by combining (3.7), (3.9).
                                 k


   Notice that discrepancy {disct : t ≥ 0} is a martingale. For the rest of the analysis, we will define
another closely related martingale {Yt } and show that the sub-Gaussianity of {disct } follows from the
sub-Gaussianity of {Yt }. Then, we will show the sub-Gaussianity of {Yt }.
   Define the random process {Yt : t ≥ 0} as
                                                
                                                0
                                                               if t = 0,
                                           Yt := Yt−1           if kΠV ft −1 /Vt (θ )k2 > 1/8,
                                                
                                                 Yt−1 + ∆t disc otherwise.
                                                

Note that E[Yt | Y1 , . . . ,Yt−1 ] = Yt−1 and thus {Yt } is a martingale. To prove Theorem 1.4, we will bound
the exponential moment of discn .
                                                    2
Theorem 3.9. E[ediscn ] ≤ e20kθ k2 .
                                                                          √
    Note that this directly gives that the walk is                         40 ≈ 6.32-sub-Gaussian.

                                     T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                  14
                   T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

      To control this exponential moment, we will split

                                        discn = discT = (discT −YT ) +YT ,

where (discT −YT ) will correspond to the discrepancy incurred during “bad” times and YT corresponds to
the “good” times. More precisely, call a time step t good if kΠV ft −1 /Vt (θ )k2 ≤ 1/8 and bad otherwise. Let
                                      n                                      o
                                B := t ∈ [T ] : kΠV ft −1 /Vt (θ )k2 > 1/8

denote the set of bad times. Note that

                                                 discT = YT + ∑ ∆t disc .
                                                                    t∈B

The following lemma now shows that the discrepancy incurred during the bad times can be upper bounded
deterministically.
Lemma 3.10. |∑t∈B ∆t disc| < 16kθ k22 .
Proof. For each k ∈ [κ], let Bk denote the set of bad time steps in phase k. Notice that in a phase, once a
time step becomes bad, all subsequent time steps in that phase are bad as the length of ΠV ft −1 /Vt (θ ) is a
non-decreasing function of t during a phase (as Vt+1 ⊆ Vt for every t). Thus, the set Bk , if non-empty,
forms a consecutive interval (ending at the end of phase k), and hence Lemma 3.8 can be applied to it.
   Recall that we denoted
                                  θ (k) := ΠV b /Vt e (θ )   for k ∈ [κ].
                                                         t −1   k
                                                          k

Note that the vectors θ (1) , . . . , θ (κ) are mutually orthogonal. Furthermore, by the argument above, for any
phase k ∈ [κ] with Bk 6= 0/ we have that kθ (k) k ≥ 1/8.
   The discrepancy incurred during bad times can now be bounded as follows


             ∑ ∆t disc ≤ ∑ ∑ ∆t disc
             t∈B            k∈[κ] t∈Bk

                        ≤       ∑           2kθ (k) k2                (by Lemma 3.8)
                            k∈[κ],Bk 6=0/
                                                                                                
                        ≤       ∑           16kθ (k) k22               kθ (k) k ≥ 1/8 if Bk 6= 0/
                            k∈[κ],Bk 6=0/
                                                                                                               
                        ≤ 16kθ k22                                     by orthogonality of θ (1) , . . . , θ (κ) .

3.6     Bounding the discrepancy
We now prove Theorem 3.9. To this end, we will define potential functions that capture how the variance
of Yt increases over time. For t ∈ [T ], define the subspace Rt := span{vi : i ∈ At }, i. e., the span of the
active vectors. Note that at any timestep t ∈ [T ], the discrepancy direction v⊥ (t) ∈ Rt /Vt . Consider the
potentials

                       T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                         15
               N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT



                             (
                              kΠVt (θ )k2 + (1 − xt−1 (n(t))2 )kΠRt /Vt (θ )k22    if t is good,
                    Φtb :=
                              kΠVt (θ )k2                                          if t is bad,

and
                             (
                              kΠVt (θ )k2 + (1 − xt (n(t))2 )kΠRt /Vt (θ )k22     if t is good,
                      Φte :=
                              kΠVt (θ )k2                                         if t is bad.

Here we think of Φtb as the potential at the beginning of time step t and Φte as the potential at the end of
time step t. By convention, we define Φe0 = Φb0 = kθ k22 .
     As by Lemma 3.10 we have deterministic control over the discrepancy incurred during the bad times,
our goal is now to control the discrepancy during the good times. Namely, we need to bound the process
Y0 ,Y1 , . . . ,YT . Here, the main idea is to charge the drop in potential to the increase in discrepancy. To this
end, we define the potential weighted discrepancy process {Zt } for t ≥ 0 as

                                                 Zt := Yt + 4Φte .

Note that Y0 = 0, Z0 = 4Φe0 = 4kθ k22 and Zt ≥ Yt for all t ≥ 0. For t ∈ [T + 1, n] we define Zt := ZT . Our
goal is to now show that the process Z0 , . . . , ZT has a strong “negative” drift.
   The increments of Zt − Zt−1 for t ∈ [T ] can be expressed as follows:

        Zt − Zt−1 = Yt −Yt−1 + 4(Φte − Φt−1
                                        e
                                            ) = (Yt −Yt−1 + 4(Φte − Φtb )) + 4(Φtb − Φt−1
                                                                                      e
                                                                                          ).                 (3.10)

We decompose this increment into a “predictable” part
                                  (
                                   δt hv⊥ (t), θ i − 4(xt (n(t))2 − xt−1 (n(t))2 )kΠRt /Vt (θ )k22 if t is good,
 Xt := Yt −Yt−1 + 4(Φte − Φtb ) =
                                   0                                                                 if t is bad,
                                  (
                                   δt hv⊥ (t), θ i − 4δt (δt + 2xt−1 (n(t)))kΠRt /Vt (θ )k22 if t is good,
                                =
                                   0                                                           if t is bad,
                                                                                                              (3.11)

over which we will be able to get stochastic control, and a “free” part

                                                  4(Φtb − Φt−1
                                                           e
                                                               )

which we show is always non-positive. The following crucial lemma shows that 4(Φtb − Φt−1
                                                                                      e ) indeed

gives us a “free drop” in potential.

Lemma 3.11. For all t ∈ [T ], Φtb ≤ Φt−1
                                     e . Hence, the increments satisfy Z − Z
                                                                        t   t−1 ≤ Xt for all t ∈ [T ].

Proof. For t = 1, the statement is trivial since Φe0 = kθ k22 and clearly Φtb ≤ kθ k22 for all t. Thus, we may
assume t ≥ 2. If t is bad, then using Vt ⊆ Vt−1 we get Φtb = kΠVt (θ )k22 ≤ kΠVt−1 (θ )k22 ≤ Φt−1   e . So, we

may assume from now on that t is good.

                        T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                    16
                    T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

    If the time step t is the beginning of a new phase, then
                                 Φtb = kΠVt (θ )k22 + (1 − xt−1 (n(t))2 )kΠRt /Vt (θ )k22
                                     ≤ kΠVt (θ )k22 + kΠRt /Vt (θ )k22
                                     = kΠRt (θ )k22 ≤ kΠVt−1 (θ )k22 ≤ Φt−1
                                                                        e
                                                                            .
Here the inequality kΠRt (θ )k2 ≤ kΠVt−1 (θ )k2 follows by our assumption that t is the beginning of a new
phase and hence At ⊆ At−1 \ {n(t − 1)} i. e., Rt ⊆ Vt−1 .
    Lastly, if t is good and not the beginning of a new phase, then note that t − 1 is also good and that
n(t) = n(t − 1). Thus
    Φtb = kΠVt (θ )k22 + (1 − xt−1 (n(t))2 )kΠRt /Vt (θ )k22
        ≤ kΠVt (θ )k22 + (1 − xt−1 (n(t))2 )kΠRt−1 /Vt (θ )k22                                     (using Rt ⊆ Rt−1 )
        = kΠVt (θ )k22 + (1 − xt−1 (n(t))2 ) kΠVt−1 /Vt (θ )k22 + kΠRt−1 /Vt−1 (θ )k2        2
                                                                                               

        ≤ kΠVt (θ )k22 + kΠVt−1 /Vt (θ )k22 + (1 − xt−1 (n(t))2 )kΠRt−1 /Vt−1 (θ )k22
                                           

        = kΠVt−1 (θ )k22 + (1 − xt−1 (n(t))2 )kΠRt−1 /Vt−1 (θ )k22 = Φt−1
                                                                      e


as needed.

    We now show the increment bounds Xt satisfy the negative drift requirements of Lemma 3.2.
Lemma 3.12. For t ∈ [T ], |Xt | ≤ 1 and E[Xt + Xt2 | Z1 , . . . , Zt−1 ] ≤ 0.
Proof. Clearly, we may assume that t is good, since otherwise Xt = 0 and the statement is trivial. For
simplicity of notation, let Ωt−1 denote all the random choices made by the algorithm in the first t − 1 time
steps. Note that Ωt−1 determines in particular Z1 , . . . , Zt−1 . We adopt the shorthand Et−1 [·] := E[· | Ωt−1 ].
Let us further denote θt := hv⊥ (t), θ i, θ̄t = kΠRt /Vt (θ )k2 , and x := xt−1 (n(t)). With this notation, we
have that
                                        Xt = δt θt − 4δt (δt + 2x)θ̄t2 .
Since v⊥ (t) ∈ Rt /Vt , note that |θt | ≤ kv⊥ (t)k2 θ̄t ≤ θ̄t . By definition of t being good, we have that θ̄t ≤ 1/8.
Since x ∈ [−1, 1] and x + δt ∈ [−1, 1], we deduce the following simple bounds
              |δt | ≤ 2,     |δt (δt + 2x)| = |(δt + x)2 − x2 | ≤ 1,      |δt + 2x| = |(δt + x) + x| ≤ 2.          (3.12)
    Using (3.12), we see that
                           |Xt | ≤ |δt ||θt | + 4|δt (δt + 2x)|θ̄t2 ≤ 2(1/8) + 4(1)(1/82 ) ≤ 1.
Next, we have that Et−1 [Xt ] = −4Et−1 [δt2 ]θ̄t2 since Et−1 [δt ] = 0. Lastly, using the inequality (a + b)2 ≤
2a2 + 2b2 , we get that
                                   Xt2 ≤ 2δt2 θt2 + 2(16)δt2 (δt + 2x)2 θ̄t4
                                       ≤ 2δt2 θ̄t2 + 2(16)δt2 (4)(1/8)2 θ̄t2 = 4δt2 θ̄t2 .
Thus Et−1 [Xt2 ] ≤ 4Et−1 [δt2 ]θ̄t2 and hence Et−1 [Xt + Xt2 ] ≤ 0. As Z1 , . . . , Zt−1 are determined by Ωt−1 , this
implies that E[Xt + Xt2 | Z1 , . . . , Zt−1 ] ≤ 0, as needed.

                           T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                        17
              N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT

      We now prove the main moment bound.

Proof of Theorem 3.9. To begin we see that
                              h                    i
                E ediscn ≤ E eYT +| ∑t∈B ∆t disc |
                       
                              h          2
                                           i
                          ≤ E eYT +16kθ k2                                (by Lemma 3.10)
                                           2
                          ≤ E eZn e16kθ k2
                               
                                                                          (as YT ≤ ZT = Zn ) .
                       2
                4kθ k22. Furthermore, by Lemma 3.11 we have Zt − Zt−1 ≤ Xt , by Lemma 3.12 we have
Recall that Z0 =
|Xt | ≤ 1 and E Xt + Xt | Z1 , . . . , Zt−1 ≤ 0 for all t ∈ [T ], and for t ∈ [T + 1, n] we have Zt − Zt−1 = 0 by
definition. Therefore, applying Lemma 3.2 gives that
                                                               2
                                           E eZn ≤ eZ0 = e4kθ k2 .
                                             

                                                                    2          2
Thus, combining everything together, we get E ediscn ≤ E eZn e16kθ k2 ≤ e20kθ k2 as needed.
                                                       



4     Applications
In this section, we list some applications of our main result.

4.1    The ` p variant of the Komlós problem
Corollary 1.5 (restated). There is an efficient randomized algorithm which given an m × n matrix A
having all columns of `2 -norm at most one and p ∈ [1, ∞), finds a coloring x ∈ {−1, 1}n with expected ` p
               √
discrepancy O( p).

Proof. Let Y = Ax. By Theorem 1.4, we get that Y is a σ -sub-Gaussian random vector and hence each
component of Y is a σ -sub-Gaussian random variable. Letting Yi denote the i-th component of Y , we get
                                                   m
                                     E[kY k pp ] = ∑ E[|Yi | p ] ≤ mC p p p/2
                                                  i=1

for a constant C = C(σ ) depending only on σ . The inequality above follows from the standard fact that
the p-th moment of an O(1)-sub-Gaussian random variable is at most C p p p/2 .

4.2    Discrepancy relative to the γ2 -norm
Corollary 1.7 (restated). There exists an efficient randomized algorithm that, given any m × n matrix A
and J ⊆ [n], returns a coloring x : J → {−1, 1} such that with constant probability,
                                                          p
                                       kA|J xk∞ = O(γ2 (A) log m).



                       T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                  18
                   T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

Proof. It was shown in [28] that for any matrix A, there exists an efficiently computable quantity γ2 (A)
such that                                                            p
                         Ω(γ2 (A)/ log m) ≤ herdisc(A) ≤ O(γ2 (A) log m).
The upper bound above was proved using Theorem 1.1 (roughly) as follows: given any set J ⊆ [n],
factorize the matrix A|J as A|J = BC where every row of the matrix B has `2 -norm at most γ2 (A) and every
column of the matrix C has `2 -norm at most one. Such a factorization exists and can be computed by
solving an appropriate semidefinite program. We refer the reader to [28] for more details.
    Then using the matrix B, a convex body K is defined as follows:
                                                               p
                                 K = {y ∈ Rm : kByk∞ ≤ cγ2 (A) log m}.

For c a large enough constant, γm (K) ≥ 1/2. This follows by standard Gaussian tail bounds and union
bound. Now finding a coloring x ∈ {−1, 1}n of the columns of A|J such that the `∞ -norm of A|J x = BCx
           √
is O(γ2 (A) log m) is equivalent to finding a coloring x of the column vectors of C such that Cx lies in
K. As C has all columns of length at most one, by Theorem 1.1 there exists a coloring x such that the
                              √
discrepancy of A|J is O(γ2 (A) log m).
    Now Theorem 1.4 (with Theorem 1.2) directly gives an efficient algorithm to find such a coloring.

4.3   A generalization of Banaszczyk’s result
In this subsection we prove the following generalization of Banaszczyk’s result. The proof follows
along similar lines as the proof of [24] showing that linear discrepancy is at most twice the hereditary
discrepancy.

Theorem 1.8 (restated). Let S1 , S2 , . . . , Sn be sets such that for each i ∈ [n], Si ⊆ Bm
                                                                                           2 and 0 lies in the
convex hull of Si . Then for any convex body K with γm (K) ≥ 1/2, there exist vectors vi ∈ Si such that
∑ni=1 vi ∈ cK, where c > 0 is an absolute constant. Moreover, there is an efficient algorithm to find these
vectors.

Proof. For technical reasons, we give the proof for the case when γm (K) ≥ 1/2 + ε for some ε > 0. The
running time of the algorithm will be proportional to log(1/ε). The general case, for instance when
γm (K) = 1/2, is slightly more complicated but can be handled by combining our proof with the techniques
in [15]. We provide a sketch of how to do this towards the end of the proof.
    For each i ∈ [n], as 0 lies in the convex hull of Si , there exist at most m + 1 vectors in Si and a convex
combination of them that equals 0. That is, there exist m + 1 vectors vi, j ∈ Si and real numbers xi, j ≥ 0
such that
                                          m+1                    m+1
                             ∀i ∈ [n] :   ∑ xi, j = 1    and     ∑ xi, j vi, j = 0.                      (4.1)
                                          j=1                    j=1

Our goal will be to round each collection {xi, j : j ∈ [m + 1]} such that exactly one of them is 1 and the
rest are 0, without incurring too much discrepancy.
     Assume for now that each of xi, j can be expressed using at most k digits in binary, for some finite
k; that is, xi, j ∈ 2−k Z for all i, j. We will see later how to get rid of this assumption. The main step will

                       T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                19
                 N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT

to be to reduce k to k − 1 by rounding the k-th bits in each xi, j either up or down, in such a way that
discrepancy stays bounded. Then, we will repeat this operation until we obtain k = 0, which means that
xi, j ∈ {0, 1}. We will maintain the property that xi, j ≥ 0 and that ∑ j xi, j = 1 for all i, which then implies
that there exists ji ∈ [m + 1] for each i such that xi, ji = 1 and xi, j = 0 if j 6= ji . The claim will follow by
choosing vi, ji ∈ Si .
                             (k)
      Setting notation, let xi, j = xi, j . Assume that we already computed for ` ∈ [k] a choice of

                                                                            (`)
                                                                        xi, j ∈ 2−` Z

                     (`)                                       (`)
that satisfies ∑ j xi, j = 1 for all i ∈ [n]. If xi, j ∈ {0, 1} for all j then there is nothing more to do. Otherwise,
let
                                                 (`)       (`) (`)               (`)
                                                xi, j = 0.bi, j,1 bi, j,2 . . . bi, j,`
                                         (`)
be the binary expansion of xi, j < 1. Let

                                                                                              (`)
                                               A(`) = {(i, j) ∈ [n] × [m + 1] : bi, j,` = 1}

be the indices where the `-th bit of xi, j is 1. We will construct a coloring χ (`) : A(`) → {−1, 1} which
satisfies
                    ∑ χ (`) (i, j)vi, j ∈ cK and ∀i ∈ [n], ∑ χ (`) (i, j) = 0                         (4.2)
                           (i, j)∈A(`)                                                         j:(i, j)∈A(`)

                                                                                                                           (`−1)
for some absolute constant c > 0. Given such a coloring χ (`) , we compute the value of xi, j                                      as follows:
                                           (`)                                    (`)
                                          xi, j
                                                                           if bi, j,` = 0,
                                    (`−1)
                                   xi, j = xi,(`)j + 2−`                           (`)
                                                                            if bi, j,` = 1 and χ (`) (i, j) = 1,
                                           (`)
                                                                                  (`)
                                           xi, j − 2−`                      if bi, j,` = 1 and χ (`) (i, j) = −1.

                                                                                  (`)         (`−1)
Observe that this operation zeros the `-th bit of all xi, j , namely xi, j                            ∈ 2−(`−1) Z; it maintains the property
                                (`−1)                (`−1)
that for all i ∈ [n], ∑ j xi, j          = 1 and xi, j           ≥ 0; and it satisfies that
                                                                                  
                                                                (`−1)          (`)
                                                    ∑ xi, j                 − xi, j vi, j ∈ 2−` cK.
                                                    i, j

Thus, repeating this rounding operation for ` = k, k − 1, . . . , 1 will result in a choice of xi, j ∈ {0, 1} that
satisfy ∑ j xi, j = 1 and
                                            ∑ xi, j vi, j ∈ cK.
                                                                     i, j

That is, there is a choice of ji ∈ [m + 1] for all i ∈ [n] such that ∑i, j vi, ji ∈ cK, as claimed.
                                                                                                    (`)
    It remains to show how to find a coloring satisfying (4.2). Let Ai = { j : (i, j) ∈ A(`) } be the elements
                                                                               (`)
being colored in the set Si when we round the `-th bits. We claim that |Ai | must be even. This is since
                                           (`)             (`)                            (`)
for every i ∈ [n], the number of elements xi, j for which bi, j,` = 1 must be even, as ∑ xi, j = 1. We will pair

                               T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                                       20
                    T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

                       (`)
up the elements of Ai in an arbitrary way, and only consider colorings χ (`) which give opposite colors
to elements in each pair. In such a way, such a coloring automatically satisfies that ∑ j χ (`) (i, j) = 0 for
all i ∈ [n].
                                                                     (`)
     For simplicity of notation, denote the vectors {vi, j : j ∈ Ai } by {ui,1 , . . . , ui,2qi } for some integer
         (`)
qi = |Ai |/2. Define new vectors wi, j = (ui,2 j−1 − ui,2 j ) /2 for j ∈ [qi ]. Observe that kwi, j k2 ≤ 1. Apply
Theorem 1.1 to the vectors wi, j and the convex body K. This gives a coloring χ 0 (i, j) for each vector wi, j
such that
                                           ∑ χ 0 (i, j)wi, j ∈ cK.
                                                   i, j

We now define the coloring χ (`) to give the color χ 0 (i, j) to ui,2 j−1 and the color −χ 0 (i, j) to ui,2 j . Clearly
this satisfies both the conditions in (4.2) with constant 2c.
    We will now show that we can assume the binary expansion to be finite. Concretely, we will show
that a preliminary rounding procedure can allow us to assume that k ≤ log(2mn/ε). This is since by
truncating each xi, j after log(2mn/ε) bits, the sum ∑i, j xi, j vi, j changes by at most

                                                          1
                                           ∑ 2log(2mn/ε) vi, j ∈ εBm2 ⊆ K
                                           i, j

and thus increases the final value of c by at most 1. The last containment follows by our assumption
that γm (K) ≥ 1/2 + ε, and thus K must contain a Euclidean ball of radius r which satisfies γ1 ([0, r]) ≤ ε.
Clearly this is true for r = ε.
    We mention briefly now on how to tackle the case when γm (K) < 1/2 + ε. This proceeds along
similar lines as Theorem 40 from [15]. The main idea is that we can find a point p such that
                                                             !
                                                  p∈K∩        ∑ conv(Si )
                                                              i

where Si denotes the convex hull of Si and the summation operator used is Minkowski addition. We
then instead solve a new problem on the instance given by the convex body K 0 := α(K − p) and sets
Si0 such that ∑i conv(Si0 ) = α(∑i conv(Si ) − p) for some constant scaling factor α > 0. A solution of
our original problem is recoverable from a solution of this. p and α moreover satisfy the property that
γm (K 0 ) ≥ 1/2 + ε, and we already know how to solve this case.


5    Conclusion and open questions
We gave efficient algorithms for several problems to find colorings with discrepancy bounds similar to
those achievable using Banaszczyk’s result, Theorem 1.1. However there are still some problems that
use Banaszczyk’s techniques in a non-trivial iterative way, for which we are unable to obtain an efficient
algorithm.
    One such problem is Tusnády’s problem concerning the discrepancy of axis-parallel boxes in Rd .
Nikolov [31] used Banaszczyk’s technique to prove that the discrepancy is Od (logd−1/2 n), where Od (.)

                         T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                      21
              N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT

hides factors depending only on d. Our techniques do not seem to apply here and the best known
algorithmic bound is Od (logd n) [7].
     Another such problem is the Steinitz problem in the `2 -norm. Here we are given n vectors v1 , . . . , vn ∈
Bm2 such that ∑i vi = 0, and the goal is to find a rearrangement of these vectors such that the `2 -norm of
the sum of vectors in any prefix along the rearrangement is small. That is, we want to find a permutation
π : [n] → [n] to minimize
                                                       k
                                              max      ∑ vπ(i)       .
                                             k=1,...,ni=1        2

Banaszczyk showed in [2], using techniques from [1], that there exists a permutation for which the above
                        √     √                                                           √
expression is at most O( m + log n), whereas the best known algorithmic bound is O( m log n) [7].

Acknowledgments
The authors would like to thank Aleksandar Nikolov for several discussions related to the topic of this
paper. N. B. and S. L. would like to thank the Dagstuhl seminar “Computational Complexity of Discrete
Problems” where this work was initiated. We would also like to thank the referees for various comments
that improved the presentation of the paper.


References
 [1] W OJCIECH BANASZCZYK: Balancing vectors and Gaussian measures of n-dimensional con-
     vex bodies. Random Structures Algorithms, 12(4):351–360, 1998. [doi:10.1002/(SICI)1098-
     2418(199807)12:4<351::AID-RSA3>3.0.CO;2-S] 3, 5, 22

 [2] W OJCIECH BANASZCZYK: On series of signed vectors and their rearrangements. Random Struc-
     tures Algorithms, 40(3):301–316, 2012. [doi:10.1002/rsa.20373] 3, 22

 [3] N IKHIL BANSAL: Constructive algorithms for discrepancy minimization. In Proc. 51st FOCS, pp.
     3–10. IEEE Comp. Soc. Press, 2010. [doi:10.1109/FOCS.2010.7, arXiv:1002.2259] 2

 [4] N IKHIL BANSAL , M OSES C HARIKAR , R AVISHANKAR K RISHNASWAMY, AND S HI L I: Bet-
     ter algorithms and hardness for broadcast scheduling via a discrepancy approach. In Proc.
     25th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’14), pp. 55–71. SIAM, 2014.
     [doi:10.1137/1.9781611973402.5] 2

 [5] N IKHIL BANSAL , DANIEL DADUSH , AND S HASHWAT G ARG: An algorithm for Komlós conjecture
     matching Banaszczyk’s bound. SIAM J. Comput., 48(2):534–553, 2019. Preliminary version in
     FOCS’16. [doi:https://doi.org/10.1137/17M1126795] 3, 5

 [6] N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG , AND S HACHAR L OVETT: The Gram–
     Schmidt walk: A cure for the Banaszczyk blues. In Proc. 50th STOC, pp. 587–597. ACM Press,
     2018. [doi:10.1145/3188745.3188850]

                       T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                                 22
                 T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

 [7] N IKHIL BANSAL AND S HASHWAT G ARG: Algorithmic discrepancy beyond partial color-
     ing. In Proc. 49th STOC, pp. 914–926. ACM Press, 2017. [doi:10.1145/3055399.3055490,
     arXiv:1611.01805] 3, 5, 22

 [8] N IKHIL BANSAL AND V ISWANATH NAGARAJAN: Approximation-friendly discrepancy rounding.
     In Proc. 18th Ann. Conf. on Integer Programming and Combinatorial Optimization (IPCO ’16), pp.
     375–386. Springer, 2016. [doi:10.1007/978-3-319-33461-5_31, arXiv:1512.02254] 2

 [9] I MRE B ÁRÁNY: On the power of linear dependencies. In Building Bridges (M. Grötschel and
     G. O. H. Katona, eds.), volume 19 of Bolyai Soc. Math. Studies, pp. 31–45. Springer, 2008.
     [doi:10.1007/978-3-540-85221-6_1] 2, 6

[10] F RANCK BARTHE , O LIVIER G UÉDON , S HAHAR M ENDELSON , AND A SSAF NAOR: A
     probabilistic approach to the geometry of the `np -ball. Ann. Probab., 33(2):480–513, 2005.
     [doi:10.1214/009117904000000874] 3

[11] J ÓZSEF B ECK: Roth’s estimate of the discrepancy of integer sequences is nearly sharp. Combina-
     torica, 1(4):319–325, 1981. [doi:10.1007/BF02579452] 2

[12] J ÓZSEF B ECK AND T IBOR F IALA: “Integer-making” theorems. Discr. Appl. Math., 3(1):1–8, 1981.
     [doi:10.1016/0166-218X(81)90022-6] 2, 3

[13] M OSES C HARIKAR , A LANTHA N EWMAN , AND A LEKSANDAR N IKOLOV: Tight hardness re-
     sults for minimizing discrepancy. In Proc. 22nd Ann. ACM-SIAM Symp. on Discrete Algorithms
     (SODA’11), pp. 1607–1614. SIAM, 2011. [doi:10.1137/1.9781611973082.124] 6

[14] B ERNARD C HAZELLE: The Discrepancy Method: Randomness and Complexity. Cambridge Univ.
     Press, 2000. [doi:10.1017/CBO9780511626371] 2

[15] DANIEL DADUSH , S HASHWAT G ARG , S HACHAR L OVETT, AND A LEKSANDAR N IKOLOV:
     Towards a constructive version of Banaszczyk’s vector balancing theorem. Theory of Computing,
     15(15):1–58, 2019. Preliminary version in RANDOM’16. [doi:10.4086/toc.2019.v015a015] 3, 4, 5,
     19, 21

[16] RONEN E LDAN AND M OHIT S INGH: Efficient algorithms for discrepancy minimization in convex
     sets. Random Structures Algorithms, 53(2):289–307, 2018. [doi:10.1002/rsa.20763] 2

[17] DAVID A. F REEDMAN: On tail probabilities for martingales. Ann. Probab., 3(1):100–118, 1975.
     [doi:10.1214/aop/1176996452] 8

[18] R AJIV G ANDHI , S AMIR K HULLER , S RINIVASAN PARTHASARATHY, AND A RAVIND S RINI -
     VASAN: Dependent rounding and its applications to approximation algorithms. J. ACM, 53(3):324–
     360, 2006. Preliminary version in FOCS’02. [doi:10.1145/1147954.1147956] 8

[19] E FIM DAVYDOVICH G LUSKIN: Extremal properties of orthogonal parallelepipeds and their
     applications to the geometry of Banach spaces. Sbornik: Mathematics, 64(1):85–96, 1989.
     [doi:10.1070/SM1989v064n01ABEH003295] 2

                     T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                        23
             N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT

[20] N ICHOLAS J. A. H ARVEY, ROY S CHWARTZ , AND M OHIT S INGH: Discrepancy without partial
     colorings. In Proc. 17th Internat. Workshop on Approximation Algorithms for Combinatorial
     Optimization Problems (APPROX’14), volume 28 of LIPIcs, pp. 258–273. Schloss Dagstuhl–
     Leibniz-Zentrum fuer Informatik, 2014. [doi:10.4230/LIPIcs.APPROX-RANDOM.2014.258] 2

[21] K ASPER G REEN L ARSEN: On range searching in the group model and combinatorial discrepancy.
     SIAM J. Comput., 43(2):673–686, 2014. Preliminary version in FOCS’11. [doi:10.1137/120865240]
     3

[22] L AP -C HI L AU , R. R AVI , AND M OHIT S INGH: Iterative Methods in Combinatorial Optimization.
     Cambridge Univ. Press, 2011. [doi:10.1017/CBO9780511977152] 2

[23] AVI L EVY, H ARISHCHANDRA R AMADAS , AND T HOMAS ROTHVOSS: Deterministic discrep-
     ancy minimization via the multiplicative weight update method. In Proc. 19th Ann. Conf. on
     Integer Programming and Combinatorial Optimization (IPCO ’17), pp. 380–391. Springer, 2017.
     [doi:10.1007/978-3-319-59250-3_31, arXiv:1611.08752] 3, 5

[24] L ÁSZLÓ L OVÁSZ , J OEL H. S PENCER , AND K ATALIN V ESZTERGOMBI: Discrepancy of set-
     systems and matrices. European J. Combinatorics, 7(2):151–160, 1986. [doi:10.1016/S0195-
     6698(86)80041-5] 19

[25] S HACHAR L OVETT AND R AGHU M EKA: Constructive discrepancy minimization by walking
     on the edges. SIAM J. Comput., 44(5):1573–1582, 2015. Preliminary version in FOCS’12.
     [doi:10.1137/130929400, arXiv:1203.5747] 2

[26] J I ŘÍ M ATOUŠEK: An L p version of the Beck-Fiala conjecture. European J. Combinatorics,
     19(2):175–182, 1998. [doi:10.1006/eujc.1997.0162] 5

[27] J I ŘÍ M ATOUŠEK: Geometric Discrepancy: An Illustrated Guide. Springer, 2009. [doi:10.1007/978-
     3-642-03942-3] 2

[28] J I ŘÍ M ATOUŠEK , A LEKSANDAR N IKOLOV, AND K UNAL TALWAR: Factorization norms and hered-
     itary discrepancy. Internat. Mathematics Research Notices, rny033, 2018. [doi:10.1093/imrn/rny033,
     arXiv:1408.1376] 3, 6, 19

[29] A LEKSANDAR N IKOLOV:          The Komlós conjecture holds for vector colorings, 2013.
     [arXiv:1301.4039] 5

[30] A LEKSANDAR N IKOLOV: New Computational Aspects of Discrepancy Theory. Ph. D. thesis,
     Rutgers University, 2014. [doi:10.7282/T3RN3749] 2, 3

[31] A LEKSANDAR N IKOLOV: Tighter bounds for the discrepancy of boxes and polytopes. Mathematika,
     63(3):1091–1113, 2017. [doi:10.1112/S0025579317000250, arXiv:1701.05532] 3, 21

[32] A LEKSANDAR N IKOLOV, K UNAL TALWAR , AND L I Z HANG: The geometry of differential privacy:
     The small database and approximate cases. SIAM J. Comput., 45(2):575–616, 2016. Preliminary
     version in STOC’13. [doi:10.1137/130938943, arXiv:1212.0297] 2

                     T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                          24
                 T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

[33] T HOMAS ROTHVOSS: Better bin packing approximations via discrepancy theory. SIAM J. Comput.,
     45(3):930–946, 2016. Preliminary version in FOCS’13. [doi:10.1137/140955367, arXiv:1301.4010]
     2

[34] T HOMAS ROTHVOSS: Constructive discrepancy minimization for convex sets. SIAM J. Comput.,
     46(1):224–234, 2017. Preliminary version in FOCS’14. [doi:10.1137/141000282, arXiv:1404.0339]
     2, 3

[35] J OEL H. S PENCER: Six standard deviations suffice. Trans. Amer. Math. Soc., 289(2):679–706,
     1985. [doi:10.1090/S0002-9947-1985-0784009-0] 2

[36] J OEL H. S PENCER:      Ten Lectures on the Probabilistic Method.                SIAM, 1987.
     [doi:10.1137/1.9781611970074] 5

[37] A RAVIND S RINIVASAN: Distributions on level-sets with applications to approximation algorithms.
     In Proc. 42nd FOCS, pp. 588–597. IEEE Comp. Soc. Press, 2001. [doi:10.1109/SFCS.2001.959935]
     8

[38] M ICHEL TALAGRAND: The Generic Chaining: Upper and Lower Bounds of Stochastic Processes.
     Springer, 2005. [doi:10.1007/3-540-27499-5] 4


AUTHORS

      Nikhil Bansal
      Researcher
      Centrum voor Wiskunde en Informatica
      Amsterdam, the Netherlands
      bansal gmail com
      http://www.win.tue.nl/~nikhil


      Daniel Dadush
      Researcher
      Centrum voor Wiskunde en Informatica
      Amsterdam, the Netherlands
      dndadush gmail com
      https://homepages.cwi.nl/~dadush/




                     T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                        25
           N IKHIL BANSAL , DANIEL DADUSH , S HASHWAT G ARG AND S HACHAR L OVETT

    Shashwat Garg
    Quantitative researcher
    WorldQuant LLC
    Budapest, Hungary
    garg.shashwat gmail com
    https://www.win.tue.nl/~sgarg/


    Shachar Lovett
    Associate professor
    University of California, San Diego
    slovett cse ucsd edu
    http://cseweb.ucsd.edu/~slovett


ABOUT THE AUTHORS

    N IKHIL BANSAL is a researcher at the Centrum voor Wiskunde en Informatica, Amsterdam.
        He attended the Indian Institute of Technology, Mumbai for his B. Tech. degree, and
        received his Ph. D. from Carnegie Mellon University, where he was advised by Avrim
        Blum.
       He got fascinated by algorithms while taking an undergraduate course by Prof. Ajit
       A. Diwan. Since then he has enjoyed thinking about various kinds of algorithmic
       questions.


    DANIEL DADUSH is a tenured researcher at the Centrum voor Wiskunde en Informatica,
      Amsterdam. He earned his Ph. D. in algorithms, combinatorics and optimization (ACO)
      from Georgia Tech in 2012, where his advisor was Santosh Vempala. Before joining
      CWI, he spent two years as a Simons postdoctoral fellow in the Computer Science
      Department at New York University. His research has focused on algorithms for lattice
      problems, integer programming, and high-dimensional convex geometry. He lives and
      works in Amsterdam, and is glad that, thus far, the city remains above water. He enjoys
      reading the New York Times, listening to NPR, and taking long bike rides along the
      canals when it is not raining.


    S HASHWAT G ARG is a Quantitative Researcher at WorldQuant LLC. He got his Ph. D. in
       algorithms from Eindhoven University of Technology in 2019 where his advisor was
       Nikhil Bansal. His research focused on algorithms for combinatorial discrepancy and
       approximation algorithms for scheduling problems. He lives in Budapest, Hungary. He
       enjoys reading anything under the sun, exploring new cafés and painting.




                   T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                        26
           T HE G RAM –S CHMIDT WALK : A C URE FOR THE BANASZCZYK B LUES

S HACHAR L OVETT graduated from the Weizmann Institute of Science in 2010; his advisors
   were Omer Reingold and Ran Raz. He was a member of the Institute for Advanced Study,
   School of Mathematics between 2010-2012. Since then, he has been a faculty member at
   the University of California, San Diego. He is interested in the role that structure and
   randomness play in computation and mathematics, and in particular in computational
   complexity, coding theory, pseudorandomness, and algebraic constructions.




               T HEORY OF C OMPUTING, Volume 15 (21), 2019, pp. 1–27                          27