On a Generalization of Iterated and Randomized Rounding

Authors Nikhil Bansal,
Plaintext
                           T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23
                                        www.theoryofcomputing.org




        On a Generalization of Iterated and
             Randomized Rounding
                                                   Nikhil Bansal∗
               Received April 24, 2019; Revised November 18, 2022; Published December 5, 2024




       Abstract. We give a general method for rounding linear programs, that combines
       the commonly used iterated rounding and randomized rounding techniques. In
       particular, we show that whenever iterated rounding can be applied to a problem with
       some slack, there is a randomized procedure that returns an integral solution that
       satisfies the guarantees of iterated rounding and also has concentration properties.
       We use this to give new results for several classical problems such as rounding
       column-sparse LPs, makespan minimization on unrelated machines and degree-
       bounded spanning trees.


1     Introduction
A powerful approach in approximation algorithms is to formulate the problem at hand as a
0-1 integer program and consider some efficiently solvable relaxation for it. Then, given some
fractional solution 𝑥 ∈ [0, 1]𝑛 to this relaxation, apply a suitable rounding procedure to 𝑥 to
     An extended abstract of this paper appeared in the Proceedings of the 51st Annual ACM Symposium on Theory
of Computing (STOC’19) [7].
   ∗ Supported by the ERC Consolidator Grant 617951 and the NWO VICI grant 639.023.812.



ACM Classification: F.2.2, G.1.6
AMS Classification: 68W25, 68W20
Key words and phrases: approximation, randomized rounding, scheduling, discrepancy,
semidefinite programming, random walks


© 2024 Nikhil Bansal
c b Licensed under a Creative Commons Attribution License (CC-BY)                  DOI: 10.4086/toc.2024.v020a006
                                            N IKHIL BANSAL

obtain an integral 0-1 solution. Arguably, the two most basic and extensively studied techniques
for rounding such relaxations are randomized rounding and iterated rounding.

Randomized rounding. Here, the fractional values 𝑥 𝑖 ∈ [0, 1] are interpreted as probabilities,
and used to round the variables independently to 0 or 1. A key property of this rounding is that
each linear constraint is preserved in expectation, and its value is tightly concentrated around
its mean as given by Chernoff bounds, or more generally Bernstein’s inequality. Randomized
rounding is well-suited to problems where the constraints do not have much structure, or when
they are soft and some error can be tolerated. Sometimes these errors can be fixed by applying
problem-specific alteration steps. We refer to [36, 35] for various applications of randomized
rounding.

Iterated rounding. This technique is quite different from randomized rounding and is useful
for problems that may have some hard combinatorial constraints that must be maintained, e. g.,
if the final solution must be a spanning tree or a matching. It is also useful for problems where
the constraints may have some other interesting structural property such as column-sparsity
that we may wish to exploit.
     The rounding is based on linear algebra and it proceeds in several iterations 𝑘 = 1, 2, . . .,
until all variables are eventually rounded to 0 or 1. More specifically, we start with 𝑥 (0) = 𝑥
initially, and let 𝑥 (𝑘−1) ∈ ℝ 𝑛 be the solution at the beginning of iteration 𝑘 and 𝑛 𝑘 denote the
number of fractional variables in 𝑥 (𝑘−1) (i. e., those strictly between 0 and 1). Then one cleverly
chooses some collection of linear constraints on these 𝑛 𝑘 fractional variables, say specified by
rows of the matrix 𝑊 (𝑘) with rank(𝑊 (𝑘) ) ≤ 𝑛 𝑘 − 1, and updates the solution as 𝑥 (𝑘) = 𝑥 (𝑘−1) + 𝑦 (𝑘)
by some non-zero vector 𝑦 (𝑘) satisfying 𝑊 (𝑘) 𝑦 (𝑘) = 0, so that some fractional variable reaches 0
or 1. Such a 𝑦 (𝑘) exists as the null space of 𝑊 (𝑘) has dimension 𝑛 𝑘 − rank(𝑊 (𝑘) ) ≥ 1. Notice that
once a variable reaches 0 or 1 it stays fixed.
     Despite its simplicity, this method is extremely powerful and most fundamental results
in combinatorial optimization such as the integrality of matroid, matroid-intersection and
non-bipartite matching polytopes follow very cleanly using this approach. Similarly, several
breakthrough results for problems such as degree-bounded spanning trees, survivable network
design and rounding for column-sparse LPs were obtained by this method. An excellent
reference is [23].

1.1   Need for combining the approaches
In many problem settings, however, one needs a rounding that combines the features of both
randomized and iterated rounding [15, 16, 19, 3]. We give several examples in Section 1.3, but a
typical scenario is where the problem involves finding an object with specific combinatorial
constraints that cannot be violated, e. g., a spanning tree to connect nodes in a network, or
a one-sided matching (assignment) of jobs to machines; and additionally a list of other soft
side-constraints, e. g., a bound on the maximum degree of the spanning tree to prevent any
particular node from being overloaded, or perhaps edges are of several types and we wish to

                       T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                            2
                  O N A G ENERALIZATION OF I TERATED AND R ANDOMIZED ROUNDING

purchase a certain minimum number of each type due to fairness considerations, or there may
be multiple budget constraints for various subsets of edges.
   As the soft constraints are typically arbitrary and lack structure, essentially the best one can
hope for is to satisfy them fractionally and then apply randomized rounding. On the other
hand, randomized rounding can be quite bad at satisfying the hard combinatorial constraints,
and iterated rounding is the right approach to handle them. So given a problem with both hard
and soft constraints, either technique by itself does not suffice and one would like a rounding
that simultaneously does as well as iterated rounding on the hard constraints and as well as
randomized rounding on the soft constraints.


Dependent rounding. Motivated by such problems, there has been extensive work on devel-
oping dependent rounding techniques. Roughly speaking, these techniques round the fractional
solution in some random but correlated way to satisfy the hard constraints and also ensure some
concentration properties for the soft constraints. Some examples of such methods include swap
rounding [15, 16], randomized pipage rounding [2, 33, 19, 20], maximum-entropy sampling
[3, 32, 4], rounding via discrepancy [26, 29, 11] and Gaussian random walks [28].
    A key idea here is that the weaker property of negative dependence (instead of independence)
also suffices to get concentration. There is a rich and deep theory of negative dependence
and various notions such as negative correlation, negative cylinder dependence, negative
association, strongly Rayleigh property and determinantal measures, that imply interesting
concentration properties [27, 13, 17]. This insight has been extremely useful and for many
general problems such as those involving assignment or matroid polytopes, one can exploit
the underlying combinatorial structure to design rounding approaches that ensure negative
dependence between all or some suitable collection of random variables.


Limitations. Even though these dependent rounding methods are powerful and ingenious,
they are also limited by the fact that requiring negative dependence substantially restricts the
kinds of rounding steps that can be designed, and the type of problems that they can be applied
to. Moreover, even when such a rounding is possible, it typically requires a lot of creativity and
careful understanding of the problem structure to come up with the rounding for the problem
at hand.


1.2   Our results

Our main result is a new and general dependent rounding approach that we call sub-isotropic
rounding. In particular, it combines iterated and randomized rounding in a completely generic
way and significantly extends the scope of previous dependent rounding techniques. Before
describing our result, we need some definitions.
    Let 𝑋1 , . . . , 𝑋𝑛 be independent 0-1 random variables with means 𝑥 𝑖 = 𝔼[𝑋𝑖 ] and 𝑎 1 , . . . , 𝑎 𝑛
be arbitrary reals (possibly negative). It is well known [14] that the sum 𝑆 = 𝑖 𝑎 𝑖 𝑋𝑖 satisfies the
                                                                              Í


                       T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                            3
                                                N IKHIL BANSAL

following tail bound for any 𝑡 ≥ 0:
                                                                                                    !
                                                                                    𝑡2
      (Bernstein’s inequality)          Pr[𝑆 − 𝔼[𝑆] ≥ 𝑡] ≤ exp − Í 2                                        (1.1)
                                                                2 𝑖 𝑎 𝑖 (𝑥 𝑖 − 𝑥 2𝑖 ) + 2𝑀𝑡/3

where 𝑀 = max𝑖 |𝑎 𝑖 |. The lower tail follows by applying the above to −𝑎 𝑖 , and the standard
Chernoff bounds correspond to (1.1) when 𝑎 𝑖 ∈ [0, 1] for 𝑖 ∈ [𝑛].
   The following relaxation of Bernstein’s inequality will be highly relevant for us.

Definition 1.1 (𝛽-concentration). Let 𝛽 ≥ 1. For a vector-valued random variable 𝑋 =
(𝑋1 , . . . , 𝑋𝑛 ) where 𝑋𝑖 are possibly dependent 0-1 random variables, we say that 𝑋 is 𝛽-
concentrated around its mean 𝑥 = (𝑥 1 , . . . , 𝑥 𝑛 ) where 𝑥 𝑖 = 𝔼[𝑋𝑖 ], if for every 𝑎 ∈ ℝ 𝑛 , the
real-valued random variable h𝑎, 𝑋i satisfies Bernstein’s inequality up to a factor of 𝛽 in the
exponent, i. e.,
                                                                                            !
                                                                          𝑡 2 /𝛽
                    Pr[h𝑎, 𝑋i − 𝔼[h𝑎, 𝑥i] ≥ 𝑡] ≤ exp − Í 2                                                  (1.2)
                                                      2 𝑖 𝑎 𝑖 (𝑥 𝑖 − 𝑥 2𝑖 ) + 2𝑀𝑡/3

where 𝑀 = max𝑖 |𝑎 𝑖 |.

Main result. We show that whenever iterated rounding can be applied to a problem such
that in iteration 𝑘, there is some slack in the sense that rank(𝑊 (𝑘) ) ≤ (1 − 𝛿)𝑛 𝑘 for some fixed
𝛿 > 0, then 𝑂(1/𝛿)-concentration can be achieved for free. More precisely, we have the following
result.

Theorem 1.2. For any fixed 𝛿 ∈ (0, 1), let us formalize an iterated rounding algorithm as follows. Given
a starting solution 𝑥, initialize 𝑥 (0) = 𝑥. In each step 𝑘, for 𝑘 ≥ 1, the algorithm selects a matrix 𝑊 (𝑘) with
rank(𝑊 (𝑘) ) ≤ (1 − 𝛿)𝑛 𝑘 . Now it can pick any 𝑦 (𝑘) satisfying 𝑊 (𝑘) 𝑦 (𝑘) = 0 and set 𝑥 (𝑘) = 𝑥 (𝑘−1) + 𝑦 (𝑘) ,
and iterate until 𝑥 (final) is in {0, 1} 𝑛 . Let 𝑉 ⊂ {0, 1} 𝑛 be the set of outcomes that can be reached by the
iterated rounding algorithm.
     Then the sub-isotropic rounding algorithm outputs a random vector 𝑋 satisfying

   1. 𝑋 ∈ 𝑉 with probability 1, and

   2. 𝔼[𝑋] = 𝑥 and 𝑋 is 𝛽-concentrated around 𝑥 with 𝛽 = 20/𝛿.

Remark 1.3. A simple example shows that the dependence 𝛽 = Ω(1/𝛿) in Theorem 1.2 cannot
be improved. Let 𝛿 = 1/𝑡 for some integer 𝑡, and consider 𝑛 variables 𝑥 1 , . . . , 𝑥 𝑛 , partitioned
into 𝑛/𝑡 blocks 𝐵1 , . . . , 𝐵𝑛/𝑡 where block 𝐵 𝑖 = {𝑥 (𝑖−1)𝑡+1 , . . . , 𝑥 𝑖𝑡 }. For each 𝐵 𝑖 there are 𝑡 − 1
constraints 𝑥(𝑖−1)𝑡+1 = 𝑥 (𝑖−1)𝑡+2 = · · · = 𝑥 𝑖𝑡 , and hence there are (𝑡 − 1)(𝑛/𝑡) = 𝑛(1 − 𝛿) constraints
in total. Consider the starting solution with all 𝑥 𝑗 = 1/2. Now, no matter what random choices
the algorithm makes, the variables within a block evolve identically and all reach the same value
0 or 1. So the linear function 𝑆 = 𝑥 1 + · · · + 𝑥 𝑛 will only be 1/𝛿-concentrated.

                         T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                                   4
                 O N A G ENERALIZATION OF I TERATED AND R ANDOMIZED ROUNDING

    The generality of Theorem 1.2 directly gives new results for several problems where iterated
rounding gives useful guarantees. All one needs to show is that the original iterated rounding
argument for the problem can be applied with some slack, which is often straightforward and
only worsens the approximation guarantee slightly. In particular, note that Theorem 1.2 makes
no assumption about the combinatorial structure of the problem and by working with the more
relaxed notion of 𝛽-concentration, we can completely avoid the need for negative dependence.

1.3     Motivating problems and applications
We now describe several applications of our result and also briefly discuss why they seem
beyond the reach of current dependent rounding methods.

1.3.1    Rounding for column-sparse LPs
Let 𝑥 ∈ [0, 1]𝑛 be some fractional solution satisfying 𝐴𝑥 = 𝑏, where 𝐴 ∈ ℝ 𝑚×𝑛 is an 𝑚 × 𝑛
matrix. The celebrated Beck–Fiala algorithm [12] (see also [21] for a related result) uses iterated
rounding to produce an integral solution 𝑋 satisfying k𝐴(𝑋 − 𝑥)k ∞ < 𝑡, where 𝑡 is the maximum
ℓ 1 norm of the columns of 𝐴. This is substantially better than randomized rounding for small 𝑡,
where the error for any row is typically its ℓ 2 norm which can be substantially larger than 𝑡.
     Many problems, however, involve both some column-sparse constraints that come from the
underlying combinatorial problem, and some general arbitrary constraints which might not
have much structure. This motivates the following natural question.
Question 1.4. Let 𝑀 be a linear system with two sets of constraints given by matrices 𝐴 and 𝐵,
where 𝐴 is column-sparse, while 𝐵 is arbitrary. Given some fractional solution 𝑥, can we round
it to get error 𝑂(𝑡) for the rows of 𝐴, while doing no worse than randomized rounding for the
constraints in 𝐵?
Remark 1.5. Note that simply applying iterated rounding on the rows of 𝐴 gives no control on
the error for 𝐵. Similarly, just doing randomized rounding will not give 𝑂(𝑡) error for 𝐴. Also
as 𝐴 and 𝐵 are arbitrary, negative dependence based techniques do not seem to apply.
     We show that a direct modification of the Beck–Fiala argument gives slack 𝛿, for any
𝛿 ∈ [0, 1), while worsening the error bound slightly to 𝑡/(1 − 𝛿). Setting, say 𝛿 = 1/2 and
applying Theorem 1.2 gives 𝑋 ∈ {−1, 1} 𝑛 that (i) has error at most 2𝑡 for rows of 𝐴, (ii) satisfies
𝔼[𝑋𝑖 ] = 𝑥 𝑖 and is 𝑂(1)-concentrated, thus giving similar guarantees as randomized rounding
for the rows of 𝐵. In fact, the solution produced by the algorithm will satisfy concentration for
all linear constraints and not just for the rows of 𝐵.
     We also consider an extension to the Komlós setting, where the error depends on the
maximum ℓ2 norm of columns of 𝐴. These results are described in Section 4.1.

1.3.2    Makespan minimization on unrelated machines
The classical problem of makespan minimization on unrelated machines is the following. Given
𝑛 jobs and 𝑚 machines, where each job 𝑗 ∈ [𝑛] has arbitrary size 𝑝 𝑖𝑗 on machine 𝑖 ∈ [𝑚], assign

                      T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                        5
                                              N IKHIL BANSAL

the jobs to machines to minimize the maximum machine load. In a celebrated result, [24] gave
a rounding method with additive error at most 𝑝 max := max𝑖𝑗 𝑝 𝑖𝑗 , i. e., it gives an assignment
with makespan at most Opt + 𝑝 max where Opt is the value of an optimum LP solution. In
many practical problems, however, there are other soft resource constraints and side constraints
that are added to the fractional formulation. So it is useful to find a rounding that satisfies
these approximately but increases the makespan by only 𝑂(𝑝 max ). This motivates the following
natural problem.

Question 1.6. Given a fractional assignment 𝑥, find an integral assignment 𝑋 with additive
error 𝑂(𝑝max ) and that also satisfies 𝔼[𝑋𝑖𝑗 ] = 𝑥 𝑖𝑗 and concentration for all linear functions of 𝑥 𝑖𝑗 ,
i. e., for all {𝑎 𝑖𝑗 } 𝑖𝑗 , with high probability it holds that 𝑖𝑗 𝑎 𝑖𝑗 𝑋𝑖𝑗 ≈ 𝑖𝑗 𝑎 𝑖𝑗 𝑥 𝑖𝑗 .
                                                               Í             Í

    Questions related to finding a good assignment with some concentration properties have
been studied before [19, 2, 16], and several methods such as randomized pipage rounding and
swap rounding have been developed for this. However, these methods crucially rely on the
underlying matching structure and round the variables alternately along cycles, which limits
them in various ways: either they give partial assignments, or only get concentration for edges
incident to a vertex.
    We show that the iterated rounding proof of the result of [24] can be easily modified
to work for any slack 𝛿 ∈ [0, 1/2) while giving additive error 𝑝 max /(1 − 2𝛿). Theorem 1.2
(say, with 𝛿 = 1/4), thus gives a solution that has additive error at most 2𝑝 max and satisfies
𝑂(1)-concentration. The result also extends naturally to the 𝑘-resource setting, where 𝑝 𝑖𝑗 is a
𝑘-dimensional vector. These results are described in Section 4.2.

1.3.3   Degree-bounded spanning trees and thin trees
In the minimum cost degree-bounded spanning tree problem, we are given an undirected graph
𝐺 = (𝑉 , 𝐸) with edge costs 𝑐 𝑒 ≥ 0 for 𝑒 ∈ 𝐸, and integer degree bounds 𝑏 𝑣 for 𝑣 ∈ 𝑉, and the goal
is to find a minimum cost spanning tree satisfying the degree bounds. In a breakthrough result,
Singh and Lau [31] gave an iterated rounding algorithm that given any fractional spanning tree
𝑥, finds a spanning tree 𝑇 with cost at most h𝑐, 𝑥i and a degree violation of plus one.
     The celebrated thin-tree conjecture asks1 if given a fractional spanning tree 𝑥, there is a
spanning tree 𝑇 satisfying Δ𝑇 (𝑆) ≤ 𝛽Δ𝑥 (𝑆) for every 𝑆 ⊂ 𝑉, where 𝛽 = 𝑂(1). Here Δ𝑇 (𝑆) is
the number of edges of 𝑇 crossing 𝑆, and Δ𝑥 (𝑆) is the 𝑥-value crossing 𝑆. This conjecture has
received a lot of attention recently, due to its connection to the asymmetric travelling salesman
problem (ATSP) [3, 1]. Despite the recent breakthrough on ATSP [34], the thin-tree conjecture
remains open.
     If one only considers single vertex sets 𝑆 = {𝑣}, the result of [31] implies that Δ𝑇 (𝑣) ≤ 2Δ𝑥 (𝑣)
for each vertex 𝑣 (as Δ𝑥 (𝑣) ≥ 1 in any fractional spanning tree 𝑥). On the other hand for general
sets 𝑆, the best known algorithmic results give 𝛽 = 𝑂(log 𝑛/log log 𝑛) [3, 15, 32, 20]. These
algorithms crucially rely on the negative dependence properties of spanning trees, which do
   1Equivalently, any 𝑘-edge-connected graph 𝐺 has a spanning tree satisfying Δ𝑇 (𝑆) = 𝑂(1/𝑘)Δ𝐺 (𝑆) for every
𝑆 ⊂ 𝑉.


                        T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                               6
                   O N A G ENERALIZATION OF I TERATED AND R ANDOMIZED ROUNDING

not give anything better for single vertex cuts (e. g., even if 𝑏 𝑣 = 2 for all 𝑣, by a balls-and-bins
argument a random tree will have maximum degree Θ(log 𝑛/log log 𝑛)).
   The motivates the following natural question as a first step toward the thin-tree conjecture.

Question 1.7. Can we find a spanning tree that achieves 𝛽 = 𝑂(1) for single vertex cuts and
𝛽 = 𝑂(log 𝑛/log log 𝑛) for general cuts?

    We show that the iterated rounding algorithm of [31] can be easily modified to create slack
𝛿 ∈ (0, 1/2) while violating the degree bounds additively by less than 2/(1 − 2𝛿). Applying
Theorem 1.2 with 𝛿 = 1/6, the degree violation is strictly below 3 and this gives a distribution
supported on trees with a degree violation of plus 2 and satisfies 𝑂(1)-concentration. By a
standard cut counting argument [3], the concentration property implies 𝑂(log 𝑛/log log 𝑛)-
thinness for every cut. We describe these results in Section 4.3. In fact, we consider the more
general setting of the minimum cost degree-bounded matroid-basis problem.

1.4   Overview of techniques
We now give a high level overview of our algorithm and analysis. The starting observation is
that randomized rounding can be viewed as an iterative algorithm by doing a discrete version
of the standard Brownian motion on the cube as follows. Given 𝑥 (0) as the starting fractional
solution, consider a random walk in the [0, 1]𝑛 cube starting at 𝑥 (0) , with tiny step size ±𝛾
chosen independently for each coordinate, where upon reaching a face of the cube (i. e., some
𝑥 𝑖 reaches 0 or 1) the walk stays on that face. The process stops upon reaching some vertex
𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) of the cube. By the martingale property of random walks, the probability
                            (0)
that 𝑋𝑖 = 1 is exactly 𝑥 𝑖 and as the walk in each coordinate is independent, 𝑋 has the same
distribution on {0, 1} 𝑛 as under randomized rounding.

A first attempt. Now consider iterated rounding, and recall that here the update 𝑦 (𝑘) at iteration
𝑘 must lie in the nullspace of 𝑊 (𝑘) . So a natural first idea to combine this with randomized
rounding, is to do a random walk in the null space of 𝑊 (𝑘) until some variable reaches 0 or 1. The
slack condition rank(𝑊 (𝑘) ) ≤ (1 − 𝛿)𝑛 𝑘 implies that the nullspace has at least 𝛿𝑛 𝑘 dimensions,
which could potentially give “enough randomness" to the random walk.
    It turns out, however, that doing a standard random walk in the null space of 𝑊 (𝑘) does
not work. The problem is that as the constraints defining 𝑊 (𝑘) can be completely arbitrary in
our setting, the random walk can lead to very high correlations between certain subsets of
coordinates causing the 𝛽-concentration property to fail. For example, suppose 𝛿 = 1/2 and 𝑊 (0)
consists of the constraints 𝑥 𝑖 = 𝑥 𝑖+1 for 𝑖 = 1, . . . , 𝑛/2 − 1. Then the random walk will update
𝑥 𝑛/2 , . . . , 𝑥 𝑛 independently, but for 𝑥 1 , . . . , 𝑥 𝑛/2 the updates must satisfy Δ𝑥1 = . . . = Δ𝑥 𝑛/2 , and
hence will be completely correlated. So the linear function 𝑥 1 + . . . + 𝑥 𝑛/2 will not have any
concentration (as all the variables will simultaneously rise by −𝛿 or by +𝛿).

A different random walk. To get around this problem, we design a different random walk
in the null space of 𝑊 (𝑘) , which looks similar to an independent walk in every direction even

                         T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                                   7
                                                  N IKHIL BANSAL

though the coordinates are correlated. More formally, consider a random vector 𝑌 = (𝑌1 , . . . , 𝑌𝑛 ),
where 𝑌𝑖 are mean-zero random variables. For a parameter 𝜂 ≥ 1, we say that 𝑌 is 𝜂-weakly
pairwise independent if for every 𝑎 = (𝑎 1 , . . . , 𝑎 𝑛 ) ∈ ℝ 𝑛 ,
                                                hÕ          i    Õ
                                𝔼[h𝑎, 𝑌i 2 ] = 𝔼 ( 𝑎 𝑖 𝑌𝑖 )2 ≤ 𝜂   𝑎 2𝑖 𝔼[𝑌𝑖2 ].
                                                      𝑖                  𝑖

If 𝑌1 , . . . , 𝑌𝑛 are pairwise independent, note that the above holds as equality with 𝜂 = 1, and
hence this can be viewed as a relaxation of pairwise independence. We show that whenever
rank(𝑊 (𝑘) ) ≤ (1 − 𝛿)𝑛 𝑘 , there exist 𝜂-weakly pairwise independent random updates 𝑦 (𝑘) that
lie in the null space of 𝑊 (𝑘) (which has dimension at least 𝛿𝑛 𝑘 ) with 𝜂 ≈ 1/𝛿. Moreover these
updates can be found by solving a semidefinite program (SDP).
     Next, using a variant of Freedman’s martingale analysis [18], we show that applying these
𝜂-weakly pairwise independent random updates (with small increments) until all the variables
reach 0-1, gives an integral solution that satisfies 𝑂(𝜂)-concentration.
     These techniques are motivated by our recent works on algorithmic discrepancy [8, 9]. While
discrepancy is closely related to rounding [25, 29], a key difference in discrepancy is that the
error for rounding a linear system 𝐴𝑥 = 𝑏 depends on the ℓ 2 norms of the coefficients of the
constraints and not on 𝑏. E. g., suppose 𝑥 ∈ [0, 1]𝑛 satisfies 𝑥 1 + . . . + 𝑥 𝑛 = log 𝑛, then the
sum stays 𝑂(log 𝑛) upon randomized
                                √         rounding with high probability, while using discrepancy
methods directly gives Ω( 𝑛) error, which would be unacceptably large in this setting. So our
results can be viewed as using techniques from discrepancy theory to obtain bounds that are
sensitive to 𝑥. Recently, this direction was explored in [11] but their method gave much weaker
results and applied to very limited settings.


2     Technical preliminaries
2.1     Tail bounds for supermartingales
We will need the following tail bound for supermartingales with a strong negative drift.
Theorem 2.1. Let 𝛼 ∈ (0, 1). Let {𝑍 𝑘 : 𝑘 = 0, 1, . . . , } be a sequence of random variables with
𝑌𝑘 := 𝑍 𝑘 − 𝑍 𝑘−1 , such that 𝑍0 is constant and 𝑌𝑘 ≤ 1 for all 𝑘 ≥ 1. If

                                             𝔼 𝑘−1 [𝑌𝑘 ] ≤ −𝛼 𝔼 𝑘−1 [𝑌𝑘2 ]

for all 𝑘 ≥ 1, where 𝔼 𝑘−1 [ · ] denotes 𝔼[ · |𝑍1 , . . . , 𝑍 𝑘−1 ], then for 𝑡 ≥ 0,

                                          Pr[𝑍 𝑘 − 𝑍0 ≥ 𝑡] ≤ exp(−𝛼𝑡).

      Before proving Theorem 2.1, we first need a simple lemma.
Lemma 2.2. Let 𝑋 be a random variable satisfying 𝑋 ≤ 1. Then for any 𝜆 > 0,
                                                                                      
                                    𝜆𝑋                          𝜆
                                𝔼[e      ] ≤ exp 𝜆𝔼[𝑋] + (e − 𝜆 − 1)𝔼[𝑋 ] .       2



                          T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                      8
                    O N A G ENERALIZATION OF I TERATED AND R ANDOMIZED ROUNDING

Proof. Let 𝑓 (𝜆, 𝑥) = (e𝜆𝑥 − 𝜆𝑥 − 1)/𝑥 2 , where we set 𝑓 (𝜆, 0) = 𝜆2 /2. By standard integration, it
                                    ∫𝜆∫𝑠
is easily verified that 𝑓 (𝜆, 𝑥) = 0 0 e𝑡𝑥 𝑑𝑡 𝑑𝑠. As e𝑡𝑥 is non-decreasing in 𝑥 for all 𝑡 ≥ 0, this
implies that 𝑓 (𝜆, 𝑥) is non-decreasing in 𝑥. In particular, 𝑓 (𝜆, 𝑥) ≤ 𝑓 (𝜆, 1) for any 𝑥 ≤ 1 and
hence
             e𝜆𝑥 = 1 + 𝜆𝑥 + 𝑓 (𝜆, 𝑥)𝑥 2 ≤ 1 + 𝜆𝑥 + 𝑓 (𝜆, 1)𝑥 2 = 1 + 𝜆𝑥 + (e𝜆 − 𝜆 − 1)𝑥 2 .
Taking expectations and using that 1 + 𝑥 ≤ e𝑥 for all 𝑥 ∈ ℝ gives,
                                                                                             
              𝜆𝑋                     𝜆                                      𝜆
          𝔼[e      ] ≤ 1 + 𝜆𝔼[𝑋] + (e − 𝜆 − 1)𝔼[𝑋 ] ≤ exp 𝜆𝔼[𝑋] + (e − 𝜆 − 1)𝔼[𝑋 ] .
                                                     2                                    2
                                                                                                      

Proof of Theorem 2.1. By Markov’s inequality,

                                                                       𝔼[exp(𝛼(𝑍 𝑘 − 𝑍0 ))]
             Pr[𝑍 𝑘 − 𝑍0 ≥ 𝑡] = Pr[exp(𝛼(𝑍 𝑘 − 𝑍0 )) ≥ exp(𝛼𝑡)] ≤
                                                                                exp(𝛼𝑡)

so it suffices to show that 𝔼[exp(𝛼(𝑍 𝑘 − 𝑍0 ))] ≤ 1. As 𝑍0 is deterministic, this is same as
𝔼[exp(𝛼𝑍 𝑘 )] ≤ exp(𝛼𝑍0 ). Now,
                                              h              i
                  𝔼 𝑘−1 e𝛼𝑍 𝑘 = e𝛼𝑍 𝑘−1 𝔼 𝑘−1 e𝛼(𝑍 𝑘 −𝑍 𝑘−1 ) = e𝛼𝑍 𝑘−1 𝔼 𝑘−1 e𝛼𝑌𝑘
                                                                                
                                                                 
                ≤ e𝛼𝑍 𝑘−1 exp 𝛼𝔼 𝑘−1 [𝑌𝑘 ] + (e𝛼 − 𝛼 − 1)𝔼 𝑘−1 𝑌𝑘2             (Lemma 2.2)
                                                              
                ≤ e𝛼𝑍 𝑘−1 exp (e𝛼 − 𝛼2 − 𝛼 − 1)𝔼 𝑘−1 𝑌𝑘2
                                                         

                ≤ e𝛼𝑍 𝑘−1      (as e𝛼 ≤ 1 + 𝛼 + 𝛼2 for 0 ≤ 𝛼 ≤ 1).

As this holds for all 𝑘, the result follows by the property of Iterated Expectations.                 

2.2   Semidefinite matrices
Let 𝑀𝑛 denote the class of all symmetric 𝑛 × 𝑛 matrices with real entries. For two matrices
𝐴, 𝐵 ∈ ℝ 𝑛×𝑛 , the trace inner product of 𝐴 and 𝐵 is defined as h𝐴, 𝐵i = tr(𝐴𝑇 𝐵) = 𝑛𝑖=1 𝑛𝑗=1 𝑎 𝑖𝑗 𝑏 𝑖𝑗 .
                                                                                     Í Í
A matrix 𝑈 ∈ 𝑀𝑛 is positive semidefinite (PSD) if all its eigenvalues are non-negative and we
denote this by 𝑈  0. Equivalently, 𝑈  0 iff 𝑎 𝑇 𝑈 𝑎 = h𝑎𝑎 𝑇 , 𝑈i ≥ 0 for all 𝑎 ∈ ℝ 𝑛 .
    For, 𝑈  0 let 𝑈 1/2 = 𝑖 𝜆 𝑖 𝑢𝑖 𝑢𝑖𝑇 , where 𝑈 = 𝑖 𝜆 𝑖 𝑢𝑖 𝑢𝑖𝑇 is the spectral decomposition of 𝑈
                            Í 1/2                    Í

with orthonormal eigenvectors 𝑢𝑖 . Then 𝑈 1/2 is PSD and 𝑈 = 𝑉 𝑇 𝑉 for 𝑉 = 𝑈 1/2 . For 𝑌, 𝑍 ∈ 𝑀𝑛 ,
we say that 𝑌  𝑍 if 𝑍 − 𝑌  0.

2.3   Approximate independence and sub-isotropic random variables
Let 𝑌 = (𝑌1 , . . . , 𝑌𝑛 ) be a random vector with 𝑌1 , . . . , 𝑌𝑛 possibly dependent.

Definition 2.3 ((𝜁, 𝜂) sub-isotropic random vector). For 𝜁 ∈ (0, 1] and 𝜂 ≥ 1, we say that 𝑌 is
(𝜁, 𝜂) sub-isotropic if it satisfies the following conditions.

                        T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                           9
                                                  N IKHIL BANSAL

                                                                      Í𝑛
   1. 𝔼[𝑌𝑖 ] = 0 and 𝔼[𝑌𝑖2 ] ≤ 1 for all 𝑖 ∈ [𝑛], and                  𝑖=1 𝔼[𝑌𝑖 ] ≥ 𝜁𝑛.
                                                                               2



   2. For all 𝑐 = (𝑐 1 , . . . , 𝑐 𝑛 ) ∈ ℝ 𝑛 it holds that

                                              𝑛
                                             hÕ                   i        𝑛
                                                                           Õ
                                           𝔼 (         𝑐 𝑖 𝑌𝑖 )
                                                              2
                                                                      ≤𝜂         𝑐 2𝑖 𝔼[𝑌𝑖 ]2 .      (2.1)
                                                 𝑖=1                       𝑖=1



   Note that if 𝑌1 , . . . , 𝑌𝑛 are pairwise independent then (2.1) holds with equality for 𝜂 = 1.
   Let 𝑈 ∈ 𝑀𝑛 be the 𝑛 × 𝑛 covariance matrix of 𝑌1 , . . . , 𝑌𝑛 , i. e., 𝑈 𝑖𝑗 = 𝔼[𝑌𝑖 𝑌𝑗 ]. Every covariance
matrix is PSD as 𝑐 𝑇 𝑈 𝑐 = 𝔼[( 𝑖 𝑐 𝑖 𝑌𝑖 )2 ] ≥ 0 for all 𝑐 ∈ ℝ 𝑛 . Let diag(𝑈) denote the diagonal 𝑛 × 𝑛
                                  Í
matrix with entries 𝑈 𝑖𝑖 , then (2.1) can be written as 𝑐 𝑇 (𝜂 diag(𝑈) − 𝑈)𝑐 ≥ 0 for every 𝑐 ∈ ℝ 𝑛 ,
and hence equivalently expressed as

                                                  𝑈  𝜂 diag(𝑈).


Generic construction. We will use the following generic way to produce (𝜁, 𝜂) sub-isotropic
vectors. Let 𝑈 be a 𝑛 × 𝑛 PSD matrix satisfying: 𝑈 𝑖𝑖 ≤ 1, tr(𝑈) ≥ 𝜁𝑛 and 𝑈  𝜂 diag(𝑈). Let
𝑟 ∈ ℝ 𝑛 be a random vector where each coordinate is independently and uniformly ±1. Then the
random vector 𝑌 = 𝑈 1/2 𝑟 has covariance 𝔼[𝑌𝑌 𝑇 ] = 𝑈 1/2 𝔼[𝑟𝑟 𝑇 ](𝑈 𝑇 )1/2 = 𝑈, and the properties
of 𝑈 imply that 𝑌 is (𝜁, 𝜂) sub-isotropic.

Remark 2.4. In other similar constructions, 𝑟 is often chosen as a random Gaussian, but we
prefer to choose it as random ±1 as it is bounded, and this makes some technical arguments
easier later on.

   We will need the following result from [9], about finding sub-isotropic random vectors
orthogonal to a subspace.

Theorem 2.5 ([9], Theorem 6). Let 𝐺 ⊂ ℝ 𝑛 be an arbitrary subspace with dimension dim(𝐺) = ℓ = 𝛿𝑛.
Then for any 𝜁 > 0 and 𝜂 > 1 satisfying 1/𝜂 + 𝜁 ≤ 𝛿, there is a 𝑛 × 𝑛 PSD matrix 𝑈, that is computable
in polynomial time, and satisfies the following properties:
    (i) hℎ ℎ 𝑇 , 𝑈i = 0 for all ℎ ∈ 𝐺 ⊥ , where 𝐺⊥ is the subspace orthogonal to 𝐺.
    (ii) 𝑈 𝑖𝑖 ≤ 1 for all 𝑖 ∈ [𝑛].
    (iii) tr(𝑈) ≥ 𝜁𝑛.
    (iv) 𝑈  𝜂 diag(𝑈).

    The first condition gives that the range space of 𝑈 is contained in the subspace 𝐺, as
hℎ ℎ 𝑇 , 𝑈i = 0 implies that k𝑈 1/2 ℎ k 2 = 0 and hence ℎ 𝑇 𝑈 1/2 = 0. So, for any vector 𝑟 ∈ ℝ 𝑛 the
vector 𝑌 = 𝑈 1/2 𝑟 lies in 𝐺 (as for every ℎ ∈ 𝐺 ⊥ , ℎ 𝑇 𝑌 = ℎ 𝑇 𝑈 1/2 𝑟 = h0, 𝑟i = 0). So this gives a
polynomial time algorithm to generate a (𝜁, 𝜂) sub-isotropic random vector 𝑌 ∈ 𝐺.

                         T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                           10
                  O N A G ENERALIZATION OF I TERATED AND R ANDOMIZED ROUNDING

3     The algorithm and analysis
Recall that by iterated rounding we refer to any procedure that given some starting fractional
solution 𝑥, sets 𝑥 (0) = 𝑥, and applies a sequence of updates as follows. Given the vector 𝑥 (𝑘−1) at
                                                                    (𝑘−1)
the beginning of iteration 𝑘, call a variable 𝑖 ∈ [𝑛] frozen if 𝑥 𝑖       is 0 or 1, and alive otherwise.
Let 𝑛 𝑘 denote the number of alive variables. Based on 𝑥          (𝑘−1) , the algorithm picks a set of
constraints of rank at most 𝑛 𝑘 − 1, given by the rows of some matrix 𝑊 (𝑘) , and finds some
                                                       (𝑘)
non-zero vector 𝑦 (𝑘) satisfying 𝑊 (𝑘) 𝑦 (𝑘) = 0 and 𝑦 𝑖 = 0 for 𝑖 frozen. The solution is updated as
𝑥 (𝑘) = 𝑥 (𝑘−1) + 𝑦 (𝑘) .
     Let 𝛿 > 0 be the slack parameter. We assume that the problem to be solved has an
iterated rounding procedure where in each iteration 𝑘 one can choose some matrix 𝑊 (𝑘) with
rank(𝑊 (𝑘) ) ≤ (1 − 𝛿)𝑛 𝑘 . We now describe the rounding algorithm.


3.1   The algorithm
Initialize 𝑥 (0) = 𝑥, where 𝑥 in the starting fractional solution given as input. For each iteration
𝑘 = 1, . . . , repeat the following until all the variables reach 0 or 1.

Iteration 𝑘. Let 𝑥 (𝑘−1) be the solution at the beginning of iteration 𝑘, and let 𝐴 𝑘 ⊂ [𝑛] denote the
                                  (𝑘−1)
set of alive coordinates 𝑖 with 𝑥 𝑖     ∈ (0, 1). Only the variables in 𝐴 𝑘 will be updated henceforth.
So for ease of notation we assume that 𝐴 𝑘 = [𝑛 𝑘 ].

    1. Apply Theorem 2.5, with 𝑛 = 𝑛 𝑘 , 𝐺 the nullspace of 𝑊 (𝑘) , 𝜁 = 𝛿/10 and 𝜂 = 10/(9𝛿) to find
       the covariance matrix 𝑈.

    2. Let 𝛾 = 1/(2𝑛 3/2 ). Let 𝑟 𝑘 ∈ ℝ 𝑛 𝑘 be a random vector with independent ±1 entries. Set

                               𝑥 (𝑘) = 𝑥 (𝑘−1) + 𝑦 (𝑘)   with   𝑦 (𝑘) = 𝛾𝑘 𝑈 1/2 𝑟 𝑘 ,

      where 𝛾𝑘 is the largest value such that (i) 𝛾𝑘 ≤ 𝛾 and (ii) both 𝑥 (𝑘−1) + 𝑦 (𝑘) and 𝑥 (𝑘−1) − 𝑦 (𝑘)
      lie in [0, 1]𝑛 𝑘 .


3.2   Analysis
Let 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) denote the final 0-1 solution returned by the algorithm. The property that
𝔼[𝑋𝑖 ] = 𝑥 𝑖 follows directly as the update 𝑦 (𝑘) at each time step has mean zero in each coordinate.
As the algorithm always moves in the nullspace of 𝑊 (𝑘) , clearly it will also satisfy the iterated
rounding guarantee with probability 1.
    To analyze the running time, we note that whenever 𝛾𝑘 < 𝛾, there is at least 1/2 probability
that some new variable will reach 0 or 1 after the update (as the new solution is either 𝑥 (𝑘) + 𝑦 (𝑘)
or 𝑥 (𝑘) − 𝑦 (𝑘) with probability half each). So, in expectation there are at most 2𝑛 such steps. So
let us focus on the iterations where 𝛾𝑘 = 𝛾.

                      T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                             11
                                                      N IKHIL BANSAL

                                                                               (𝑘)
    Let us define the energy of 𝑥 (𝑘) as 𝐸 (𝑘) := 𝑖 (𝑥 𝑖 )2 . During the update, conditioned on all
                                                                   Í

the randomness up to time 𝑘 − 1, 𝐸 (𝑘) rises in expectation by at least 𝛾 2 𝑛 𝑘 𝜁 as,
                                                 hÕ                         i
                                                       (𝑘−1) (𝑘)     (𝑘)
                  𝔼 𝑘−1 𝐸(𝑘) − 𝐸(𝑘−1) = 𝛾2 𝔼 𝑘−1            𝑦 𝑖 + (𝑦 𝑖 )2 )
                                    
                                                   (2𝑥 𝑖
                                                                           𝑖
                                                              Õ
                                                  = 𝛾               𝔼 𝑘−1 (𝑦 𝑖(𝑘) )2 = 𝛾2 tr(𝑈) ≥ 𝛾2 𝜁𝑛 𝑘 .
                                                         2
                                                                                   
                                                               𝑖
                                                          (𝑘) 
where the second equality uses that 𝔼 𝑘−1 𝑦 𝑖   = 0 for each 𝑖. As the final energy can be at most
𝑛, standard arguments [6, 26] imply that, with constant probability, the algorithm terminates in
𝑂(𝑛 log 𝑛 + log 𝑛/𝛾2 ) time steps.
    It remains to show that the rounding satisfies the concentration property, which we do next.

3.2.1   Isotropic updates imply concentration
Let 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) denote the final 0-1 solution returned by the algorithm. Fix any 𝑎 =
(𝑎 1 , 𝑎2 , . . . , 𝑎 𝑛 ) ∈ ℝ 𝑛 . We will show that
                                                                                                                           !
                                                                                                  𝑡 2 /𝛽
                   Pr[h𝑎, 𝑥i − 𝔼[h𝑎, 𝑥i] ≥ 𝑡] ≤ exp − Í 2
                                                     2( 𝑖 𝑎 𝑖 (𝑥 𝑖 − 𝑥 2𝑖 ) + 𝑀𝑡/3)
with 𝛽 = 18𝜂 which equals 20/𝛿 by our choice of the parameters, and 𝑀 = max𝑖 |𝑎 𝑖 |.

Proof. By scaling 𝑎 𝑖 ’s and 𝑡, we can assume that 𝑀 = 1. Let us define the random variable
                                             Õ                       Õ
                                                         (𝑘)                          (𝑘)         (𝑘)
                                   𝑍𝑘 =             𝑎𝑖 𝑥𝑖 + 𝜆                   𝑎 2𝑖 𝑥 𝑖 (1 − 𝑥 𝑖 ),
                                              𝑖                        𝑖

where the parameter 𝜆 ≤ 1 will be optimized later.
                                                                                             (𝑘)      (𝑘)
     It is useful to think of 𝑍 𝑘 as 𝜇 𝑘 + 𝜆𝑣 𝑘 , where 𝜇 𝑘 = h𝑎, 𝑥 (𝑘) i and 𝑣 𝑘 = 𝑖 𝑎 2𝑖 𝑥 𝑖 (1 − 𝑥 𝑖 ) are
                                                                                   Í

the mean and variance if we would randomly round the coordinates of 𝑥 (𝑘) to 0-1. Initially,
𝑍0 = 𝜇 + 𝜆𝑣 where 𝜇 = 𝑖 𝑎 𝑖 𝑥 𝑖 and 𝑣 = 𝑖 𝑎 2𝑖 (𝑥 𝑖 − 𝑥 2𝑖 ).
                            Í                Í
     We will show that 𝑍 𝑘 satisfies the conditions of Theorem 2.1 for an appropriate 𝛼, and use
it to show the required tail bound. Roughly speaking, as the algorithm proceeds, 𝑍 𝑘 has a
strong negative drift as the energy term 𝑣 𝑘 decreases in expectation and 𝜇 𝑘 does not change in
expectation. In the proof of Theorem 2.1, this negative drift offsets the positive terms that arise
while bounding the exponential moment of 𝑍 𝑘 .
     We now give the details. We first compute 𝑌𝑘 := 𝑍 𝑘 − 𝑍 𝑘−1 and show that it is bounded.
Using 𝑥 (𝑘) = 𝑥 (𝑘−1) + 𝑦 (𝑘) , we have
             𝑌𝑘 = 𝑍 𝑘 − 𝑍 𝑘−1
                      Õ                                  Õ                                                                         
                                 (𝑘)        (𝑘−1)                               (𝑘)         (𝑘)         (𝑘−1)          (𝑘−1)
                  =         𝑎 𝑖 (𝑥 𝑖 − 𝑥 𝑖          )+             𝜆𝑎 2𝑖 (𝑥 𝑖 (1 − 𝑥 𝑖 ) − 𝑥 𝑖                  (1 − 𝑥 𝑖       ))
                        𝑖                                 𝑖
                      Õ                Õ                                                     
                                 (𝑘)                   (𝑘)       (𝑘−1)    (𝑘)
                  =         𝑎 𝑖 𝑦𝑖 +           𝜆𝑎 2𝑖 𝑦 𝑖 (1 − 2𝑥 𝑖     − 𝑦𝑖 )                     .                                     (3.1)
                        𝑖               𝑖


                       T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                                                               12
                   O N A G ENERALIZATION OF I TERATED AND R ANDOMIZED ROUNDING

Claim 3.1. For all 𝑘, the update 𝑦 (𝑘) satifies k𝑦 (𝑘) k 2 ≤ 𝛾 𝑛 = 2√1 𝑛 .

Proof. Recall that 𝑦 (𝑘) = 𝛾𝑈 1/2 𝑟 𝑘 . Let 𝑈 1/2 (𝑖) denote the 𝑖-th column of 𝑈 1/2 . As

                                             h𝑈 1/2 (𝑖), 𝑈 1/2 (𝑖)i = 𝑈 𝑖𝑖 ≤ 1,

the columns of 𝑈 1/2 have length at most 1. Let 𝑟 𝑘 (𝑖) denote the 𝑖-th entry of 𝑟 𝑘 . Applying the
triangle inequality to the columns of 𝑈 1/2 ,
                                                    Õ
                                 k𝑈 1/2 𝑟 𝑘 k 2 ≤            |𝑟 𝑘 (𝑖)|k𝑈 1/2 (𝑖)k 2 ≤ k𝑟 𝑘 k 1 ≤ 𝑛.
                                                      𝑖


This gives that k𝑦 (𝑘) k 2 ≤ 𝛾𝑛.                                                                                                             

Claim 3.2. For all 𝑘, |𝑌𝑘 | ≤ 1.

                                                                            (𝑘)
                                                                   𝑖 |𝑎 𝑖 𝑦 𝑖 |. This follows as |𝑎 𝑖 | ≤ |𝑎 𝑖 |
                                                                                             Í
Proof. First we note that the second term in (3.1) is at most                                          2
                                                     (𝑘−1)     (𝑘)                         (𝑘−1)
(as 𝑀 = 1), 𝜆 ≤ 1 by our assumption, and 1 − 2𝑥 𝑖          − 𝑦 𝑖 ∈ [−1, 1] (as 1 − 𝑥 𝑖           ∈ [0, 1] and
  (𝑘−1)     (𝑘)   (𝑘)
𝑥𝑖      + 𝑦 𝑖 = 𝑥 𝑖 ∈ [0, 1]). As k𝑎 k ∞ ≤ 1 and using the bound on k𝑦 k 2 in Claim 3.1, we have
                                                                                (𝑘)

                                            Õ                    Õ                           Õ
                                                       (𝑘)                       (𝑘)                   (𝑘)
                                  |𝑌𝑘 | ≤       𝑎 𝑖 𝑦𝑖 +                   |𝑎 𝑖 𝑦 𝑖 | ≤ 2          |𝑦 𝑖 |
                                            𝑖                     𝑖                           𝑖
                                                (𝑘)                              (𝑘)
                                       = 2k𝑦          k 1 ≤ 2𝑛        1/2
                                                                            k𝑦         k 2 ≤ 2𝛾𝑛 3/2 = 1.                                    

    We now upper bound the negative drift of 𝑍 𝑘 .

Claim 3.3. 𝔼 𝑘−1 [𝑌𝑘 ] ≤ −(𝜆/8𝜂)𝔼 𝑘−1 𝑌𝑘2
                                                     

                  (𝑘)                                      (𝑘−1)
Proof. As 𝔼 𝑘−1 𝑦 𝑖   = 0 for all 𝑖, and as 𝑥 𝑖     is deterministic conditioned on the randomness
until 𝑘 − 1, taking expectations 𝔼 𝑘−1 [·] in (3.1) gives
                                                                  Õ
                                                                                             (𝑘)
                                                                            𝑎 2𝑖 𝔼 𝑘−1 (𝑦 𝑖 )2 .
                                                                                                  
                                        𝔼 𝑘−1 [𝑌𝑘 ] = −𝜆                                                                                  (3.2)
                                                                       𝑖


We now upper bound 𝔼 𝑘−1 [𝑌𝑘2 ]. Using (𝑎 + 𝑏)2 ≤ 2𝑎 2 + 2𝑏 2 twice for the expression in (3.1),
                                                                                                             !2
                            Õ                             Õ
                                        (𝑘)                             (𝑘)       (𝑘−1)    (𝑘)
             𝑌𝑘2 ≤ 2(             𝑎 𝑖 𝑦 𝑖 )2 + 2𝜆2               𝑎 2𝑖 𝑦 𝑖 (1 − 2𝑥 𝑖     − 𝑦𝑖 )
                             𝑖                               𝑖
                                                                                                   !2                            !2
                            Õ                      © Õ 2 (𝑘)                                                 Õ
                                        (𝑘)                          (𝑘−1)                                                 (𝑘)
                   ≤ 2(           𝑎 𝑖 𝑦 𝑖 )2 + 4𝜆2    𝑎 𝑖 𝑦 (1 − 2𝑥 𝑖     )                            +             𝑎 2𝑖 (𝑦 𝑖 )2 ®
                                                                                                                                      ª
                                                                                                                                          (3.3)
                             𝑖                     « 𝑖                                                            𝑖                   ¬

                           T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                                                             13
                                                               N IKHIL BANSAL

Taking expectations 𝔼 𝑘−1 [·] in (3.3), we now upper bound the terms on the right. As 𝑦 (𝑘) is (𝜁, 𝜂)
sub-isotropic, by (2.1), the first term satisfies
                                         hÕ           i    Õ
                                                  (𝑘)                   (𝑘) 
                                    𝔼 𝑘−1 ( 𝑎 𝑖 𝑦 𝑖 )2 ≤ 𝜂   𝑎 2𝑖 𝔼 𝑘−1 (𝑦 𝑖 )2 .
                                                   𝑖                        𝑖

Similarly, by the sub-isotropic property, the second term satisfies
           hÕ                                  i           Õ                                                      Õ
                           (𝑘)       (𝑘−1) 2                                 (𝑘−1) 2                    (𝑘)                       (𝑘)
                    𝑎 2𝑖 𝑦 𝑖 (1 − 2𝑥 𝑖             ≤𝜂           𝑎 4𝑖 (1 − 2𝑥 𝑖                                        𝑎 2𝑖 𝔼 𝑘−1 (𝑦 𝑖 )2 ,
                                                                                                                                     
      𝔼 𝑘−1 (                             ))                                           ) 𝔼 𝑘−1 (𝑦 𝑖 )2 ≤
                𝑖                                          𝑖                                                      𝑖

                                                                         (𝑘−1)
where the last step uses that |𝑎 𝑖 | ≤ 1 and |1 − 2𝑥 𝑖                           | ≤ 1.
                                             (𝑘) 2
                                        𝑖 (𝑦 𝑖 ) ≤ 𝛾𝑛 ≤ 1/2 by Claim 3.1, the third term can be bounded as
                                    Í
      Finally, as |𝑎 𝑖 | ≤ 1 and
                                                                  !2
                                          Õ                                      Õ
                                                           (𝑘)                                (𝑘)
                                                   𝑎 2𝑖 (𝑦 𝑖 )2        ≤ (1/2)          𝑎 2𝑖 (𝑦 𝑖 )2 .                                  (3.4)
                                           𝑖                                       𝑖

      Plugging these bounds and using that 𝜆 ≤ 1, (3.3) gives that,
                                           Õ            (𝑘) 
                            𝔼 𝑘−1 𝑌𝑘2 ≤ 8𝜂   𝑎 2𝑖 𝔼 𝑘−1 (𝑦 𝑖 )2 = −(8𝜂/𝜆)𝔼[𝑌𝑘 ].
                                  
                                                       𝑖

where the last equality uses (3.2).                                                                                                          
   By Claim 3.3, we can apply Theorem 2.1 with 𝛼 = 𝜆/8𝜂, provided that the conditions for
Theorem 2.1 hold. Indeed, 𝛼 ≤ 1 holds as 𝜆 ≤ 1 and 𝜂 ≥ 1, and |𝑌𝑘 | < 1 holds by Claim 3.1.
   Applying Theorem 2.1 now gives that Pr[𝑍𝑇 − 𝑍0 ≥ 𝑡 0] ≤ exp(−𝜆𝑡 0/8𝜂). As 𝑍0 = 𝜇 + 𝜆𝑣, this
gives
                           Pr[𝑍𝑇 − 𝜇 − 𝜆𝑣 ≥ 𝑡 0] ≤ exp(−𝑡 0𝜆/4𝜂).                         (3.5)
Setting 𝜆 = 𝑡 0/(𝑡 0 + 2𝑣) (note that this satisfies our assumption 𝜆 ≤ 1), so that 𝜆𝑣 = 𝑡 0 𝑣/(𝑡 0 + 2𝑣) ≤
𝑡 0/2, (3.5) implies that
                                   Pr[𝑍𝑇 − 𝜇 ≥ 3𝑡 0/2] ≤ exp(−𝑡 0𝜆/4𝜂).
Setting 𝑡 = 3𝑡 0/2 and plugging the value of 𝜆 gives the desired result that
                                                                𝑡 2 /(18𝜂)
                                                                                                   
                                        Pr[𝑍𝑇 − 𝜇 ≥ 𝑡] ≤ exp −             .                                                                 
                                                               2(𝑣 + 𝑡/3)

4      Applications
4.1     Rounding column-sparse LPs
Let 𝑥 ∈ [0, 1]𝑛 be a fractional solution to 𝐴𝑥 = 𝑏, where 𝐴 ∈ ℝ 𝑚×𝑛 is an arbitrary 𝑚 × 𝑛 matrix.
Let 𝑡 = max 𝑗∈[𝑛] ( 𝑖∈[𝑚] |𝑎 𝑖𝑗 |) be the maximum ℓ 1 -norm of the columns of 𝐴. Beck and Fiala [12]
                   Í
gave a rounding method to find 𝑋 ∈ {0, 1} 𝑛 so that the maximum rounding error for any row
satisfies k𝐴𝑋 − 𝑏 k ∞ = k𝐴(𝑋 − 𝑥)k ∞ < 𝑡.

                             T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                                                            14
                  O N A G ENERALIZATION OF I TERATED AND R ANDOMIZED ROUNDING

Beck–Fiala rounding. We first recall the iterated rounding algorithm in [12]. The algorithm
starts with 𝑥 0 = 𝑥 and proceeds in iterations. Consider some iteration 𝑘, and let 𝐴 𝑘 denote the
matrix 𝐴 restricted to the alive coordinates. Call a row big if its ℓ 1 -norm in 𝐴 𝑘 is strictly more
than 𝑡. The key point is that by an averaging argument, the number of big rows is strictly less
than 𝑛 𝑘 as each column has ℓ 1 -norm at most 𝑡 and thus the total ℓ 1 -norm of all entries 𝐴 𝑘 is at
most 𝑡𝑛 𝑘 . The algorithm chooses 𝑊 (𝑘) to be matrix consisting of the big rows of 𝐴 𝑘 and applies
the iterated rounding update.
    Let us now analyze the error. Fix some row 𝑖. As long as this row 𝑖 is big, its rounding error
is 0 during the update steps. Consider the first iteration when this row is no longer big. Then,
no matter how the remaining alive variables are rounded in subsequent iterations, the error
incurred will be (strictly) less than its ℓ1 -norm, which is at most 𝑡.

Introducing slack. To apply Theorem 1.2, we can easily introduce 𝛿-slack for any 0 ≤ 𝛿 < 1,
as follows. In iteration 𝑘, call a row big if its ℓ 1 norm exceeds 𝑡/(1 − 𝛿), and by the argument
above the number of big rows is strictly less than 𝑛 𝑘 (1 − 𝛿). Theorem 1.2 now directly gives the
following result.

Theorem 4.1. Given a matrix 𝐴 with maximum ℓ1 -norm of any column at most 𝑡, and any 𝑥 ∈ [0, 1]𝑛 ,
then for any 0 ≤ 𝛿 < 1 the algorithm returns 𝑋 ∈ {0, 1} 𝑛 such that k𝐴(𝑋 − 𝑥)k ∞ ≤ 𝑡/(1 − 𝛿),
𝔼[𝑋𝑖 ] = 𝑥 𝑖 and 𝑋 satisfies 𝑂(1/𝛿)-concentration.

   This implies the following useful corollary.

Corollary 4.2. Let 𝑀 be a matrix, and 𝐴 be some subset of rows of 𝑀 so that the columns of 𝑀 restricted
to 𝐴 have ℓ1 -norm at most 𝑡. Setting 𝛿 = 1/2, the rounding error is at most 2𝑡 for rows of 𝐴, while the
other rows of 𝑀 have error similar to that as under randomized rounding.

Komlós setting. For a 𝑚 × 𝑛 matrix 𝐴, let 𝑡 = max 𝑗∈[𝑛] ( 𝑖∈[𝑚] 𝑎 2𝑖𝑗 )1/2 denote the maximum
                                                                Í

ℓ 2 -norm of the columns of 𝐴. The long-standing Komlós conjecture (together with a connection
between hereditary discrepancy and rounding due to [25]) states that any 𝑥 ∈ [0, 1]𝑛 can be
rounded to 𝑋 ∈ {0, 1}p 𝑛 so that k𝐴(𝑋 − 𝑥)k = 𝑂(𝑡). Currently, the best known bound on this
                                            ∞
rounding error is 𝑂(𝑡 log 𝑚) [5, 8].
   An argument similar to that for Theorem 4.1 gives the following result in this setting.

Theorem 4.3. If 𝐴 has maximum column ℓ 2 -norm                           𝑛
             𝑛
                                           p 𝑡, then given any 𝑥 ∈ [0, 1] , the algorithm returns
an 𝑋 ∈ {0, 1} satisfying k𝐴(𝑋 − 𝑥)k ∞ = 𝑂(𝑡 log 𝑚) and the 𝑂(1)-concentration property.

Proof. We will apply Theorem 1.2 with 𝛿 = 1/2. During any iteration 𝑘, call row 𝑖 big if its
squared ℓ 2 norm in 𝐴 𝑘 exceeds 2𝑡 2 . As the sum of squared entries of 𝐴 𝑘 is at most 𝑡 2 𝑛 𝑘 , the
number big rows is at most 𝑛 𝑘 /2 and we set 𝑊 (𝑘) to 𝐴 𝑘 restricted to the big rows.
    The 𝑂(1)-concentration follows directly from Theorem 1.2. To bound the error for rows of 𝐴,
we argue as follows. Fix a row 𝑖. Clearly, row 𝑖 incurs zero error while it is big. Let 𝑘 be the first
iteration when row 𝑖 is not big, and condition on the randomness up to this point. Call an (alive)

                      T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                           15
                                                           N IKHIL BANSAL

coordinate 𝑗 large if |𝑎 𝑖𝑗 | ≥ 𝑡/ log 𝑚, and let 𝐿 𝑖 denote the set of large coordinates in row 𝑖. Let
                                         p

𝑎˜ 𝑖 denote the row 𝑎 𝑖 with the coordinates in 𝐿 removed. As 𝑗 𝑎 2𝑖𝑗 ≤ 2𝑡 we have |𝐿 𝑖 | ≤ 2 log 𝑚
                                                                    Í

and so the rounding error due to the coordinates in 𝐿 𝑖 can be at most
                                Õ                                 Õ                        p
                                         |𝑎 𝑖𝑗 | ≤ |𝐿 𝑖 | 1/2 (           |𝑎 𝑖𝑗 | 2 )1/2 = 𝑂(𝑡 log 𝑚).
                                 𝑗∈𝐿 𝑖                            𝑗∈𝐿 𝑖


Applying the 𝑂(1)-concentration property of the rounded solution 𝑋, the error due to the entries
of 𝑎˜ 𝑖 satisfies

                                                                          𝑐 2 𝑡 2 log 𝑚
                 Õ                                  
                                     (𝑘)
                                             p
                       𝑎 𝑖𝑗 (𝑋 𝑗 − 𝑥 𝑗 ) ≥ 𝑐𝑡 log 𝑚  = exp −𝑐 0 Í
                                                           ©                            ª
              Pr 
                 
                                                                     𝑗∉𝐿 𝑎 𝑖𝑗 + 𝑀𝑐𝑡 log 𝑚
                                                                           2
                                                                                      p   ®
                  𝑗∉𝐿                               
                                                           «                            ¬
for some fixed constant 𝑐 0.
    As 𝑗∉𝐿 𝑎 2𝑖𝑗 ≤ 2𝑡 2 and 𝑀 ≤ 𝑡/ log 𝑚, the right hand side is exp(−Ω(𝑐𝑐 0 log 𝑚)). Choosing 𝑐
       Í                          p

large enough so that this is  1/𝑚, the result follows by a union bound over the rows.        

4.2   Makespan minimization on unrelated machines
In the unrelated machine setting, there are 𝑟 jobs and 𝑚 machines, and each job 𝑗 ∈ [𝑟] has
size 𝑝 𝑖𝑗 on a machine 𝑖 ∈ [𝑚]. The goal is to assign all the jobs to machines to minimize the
maximum machine load.


LP formulation. The standard LP relaxation has fractional assignment variables 𝑥 𝑖𝑗 ∈ [0, 1]
for 𝑗 ∈ [𝑟] and 𝑖 ∈ [𝑚]. Consider the smallest target makespan 𝑇 for which the following LP is
feasible.
                   Õ
                           𝑝 𝑖𝑗 𝑥 𝑖𝑗     ≤𝑇               ∀𝑖 ∈ [𝑚]               (load constraints)
                   𝑗∈[𝑟]
                      Õ
                               𝑥 𝑖𝑗      =1               ∀𝑗 ∈ [𝑟]               (assignment constraints)
                     𝑖∈[𝑚]
                               𝑥 𝑖𝑗      =0               ∀𝑖, 𝑗 such that 𝑝 𝑖𝑗 > 𝑇

The last constraint is valid for any integral solution, and so we can assume that 𝑝 max := max𝑖𝑗 𝑝 𝑖𝑗
is at most 𝑇. In a celebrated result, [24] gave a rounding procedure that produces an integral
solution with makespan at most 𝑇 + 𝑝 max . We now sketch the iterated rounding based proof of
this result from [23].


Iterated rounding proof. As always, we start with 𝑥 (0) = 𝑥 and fix the variables that get
rounded to 0 or 1. Consider some iteration 𝑘. Let 𝑛 𝑘 denote the number of fractional variables,

                      T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                                  16
                  O N A G ENERALIZATION OF I TERATED AND R ANDOMIZED ROUNDING

and let 𝑅 𝑘 denote the set of jobs that are still not integrally assigned to some machine. For a
machine 𝑖, define the excess as               Õ
                                                            (𝑘)
                                     𝑒 𝑖 :=          (1 − 𝑥 𝑖𝑗 ),                           (4.1)
                                                                (𝑘)
                                                       𝑗∈𝑅 𝑘 : 𝑥 𝑖𝑗 >0

and note that 𝑒 𝑖 is simply the maximum (fractional) number of extra jobs that can be possibly
assigned to 𝑖 if all the non-zero variables are rounded to 1. An elegant counting argument in
[23] shows that if 𝑊 (𝑘) consists of load constraints for machines with 𝑒 𝑖 > 1, and assignment
constraints for jobs in 𝑅 𝑘 , then rank(𝑊 (𝑘) ) < 𝑛 𝑘 .

Introducing slack. We now extend the argument of [23] to introduce some slack so that we
can apply Theorem 1.2. This will give the following result.

Theorem 4.4. Given any 𝛿 ∈ [0, 1/2), and a fractional solution 𝑥 to the problem, there is a rounding
where the integral solution 𝑋 increases the load on any machine by 𝑝 max /(1 − 2𝛿), satisfies 𝔼[𝑋𝑖𝑗 ] = 𝑥 𝑖𝑗
for all 𝑖, 𝑗 and has 𝑂(1/𝛿)-concentration.
                                                                                                                    (𝑘)
Proof. Consider some iteration 𝑘, and let 𝑛 𝑘 denote the number of fractional variables 𝑥 𝑖𝑗 ∈ (0, 1),
and let 𝑅 𝑘 denote the jobs that are still not integrally assigned. Let 𝑟 𝑘 = |𝑅 𝑘 |. For a machine 𝑖,
we define the excess 𝑒 𝑖 as in (4.1). Let 𝑀 𝑘 denote the set of machines with 𝑒 𝑖 > 1/(1 − 2𝛿).
    𝑊 (𝑘) will consist of load constraints for machines in 𝑀 𝑘 and assignment constraints for jobs
                                      (𝑘)                                                        (𝑘)
in 𝑅 𝑘 . More precisely, the update 𝑦 𝑖𝑗 will satisfy the following two conditions: (i) 𝑗 𝑝 𝑖𝑗 𝑦 𝑖𝑗 = 0
                                                                                         Í
                                  (𝑘)
for all 𝑖 ∈ 𝑀 𝑘 and (ii) 𝑖 𝑦 𝑖𝑗 = 0, for all 𝑗 ∈ 𝑅 𝑘 . We say that machine 𝑖 is protected in iteration 𝑘 if
                          Í

𝑖 ∈ 𝑀 𝑘 . For a protected machine, the fractional load does not change after an update. When a
machine ceases to be protected for the first time, the definition of excess ensures that its extra
load in subsequent iterations can be at most 𝑝 max /(1 − 2𝛿).
    It remains to show that rank(𝑊𝑘 ) ≤ (1 − 𝛿)𝑛 𝑘 . As each job in 𝑅 𝑘 contributes at least two
fractional variables to 𝑛 𝑘 , we first note that

                                                          2𝑟 𝑘 ≤ 𝑛 𝑘 .                                                    (4.2)

Let 𝑚 𝑘 = |𝑀 𝑘 |. Then we also have the following.
Claim 4.5. 𝑚 𝑘 ≤ (1 − 2𝛿)(𝑛 𝑘 − 𝑟 𝑘 ).

Proof. Clearly 𝑚 𝑘 /(1 − 2𝛿) ≤              𝑖∈𝑀 𝑘 𝑒 𝑖 as each 𝑖 ∈ 𝑀 𝑘 has excess more than 1/(1 − 2𝛿). Next,
                                        Í

                 Õ             Õ         Õ                            Õ       Õ
                                                          (𝑘)                                  (𝑘)
                        𝑒𝑖 =                       (1 − 𝑥 𝑖𝑗 ) ≤                        (1 − 𝑥 𝑖𝑗 ) = 𝑛 𝑘 − 𝑟 𝑘 ,
                𝑖∈𝑀 𝑘          𝑖∈𝑀 𝑘 𝑗∈𝑅 :𝑥 (𝑘) >0                    𝑖∈𝑀 𝑗∈𝑅 :𝑥 (𝑘) >0
                                         𝑘    𝑖𝑗                               𝑘   𝑖𝑗


where the first equality uses the definition of 𝑒 𝑖 and second uses the definition of 𝑛 𝑘 and that
        (𝑘)
  𝑖∈𝑀 𝑥 𝑖𝑗 = 1 for each job 𝑗 ∈ 𝑅 𝑘 . Together this gives 𝑚 𝑘 ≤ (1 − 2𝛿)(𝑛 𝑘 − 𝑟 𝑘 ).
Í
                                                                                                

                         T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                                               17
                                                N IKHIL BANSAL

   Multiplying (4.2) by 𝛿 and adding to the inequality in Claim 4.5 gives 𝑚 𝑘 + 𝑟 𝑘 ≤ (1 − 𝛿)𝑛 𝑘 ,
which implies the result as rank(𝑊𝑘 ) ≤ 𝑟 𝑘 + 𝑚 𝑘 .                                           
Remark 4.6. Setting 𝛿 = 0 recovers the additive 𝑝 max result of [24]. Theorem 4.4 also generalizes
directly to 𝑞 resources, where job 𝑗 has load vector 𝑝 𝑖𝑗 = (𝑝 𝑖𝑗 (1), . . . , 𝑝 𝑖𝑗 (𝑞)) on machine 𝑖, and
the goal is to find an assignment 𝐴 of jobs to machines to minimize max ℎ,𝑖 ( 𝑗:𝐴(𝑗)=𝑖 𝑝 𝑖𝑗 (ℎ)). A
                                                                                          Í
direct modification of the proof above gives an additive 𝑞𝑝max /(1 − 2𝛿) error and the 𝑂(1/𝛿)-
concentration property.

4.3   Minimum cost degree-bounded matroid basis
Instead of just the degree-bounded spanning tree problem, we consider the more general
matroid setting as all the arguments apply directly without additional work.

Minimum cost degree-bounded matroid-basis problem (DegMat). The input is a matroid 𝑀
defined on elements 𝑉 with costs 𝑐 : 𝑉 → ℝ+ and 𝑚 degree constraints specified by (𝑆 𝑗 , 𝑏 𝑗 ) for
𝑗 ∈ [𝑚], where 𝑆 𝑗 ⊂ 𝑉 and 𝑏 𝑗 ∈ ℤ+ . The goal is to find a minimum-cost base 𝐼 in 𝑀 satisfying
the degree bounds, i. e., |𝐼 ∩ 𝑆 𝑗 | ≤ 𝑏 𝑗 for all 𝑗 ∈ [𝑚]. The matroid 𝑀 is given implicitly, by an
independence oracle (which given a query 𝐼, returns whether 𝐼 is an independent set or not).

Iterated rounding algorithm. The natural LP formulation for the problem has the variables
𝑥 𝑖 ∈ [0, 1] for each element 𝑖 ∈ 𝑉 and the goal is to minimize the cost 𝑖 𝑐 𝑖 𝑥 𝑖 , subject to the
                                                                        Í
following constraints.
                      Õ
                              𝑥𝑖   ≤ 𝑟(𝑇)       ∀𝑇 ⊂ 𝑉                (rank constraints)
                      𝑖∈𝑇
                      Õ
                              𝑥𝑖   = 𝑟(𝑉)                          (matroid base constraint)
                      𝑖∈𝑉
                      Õ
                              𝑥𝑖   ≤ 𝑏𝑗      ∀𝑗 ∈ [𝑚]               (degree constraints)
                      𝑖∈𝑆 𝑗

Here 𝑟(·) is the rank function of 𝑀.
    Given a feasible LP solution with cost 𝑐 ∗ , [22, 10] gave an iterated rounding algorithm that
finds a solution with cost at most 𝑐 ∗ and an additive degree violation of at most 𝑞 − 1. Here
𝑞 = max𝑖 |{𝑗 : 𝑖 ∈ 𝑆 𝑗 } is the maximum number of sets that contain any element 𝑖. Note that
𝑞 = 2 for the degree bounded spanning tree problem, as the elements here are edges and the
sets 𝑆 𝑗 consist of edges incident to a vertex, so that each edges can lie in at most two such sets.
    We briefly sketch the argument in [22, 10]. The algorithm starts with 𝑥 (0) = 𝑥 and applies
iterated rounding as follows. Consider some iteration 𝑘. Let 𝐴 𝑘 denote the set of fractional
variables and let 𝑛 𝑘 = |𝐴 𝑘 |. For a set 𝑆 𝑗 , define the excess as
                                                       Õ
                                                                        (𝑘)
                                            𝑒 𝑗 :=                (1 − 𝑥 𝑖 ),                       (4.3)
                                                     𝑖∈𝐴 𝑘 ∩𝑆 𝑗


                       T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                            18
                   O N A G ENERALIZATION OF I TERATED AND R ANDOMIZED ROUNDING

the maximum degree violation for 𝑆 𝑗 even if all current fractional variables are rounded to 1.
    Let 𝐷 𝑘 be the set of indices 𝑗 of degree constraints with excess 𝑒 𝑗 ≥ 𝑞. The algorithm chooses
𝑊 to consist of the degree constraints in 𝐷 𝑘 (call these protected constraints) and some
  (𝑘)

basis for the tight matroid rank constraints. An elegant counting argument then shows that
rank(𝑊𝑘 ) ≤ 𝑛 𝑘 − 1. The correctness follows since if a degree constraint is no longer protected,
then its excess is strictly below 𝑞, and by integrality of 𝑏 𝑗 and the final rounded solution, the
degree violation can be at most 𝑞 − 1.

Introducing slack. We will extend the argument above in a straightforward way to introduce
some slack, and then apply Theorem 1.2 to obtain the following result.

Theorem 4.7. For any 0 < 𝛿 < 1, there is an algorithm for the DegMat problem that produces a basis
with additive degree violation strictly less than 𝑞/(1 − 2𝛿) and satisfies 𝑂(1/𝛿)-concentration.

   Setting 𝛿 = 1/6 so that 2/(1 − 2𝛿) = 3, and noting that the degree violation is strictly less
than this bound, gives the following.

Corollary 4.8. For the minimum cost degree bounded spanning tree problem, given a fractional
solution 𝑥 there is an algorithm to find a spanning tree with degree violation of plus two and satisfying
𝑂(1)-concentration.

     We now describe the argument. Consider iteration 𝑘. Let 𝐴 𝑘 be the set of fractional variables
and 𝑛 𝑘 = |𝐴 𝑘 |. We need to specify how to choose 𝑊 (𝑘) and show that rank(𝑊 (𝑘) ) ≤ (1 − 𝛿)𝑛 𝑘 .
Let 𝐷 𝑘 denote the set of indices 𝑗 of degree constraints with excess 𝑒 𝑗 ≥ 𝑞/(1 − 2𝛿). Let ℱ denote
the family of the tight matroid constraints that hold with equality, i. e., 𝑖∈𝑆∩𝐴 𝑘 𝑥 𝑖 = 𝑟 𝑘 (𝑆), where
                                                                              Í
𝑟 𝑘 is the rank function of the matroid 𝑀 𝑘 obtained from 𝑀 by deleting elements with 𝑥 𝑖 = 0
and contracting those with 𝑥 𝑖 = 1. It is well-known [30] that there is a chain family of tight sets
𝒞 = {𝐶1 , . . . , 𝐶ℓ }, with 𝐶1 ⊂ 𝐶2 ⊂ · · · ⊂ 𝐶ℓ , such that the rank constraint of every 𝑆 ∈ ℱ lies in
the linear span of the constraints for sets in 𝒞. Let 𝑐 𝑘 = |𝒞| and 𝑑 𝑘 = |𝐷 𝑘 |. We set 𝑊 (𝑘) to be the
degree constraints in 𝐷 𝑘 and the rank constraints in 𝒞.

Claim 4.9. rank(𝑊 (𝑘) ) ≤ (1 − 𝛿)𝑛 𝑘 .

Proof. It suffices to show that 𝑐 𝑘 + 𝑑 𝑘 ≤ (1 − 𝛿)𝑛 𝑘 . As each 𝑥 𝑖 is fractional and as the ranks
𝑟 𝑘 (𝐶) are integral, it follows that any two sets in chain family differ by at least two elements,
i. e., |𝐶 𝑖+1 \ 𝐶 𝑖 | ≥ 2. This implies that 𝑐 𝑘 ≤ 𝑛 𝑘 /2. We also note that 𝑟 𝑘 (𝐶1 ) < 𝑟 𝑘 (𝐶2 ) · · · < and in
particular the rank 𝑟(𝐶 𝑐 𝑘 ) of the largest set in 𝒞 is at least 𝑐 𝑘 . This gives that 𝑖∈𝐴 𝑘 𝑥 𝑖 ≥ 𝑐 𝑘 .
                                                                                         Í
     Next, as 𝑒 𝑗 ≥ 𝑞/(1 − 2𝛿) for each 𝑗 ∈ 𝐷 𝑘 , we have that 𝑞𝑑 𝑘 ≤ (1 − 2𝛿) 𝑗∈𝐷𝑘 𝑒 𝑗 . Moreover, by
                                                                                   Í
definition of 𝑒 𝑗 ,              Õ        Õ Õ                    Õ
                                       𝑒𝑗 =                      (1 − 𝑥 𝑖 ) =           𝑞 𝑖 (1 − 𝑥 𝑖 )
                               𝑗∈𝐷 𝑘          𝑗∈𝐷 𝑘 𝑖∈𝐴 𝑘 ∩𝑆 𝑗                  𝑖∈𝐴 𝑘

where 𝑞 𝑖 = |{𝑗 : 𝑖 ∈ 𝑆 𝑗 , 𝑗 ∈ 𝐷 𝑘 }| is the number of tight degree constraints in 𝐷 𝑘 that contain
element 𝑖. As 𝑞 𝑖 ≤ 𝑞, the above is at most 𝑞 𝑖∈𝐴 𝑘 (1 − 𝑥 𝑖 ) ≤ 𝑞𝑛 𝑘 − 𝑞𝑐 𝑘 , where we use that
                                                    Í

  𝑖∈𝐴 𝑘 𝑥 𝑖 ≥ 𝑐 𝑘 , and   𝑖∈𝐴 𝑘 1 = |𝐴 𝑘 | = 𝑛 𝑘 .
Í                       Í


                        T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                                  19
                                           N IKHIL BANSAL

    Together this gives that 𝑑 𝑘 ≤ (1 − 2𝛿)(𝑛 𝑘 − 𝑐 𝑘 ), and adding 2𝛿 times the inequality 𝑐 𝑘 ≤ 𝑛 𝑘 /2
to this gives that 𝑑 𝑘 + 𝑐 𝑘 ≤ (1 − 𝛿)𝑛 𝑘 , which proves the desired claim.                            

   The degree violation property follows as before, since if a degree constraint is no longer
protected, then its excess is strictly below 𝑞/(1 − 2𝛿).
   Finally, we remark that as the underlying LP has exponential size, some care is needed in
implementing the rounding algorithm, in particular in maintaining the chain family and in
computing the step size of the walk. These issues are discussed in [11].


Acknowledgements
We thank the referees for their extremely thorough reading of the manuscript, catching various
errors, and for several useful comments and suggestions that significantly improved the
presentation.


References
 [1] Nima Anari and Shayan Oveis Gharan: Effective-resistance-reducing flows, spectrally
     thin trees, and asymmetric TSP. In Proc. 56th FOCS, pp. 20–39. IEEE Comp. Soc., 2015.
     [doi:10.1109/FOCS.2015.11] 6

 [2] Sanjeev Arora, Alan M. Frieze, and Haim Kaplan: A new rounding procedure for the
     assignment problem with applications to dense graph arrangement problems. Math.
     Programming, 92(1):1–36, 2002. [doi:10.1007/s101070100271] 3, 6

 [3] Arash Asadpour, Michel X. Goemans, Aleksander Madry, Shayan Oveis Gharan, and
     Amin Saberi: An 𝑂(log 𝑛/log log 𝑛)-approximation algorithm for the asymmetric traveling
     salesman problem. INFORMS, 65(4):1043–1061, 2017. Preliminary version in SODA’10.
     [doi:10.1287/opre.2017.1603] 2, 3, 6, 7

 [4] Arash Asadpour and Amin Saberi: An approximation algorithm for max-min fair allocation
     of indivisible goods. SIAM J. Comput., 39(7):2970–2989, 2010. [doi:10.1137/080723491] 3

 [5] Wojciech Banaszczyk: Balancing vectors and Gaussian measures of 𝑛-dimensional
     convex bodies. Random Struct. Algor., 12(4):351–360, 1998. [doi:10.1002/(SICI)1098-
     2418(199807)12:4<351::AID-RSA3>3.0.CO;2-S] 15

 [6] Nikhil Bansal: Constructive algorithms for discrepancy minimization. In Proc. 51st FOCS,
     pp. 3–10. IEEE Comp. Soc., 2010. [doi:10.1109/FOCS.2010.7] 12

 [7] Nikhil Bansal: On a generalization of iterated and randomized rounding. In Proc. 51st
     STOC, pp. 1125–1135. ACM Press, 2019. [doi:10.1145/3313276.3316313] 1

                      T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                           20
                O N A G ENERALIZATION OF I TERATED AND R ANDOMIZED ROUNDING

 [8] Nikhil Bansal, Daniel Dadush, and Shashwat Garg: An algorithm for Komlós
     Conjecture matching Banaszczyk’s bound. SIAM J. Comput., 48(2):534–553, 2019.
     [doi:10.1137/17M1126795] 8, 15
 [9] Nikhil Bansal and Shashwat Garg: Algorithmic discrepancy beyond partial coloring. In
     Proc. 49th STOC, pp. 914–926. ACM Press, 2017. [doi:10.1145/3055399.3055490] 8, 10
[10] Nikhil Bansal, Rohit Khandekar, and Viswanath Nagarajan: Additive guarantees
     for degree-bounded directed network design. SIAM J. Comput., 39(4):1413–1431, 2010.
     [doi:10.1137/080734340] 18
[11] Nikhil Bansal and Viswanath Nagarajan: Approximation-friendly discrepancy rounding.
     In Proc. 18th Integer Prog. Combinat. Optim. (IPCO’16), volume 9682 of LNCS, pp. 375–386.
     Springer, 2016. [doi:10.1007/978-3-319-33461-5_31] 3, 8, 20
[12] József Beck and Tibor Fiala: “Integer-making” theorems. Discr. Appl. Math., 3(1):1–8, 1981.
     [doi:10.1016/0166-218X(81)90022-6] 5, 14, 15
[13] Julius Borcea, Peter Brändén, and Thomas M. Liggett: Negative dependence and the
     geometry of polynomials. J. AMS, 22(2):521–567, 2009. [doi:10.1090/S0894-0347-08-00618-8]
     3
[14] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart: Concentration In-
     equalities: A Nonasymptotic Theory of Independence. Oxford Univ. Press, 2013.
     [doi:10.1093/acprof:oso/9780199535255.001.0001] 3
[15] Chandra Chekuri, Jan Vondrák, and Rico Zenklusen: Dependent randomized rounding
     via exchange properties of combinatorial structures. In Proc. 51st FOCS, pp. 575–584. IEEE
     Comp. Soc., 2010. [doi:10.1109/FOCS.2010.60] 2, 3, 6
[16] Chandra Chekuri, Jan Vondrák, and Rico Zenklusen: Multi-budgeted matchings and
     matroid intersection via dependent rounding. In Proc. 22nd Ann. ACM–SIAM Symp. on
     Discrete Algorithms (SODA’11), pp. 1080–1097. SIAM, 2011. [doi:10.1137/1.9781611973082.82]
     2, 3, 6
[17] Devdatt P. Dubhashi and Alessandro Panconesi: Concentration of Measure for the Analysis of
     Randomized Algorithms. Cambridge Univ. Press, 2009. [doi:10.1017/CBO9780511581274] 3
[18] David A. Freedman: On tail probabilities for martingales. Ann. Probab., 3(1):100–118, 1975.
     [doi:10.1214/aop/1176996452] 8
[19] Rajiv Gandhi, Samir Khuller, Srinivasan Parthasarathy, and Aravind Srinivasan: Depen-
     dent rounding and its applications to approximation algorithms. J. ACM, 53(3):324–360,
     2006. [doi:10.1145/1147954.1147956] 2, 3, 6
[20] Nicholas J. A. Harvey and Neil Olver: Pipage rounding, pessimistic estimators and matrix
     concentration. In Proc. 25th Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’14), pp.
     926–945. SIAM, 2014. [doi:10.1137/1.9781611973402.69] 3, 6

                    T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                     21
                                        N IKHIL BANSAL

[21] Richard M. Karp, Frank Thomson Leighton, Ronald L. Rivest, Clark D. Thompson,
     Umesh V. Vazirani, and Vijay V. Vazirani: Global wire routing in two-dimensional arrays.
     Algorithmica, 2:113–129, 1987. [doi:10.1007/BF01840353] 5

[22] Tamás Király, Lap Chi Lau, and Mohit Singh: Degree bounded matroids and submodular
     flows. Combinatorica, 32(6):703–720, 2012. [doi:10.1007/s00493-012-2760-6] 18

[23] Lap-Chi Lau, R. Ravi, and Mohit Singh: Iterative Methods in Combinatorial Optimization.
     Cambridge Univ. Press, 2011. [doi:10.1017/CBO9780511977152] 2, 16, 17

[24] Jan Karel Lenstra, David B. Shmoys, and Éva Tardos: Approximation algorithms
     for scheduling unrelated parallel machines. Math. Programming, 46:259–271, 1990.
     [doi:10.1007/BF01585745] 6, 16, 18

[25] László Lovász, Joel H. Spencer, and Katalin Vesztergombi: Discrepancy of set-systems
     and matrices. Europ. J. Combinat., 7(2):151–160, 1986. [doi:10.1016/S0195-6698(86)80041-5]
     8, 15

[26] Shachar Lovett and Raghu Meka: Constructive discrepancy minimization by walking on
     the edges. SIAM J. Comput., 44(5):1573–1582, 2015. [doi:10.1137/130929400] 3, 12

[27] Robin Pemantle: Towards a theory of negative dependence. J. Math. Phys., 41(3):1371–1390,
     2000. [doi:10.1063/1.533200] 3

[28] Yuval Peres, Mohit Singh, and Nisheeth K. Vishnoi: Random walks in poly-
     topes and negative dependence. In Proc. 8th Innovations in Theoret. Comp. Sci.
     Conf. (ITCS’17), pp. 50:1–10. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017.
     [doi:10.4230/LIPIcs.ITCS.2017.50] 3

[29] Thomas Rothvoß: The entropy rounding method in approximation algorithms. In Proc.
     23rd Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’12), pp. 356–372. SIAM, 2012.
     [doi:10.1137/1.9781611973099.32] 3, 8

[30] Alexander Schrijver: Combinatorial Optimization: Polyhedra and Efficiency. Volume B.
     Springer, 2003. Springer. 19

[31] Mohit Singh and Lap Chi Lau: Approximating minimum bounded degree spanning trees
     to within one of optimal. J. ACM, 62(1):1:1–19, 2015. [doi:10.1145/2629366] 6, 7

[32] Mohit Singh and Nisheeth K. Vishnoi: Entropy, optimization and counting. In Proc. 46th
     STOC, pp. 50–59. ACM Press, 2014. [doi:10.1145/2591796.2591803] 3, 6

[33] Aravind Srinivasan: Distributions on level-sets with applications to approxima-
     tion algorithms.    In Proc. 42nd FOCS, pp. 588–597. IEEE Comp. Soc., 2001.
     [doi:10.1109/SFCS.2001.959935] 3

                    T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                    22
                O N A G ENERALIZATION OF I TERATED AND R ANDOMIZED ROUNDING

[34] Ola Svensson, Jakub Tarnawski, and László A. Végh: A constant-factor approximation
     algorithm for the asymmetric traveling salesman problem. J. ACM, 67(6):37:1–53, 2020.
     [doi:10.1145/3424306] 6

[35] Vijay V. Vazirani: Approximation Algorithms. Springer, 2001. ACM DL. 2

[36] David P. Williamson and David B. Shmoys: The Design of Approximation Algorithms. Cam-
     bridge Univ. Press, 2011. CUP. 2


AUTHOR

     Nikhil Bansal
     Department of Computer Science
     University of Michigan, Ann Arbor
     bansal gmail com
     https://bansal.engin.umich.edu/


ABOUT THE AUTHOR

     Nikhil Bansal is a professor in the Department of Computer Science at University of
        Michigan, Ann Arbor. He attended the Indian Institute of Technology, Mumbai
        for his B. Tech. degree, and received his Ph. D. from Carnegie Mellon University,
        Pittsburgh. He got fascinated by algorithms after taking an undergraduate class
        by Ajit A. Diwan, and has worked on various algorithmic questions since then.
        During his free time, he enjoys reading, hiking and doing Yoga.




                    T HEORY OF C OMPUTING, Volume 20 (6), 2024, pp. 1–23                    23