Lowest-Degree $k$-Spanner: Approximation and Hardness

Authors Eden Chlamt{\'a}{\v{c}}, Michael Dinitz,
Plaintext
                          T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29
                                       www.theoryofcomputing.org

                          S PECIAL ISSUE : APPROX-RANDOM 2014



                       Lowest-Degree k-Spanner:
                      Approximation and Hardness
                                Eden Chlamtáč∗                     Michael Dinitz†
                  Received April 7, 2015; Revised July 20, 2016; Published September 24, 2016




       Abstract: A k-spanner is a subgraph in which distances are approximately preserved, up to
       some given stretch factor k. We focus on the following problem: Given a graph and a value
       k, can we find a k-spanner that minimizes the maximum degree? While reasonably strong
       bounds are known for some spanner problems, they almost all involve minimizing the total
       number of edges. Switching the objective to the degree introduces significant
                                                                             √       new challenges,
       and currently the only known approximation bound is an O(∆    e 3−2 2 )-approximation for
       the special case when k = 2 [Chlamtáč, Dinitz, Krauthgamer FOCS 2012] (where ∆ is the
       maximum degree in the input graph). In this paper we give the first non-trivial algorithm and
       polynomial-factor hardness of approximation for the case of arbitrary constant k. Specifically,
                             e (1−1/k)2 )-approximation and prove that it is hard to approximate
       we give an LP-based O(∆
       the optimum to within ∆Ω(1/k) when the graph is undirected, and to within ∆Ω(1) when it is
       directed.

ACM Classification: G.2.2, F.2.2, G.1.6
AMS Classification: 68W25, 68Q17, 05C85
Key words and phrases: approximation algorithms, hardness of approximation, graph spanners

     An extended abstract of this paper appeared in the Proceedings of the 17th International Workshop on Approximation
Algorithms for Combinatorial Optimization Problems (APPROX 2014) [12].
   ∗ Partially supported by ISF grant 1002/14.
   † Partially supported by NSF grants 1464239 and 1535887.



© 2016 Eden Chlamtáč and Michael Dinitz
c b Licensed under a Creative Commons Attribution License (CC-BY)                         DOI: 10.4086/toc.2016.v012a015
                                     E DEN C HLAMT Á Č AND M ICHAEL D INITZ

1     Introduction
A spanner of a graph is a sparse subgraph that approximately preserves distances. Formally, a k-spanner
of a graph G = (V, E) is a subgraph H of G in which dH (u, v) ≤ k · dG (u, v) for all u, v ∈ V , where dH and
dG denote shortest-path distances1 in H and G, respectively.2 Graph spanners were originally introduced
in the context of distributed computing [25, 26], and since then have been extensively studied from both a
distributed and a centralized perspective. Much of this work has focused on the fundamental tradeoffs
between stretch, size, and total weight, such as the seminal result of Althöfer et al. that every graph admits
a (2k − 1)-spanner with at most n1+1/k edges [1] and its many extensions (e. g., to dealing with total
weight [9]). Spanners have also appeared as fundamental building blocks in a wide range of applications,
from routing in computer networks [29] to property testing of functions [6].
     In parallel with this work on the fundamental tradeoffs there has been a line of work on approximating
spanners. In this setting we are usually given an input graph G and a stretch value k, and our goal is to
construct the best possible k-spanner. If “best” is measured in terms of the total number of edges, then
clearly the construction of [1] gives an O(n2/(k+1) )-approximation (for odd k), simply because Ω(n) is
a trivial lower bound on the size of any spanner of a connected graph. However, when the objective
function is to minimize the maximum degree, there are no non-trivial fundamental bounds like there
are for the number of edges, so it is natural to consider the optimization problem. Moreover, degree
objectives are notoriously difficult (consider degree-bounded minimum spanning trees [28] as opposed to
general minimum spanning trees), and so almost all work on approximation algorithms for spanners has
focused on minimizing the number of edges, as opposed to maximum degree.
     We call the problem of minimizing the degree of a k-spanner the L OWEST-D EGREE k-S PANNER
problem (which we will abbreviate LDkS). For directed graphs, we define LDkS to be the problem of
minimizing the maximum total degree of a k-spanner, that is, the sum of the in- and out-degrees. Kortsarz
and Peleg initiated the study of the maximum degree of a spanner, giving an O(∆1/4 )-approximation
for LD2S [24] (where√      ∆ is the maximum degree of the input graph). This was only recently im-
proved to O(∆3−2 2+ε ) = O(∆0.17...+ε ) for arbitrarily small constant ε > 0 by Chlamtáč, Dinitz, and
Krauthgamer [13]. The only known hardness for LD2S was Ω(log n) [24]. Despite the length of time
since minimizing the degree was first considered (over 15 years) and the significant amount of work on
other spanner problems, no nontrivial upper or lower bounds were known previous to this work for LDkS
when k ≥ 3.


1.1    Our results and techniques
We give the first nontrivial upper and lower bounds for the approximability of L OWEST-D EGREE k-
S PANNER for constant k ≥ 3. We assume throughout that all edges have length 1; while much previous
work has dealt with spanners with arbitrary edge lengths, our results (and all previous results on optimizing
the degree) are specific to uniform edge lengths. Handling general edge lengths is an intriguing open
problem.
   1 We consider both undirected and directed graphs. In directed graphs, “path” refers to directed path (hence the distance

function dG (u, v) is not symmetric).
   2 Equivalently, a subgraph H is a k-spanner if d (u, v) ≤ k for every edge (u, v) in G.
                                                   H



                          T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                           2
                         L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

    As we also note later, it is easy to see that any k-spanner must have maximum degree at least ∆1/k
(simply to span the edges incident to the node of maximum degree). Thus, simply outputting the original
graph is a ∆1−1/k approximation. We beat the trivial algorithm, and give the following algorithmic result.3

Theorem 1.1. For every fixed integer k ≥ 1, there is a randomized polynomial-time O(∆e (1−1/k)2 )-
approximation for L OWEST-D EGREE k-S PANNER in both undirected and directed graphs.

     While this may seem like a rather small improvement over the trivial ∆1−1/k -approximation, it still
requires significant technical work (possibly explaining why no nontrivial bounds were known previously).
Note that in the special case of k = 2 our bound recovers the bound of [24], although not the improved
one of [13]. This is not a coincidence: our algorithm is a modification of [24], albeit with a very different
and significantly more involved analysis. We use a natural flow-based linear program in which the
decision variable for each edge is interpreted as a capacity, while the spanning requirement is interpreted
as requiring that for every original edge {u, v} there is enough capacity to send 1 unit of flow along paths
of length at most k (this is essentially the same LP used for directed spanners by [15, 6] but with a degree
objective, and reduces to the LP used by [24] when k = 2).
     The LP rounding in [24] was a simple independent randomized rounding which ensured that every
path of length 2 is contained in the spanner with probability that is at least the LP flow along that path.
Since paths of length 2 with common endpoints are naturally edge-disjoint, these events are independent
(for a fixed edge (u, v)), and a simple calculation shows that at least one u v path survives the rounding
with probability at least 1 − 1/e.
     When k ≥ 3 the structure of these paths becomes significantly more complicated. While we still
guarantee that each flow path will be contained in the spanner with probability proportional to the amount
of flow in the path, we can no longer guarantee independence, as the flow-paths are not disjoint, and may
intersect and overlap in highly non-trivial ways. Our main technical contribution (in the upper bound)
shows that the rounding exhibits a certain dichotomy: either we can carefully prune the paths (while
retaining 1/ polylog(∆) flow) until they are disjoint, or the number of flow-paths that survive the rounding
is concentrated around an expectation which is ω(1). This ensures that (after boosting by repeating the
rounding a polylogarithmic number of rounds), every edge is spanned with high probability.
     On the lower bound side, our main result is the following.

Theorem 1.2. For any integer k ≥ 3, there is no polynomial-time algorithm that can approximate
L OWEST-D EGREE k-S PANNER better than ∆Ω(1/k) unless NP ⊆ BPTIME(2polylog(n) ).

    We can actually get a stronger hardness result if we assume that the input graph is directed.

Theorem 1.3. There is some constant γ > 0 such that for any integer k ≥ 3 there is no polynomial-time
algorithm that can approximate L OWEST-D EGREE k-S PANNER on directed graphs better than ∆γ , unless
NP ⊆ BPTIME(2polylog(n) ).
   3 Our algorithm and analysis work for both the undirected and directed cases with no change. The parameter k is taken to
                       e notation in the approximation guarantee hides polylogarithmic factors of the form O(log n(log ∆) f (k) )
be a constant, and the O
for some function f (k) that depends exponentially on k. The running time is bounded by a fixed polynomial in n that does not
depend on k.


                            T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                              3
                                 E DEN C HLAMT Á Č AND M ICHAEL D INITZ

    It is important to note that these hardness results do not hold if we replace ∆ by n, as the algorithmic
results do. The instances generated by the hardness reduction have a maximum degree that is subpolyno-
mial in n, so the best hardness that we would be able to prove (in terms of n) would be subpolynomial
(although still superpolylogarithmic). On the other hand, by phrasing the hardness in terms of ∆ we
not only allow direct comparisons to the upper bounds, but also allow us to use techniques (namely
reductions from L ABEL C OVER and M IN -R EP) that typically give only subpolynomial hardness results.
Our hardness results require a mix of previous techniques and ideas, but with some interesting twists.
    There is a well-developed framework (mostly put forward by Kortsarz [22] and Elkin and Peleg [20])
for proving hardness for spanner problems by reducing from M IN -R EP, a minimization problem related
to L ABEL C OVER that has proven useful for proving hardness (see Section 3 for the formal definition).
Our reductions have two key modifications. First, we boost the degree by including many copies of both
the starting M IN -R EP instance and the added gadget nodes. This was unnecessary for previous spanner
problems because boosting the degree was not necessary—it was sufficient to boost the number of edges
by including many copies of just the gadget nodes.
    The second modification is particular to the undirected case. Undirected spanner problems are difficult
to prove hard because if we try to simply apply the generic framework for reducing from M IN -R EP, there
can be extra “fake” paths that allow the spanner to bypass the M IN -R EP instance altogether. Elkin and
Peleg [18] showed that for basic (min-cardinality rather than min-degree) undirected k-spanner it was
sufficient to use M IN -R EP instances with large girth: applying the framework to those instances would
yield hardness for basic k-spanner. But they left open the problem of actually proving that M IN -R EP with
large girth was hard. This was proved recently [14] by subsampling the M IN -R EP instance to get rid of
short cycles while still preserving hardness, finally proving hardness for basic k-spanner.
    We might hope LDkS is similar enough to basic k-spanner that we could just apply the generic
reduction to M IN -R EP with large girth. Unfortunately this does not work, since the steps we take to boost
the degree end up introducing short cycles even if the starting M IN -R EP instance has large girth (unlike
the reduction used for basic k-spanner [18]). So we might instead hope that we could simply use the
ideas of [14], and subsample after doing the reduction rather than before. Unfortunately this does not
work either. Instead, we must do both: apply the normal reduction to the special (already subsampled)
M IN -R EP instances from [14], and then do an extra, separate round of subsampling on the reduction. In
other words, we must sample both the M IN -R EP instance itself and the graph obtained by applying the
generic reduction to these already sampled instances.


1.2   Related work
There has been a huge amount of work on graph spanners, from their original introduction in the late
80’s [25, 26] to today. The best bounds on the tradeoff between stretch and space were reached by
Althöfer et al. [1]. Most of the work since then has been on extending these tradeoffs (e. g., including
additive stretch [4, 10], fault-tolerance [11, 16], or average stretch [8]) or considering algorithmic aspects
such as allowing fast distance queries [30] or extremely fast constructions [21].
     In parallel with this, there has been a line of work on approximating graph spanners. This was
initiated by Kortsarz and Peleg, who gave an O(log(|E|/|V |))-approximation for the sparsest 2-spanner
problem [23] and then an O(∆1/4 )-approximation for L OWEST-D EGREE 2-S PANNER [24]. This was

                        T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                               4
                     L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

followed by upper bounds by Elkin and Peleg [19] for a variety of related spanner problems including
LD2S (although not LDkS).
     With the exception of [24], one feature that the approximation algorithms for spanners have shared
with the global bounds on spanners has been the use of purely combinatorial techniques. Kortsarz and
Peleg introduced the use of linear programming for spanners [24], but this was a somewhat isolated
example. More recently, linear programming relaxations have become a dominant technique, and have
been used for transitive closure spanners [7], directed spanners [15, 6], fault-tolerant spanners [15, 16],
and LD2S [13]. In this paper we use a rounding scheme similar to [24] (with a much more complicated
analysis) and an LP that is a degree-based variant of the flow-based LP introduced by [15] (an earlier use
of flow-based LPs for approximating spanners is [17]).
     On the hardness side, the first results were due to Kortsarz [22] who proved Ω(log n)-hardness for
                                                       1−ε
the basic k-spanner problem (for constant k) and 2log n -hardness for a weighted version. These results
                                                                            1−ε
were pushed further by Elkin and Peleg [20], who proved the same 2log n -hardness for a collection
of spanner problems including directed k-spanner. Separately, Kortsarz and Peleg proved logarithmic
hardness for LD2S [24]. Proving strong hardness for basic k-spanner remained open until recently, when
Dinitz, Kortsarz, and Raz proved it by showing that M IN -R EP is hard even when the instances have large
girth [14]. They accomplished this through careful subsampling, which we push further by subsampling
both before and after the reduction.

1.3   Preliminaries
We now give some basic formal definitions which will be useful throughout this paper. Given an
unweighted graph G = (V, E), we let dG (u, v) denote the shortest-path distance from u to v in G, i. e., the
minimum number of edges in any path from u to v (note that if G is directed this may be asymmetric).
The girth of a graph is the minimum number of edges in any cycle in the graph. We use the notation e ∼ v
to indicate that e is incident on v, and the notation p : u v to indicate that p is a path from u to v (in the
case of directed graphs, a directed path). We think of paths as tuples of edges, and denote by (p)i the i-th
edge in a path p. For integer k, we will use [k] to denote the set {1, 2, . . . , k}. We will use ln x to denote
loge x, and log x to denote log2 x.
     A k-spanner of G is a subgraph H of G in which dH (u, v) ≤ k · dG (u, v) for all u, v ∈ V . The value k
is referred to as the stretch of the spanner. The fundamental problem that we are concerned with is the
following.

Definition 1.4. Suppose we are given an unweighted graph G and a stretch parameter k. The problem of
computing the k-spanner that minimizes the maximum degree is L OWEST-D EGREE k-S PANNER.


2     The algorithm
We now present our approximation algorithm for LDkS, proving Theorem 1.1. It is not hard to see that
a subgraph with maximum degree D can only be a k-spanner if the original graph has degree at most
∑ki=1 Di = O(Dk ) (the maximum number of possible paths of length ≤ k starting from a given node in the
spanner). Therefore, we have

                        T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                 5
                                   E DEN C HLAMT Á Č AND M ICHAEL D INITZ

Observation 2.1. In a graph with maximum degree ∆, any k-spanner must have maximum degree at least
Ω(∆1/k ).

2.1     LP relaxation, rounding, and approximation guarantee
Our algorithm uses the following natural LP relaxation.
                       min d
                       s. t.   ∑e∼v xe ≤ d                                    ∀v ∈ V                   (2.1)
                               ∑ p:u    v,|p|≤k y p = 1                   ∀(u, v) ∈ E                  (2.2)
                               xe ≥       ∑          yp                ∀(u, v), e ∈ E                  (2.3)
                                       p:u v,|p|≤k
                                           p3e

                               xe , y p ≥ 0                                     ∀e, p                  (2.4)

    Note that this LP has size nk+O(1) , and thus can be solved in polynomial time when k is constant.
However, there is also an equivalent formulation whose size is bounded by a fixed polynomial in n, that
does not depend on k. As shown in [6], when the edges have unit length (as in our case), constraints (2.2)
and (2.3) can be captured by general flow constraints on a layered graph with (k + 1)n vertices. Since we
may assume that k ≤ n − 1 (in the unit-length case, if k > n − 1, then a subgraph is a k-spanner iff it is an
(n − 1)-spanner), this graph has at most n2 vertices, which allows us to solve the LP in time which is a
fixed polynomial in n.
    Recall that in a k-spanner, it is sufficient to span every edge by a path of length at most k. Note that
for any edge (u, v) ∈ E there may be multiple paths in the spanner spanning this edge. However, we
can always pick one such path per edge. In the intended (integral) solution to the above formulation, xe
is an indicator for whether e appears in the spanner, and y p is an indicator for the unique spanner-path
we assign to (u, v) (it could even be just the edge itself, if p = (u, v)). Thus, we get the following easy
observation.
Observation 2.2. In a graph with maximum degree ∆, in which the optimal solution to the above LP is
dLP , any k-spanner (including the optimum spanner) must have maximum degree at least dLP .
    We apply a rather naïve rounding algorithm to the LP solution, which can be thought of as a natural
extension of the rounding in [24] for LD2S:
                                                                              1/k
      • Independently add each edge e ∈ E to the spanner with probability xe .
The heart of our analysis is showing that in the subgraph this produces, every original edge is spanned
with probability at least Ω(1).
                          e     It is then only a matter of repeating the above algorithm a polylogarithmic
number of times to ensure that every edge is spanned with high probability. This will only incur a
polylogarithmic factor in the degree guarantee. The following lemma gives an easy bound on the expected
degree of any vertex in the above rounding.
Lemma 2.3. Let H be the subgraph obtained from the above rounding, and let dOPT be the smallest
possible degree of a k-spanner of G = (V, E). Then every vertex in H has expected degree at most
               2
O(dOPT ∆(1−1/k) ).

                        T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                              6
                         L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

Proof. By linearity of expectation, the expected degree of any v ∈ V is
                        1/k
                ∑e∼v xe       ≤ (degG (v))1−1/k (∑e∼v xe )1/k                by Jensen’s inequality4
                                          1/k
                              ≤ ∆1−1/k dLP
                                                     1−1/k
                                          1/k      dOPT
                              ≤ ∆1−1/k dLP ·                                 by Observation 2.1
                                                ∆(1/k)(1−1/k)
                                          2
                              ≤ ∆(1−1/k) · dOPT                              by Observation 2.2.



Remark 2.4. Note that a simple Chernoff bound says that all degrees will be concentrated around their
respective expectations, as long as the expectations are sufficiently large. Specifically, for any vertex for
which the expected degree is µ ≥ 2 ln n, since the degree is a sum of independent Bernoulli variables,
the probability that its degree will be greater than 3µ is (by Chernoff) at most (e2 /(33 ))µ < e−µ = 1/n2 .
On the other hand, if the expected degree is less than 2 ln n, then the probability that the degree will be
greater than 6 ln n is even smaller. Taking a union bound, the probability that any vertex v with expected
degree µv will have degree greater than max{3µv , 6 ln n} is at most 1/n. This concentration argument can
also be applied to the total number of edges incident to a vertex added throughout the polylogarithmically
many iterations of the basic algorithm, with multiplicities.

   Thus, the crux of the analysis is to show that, indeed, every edge will be spanned with some reasonable
probability.

2.2    Proof of correctness
Having shown that t iterations of the algorithm (for t ≥ 3 ln n) produce a subgraph with maximum
                           2
degree at most O(t∆(1−1/k) dOPT ), it suffices to show that, for every edge, one iteration of the algorithm
spans that edge with probability at least Ω(1/ polylog(∆)). Theorem 1.1 then follows by choosing some
t = O(log n · polylog(∆)). Thus, our goal for the remainder of the section is to prove the following lemma.

Theorem 2.5. Given a solution to the LP relaxation, our rounding algorithm spans every edge (by a path
of length at most k) with probability at least 1/ polylog(∆).

     Before proving this theorem, let us give some intuition. Suppose, for simplicity, that for an edge
(u, v) ∈ E, all the contribution in (2.2) (the spanning constraint) comes from paths of length exactly k.
First, consider the case where all the paths with non-zero weight y p in (2.2) are edge-disjoint. For every
edge e in such a path p we have, from (2.3), that xe ≥ y p . Therefore, the probability that such a path
survives (i. e., all the edges in it are retained in the rounding) is
                                                              1/k
                                                        ∏ xe        ≥ yp .
                                                        e∈p

   4 Alternatively, this follows from Hölder’s inequality.



                              T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                        7
                                   E DEN C HLAMT Á Č AND M ICHAEL D INITZ

Denoting by P the set of such paths, by disjointness these events are independent, and therefore we have

            Prob[(u, v) is spanned] ≥ 1 − ∏ (1 − y p ) ≥ 1 − ∏ e−y p = 1 − e∑ p∈P y p = 1 − 1/e .
                                                p∈P              p∈P


Thus, repeating this process O(log n) times, all such edges will be spanned with high probability.
     However, u v paths of length ≥ 3 need not be disjoint in general. We may assume that all paths
p ∈ P have some fixed length k0 ∈ [k] and are tuples of the form (ei )ki=1 ∈ ∏ki=1 Ei for some disjoint edge
sets E1 , . . . , Ek ⊂ E (see Lemma 2.6). Consider the other extreme (in terms of path disjointness) where
k0 = k and the flow is distributed evenly over all possible paths of the form u − v1 − v2 − · · · − vk−1 − v
for vi ∈ Vi , where {Vi | i ∈ [k − 1]} is an equipartition of V \ {u, v} (that is, uniform flow through a
series of bicliques). Here, the amount of flow through each edge in the first and last layers is roughly
(k − 1)/n, and the amount of flow through any edge in the other layers is roughly ((k − 1)/n)2 . Thus,
in the worst case, edges in the first and last layers will have values xe = (k − 1)/n and in the other
layers xe = ((k − 1)/n)2 . It is easy to see that the number of edges from u to V1 that are still present
(after the rounding) is concentrated around (n/(k − 1))1−1/k (since each outgoing edge from u is retained
independently with probability ((k − 1)/n)1/k ). Similarly, every vertex in layers i = 2, . . . , k − 2 will retain
≈ (n/(k − 1))1−2/k edges to the next layer, creating a total of (n/(k − 1))1−1/k+(1−2/k)(k−2) paths from u
to Vk−1 , an (n/(k − 1))−1/k fraction of which will continue to v. Thus, not only is (u, v) spanned after the
                                                2
rounding, it is spanned by ≈ (n/(k − 1))(k −3k+2)/k different paths (unlike the disjoint case, where only a
constant number of paths survive).
     Thus intuitively we have two scenarios: either the paths are disjoint, or they overlap, and a large
number of them survive (both in expectation and with high probability due to concentration). However,
this is not easy to formalize (moreover, we note that obvious analysis techniques do not work—for
example, on an edge-by-edge basis, gradually merging two paths does not monotonically increase the
probability that at least one path survives). To greatly simplify the formalization of this dichotomy, we
prune the paths to achieve near-regularity in the LP values and combinatorial structure of the flow. To
describe the outcome of the pruning, we need to introduce one more notation: Given a set P0 of paths
and a (small) set S of edges, we denote by mP0 (S) the number of paths p ∈ P0 such that p contains S. For
example, mP0 (0)    / = |P0 | and for any path p ∈ P0 (considering p as a set of edges), mP0 (p) = 1.
     The pruning procedure, which is only needed for the analysis, is an extension of standard pruning
techniques (e. g., pruning to make a bipartite graph nearly regular), and is summarized in the following
lemma, which we prove in Section 2.3.

Lemma 2.6. There exists a function f such that for any vertices u, v ∈ V and set P of paths from u to v of
length at most k such that ∑ p∈P y p ≥ 1/ polylog(∆), there exists a subset P0 ⊆ P satisfying the following
conditions.

    • For some k0 ∈ [k], all paths in P0 have length k0 .
                                            0
    • All the paths in P0 are tuples in ∏ki=1 Ei for some pairwise disjoint collection of sets E1 , . . . , Ek0 ⊂ E.

    • There exists some y0 > 0 such that every path has weight y p ∈ [y0 , 2y0 ]. Furthermore, y0 |P0 | ≥
      1/(log ∆) f (k) .

                         T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                    8
                       L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

    • There exists a positive integer vector (mI )I⊆[k0 ] such that mP0 ((ei )i∈I ) ∈ [mI , mI (log ∆) f (k) ] for every
      index set 0/ 6= I ⊆ [k0 ] and every I-tuple (ei )i∈I ∈ ∏i∈I Ei which is contained in some path in P0 .
      (Note that if e j ∈ (ei )i∈I then mP0 ((ei )i∈I ) ≤ m(e j ) and therefore mI ≤ m{ j} (log ∆) f (k) for j ∈ I.)
                         0
   We note that if ∏ki=1 m{i} ≤ polylog(∆), this is quite close to the disjoint paths case (where m{i} = 1),
and can be analyzed accordingly. The following lemma gives the relevant result for this case.
                                                                                                    0
Lemma 2.7. Let P0 be the set of paths given by Lemma 2.6, and suppose ∏ki=1 m{i} < (log ∆)g for some
constant g = g(k). Then with probability at least 1/(log ∆)h (for some constant h = h(k)), at least one
path in P0 survives the rounding.

Proof. For the sake of the analysis, let us prune the paths even further. Go through every level Ei for
i = 1, . . . , k0 sequentially, and for every e ∈ Ei , choose exactly one (undeleted) path that contains e and
delete all other paths containing e. Since for all e ∈ Ei we have mP0 (e) ≤ m{i} (log ∆) f (k) , in each level
we retain at least a 1/(m{i} (log ∆) f (k) )-fraction of paths. Therefore, we end up with a new collection of
                                                        0
paths P∗ ⊆ P0 such that |P∗ | ≥ |P0 |/(log ∆)g(k)+k f (k) , and the paths in P∗ are edge-disjoint.
    The analysis is now straightforward. Every path p ∈ P∗ is retained with probability
                                       1/k           1/k   k0 /k                            0
                                 ∏ xe        ≥ ∏ y p ≥ y0          ≥ (|P0 |(log ∆) f (k) )−k /k .
                                 e∈p           e∈p

There are |P∗ | such paths, and each survives independently of the rest, therefore, at least one path in P∗
survives with probability
                     
                             1/k
                                       
                                               0
                                                               0 |P∗ |
                                                         f (k) −k /k
                1 − ∏ 1 − ∏ xe     ≥ 1 − 1 − |P |(log ∆)
                    p∈P∗         e∈p
                                                                 0                    0         0
                                                                                                         
                                              ≥ 1 − exp −|P0 |(k−k )/k (log ∆)−( f (k)k /k+g(k)+k f (k))
                                                                                        
                                              ≥ 1 − exp −(log ∆)−(1+1/k)( f (k)+g(k))
                                              = (1 − o(1))(log ∆)−(1+1/k)( f (k)+g(k)) .

    We can also easily deal with the case m{i} ≥ |P0 |/ polylog(∆), which indicates that in some layer i,
the paths are concentrated in a small number of edges, by choosing just one edge e ∈ Ei , contracting this
edge, and deleting all paths that do not use e (see the proof of Theorem 2.9). Thus, the main case we have
to deal with is the intermediate case, where there is non-negligible overlap (∏ m{i} is not too small), but
also no edges have too large a load (no m{i} is too close to |P0 |). It is not hard to show that in this case the
expected number of paths will be large, but showing concentration is more challenging. This constitutes
the bulk of the technical analysis.
    Before we describe this part of the analysis, let us describe a more uniform rounding scheme, which is
dominated by our algorithm. Consider a single edge e ∈ Ei . We know this edge is contained in mP0 ({e})
paths in P0 , and each of these paths has weight y p ∈ [y0 , 2y0 ]. Therefore, by (2.3) and Lemma 2.6, we
have
                             xe ≥ m({e})y0 ≥ m{i} y0 ≥ m{i} /(|P0 |(log ∆) f (k) ) .                        (2.5)

                             T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                    9
                                 E DEN C HLAMT Á Č AND M ICHAEL D INITZ

                                                                                                 1/k
    Suppose instead of sampling each edge independently with probability xe , we retained every edge
                         1/k
e ∈ Ei with probability xi for
                                    xi := m{i} /(|P0 |(log ∆) f (k) ) ,
and let Y be the number of paths in P0 that survive this rounding. This is clearly a lower bound for the
number of paths retained in our original rounding algorithm (we can think of the modified rounding as
first applying the original rounding, and then subsampling the edges even further). Note that
                                                                   !1/k
                                                       k0
                                         0 1−k0 /k                                       0
                              E[Y ] = |P |            ∏ m{i}              /(log ∆) f (k)k /k ,             (2.6)
                                                      i=1

so, as we’ve mentioned, if ∏i m{i} is large, then E[Y ] will also be large. By Chebyshev’s inequality, we
can bound the probability that Y = 0 by
                                                                               
                                      1                             2    1      2    4 Var[Y ]
           Prob[Y = 0] ≤ Prob Y < E[Y ] ≤ Prob (Y − E[Y ]) > (E[Y ]) <                         .
                                      2                                  4           (E[Y ])2

Thus, to prove, say, that Prob[Y = 0] < 1/2, it suffices to show that

                                                                  1
                                    Var[Y ] = E[Y 2 ] − (E[Y ])2 < (E[Y ])2 .                              (2.7)
                                                                  8
While the proof of this bound is somewhat technical, it is greatly simplified by the pruning phase, which
allows us to bound the variance directly as a function of the mI values without having to analyze the
combinatorial structure of the flow. The result for the main case is given by the following lemma.

Lemma 2.8. Let P0 be the set of paths given by Lemma 2.6. Then if

                                             k0
                                             ∏ m{i} ≥ (log ∆)g(k) ,
                                             i=1

and for every i ∈ [k0 ] we have m{i} ≤ |P0 |(log ∆)−g(k) , where g(k) ≥ (4k + 2) f (k) then (2.7) holds.

Proof. Recall that Y is defined w. r. t. the modified rounding (in which edges in Ei are all chosen with
             1/k
probability xi ). For any tuple of edges (ei ), we introduce the following random variable:
                               (
                                1 all edges in (ei ) survive the modified rounding,
                      X(ei ) =
                                0 otherwise.

Note that, in particular, we have
                                                     Y = ∑ Xp .
                                                            p∈P0

For every I ⊆ [k0 ], we also denote by EI the set of all tuples (ei ) ∈ ∏i∈I Ei that are contained in at least
one path in P0 (note that E0/ = {()} and E[k0 ] = P0 ). Note that |EI |mI ≤ |P0 |. We can now bound the second

                       T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                 10
                       L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

moment as follows.



                            h                  2 i
                 E[Y 2 ] = E        ∑ Xp              = ∑          ∑ E[Xp Xp ]   1         2
                                    p∈P0                  p1 ∈P0 p2 ∈P0

                                                      = ∑          ∑            ∑              E[Xp1 Xp2 ]
                                                          I⊆[k0 ] (ei )∈EI p1 ,p2 ∈P0
                                                                          p1 ∩p2 =(ei )

                                                      = ∑          ∑ E[X(e ) ] ∑      i
                                                                                                                E[Xp1 \(ei ) ] E[Xp2 \(ei ) ]
                                                          I⊆[k0 ] (ei )∈EI                   p1 ,p2 ∈P0
                                                                                            p1 ∩p2 =(ei )
                                                                                                                                       !2
                                                      < ∑          ∑ E[X(e ) ]        i                     ∑          E[Xp\(ei ) ]         .
                                                          I⊆[k0 ] (ei )∈EI                       p:((p)i )i∈I =(ei )




Noting that in the final sum, for I = 0/ the summand equals (∑ p∈P0 E[Xp ])2 , we get



                                                                                                                                                !2
                                                    2
                                                 
                     2             
      E       ∑ Xp            − E       ∑ Xp              <      ∑           ∑ E[X(e ) ]          i                ∑              E[Xp\(ei ) ]       .      (2.8)
              p∈P0                      p∈P0                  06/ =I⊆[k0 ] (ei )∈EI                         p:((p)i )i∈I =(ei )




   Thus, we only have to bound the sum on the right hand side above. We treat the summand for I = [k0 ]
separately. Using our lower bound on ∏i mi , for this value of I we get



                   2
       ∑ p ∑ 0/ = ∑ E[Xp ]
        E[X ]  E[X ]
      p∈P0           p0 =p                     p∈P0
                                                          2                  k0
                                                                    f (k)k0 /k    −1/k        0
                                           =       ∑ E[Xp ] (log ∆)            ∏ m{i} |P|−(1−k /k)                                               by (2.6)
                                                   p∈P0                                               i=1
                                                                                                             !2
                                                          −(g(k)− f (k)k0 )/k
                                           ≤ (log ∆)                                      ∑ E[Xp ]                 .                                        (2.9)
                                                                                          p∈P0



                              T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                                                           11
                                              E DEN C HLAMT Á Č AND M ICHAEL D INITZ

   Finally, for all non-empty strict subsets 0/ 6= I ( [k0 ] we have the following.
                                          !2                                        !2
                                                                                        1/k                                  1/k
      ∑ E[X(e ) ]  i              ∑              E[Xp\(ei ) ]       =     ∑ ∏ xi                  mP0 (ei )       ∏ xj
    (ei )∈EI               p:((p)i )i∈I =(ei )                          (ei )∈EI i∈I                             j∈[k0 ]\I
                                                                                                                1/k                2/k
                                                                    ≤ (log ∆)2 f (k) |EI |m2I ∏ xi                      ∏ xj
                                                                                                      i∈I             j∈[k0 ]\I
                                                                                                                1/k               2/k
                                                                    ≤ (log ∆)2 f (k) |P0 |mI ∏ xi                      ∏ xj
                                                                                                      i∈I             j∈[k0 ]\I

                                                                                                                  2
                                                                                                                 
                                                                                                                                                −1/k
                                                                    = (log ∆)2 f (k) E                                   |P0 |−1 mI ∏ xi
                                                                                               
                                                                                                    ∑ Xp
                                                                                                   p∈P0                                   i∈I
                                                                                                                !2
                                                                                                                                    1/|I|
                                                                                                                                  mI
                                                                    = (log ∆)2 f (k)          ∑ E[Xp ]                ∏ 1/k 0 1/|I|
                                                                                              p                       i∈I xi       |P |
                                                                                                                          !2
                                                                                                                                                 1/|I|
                                                                                                                                                mI
                                                                    = (log ∆)(2+|I|/k) f (k)            ∑ E[Xp ]                  ∏ 1/k 0 1/|I|−1/k
                                                                                                            p                     i∈I m{i} |P |
                                                                                                                          !2                1/|I|−1/k
                                                                                                                                         m{i}
                                                                                    (3+|I|/k) f (k)
                                                                    ≤ (log ∆)                           ∑ E[Xp ] ∏ |P0 |1/|I|−1/k .
                                                                                                            p                     i∈I

   By our upper bound on m{i} , this gives
                                                                !2                            !2
          ∑ E[X(e ) ]  i              ∑              E[Xp\(ei ) ]       ≤     ∑ E[Xp ]             (log ∆)−(1−|I|/k)g(k)+(3+|I|/k) f (k) ,
        (ei )∈EI               p:((p)i )i∈I =(ei )                              p

and combining this with (2.9) and plugging into (2.8), we get (by our choice of g(k))
                                   2              2
                                                  
               2                                                             0
                                                         (log ∆)−(g(k)− f (k)k )/k
                          
     E ∑ Xp        − E ∑ Xp           < ∑ E Xp
           p∈P0                        p∈P0                     p∈P0
                                                                      0  
                                                                    2 k −1 k0
                                                                
                                                          + ∑ E Xp        ·∑           (log ∆)−(1−i/k)g(k)+(3+i/k) f (k)
                                                            p∈P0            i=1  i
                                                                   2
                                                                       2k (log ∆)− f (k)/k ,
                                                                  
                                                         < ∑ E Xp
                                                                p∈P0

and the lemma follows for all sufficiently large ∆.

    Finally, we combine these three components to give our correctness guarantee. Theorem 2.5 will
follow by applying Lemma 2.6 applied to the set P of all u      v paths of length at most k, and then
                                           0
applying the following theorem to the set P of paths guaranteed by the lemma.
Theorem 2.9. Let P0 be a collection of u         v paths as in Lemma 2.6, then with probability at least
1/(log ∆)`(k) (for some function `), at least one path in P0 survives the rounding.

                             T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                                                       12
                      L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

Proof. First, consider the case where m{i} ≤ |P0 |(log ∆)−(4k+2) f (k) for every i ∈ [k0 ]. In this case, if

                                           k0
                                          ∏ m{i} ≥ (log ∆)(4k+2) f (k) ,
                                          i=1

then the theorem follows directly from our second moment argument and Lemma 2.8. If
                                           k0
                                          ∏ m{i} < (log ∆)(4k+2) f (k) ,
                                          i=1

on the other hand, then the theorem follows from Lemma 2.7.
    On the other hand, if there does exist some i ∈ [k0 ] such that m{i} ≥ |P0 |(log ∆)−(4k+2) f (k) , then the
above analysis breaks down. In this case, choose any edge e ∈ Ei , and note that (by (2.5))

                              xe ≥ m{i} /(|P0 |(log ∆) f (k) ) ≥ (log ∆)−(4k+3) f (k) .

Suppose e = (s,t). In the undirected case, we have a minor technical detail: we choose the direction
(s,t) or (t, s) which contains at least xe0 /2 flow in P, say (s,t). Let Pe0 be the set of paths in P0 that use
(s,t) (in this direction) (in the directed case, Pe0 is just the set of paths in P0 that use e). Then every
path in Pe0 consists of three parts: a u      s prefix of length i − 1, the edge (s,t), and a t     v suffix of
length k0 − i. Let Pe00 be the set of contracted paths {p/{e} | p ∈ Pe0 } in the contracted graph G/{e}.
The paths in Pe00 are clearly in a one-to-one correspondence with the paths in Pe0 . Note that the paths
Pe00 satisfy all the properties given by Lemma 2.6 with (4k + 4) f (k) in place of f (k) (where we define
mPe00 (S) := mP0 (S ∪ {e})).
      The original rounding will retain edge e with probability at least (log ∆)−(4+3/k) f (k) . However, by
induction on k, there is also a (log ∆)−`(k−1) probability that some (contracted) u v path in Pe00 will
survive. Since this event is independent of the event where e is retained, we have that at least one path in
Pe0 will survive with probability at least (log ∆)−(4+3/k) f (k)−`(k−1) .

2.3   Bucketing and pruning
Here we prove Lemma 2.6. Consider an edge (u, v) ∈ E. It is immediate that for some k0 ∈ [k] at least 1/k
fraction of the flow in P must come from paths of length exactly k0 , which guarantees the first condition
of the lemma. The third condition follows by noting that we can easily bucket by path weight (flow) and
lose at most a logarithmic factor.
Claim 2.10. For any set P of paths of length k0 , there is some positive integer t < k0 log ∆ such that the
set Pt = {p ∈ P | 2−(t+1) < y p ≤ 2−t } of paths satisfies
                                                       1           1
                                        ∑ yp ≥                    −
                                                            ∑ p ∆ .
                                                   k0 log ∆ p∈P
                                                                y
                                       p∈Pt

     The second condition requires the paths to form a layer graph, that is, for every edge e ∈ E there is
a fixed i ∈ [k] such that edge e can only appear as the i-th edge in a u v flow-path. This can be done
while losing at most a constant factor (that depends on k), as shown in the following lemma.

                        T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                  13
                                     E DEN C HLAMT Á Č AND M ICHAEL D INITZ

Lemma 2.11. Given P, a collection of u v paths of length k, with non-negative weights y p , there is
a subset P0 ⊆ P and a mapping ϕ : E → [k] such that every edge e ∈ E can only appear in the ϕ(e)-th
position in any path in P0 , and furthermore

                                           ∑ y p ≥ (k − 2)−(k−2) ∑ y p .
                                          p∈P0                       p∈P

Proof. We may assume that all paths in P are simple paths. Note that any edge incident to u or
to v will always be the first or k-th edge, respectively. Thus, for k ≤ 2, the lemma is trivially true
for P0 = P. Now consider k ≥ 3, and for all other edges, choose a uniformly random assignment
ϕ : E \ (E(u) ∪ E(v)) → {2, . . . , k − 1}. Since the paths are simple, the probability that any path p ∈ P is
consistent with ϕ (that is, ϕ maps the edge (p)i to i, for every i ∈ [k]) is exactly (k − 2)−(k−2) . Let P0 be
the set of paths in P that are consistent with ϕ, then by linearity of expectation we have
                                        h       i
                                       E ∑ y p = (k − 2)−(k−2) ∑ y p ,
                                           p∈P0                        p∈P

and so there exists a set P0 ⊆ P with the required properties.

      We now have a layered collection P of paths of length exactly k0 with nearly uniform (up to a constant
factor) path weights which support Ω(1)     e   flow from u to v (where the constant depends on k). Let
E1 , . . . , Ek0 ⊆ E be the layer sets corresponding to paths P. We would like to further prune the set P so
that the combinatorial structure of the flow network is almost regular in the sense required by the final
condition in Lemma 2.6.
      Consider any subset I ⊆ [k0 ], and for every tuple (ei )i∈I ∈ ∏i∈I Ei , denote by mP ((ei )i∈I ) the number
of paths in P which use these edges. That is,

                           mP ((ei )i∈I ) = |{p ∈ P | (|p| = k0 ) ∧ (∀i ∈ I : (p)i = ei )}| .

The proof of Lemma 2.6 is complete by showing that the paths in P can now be pruned so that the values
m((ei )i∈I ) are roughly uniform (for all tuples corresponding to a given I ⊂ [k0 ]), and a polylogarithmic
fraction of paths is retained.

Lemma 2.12. There exists some function f : N → N such that, given a layered set P of paths of length k0 ,
we can find a subset P0 ⊆ P satisfying

    • There exists a vector (mI )I⊆[k0 ] so that for every I ⊆ [k0 ] and every edge tuple (ei )i∈I ∈ ∏i∈I we
                                                                        0
      have either mP0 ((ei )i ) = 0 or mP0 ((ei )i ) ∈ [mI /(log ∆) f (k ) , mI ].
                               0
    • |P0 | ≥ |P|/(log ∆) f (k ) .

Proof. Start by letting P0 = P, the set of all paths. Sequentially, for each 0/ 6= I ⊆ [k; ], do the following:
bucket all possible tuples (ei )i∈I ∈ ∏i∈I Ei according to their current mP0 ((ei )i ) values. Noting that this
                   0
value is at most ∆k −1 (the maximum possible number of u v paths of length k0 ), there is some mI such
that tuples with m((ei )i ∈ [mI /2, mI ] constitute at least a 1/((k0 − 1) log ∆) fraction of paths in P0 . Retain
paths corresponding to these tuples and delete all other paths from P0 .

                          T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                14
                         L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

                                                                               0
     Note that since this procedure is repeated a constant (2k − 1) number of times, we still have
                                                                                        k0
                                                  |P0 | ≥ |P|/((k0 − 1) log ∆)2 −1 .

Now consider some set I ⊆ [k]. Immediately after the pruning step corresponding to I, all remaining
I-tuples had m((ei )i ) ∈ [mI /2, mI ]. Denote the set of these tuples by TI , and the set of paths that survive
this step by PI0 . We have that the average m(·) value for I-tuples at this stage is

                                       E(ei )∈TI [mPI0 ((ei )i )] = |PI0 |/|TI | ∈ [mI /2, mI ] .
                                              0
After further pruning (at most (2k − 2) steps for other subsets I 0 ⊆ [k]), this regularity no longer holds.
However, since a polylogarithmic fraction of paths are retained, the average m((ei )i ) value for (ei )i ∈ TI
cannot decrease too much. Specifically, denoting the current set of paths by P00 = P0 , the new average
m(·) values satisfy
                                                                         |PI0 |/|TI |                                 mI
            m0I := E(ei )∈TI [mP00 ((ei )i )] = |P00 |/|TI | ≥                      k        0       ≥                         k0
                                                                                                                                   .
                                                                   ((k0 − 1) log ∆)2 −2                           0
                                                                                                              2((k − 1) log ∆)2 −2

     Finally, while there exists any tuple (ei ) ∈ ∏i∈I Ei for any I ⊆ [k0 ] such that mP0 ((ei )) ≤ 2−k m0I , remove
all paths corresponding to this tuple from P0 . The total number of paths which may be pruned at this stage
is at most
                              ∑ |TI | · 2−k m0I = ∑ 2−k |P00 | = (2k − 1)2−k |P00 | .
                               06/ =I⊆[k]                   06/ =I⊆[k]

Thus, the final set P0 satisfies
                                                                              |P|
                                            |P0 | ≥ 2−k |P00 | ≥                       k         0        .
                                                                   2k ((k0 − 1) log ∆)2 −1
Furthermore, for every I-tuple (ei )i∈I which is involved in any path in P0 , we have m((ei )) ≤ mI and
                                                                                   mI
                                     m((ei )) ≥ 2−k m0I ≥                                            k0
                                                                                                               .
                                                                   2k+1 ((k0 − 1) log ∆)2 −2

3     Hardness of approximation
Our reductions are based on the framework developed by [22, 20]. Our hardness bounds rely on the
M IN -R EP problem. In M IN -R EP we are given a bipartite graph G = (A, B, E) where A is partitioned
into groups A1 , A2 , . . . , Ar and B is partitioned into groups B1 , B2 , . . . , Br , with the additional property
that every set Ai and every set B j has the same size (which we will call |Σ| due to its connection to the
alphabet of a 1-round 2-prover proof system). This graph and partition induces a new bipartite graph G0
called the supergraph in which there is a vertex ai for each group Ai and similarly a vertex b j for each
group B j . There is an edge between ai and b j in G0 if there is an edge in G between some node in Ai and
some node in B j . A node in G0 is called a supernode, and similarly an edge in G0 is called a superedge.5
    5 Rather than G being the graph and G0 being the supergraph, sometimes G0 is referred to as the graph and G is called the

label-extended graph.


                            T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                                      15
                                 E DEN C HLAMT Á Č AND M ICHAEL D INITZ

     A REP-cover is a set C ⊆ A ∪ B with the property that for all superedges {ai , b j } there are nodes
a ∈ Ai ∩C and b ∈ B j ∩C where {a, b} ∈ E. We say that {a, b} covers the superedge {ai , b j }. The goal
is to construct a REP-cover of minimum size.
     For any fixed constant ε > 0, we say that an instance of M IN -R EP is a YES instance if OPT = 2r (i. e.,
                                                                               1−ε
a single node is chosen from each group) and is a NO instance if OPT ≥ 2log n r. We will sometimes refer
                                      1−ε
to the hardness gap (in this case 2log n ) as the soundness s, due to the connection between M IN -R EP
and proof systems. The following theorem is due to Kortsarz [22] (the polynomial relations between
the parameters are implicit rather than explicit in his proof, but are straightforward to verify since the
instances used in [22] are obtained by parallel repetition [27] applied to instances of 3SAT-5 which have
a constant gap [3, 2]).

Theorem 3.1 ([22]). Unless NP ⊆ DTIME(2polylog(n) ), for any constant ε > 0 there is no polynomial-
time algorithm that can distinguish between YES and NO instances of M IN -R EP. This is true even when
the graph and the supergraph are regular, and both the supergraph degree and |Σ| are polynomial in the
soundness.

     In the basic reduction framework we start with a M IN -R EP instance, and then for every group we
add a vertex (corresponding to the supernode) which is connected to vertices in the group using paths
of length approximately k/2. We then add an edge between any two supernodes that have a superedge
in the supergraph. So there is an “outer” graph corresponding to the supergraph, as well as an “inner”
graph which is just the M IN -R EP graph itself. The basic idea is that the only way to span a superedge is
to use a path of length k that goes through the M IN -R EP instance, in which case the M IN -R EP edge that
is in this path corresponds to nodes in a valid REP-cover. So if we are in a YES instance there is a small
REP-cover and thus a small spanner, while if we are in a NO instance every REP-cover is large and thus
the spanner must have many edges in order to span the superedges.
     In [20] and [22] this framework is used to prove hardness of approximation when the objective is
the number of edges by creating many copies of the outside nodes (i. e., the supergraph), all of which
are connected to the same inner nodes (M IN -R EP graph). This forces the number of edges used in the
spanner to essentially equal the size of a valid REP-cover, as all other edges used by the spanner become
lower order terms. We reverse this, by creating many copies of the inner M IN -R EP graph. If we simply
connect a single copy of the outer graph we run into a problem, though: each superedge can be spanned
by paths through any of the copies. There is nothing that forces it to be spanned through all of them, and
thus nothing that forces degrees to be large. We show how to get around this by creating many copies of
both the inner and the outer graph, but using asymptotically more copies of the inner graph than the outer.


3.1   Directed LDkS
We first consider the directed setting (note that here the “degree” we are trying to minimize is the sum of
the in-degree and the out-degree). Suppose we are given a bipartite M IN -R EP instance G e = (A, B, E)
                                                                                                      e with
                          0           0
associated supergraph G = (U,V, E ) from Theorem 3.1. For any vertex w ∈ U ∪V we let Γ(w) denote
its group. So Γ(u) ⊆ A for u ∈ U, and Γ(v) ⊆ B for v ∈ V . We will assume without loss of generality
that G0 is regular with degree dG0 and G e is regular with degree d e . Our reduction will also use a special
                                                                   G
bipartite regular graph H = (X,Y, EH ), which will simply be the directed complete bipartite graph with

                       T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                               16
                     L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

|X| = |Y |. Let dH denote the degree of a node in H, so dH = |X| = |Y |. We will set all of these values to
dG0 + 2|Σ| + 1.
    Our LDkS instance G = (VG , EG ) will be a combination of these three graphs. Let kL = b(k − 1)/2c,
and let kR = d(k − 1)/2e. The four sets of vertices are
                       L                                                 R
                      Vout = U × X × [kL ] ,                            Vout = V ×Y × [kR ] ,
                       VinL = A × EH ,                                  VinR = B × EH .
The actual vertex set VG of our LDkS instance G will be VoutL ∪V R ∪V L ∪V R . We say that an outer node
                                                                   out   in in
                                                                L or k for nodes in V R ), and we say that
is maximal if its final coordinate is maximal (kL for nodes in Vout    R             out
an outer node is minimal if its final coordinate is 1.
    Defining the edge set is a little more complex, as there are a few different types of edges. We first
create outer edges, which are incident on maximal outer nodes.
       Eout = {((u, x, kL ), (v, y, kR )) : u ∈ U ∧ v ∈ V ∧ x ∈ X ∧ y ∈ Y ∧ {u, v} ∈ E 0 ∧ (x, y) ∈ EH } .
Note that if we fix x and y the corresponding outer edges form a copy of the supergraph G0 . Thus these
edges essentially form |EH | copies of the supergraph.
    We also have inner edges, which correspond to |EH | copies of the M IN -R EP instance. (Note that
unlike the supergraph copies, these copies are vertex disjoint.)
                       Ein = {((a, e), (b, e)) : a ∈ A ∧ b ∈ B ∧ e ∈ EH ∧ {a, b} ∈ E}.
                                                                                   e

    We will now add connection edges, i. e., edges that connect some of the outer nodes to some of the
inner nodes. Let
              L
             Econ = {((u, x, 1), (a, (x, y))) : u ∈ U ∧ a ∈ Γ(u) ∧ x ∈ X ∧ (x, y) ∈ EH } ,           and
              R
             Econ = {((b, (x, y)), (v, y, 1)) : v ∈ V ∧ b ∈ Γ(v) ∧ y ∈ Y ∧ (x, y) ∈ EH } .
In other words, the minimal outer node for each (u, x) (resp. (v, y)) is connected to the inner nodes in its
group in each copy of Ge that corresponds to an EH edge that involves x (resp. y).
    We now need to connect the minimal outer nodes and the maximal outer nodes. We do this by creating
paths: let
                    L
                   Epath = {((u, x, i), (u, x, i − 1)) : u ∈ U, x ∈ X, i ∈ {2, . . . kL }} ,   and
                    R
                   Epath = {((v, y, i), (v, y, i + 1)) : v ∈ V, y ∈ Y, i ∈ [kR − 1]} ,
                       L and E R path-edges. Finally, for technical reasons we need to add group edges
and call the edges in Epath     path
internally in each group in each copy of G:
                                         e let
                 L
                Egroup = {((a, e), (a0 , e)) : e ∈ EH ∧ a, a0 ∈ Γ(u) for some u ∈ U} ,          and
                 R
                Egroup = {((b, e), (b0 , e)) : e ∈ EH ∧ b, b0 ∈ Γ(v) for some v ∈ V } .
    Our final edge set is the union of all of these, namely
                                        L      R      L       R       L        R
                          Eout ∪ Ein ∪ Econ ∪ Econ ∪ Epath ∪ Epath ∪ Egroup ∪ Egroup .
A small example of this construction appears as Figure 1.

                       T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                 17
                                  E DEN C HLAMT Á Č AND M ICHAEL D INITZ




                                                                                   (b) Supergraph G0 .




                                                                                 (c) M IN -R EP graph G.
                                                                                                      e




                             (a) Final graph G.                                       (d) H = K2,2 .

Figure 1: Graph G created by the reduction for k = 7 when using supergraph G0 from Figure 1b and using
H = K2,2 (Figure 1d). Blue vertices are outer nodes, and the inner nodes are contained in the clear central
ovals (individual nodes not pictured), each of which is a group from the M IN -R EP graph G e (Figure 1c).
                                          0
Each dotted edge represents a copy of G (so six actual edges), the union of which are the outer edges.
Each of the four central rectangles represents a copy of the M IN -R EP graph G,
                                                                              e and together these are the
inner edges. Each small oval is a clique (the group edges), and the edges forming the outer 3-paths are
the path-edges. Each red edge from the outer nodes to the inner nodes actually represents a star, from the
outer node to the associated inner group. These stars together form the connection edges.


3.1.1   Analysis
We begin by simply figuring out the degrees of each kind of node in our construction. Each inner node is
incident on 2|Σ| group edges, a single connection edge, and at most dGe ≤ dG0 |Σ| inner edges, for a total
degree of at most dG0 |Σ| + 2|Σ| + 1. Minimal outer nodes are each incident on 1 path-edge and |Σ|dH
connection edges, for a total degree of |Σ|dH + 1. Maximal outer nodes are incident on 1 path-edge and
dG0 dH outer edges, for a total degree of dG0 dH + 1. If k = 3 of 4 then there are outer nodes which are both
maximal and minimal and thus have degree |Σ|dH + dG0 dH , but this will not matter. Together with our
choice of dH , this gives the following lemma.

Lemma 3.2. The maximum degree in G is achieved at either the maximal outer or minimal outer nodes,
and is at most O(dG2 0 + |Σ|2 ).

    We now show that if there is a small REP-cover for the original M IN -R EP instance, then there is a
k-spanner with low maximum degree. To do this we will use the notion of a canonical path for an outer
edge. Consider an outer edge ((u, x, kL ), (v, y, kR )). A path from (u, x, kL ) to (v, y, kR ) is canonical if it

                        T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                  18
                      L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

includes kL − 1 path-edges, followed by a connection edge, an inner edge, another connection edge, and
then kR − 1 path-edges. Note that any such path has length kL + kR + 1 = k, so can be used to span the
outer edge. Furthermore, note that any such path corresponds to selecting two nodes (the inner nodes hit
by the path) that cover the {u, v} superedge in the original M IN -R EP instance.
     It is not hard to see that the only way to span an outer edge is either through a canonical path (which
corresponds to a way of covering the associated superedge in the M IN -R EP instance) or including the
edge itself. This means that we can span all outer edges by using canonical paths corresponding to
a REP-cover, and that this is the only way to span outer edges (other than using those edges to span
themselves). Since in a YES instance there is a REP-cover in which only a single node is selected per
group, we can use those canonical paths to construct a k-spanner with maximum degree at most dH .
Lemma 3.3. If G       e is a YES instance of M IN -R EP, then there is a k-spanner of G which has maximum
degree at most dH + 1.
Proof. Since G     e is a YES instance, for each u ∈ U there is some f (u) ∈ Γ(u) and for each v ∈ V
there is some f (v) ∈ Γ(v) so that { f (u), f (v)} ∈ Ee for all {u, v} ∈ E 0 . Our spanner contains all edges
      L
in Egroup            R
             and Egroup    as well as all edges in EpathL         R . It also contains the connection edges
                                                             and Epath
suggested by the REP-cover: for every u ∈ U and x ∈ X and (x, y) ∈ EH , it contains the connection edge
((u, x, 1), ( f (u), (x, y))). Similarly, for every v ∈ V and y ∈ Y and (x, y) ∈ EH , it contains the connection
edge (( f (v), (x, y)), (v, y, 1)). Finally, it contains the appropriate inner edges: for every {u, v} ∈ E 0 with
u ∈ U and v ∈ V and every e ∈ EH , we add the inner edge (( f (u), e), ( f (v), e)).
     In this spanner, the degree of outer nodes which are not minimal is at most 2 (the 2 incident path-edges),
and the degree of inner nodes is at most dG0 + 2|Σ| + 1 (since they are incident on one connection edge,
2|Σ| group edges, and dG0 inner edges). The degree of a minimal outer node is at most dH + 1, since it is
incident on 1 path-edge and for each edge incident on the second coordinate in EH it is incident to a single
inner node. Thus the maximum degree of the spanner is at most max{dG0 + 2|Σ| + 1, dH + 1} = dH + 1 as
claimed.
     It remains to show that this is indeed a valid spanner. The only edges not included are the outer edges
and some of the connection edges and inner edges, so we simply need to prove that they are spanned by
paths of length at most k. For connection edges this is trivial. Consider some edge ((u, x, 1), (a, (x, y))) ∈
   L . Clearly there is a path of length two that spans it: an included connection edge from (u, x, 1) to
Econ
( f (u), (x, y)), followed by a group edge from ( f (u), (x, y)) to (a, (x, y)). A similar path exists (in the
opposite direction) for connection edges in Econ     R .

     Similarly, consider an inner edge ((a, e), (b, e)) which is not in the spanner. Let u ∈ U and v ∈ V so
that a ∈ Γ(u) and b ∈ Γ(v). Then {u, v} ∈ E 0 , so our spanner contains an inner edge (( f (u), e), ( f (v), e)).
So there is a path of length three in our spanner from (a, e) to (b, e), namely (a, e) − ( f (u), e) − ( f (v), e) −
(b, e), where the first and last edges are group edges and the middle edge is an inner edge.
     Now consider an outer edge ((u, x, kL ), (v, y, kR )). We can span it by using a canonical path, where
the first connection edge will be from (u, x, 1) to ( f (u), (x, y)), the inner edge will be from ( f (u), (x, y))
to ( f (v), (x, y)), and the second connection edge will be from ( f (v), (x, y)) to (v, y, 1) (this fixes the path
edges used as well). Note that all of these edges exist in the spanner, since the connection edges are
included by construction and the inner edge must exist because this is a YES instance, i. e., because
{ f (u), f (v)} ∈ Ee for all {u, v} ∈ E 0 . Thus this is indeed a path in the spanner, and it clearly has length
k.

                        T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                   19
                                   E DEN C HLAMT Á Č AND M ICHAEL D INITZ

    On the other hand, since in a NO-instance there are no small REP-covers, any spanner must include
either many canonical paths or many outer edges. This lets us prove that in this case every k-spanner has
some node with large degree.

Lemma 3.4. If G
              e is a NO instance of M IN -R EP, then every k-spanner of G has maximum degree at least
(s/3)dH .

Proof. We will prove the contrapositive, that if there is a k-spanner of G with maximum degree less than
(s/3)dH then there is a REP-cover of G       e of size less than s(|U| + |V |) (and thus we did not start with a NO
instance). Let Ĝ be such a spanner. We create a bucket B(x,y) for each edge (x, y) ∈ EH , which will contain
a collection of outer edges and connection edges that are in Ĝ. For each outer edge ((u, x, kL ), (v, y, kR ))
that is in E(Ĝ), we add it to the bucket B(x,y) . Similarly, for each connection edge ((u, x, 1), (a, (x, y))) that
        L ∩ E(Ĝ) we add it to B                                                                         R
is in Econ                          (x,y) , as well as each connection edge ((b, (x, y)), (v, y, 1)) ∈ Econ ∩ E(Ĝ).
Since Ĝ has maximum degree less than (s/3)dH , the total number of edges in buckets (i. e., the total
number of outer and connection edges in Ĝ) is less than |U||X|(s/3)dH (the number of outer edges) plus
|U||X|(s/3)dH + |V ||Y |(s/3)dH (the number of connection edges), for a total of |U||X|sdH edges (since
both G0 and H are balanced and regular).
     Since H is regular we know that |X|dH = |EH |. Thus there must exist some bucket with less than
s|U| = s|V | edges. Let B(x,y) be this bucket. We will create a REP-cover as follows. For each edge
                            L ∩B
((u, x, 1), (a, (x, y))) ∈ Econ                                                                           R
                                  (x,y) we will include a and for each edge ((b, (x, y)), (v, y, 1)) ∈ Econ ∩ B(x,y)
we will include b. For each outer edge ((u, x, kL ), (v, y, kR )) we will include an arbitrary vertex in Γ(u)
and an arbitrary vertex in Γ(v) that are adjacent in G      e (such vertices must exist in order for the M IN -R EP
instance to be satisfiable at all). Clearly this set has size less than 2|B(x,y) | ≤ 2s|U| = s(|U| + |V |).
     It only remains to show that this is a valid cover. To see this, consider an arbitrary superedge, say
{u, v}, and the associated outer edge from (u, x, kL ) to (v, y, kR ) (where here x and y are the same as in
our special bucket). It is clear that by construction the only paths of length at most k which can span
an outer edge are either the outer edge itself or the canonical paths. In the former case we explicitly
added an arbitrary pair of nodes that cover {u, v}. In the second case, the existence of a canonical path in
the spanner means that the connection edges it uses are in the bucket. This in turn means that the inner
nodes they are incident on were added to the REP-cover, and since the canonical path uses the inner edge
between them they must in fact cover the {u, v} superedge. Thus we have a valid REP-cover of size at
most s(|U| + |V |).


    We can now use Lemma 3.3 and Lemma 3.4 to prove the desired hardness for Directed LDkS.

Theorem 3.5. Unless NP ⊆ DTIME(2polylog(n) ), there is a constant γ > 0 so that no polynomial-time
algorithm can approximate Directed LDkS to a factor better than ∆γ (for any integer k ≥ 3).

Proof. Lemma 3.3 and Lemma 3.4, when combined with Theorem 3.1, imply hardness of Ω(s). We know
from Lemma 3.2 that ∆ = O(dG2 0 + |Σ|2 ). Since we specifically chose to use hard M IN -R EP instances
where dG0 and |Σ| are polynomial in s (see Theorem 3.1), this proves the theorem.

                        T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                   20
                      L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

3.2     Undirected LDkS
We now want to handle the undirected case. This is complicated primarily because switching edges to
being undirected creates new paths that the spanner might use. In the directed setting, if an outer edge was
not in the spanner then the only way for it to be spanned was to use a canonical path, which essentially
determined the “suggested” REP-cover for the M IN -R EP instance. Once we move to the undirected
setting there are extra options. Most of them can easily be ruled out by the fact that a spanning path must
have length at most k. For example, this implies that an outer edge cannot be spanned by a path which
uses more then one inner edge (in the directed setting this was clear from the directions). But there is one
particularly problematic possibility: an outer edge could be spanned by a path consisting entirely of outer
edges. This was not possible with directed edges because all outer edges were directed into Vout   R . These

new paths are problematic, since if an outer edge is spanned in this way there is no suggested REP-cover.
Thus we will try to make sure that no such paths actually exist.
     We will need to start with hard M IN -R EP instances with some extra properties, namely, we want large
supergirth and dG0 ≥ |Σ|. This can be achieved using a simple modification of [14], giving the following
lemma. As it is essentially straightforward from [14], we give only the outline of the proof. First, we will
need the following definitions. The M AX -R EP problem is the maximization version of M IN -R EP: instead
of finding the smallest possible REP-cover, we allow ourselves to pick at most one node from each group
in the M IN -R EP graph and attempt to maximize the number of covered superedges. The M IN -R EPk
problem is M IN -R EP restricted to instances where the supergirth is larger than k. The following theorem
appears as Theorem 5 in [14] (the extra condition on superdegrees is Lemma 3 from [14]).

Theorem 3.6 ([14]). Suppose that there is no (randomized) polynomial-time algorithm that, given an
instance of M AX -R EP in which |U| = |V | = n/2 and all supervertices have superdegree d, can distinguish
between the following cases.

   1. All superedges are satisfiable, and

   2. At most an s ≤ 1/(16 log n) fraction of the superedges are satisfiable.

Then there is some constant c so that for all α with 16 log n ≤ α ≤ min{1/s, d 1/k / log |Σ|}, there is no
(randomized) polynomial-time algorithm that, given an instances of M IN -R EPk where all supernodes
have superdegree within a factor of two of α log |Σ|, can distinguish between the following cases.

   1. The smallest REP-cover has size at most n, and
                                                √
   2. The smallest REP-cover has size at least n α/c,

where n is the size of the supergraph.

      The following lemma gives the M IN -R EP instances which we will use through the rest of the section.

Lemma 3.7. Unless NP ⊆ BPTIME(npolylog(n) ), there is no polynomial-time algorithm that can dis-
tinguish between instances of M IN -R EP in which there is a REP-cover of size |U| + |V | (i. e., a YES
instance) and instances in which every REP-cover has size at least s(|U| + |V |), even when all instances
are guaranteed to have the following properties:

                        T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                            21
                                      E DEN C HLAMT Á Č AND M ICHAEL D INITZ

   1. The girth of the supergraph is larger than k + 1,

   2. There is some value dG0 so that all degrees in the supergraph are within a factor of 2 of dG0 ,

   3. s, dG0 , and |Σ| are all polynomials of each other, and

   4. dG0 ≥ |Σ|.

Proof. We will start out with the M AX -R EP instances derived from applying parallel repetition [27] to
3SAT-5, but where the graph is balanced and regular by including 3 copies of the variable nodes and 5
copies of the clause nodes (see Section 3 of [14] for a slightly more formal description). These instances
have degree d = 15` and |Σ| = 7` and soundness s = 2`/c for some value ` (the number of times parallel
repetition is applied) and constant c (the loss in parallel repetition). (Note that in the context of M AX -R EP
the soundness is less than 1, since it represents the fraction of superedges that can be covered in a NO
instance.) Now if we try to use [14] to get the supergirth larger than k + 1 we get into trouble, since the
degrees (and the soundness s) are reduced to 15`/k , while the alphabet size |Σ| = 7` is much larger.
     We get around this by boosting the degree, in much the same way as in the directed case. We
create a new M AX -R EP instance by taking the cross product of the original instance and the complete
bipartite graph with d k nodes on each side. In other words, we create a new instance in which there
are d k copies of each side of the original M AX -R EP instance, and each left copy and right copy (with
the edges between them) form a copy of the original instance. Clearly every node now has degree
d 0 = 15` · 15`k = 15`(k+1) = d k+1 , while |Σ| is unchanged. It is not hard to see that the soundness s is also
unchanged, so is still 2`/c . (This argument appears in detail in [14].)
     We can now apply Theorem 3.6 with α = (d 0 )1/(k+1) / log |Σ| to these instances, which gives the
claimed bounds.

     We will also use a balanced regular bipartite graph H as before, but instead of being the (directed)
complete bipartite graph, H will be a balanced regular bipartite graph of girth at least k + 2 and degree
dH (note that such graphs exist as long as the number of nodes nH = |X| + |Y | = 2|X| is sufficiently large,
                            1+(1/(3k2 ))
e. g., as long as nH dH ≤ nH             [5]). We will set dH = dG0 , so the number of outer edges incident on
each maximal outer node of G is d = dH dG0 = dG2 0 .
     We start with the same graph G as in the directed setting (although with undirected edges, and
using M IN -R EP instances from Lemma 3.7). We will then subsample in essentially the same way
as [14]: for every outer edge {(u, x, kL ), (v, y, kR )} we will flip an independent coin, keeping the edge
with probability p = α/d and removing it with probability 1 − p (we will set α = d (k+2)/(2(k+1)) /4, so
p = 1/(4d k/(2(k+1)) )).6 If we remove it we will also remove the associated inner edges, i. e., we will
remove all inner edges of the form {(a, {x, y}), (b, {x, y})} where a ∈ Γ(u) and b ∈ Γ(v). This gives us a
new graph Gα .
     Call an outer edge of Gα bad if it is part of a cycle in Gα consisting only of outer edges of length
at most k + 1. We will see that there are not too many bad edges, so we then create our final instance
of LDkS by removing all bad edges (and associated inner edges) from Gα , giving us a new graph G           cα .
Intuitively Gcα is essentially the same as Gα , since there are relatively few bad edges in Gα .
   6 While the combination of defining both p and α may seem redundant, we include both in order to be consistent with [14],

as well as for the convenience of having α be closely related to the average degree.


                           T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                        22
                         L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

3.2.1    Analysis
We can still build a spanner using canonical paths corresponding to a REP-cover of each subsampled
instance, so if we start with a YES instance we can still build a spanner of G
                                                                             cα with small maximum
degree. This is essentially the same as Lemma 3.3.

Lemma 3.8. If G
              e is a YES instance of M IN -R EP from Lemma 3.7, then there is a k-spanner of G
                                                                                             cα with
maximum degree at most 3dH + 1.

Proof. We use the same spanner construction as in Lemma 3.3, which proved that the maximum degree
is at most max{dG0 + 2|Σ| + 1, dH + 1}. The only difference is that now we need to replace the use of dG0
in Lemma 3.3 with the maximum superdegree of any of the |EH | instances of M IN -R EP that we are left
with after our sampling to get Gbα . However, since this sampling only decreases degrees, the maximum
degree is still at most max{dG0 + 2|Σ| + 1, dH + 1} ≤ max{3dG0 + 1, dH + 1} = 3dH + 1.

     For each outer edge ((u, x, kL ), (v, y, kR )) in G, call a path from (u, x, kL ) to (v, y, kR ) bad if it contains
only outer edges and has length at most k (and larger than 1). So an outer edge is bad if and only if there
is a bad path between its endpoints. We begin by analyzing the number of bad paths for any fixed outer
edge in the original construction G (before subsampling). The trivial bound would be d k−1 , but because
G0 and H both have large girth we can do better. This is the reason we needed to start out with already
sampled instances of M IN -R EP (i. e., why we had to start with instances based on Lemma 3.7 rather than
generic hard M IN -R EP instances, like those from Theorem 3.1).
     Throughout the rest of this section we will analyze paths of length exactly k, in order to simplify
notation. The analysis of paths of length less than k is identical, just with k replaced by some k0 < k.

Lemma 3.9. For any outer edge, the number of bad paths is at most O(4k d (k−1)/2 ).

Proof. Let {(u, x), (v, y)} be an outer edge. For now we will only analyze paths of length exactly k. Then
any bad path has the form

        (u, x) = (u0 , x0 ) − (v0 , y0 ) − (u1 , x1 ) − · · · − (u(k−1)/2 , x(k−1)/2 ) − (v(k−1)/2 , y(k−1)/2 ) = (v, y) .

Consider the projection of this path on the first coordinate, i. e., the path

                                     u0 − v0 − u1 − v1 − · · · − u(k−1)/2 − v(k−1)/2 .

Clearly this is a (not necessarily simple) path in G0 . Call this path p. In fact it cannot be a simple path,
since together with the edge {u, v} it would form a cycle of length k + 1 in G0 , which cannot happen
since we know that G0 has girth at least k + 2. Moreover, p must include {u, v} at some point, or else by
shortcutting we would be able to get a cycle of length at most k + 1. But we have even more: every edge
in this path other than {u, v} must appear an even number of times (in an undirected sense, so it might be
traversed once in one direction and a second time in the other direction). To see this, suppose that at some
point this path went from ui to vi but then never went across this edge in the opposite direction. Then by
shortcutting to get simple paths, we know that there is a path from u0 to ui and a path from vi to v(k−1)/2
with combined length less than k. Together with the existence of the {u, v} edge, this clearly implies the

                           T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                             23
                                  E DEN C HLAMT Á Č AND M ICHAEL D INITZ

existence of a cycle of length at most k + 1. This contradicts our starting assumption that G0 has girth
larger than k + 1.
    Since every edge other than {u, v} must be used in both directions, we know that there are at most
(k − 1)/2 hops of p where an edge is used for an odd-numbered time (e. g., for the first time, or third
time, etc). In order to bound the number of possible paths p, consider a hop γ ∈ [k] of p in which an
edge is traversed an odd-numbered time, and suppose that p is currently at some node x (e. g., if γ = 5
then after four hops the path is at x and the fifth hop involves traversing an edge an odd-numbered time).
There are clearly at most dG0 choices for which edge appears at hop α, since that is the degree of x. On
the other hand, suppose that at hop α we traverse an edge for an even-numbered time. Then it is easy to
see by induction that we only have one possible choice, i. e., there is only one edge incident on x which
has already been crossed an odd number of times. (Clearly this is true initially, and if we ever traverse
an edge to a node y so that y has at least two incident edges that have been traversed an even number of
times then we would have found a cycle of length at most k.)
    Thus each path p is uniquely identified by two pieces of information: which hops traverse an edge for
an odd-numbered time, and which of the dG0 possible edges are traversed in those hops. There are at most
                                                         
                                                     k
                                                            ≤ 2k
                                                (k − 1)/2
                                                                    (k−1)/2
possibilities for the first piece of information, and at most dG0             possibilities for the second. Thus there
                           (k−1)/2
are a total of at most 2k dG0      possible paths p.
    Since H also has girth larger than k + 1, the same analysis holds for our bad path projected on
the second coordinate but with dG0 replaced by dH . Thus the total number of bad paths is at most
    (k−1)/2       (k−1)/2
2k dG0      · 2k dH       = 4k d (k−1)/2 .

   Lemma 3.9 now allows us to upper bound the number of bad edges in Gα , since we set α to be low
enough that we expect all of the bad paths in G to be missing at least one edge in Gα .
Lemma 3.10. Let e = {(u, x, kL ), (v, y, kR )} be an outer edge in G. The probability that it is bad in Gα
and exists in Gα is at most 4k α k+1 /d (k+3)/2 .
Proof. Fix some bad path in G between the endpoints of e. The probability that it survived the sampling
is at most pk = (α/d)k . We can now use Lemma 3.9 and a union bound to get that the total probability
that e is bad is at most 4k d (k−1)/2 · (α/d)k = (4α)k /(d (k+1)/2 ). The probability that e itself survived the
sampling is α/d, and this is independent of whether or not any of the bad paths survive. Thus the total
probability that e is bad and is in Gα is at most 4k α k+1 /d (k+3)/2 as claimed.

Lemma 3.11. With probability at least 3/4 the number of outer edges in Gα that are bad is at most
|U| · |X| · dH .
Proof. There are at most |U| · |X| · d outer edges in G, and from Lemma 3.10 we know that each is bad
with probability at most 4k α k+1 /d (k+3)/2 (not necessarily independently). Thus the expected number of
bad edges in Gα is at most
                                                   4k α k+1    |U| · |X| · 4k α k+1
                                 |U| · |X| · d ·             =                      .
                                                   d (k+3)/2        d (k+1)/2

                       T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                                       24
                     L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

Now applying Markov’s inequality implies that with probability at least 3/4 the number of bad edges in
Gα is at most
                                       |U| · |X| · 4k+1 α k+1
                                                              .
                                              d (k+1)/2
   When we plug in our choice of α = d (k+2)/(2(k+1)) /4, this gives us at most |U|·|X|·d 1/2 = |U|·|X|·dH
bad edges in Gα , as claimed.

     Recall that our construction started with |EH | = |X|dH copies of the original M IN -R EP instance, and
each outer edge is associated with a single such instance. So in Gα the average instance has at most
|U| bad edges, and thus by Markov at least |EH |/2 of the instances have at most 2|U| bad edges. Let
A ⊆ EH be the edges in H which correspond to these instances. It is well-known that removing only
O(|U|) superedges of a M IN -R EP instance affects the size of the optimal REP-cover in a NO instance
by at most a constant factor (see, e. g., the proof of [14, Theorem 5]—intuitively, each superedge can
only decrease the size of the optimal REP-cover by at most 2). Therefore, G  cα has the same soundness as
Gα , up to a constant factor. Now that there are no bad edges, the only way to span an outer edge in G    bα
is either the edge itself or a canonical path. This means that we can now analyze this case just like the
directed case (with the main difference being that we can only use |A| ≥ |EH |/2 of the |EH | M IN -R EP
instances to prove our bound, though this is sufficient). This implies that in a NO instance all spanners
must have large maximum degree, through an analysis similar to Lemma 3.4.

Lemma 3.12. If G e is a NO instance of M IN -R EP, then every k-spanner of G has maximum degree at
            1/(2(k+1))
      e H ·d 0
least Ω(d   G          ).

Proof. Since G   cα has no bad edges, every outer edge must be spanned by either itself or by a canonical
path. This is the same as in the directed case, and the same analysis from Lemma 3.4 continues to hold.
Let dH0 denote the average degree of a node in H restricted to A. This is clearly at least dH /2. Now using
the analysis of Lemma 3.4, we know that if all of the M IN -R EP instances of A require REP-covers of size
at least s0 (|U| + |V |) then the maximum degree must be at least Ω(s0 dH0 ) = Ω(s0 dH ).
     So it remains simply to bound s0 . But this is straightforward, thanks to [14], which proved that if
we start with a M IN -R EP instance with soundness s, and then independently subsample to make the
average degree α 0 ≥ Ω(log n) (with α 0 ≤ 1/s) and then remove √ a linear number of superedges, we are
                                                         0
left with a M IN -R EP instance where the soundness is s = Ω( α 0 ) (see [14, Lemma 5] and the proof of
                                                             e
Theorem 3.6 in particular). In our case,

                        α       d (k+2)/(2(k+1))          1              dG0         1/(k+1)
         α 0 = pdG0 =     dG0 =                  dG0 = k/(2(k+1)) dG0 = k/(k+1) = Ω(dG0      ).
                        d              4d             4d               4d 0G
                                   
                     e d 1/(2(k+1))
Hence s0 is at least Ω    0          . This proves the lemma.
                         G

    The main hardness theorem is now implied by the chosen parameters.

Theorem 3.13. Unless NP ⊆ BPTIME(npolylog(n) ), there is no algorithm that can approximate L OWEST-
D EGREE k-S PANNER on undirected graphs to a factor better than ∆Ω(1/k) (for any integer k ≥ 3).

                        T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                            25
                                 E DEN C HLAMT Á Č AND M ICHAEL D INITZ

                                                                              1/(2(k+1))
Proof. We know from Lemma 3.8 and Lemma 3.12 that we have a gap of dG0                   . So we simply need
                1/2(k+1)                Ω(1/k)
to argue that dG0          is at least ∆       . The maximum degree in Gα is at most the maximum degree in
                                                                       c
Gα which is at most the maximum degree in G. By Lemma 3.2 and the fact that dG0 ≥ |Σ| by Lemma 3.7,
this is at most O(dG2 0 ). This proves the theorem.


4    Open problems
The obvious open problem is to close the gap between the upper bound of O(∆     e (1−1/k)2 ) and the lower
bound of ∆Ω(1/k) for LDkS. It is unclear what the true behavior of the approximability of LDkS is as a
function of k, and even whether the problem gets easier or harder as k gets larger (or if the approximability
stays the same). Our current upper bound gets larger with k, while our current lower bound gets smaller.
What is the true behavior?
    Another interesting set of questions involves relaxing the notion of approximation to allow bicriteria
approximations, where we return a spanner with stretch slightly larger than k but compare ourselves to the
lowest-degree k-spanner. Our current hardness results break down when we allow a constant violation of
the stretch, so (for example) we cannot rule out the possibility of an algorithm which returns a 9-spanner
with maximum degree at most a constant times the maximum degree of the best 3-spanner. Obtaining
good bicriteria approximations, or proving that they cannot exist, is an extremely interesting area for
future research (for both LDkS and other spanner approximation problems).


References
 [1] I NGO A LTHÖFER , G AUTAM DAS , DAVID D OBKIN , D EBORAH J OSEPH , AND J OSÉ S OARES:
     On sparse spanners of weighted graphs. Discrete Comput. Geom., 9(1):81–100, 1993.
     [doi:10.1007/BF02189308] 2, 4

 [2] S ANJEEV A RORA , C ARSTEN L UND , R AJEEV M OTWANI , M ADHU S UDAN , AND M ARIO
     S ZEGEDY: Proof verification and the hardness of approximation problems. J. ACM, 45(3):501–555,
     1998. Preliminary version in FOCS’92. [doi:10.1145/278298.278306] 16

 [3] S ANJEEV A RORA AND S HMUEL S AFRA: Probabilistic checking of proofs: A new characterization
     of NP. J. ACM, 45(1):70–122, 1998. Preliminary version in FOCS’92. [doi:10.1145/273865.273901]
     16

 [4] S URENDER BASWANA , T ELIKEPALLI K AVITHA , K URT M EHLHORN , AND S ETH P ETTIE: Ad-
     ditive spanners and (α, β )-spanners. ACM Trans. Algorithms, 7(1):5:1–5:26, 2010. Preliminary
     version in SODA’05. [doi:10.1145/1868237.1868242] 4

 [5] M OHSEN BAYATI , A NDREA M ONTANARI , AND A MIN S ABERI: Generating random graphs with
     large girth. In Proc. 20th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’09), pp. 566–575.
     ACM Press, 2009. ACM DL. [arXiv:0811.2853] 22

                       T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                              26
                   L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

 [6] P IOTR B ERMAN , A RNAB B HATTACHARYYA , KONSTANTIN M AKARYCHEV, S OFYA R ASKHOD -
     NIKOVA , AND G RIGORY YAROSLAVTSEV: Improved approximation for the directed spanner prob-
     lem. In Proc. 38th Internat. Colloq. on Automata, Languages and Programming (ICALP’11), volume
     6755 of LNCS, pp. 1–12. Springer, 2011. [doi:10.1007/978-3-642-22006-7_1, arXiv:1012.4062] 2,
     3, 5, 6

 [7] A RNAB B HATTACHARYYA , E LENA G RIGORESCU , K YOMIN J UNG , S OFYA R ASKHODNIKOVA ,
     AND DAVID P. W OODRUFF: Transitive-closure spanners. SIAM J. Comput., 41(6):1380–1425,
     2012. Preliminary version in SODA’09. [doi:10.1137/110826655, arXiv:0808.1787] 5

 [8] T.-H. H UBERT C HAN , M ICHAEL D INITZ , AND A NUPAM G UPTA: Spanners with slack.
     In Proc. 14th Ann. European Symp. on Algorithms (ESA’06), pp. 196–207. Springer, 2006.
     [doi:10.1007/11841036_20] 4

 [9] BARUN C HANDRA , G AUTAM DAS , G IRI NARASIMHAN , AND J OSÉ S OARES: New sparseness
     results on graph spanners. Internat. J. Comput. Geom. Appl., 5(1):125–144, 1995. Preliminary
     version in SoCG’92. [doi:10.1142/S0218195995000088] 2

[10] S HIRI C HECHIK: New additive spanners. In Proc. 24th Ann. ACM-SIAM Symp. on Discrete
     Algorithms (SODA’13), pp. 498–512. ACM Press, 2013. [doi:10.1137/1.9781611973105.36] 4

[11] S HIRI C HECHIK , M ICHAEL L ANGBERG , DAVID P ELEG , AND L IAM RODITTY: Fault tolerant
     spanners for general graphs. SIAM J. Comput., 39(7):3403–3423, 2010. Preliminary version in
     STOC’09. [doi:10.1137/090758039] 4

[12] E DEN C HLAMTÁ Č AND M ICHAEL D INITZ: Lowest degree k-spanner: Approximation and
     hardness. In Proc. 17th Internat. Workshop on Approximation Algorithms for Combinato-
     rial Optimization Problems (APPROX’14), volume 28 of LIPIcs, pp. 80–95. Springer, 2014.
     [doi:10.4230/LIPIcs.APPROX-RANDOM.2014.80] 1

[13] E DEN C HLAMTÁČ , M ICHAEL D INITZ , AND ROBERT K RAUTHGAMER: Everywhere-sparse
     spanners via dense subgraphs. In Proc. 53rd FOCS, pp. 758–767. IEEE Comp. Soc. Press, 2012.
     [doi:10.1109/FOCS.2012.61, arXiv:1205.0144] 2, 3, 5

[14] M ICHAEL D INITZ , G UY KORTSARZ , AND R AN R AZ: Label cover instances with large girth and
     the hardness of approximating basic k-spanner. ACM Trans. Algorithms, 12(2):25:1–25:16, 2016.
     Preliminary version in ICALP’12. [doi:10.1145/2818375, arXiv:1203.0224] 4, 5, 21, 22, 25

[15] M ICHAEL D INITZ AND ROBERT K RAUTHGAMER: Directed spanners via flow-based linear pro-
     grams. In Proc. 43rd STOC, pp. 323–332. ACM Press, 2011. [doi:10.1145/1993636.1993680,
     arXiv:1011.3701] 3, 5

[16] M ICHAEL D INITZ AND ROBERT K RAUTHGAMER: Fault-tolerant spanners: Better and simpler. In
     Proc. 30th Symp. on Principles of Distributed Comput. (PODC’11), pp. 169–178. ACM Press, 2011.
     [doi:10.1145/1993806.1993830, arXiv:1101.5753] 4, 5

                    T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                       27
                               E DEN C HLAMT Á Č AND M ICHAEL D INITZ

[17] Y EVGENIY D ODIS AND S ANJEEV K HANNA: Designing networks with bounded pairwise distance.
     In Proc. 31st STOC, pp. 750–759. ACM Press, 1999. [doi:10.1145/301250.301447] 5
[18] M ICHAEL E LKIN AND DAVID P ELEG: Strong inapproximability of the basic k-spanner problem. In
     Proc. 27th Internat. Colloq. on Automata, Languages and Programming (ICALP’00), pp. 636–647.
     Springer, 2000. [doi:10.1007/3-540-45022-X_54] 4
[19] M ICHAEL E LKIN AND DAVID P ELEG: Approximating k-spanner problems for k > 2. Theoret. Com-
     put. Sci., 337(1-3):249–277, 2005. Preliminary version in IPCO’01. [doi:10.1016/j.tcs.2004.11.022]
     5
[20] M ICHAEL E LKIN AND DAVID P ELEG: The hardness of approximating spanner problems. Theory
     Comput. Sys., 41(4):691–729, 2007. Preliminary version in STACS’00. [doi:10.1007/s00224-006-
     1266-2] 4, 5, 15, 16
[21] M ICHAEL E LKIN AND S HAY S OLOMON: Fast constructions of lightweight spanners for general
     graphs. ACM Trans. Algorithms, 12(3):29:1–29:21, 2016. Preliminary version in SODA’13.
     [doi:10.1145/2836167, arXiv:1207.1668] 4
[22] G UY KORTSARZ: On the hardness of approximating spanners. Algorithmica, 30(3):432–450, 2001.
     Preliminary version in APPROX’98. [doi:10.1007/s00453-001-0021-y] 4, 5, 15, 16
[23] G UY KORTSARZ AND DAVID P ELEG: Generating sparse 2-spanners. J. Algorithms, 17(2):222–236,
     1994. Preliminary version in SWAT’92. [doi:10.1006/jagm.1994.1032] 4
[24] G UY KORTSARZ AND DAVID P ELEG: Generating low-degree 2-spanners. SIAM J. Comput.,
     27(5):1438–1456, 1998. Preliminary version in SODA’94. [doi:10.1137/S0097539794268753] 2, 3,
     4, 5, 6
[25] DAVID P ELEG AND A LEJANDRO A. S CHÄFFER: Graph spanners. J. Graph Theory, 13(1):99–116,
     1989. [doi:10.1002/jgt.3190130114] 2, 4
[26] DAVID P ELEG AND J EFFREY D. U LLMAN: An optimal synchronizer for the hypercube. SIAM J.
     Comput., 18(4):740–747, 1989. Preliminary version in PODC’87. [doi:10.1137/0218050] 2, 4
[27] R AN R AZ: A parallel repetition theorem. SIAM J. Comput., 27(3):763–803, 1998. Preliminary
     version in STOC’95. [doi:10.1137/S0097539795280895] 16, 22
[28] M OHIT S INGH AND L AP C HI L AU: Approximating minimum bounded degree spanning trees
     to within one of optimal. J. ACM, 62(1):1:1–1:19, 2015. Preliminary version in STOC’07.
     [doi:10.1145/2629366] 2
[29] M IKKEL T HORUP AND U RI Z WICK: Compact routing schemes. In Proc. 13th Ann. ACM
     Symp. on Parallel Algorithms and Architectures (SPAA’01), pp. 1–10. ACM Press, 2001.
     [doi:10.1145/378580.378581] 2
[30] M IKKEL T HORUP AND U RI Z WICK: Approximate distance oracles. J. ACM, 52(1):1–24, 2005.
     Preliminary version in STOC’01. [doi:10.1145/1044731.1044732] 4


                     T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                          28
                  L OWEST-D EGREE k-S PANNER : A PPROXIMATION AND H ARDNESS

AUTHORS

    Eden Chlamtáč
    Assistant Professor
    Ben Gurion University, Beersheva, Israel
    chlamtac cs bgu ac il
    https://www.cs.bgu.ac.il/~chlamtac/


    Michael Dinitz
    Assistant Professor
    Johns Hopkins University, Baltimore, MD
    mdinitz cs jhu edu
    http://www.cs.jhu.edu/~mdinitz/


ABOUT THE AUTHORS

    E DEN C HLAMTÁČ graduated from Princeton University in Princeton, NJ, in 2008; his
       advisor was Sanjeev Arora. He then spent two years as a postdoc at the Weizmann
       Institute of Science, and one and three semesters as a postdoc at the University of Toronto
       and Tel-Aviv University, respectively, before joining the faculty at Ben Gurion University.
       If memory serves, when he used to have spare time he would enjoy playing piano and
       studying languages which rarely did him much good.


    M ICHAEL D INITZ graduated from the Computer Science Department at Carnegie Mellon
       University in 2010, where he was advised by Anupam Gupta. He then spent three years
       as a postdoc at the Weizmann Institute of Science, hosted by Robert Krauthgamer and
       David Peleg, before joining the faculty at Johns Hopkins University. He is interested in
       approximation algorithms and applications to computer networking, and particularly in
       network design problems and problems involving metric spaces.




                    T HEORY OF C OMPUTING, Volume 12 (15), 2016, pp. 1–29                            29