A New Regularity Lemma and Faster Approximation Algorithms for Low Threshold Rank Graphs

Authors Shayan Oveis Gharan, Luca Trevisan,
Plaintext
                         T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256
                                       www.theoryofcomputing.org

                          S PECIAL ISSUE : APPROX-RANDOM 2013



              A New Regularity Lemma and
           Faster Approximation Algorithms for
               Low Threshold-Rank Graphs
                            Shayan Oveis Gharan∗                       Luca Trevisan†
                  Received February 4, 2014; Revised March 27, 2015; Published June 10, 2015




       Abstract: Kolla and Tulsiani (2007, 2011) and Arora, Barak and Steurer (2010) introduced
       the technique of subspace enumeration, which gives approximation algorithms for graph
       problems such as unique games and small set expansion; the running time of such algorithms
       is exponential in the threshold rank of the graph.
           Guruswami and Sinop (2011, 2012) and Barak, Raghavendra, and Steurer (2011) de-
       veloped an alternative approach to the design of approximation algorithms for graphs of
       bounded threshold rank based on semidefinite programming relaxations obtained by using
       sum-of-squares hierarchy (2000, 2001) and on novel rounding techniques. These algo-
       rithms are faster than the ones based on subspace enumeration and work on a broad class of
       problems.
     An extended abstract of this paper appeared in the proceedings of the 16th International Workshop on Approximation
Algorithms for Combinatorial Optimization Problems (APPROX 2013) [15].
   ∗ Department of Computer Science and Engineering, University of Washington.
   † Department of Electrical Engineering and Computer Sciences U.C. Berkeley. This material is based upon work supported

by the National Science Foundation under grant No. CCF 1017403.


ACM Classification: F.2.2, G.2.2, G.1.6
AMS Classification: 68W25, 68Q25, 68R10
Key words and phrases: regularity lemma, approximation algorithms, expander graphs, low threshold-
rank graphs, maximum cut


© 2015 Shayan Oveis Gharan and Luca Trevisan
c b Licensed under a Creative Commons Attribution License (CC-BY)                           DOI: 10.4086/toc.2015.v011a009
                             S HAYAN OVEIS G HARAN AND L UCA T REVISAN

          In this paper we develop a third approach to the design of such algorithms. We show,
      constructively, that graphs of bounded threshold rank satisfy a weak Szemerédi regularity
      lemma analogous to the one proved by Frieze and Kannan (1999) for dense graphs. The
      existence of efficient approximation algorithms is then a consequence of the regularity
      lemma, as shown by Frieze and Kannan.
          Applying our method to the Max Cut problem, we devise an algorithm that is slightly
      faster than all previous algorithms, and is easier to describe and analyze.



1    Introduction
Kolla and Tulsiani [13, 12] and Arora, Barak and Steurer [2] proved that the Unique Games problem
can be approximated efficiently if the adjacency matrix of a graph associated with the problem has few
large eigenvalues; they show that, for every optimal solution, its indicator vector is close to the subspace
spanned by the eigenvectors of the large eigenvalues, and one can find a solution close to an optimal one
by enumerating an ε-net for such a subspace. This algorithm, also known as the subspace enumeration
algorithm, runs in time exponential in the dimension of the subspace, which is the number of large
eigenvalues; the number of large eigenvalues is called the threshold rank of the graph. Arora, Barak
and Steurer show that the subspace enumeration algorithm can approximate other graph problems, in
regular graphs, in time exponential in the threshold rank, including the Uniform Sparsest Cut problem,
the Small-Set Expansion problem and the Max Cut problem. We remark that the subspace enumeration
algorithm does not improve the 0.878 approximation guarantee of Goemans and Williamson [8], but it
finds a solution of approximation factor 1 − O(ε) if the optimum cuts at least 1 − ε fraction of edges.
    Barak, Raghavendra and Steurer [3] and Guruswami and Sinop [9, 10, 11] developed an alternative
approach to the design of approximation algorithms running in time exponential in the threshold rank.
Their algorithms are based on solving semidefinite programming relaxations obtained by using the sum-of-
squares hierarchy [16, 14] and then applying sophisticated rounding schemes. This approach has several
advantages. It is applicable to a more general class of graph problems and constraint satisfaction problems,
that the approximation guarantee has a tighter dependency on the threshold used in the definition of
threshold rank and that, in some cases, the algorithms have a running time of f (k, ε) · nO(1) where k is
the threshold rank and 1 ± ε is the approximation guarantee, instead of the running time of nO(k) which
follows from an application of the subspace enumeration algorithm for constant ε.
    In this paper we introduce a third approach to designing algorithms for graphs of bounded threshold
rank, which is based on proving a weak Szemerédi regularity lemma for such graphs.
    The regularity lemma of Szemerédi [17] states that every dense graph can be well approximated by
the union of a constant number of bipartite complete subgraphs; the constant, however, has a tower-of-
exponentials dependency on the quality of approximation. Frieze and Kannan [6, 7] prove what they
call a weak regularity lemma, showing that every dense graph can be approximated up to an error εn2
in the cut norm by a linear combination of O(1/ε 2 ) cut matrices (a cut matrix is a bipartite complete
subgraph) with bounded coefficients. Frieze and Kannan also show that such an approximation can
be constructed “implicitly” in time polynomial in 1/ε and that, for a weighted graph which is a linear
combination of σ cut matrices, several graph problems can be approximated in time exp(Õ(σ )) + poly(n)

                    T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                            242
     A N EW R EGULARITY L EMMA AND FASTER A LGORITHMS FOR L OW T HRESHOLD -R ANK G RAPHS

time.1 Combining the two facts one has a exp(poly(1/ε)) + poly(n) time approximation algorithm for
many graph problems on dense graphs.
     We prove that a weak regularity lemma holds for all graphs of bounded threshold rank. Our result
is a proper generalization of the weak regularity lemma of Frieze and Kannan, because dense graphs
are known to have bounded threshold rank,2 but there are many graphs with bounded threshold rank
that are not dense. For a (weighted) G = (V, E) with adjacency matrix A, and diagonal matrix of vertex
degrees D, D−1/2 AD−1/2 is called the normalized adjacency matrix of G. If the sum of squares of the
eigenvalues of the normalized adjacency matrix outside the range [−ε/2, ε/2] is equal to k (in particular,
if there are at most k such eigenvalues), then we show that there is a linear combination of O(k/ε 2 ) cut
matrices that approximate A up to 2ε|E| in cut norm; furthermore, such a decomposition can be found in
poly(n, k, 1/ε) time. (See Theorem 2.3 below.) Our regularity lemma, combined with an improvement of
the Frieze-Kannan approximation algorithm for graphs that are linear combination of cut matrices, gives
                                      1.5 3
us algorithms of running time 2Õ(k /ε ) + poly(n) for several graph problems on graphs of threshold rank
k, providing an additive approximation of 2ε|E|. In problems such as Max Cut in which the optimum is
Ω(|E|), this additive approximation is equivalent to a multiplicative approximation.
     We remark that there are several generalizations of the weak regularity lemma to the matrices that are
not necessarily dense, e. g., [5, 4], but to the best of our knowledge, none of these generalizations include
matrices of low threshold rank. Let us provide a detailed example. Coja-Oghlan, Cooper and Frieze [4]
consider sparse matrices that have a suitable boundedness property. For S, T ⊆ V , let the density of a
sub-matrix AS,T be defined as follows:

                                                                   ∑u∈S,v∈T Au,v
                                             density(AS,T ) :=                   .
                                                                     |S| · |T |

Coja-Oghlan et al. [4] generalized weak regularity lemma to matrices where the density of each sub-matrix
                                                                                                   2
AS,T is within a constant factor, C, of the density of A, for any S, T such that |S|, |T | ≥ Ω(n/2C ). It turns
out that if A represents the adjacency matrix of a graph that is a union of a constant number of constant
degree expanders, then A has bounded threshold rank, but it doesn’t satisfy the boundedness property.

       Reference            Running time                                 Parameter k
                                4
       BaRaSt [3]         2O(k/ε ) · poly(n)          # of eigenvalues not in range [−c · ε 2 , c · ε 2 ], c > 0
                                       2
        GuSi [9]                 nO(k/ε )                         # of eigenvalues ≤ −ε/2
                              k/ε 3
       GuSi [10]            2       · nO(1/ε)                     # of eigenvalues ≤ −ε/2
                             1.5   3
       this paper       2Õ(k /ε ) + poly(n)        sum of squares of eigenvalues not in range [−ε/8, ε/8]

      Table 1: A comparison between previous algorithms applied to Max Cut and our algorithm.

    Table 1 gives a comparison between previous algorithms applied to Max Cut and our algorithm.
Unlike the previous algorithms, our algorithm rounds the solution to a fixed size LP, as opposed to a SDP
hierarchy. The advantages over previous algorithms, besides the simplicity of the algorithm, is a faster
   1 Throughout the paper we use Õ(.) to denote that logarithmic terms are ignored.
   2 The normalization one needs for dense graphs is different from what we use in this paper. If G is a graph with average

degree c · n, and A is the adjacency matrix of G, then A/(c · n) has a low threshold rank.


                         T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                                      243
                              S HAYAN OVEIS G HARAN AND L UCA T REVISAN

running time and the dependency on a potentially smaller threshold-rank parameter, because the running
time of our algorithm depends on the sum of squares of eigenvalues outside of a certain range, rather than
the number of such eigenvalues. Recall that the eigenvalues of D−1/2 AD−1/2 are in the range [−1, 1].
    We now give a precise statement of our results, after introducing some notation.


2     Statement of results
2.1   Notation
Let G = (V, E) be a (weighted) undirected graph with n := |V | vertices. Let A be the adjacency matrix of
G. For any vertex u ∈ V , let
                                            d(u) := ∑ Au,v
                                                          v
be the degree of u. For a set S ⊂ V , let the volume of S be the summation of vertex degrees in S,
d(S) = ∑v∈S d(v), and let
                                       m := d(V ) = ∑ d(v) .
                                                           v∈V
Let D be the diagonal matrix of degrees. For any matrix M ∈ RV ×V , we use
                                           MD := D−1/2 MD−1/2 .
Observe that if G is a d-regular graph, then MD = M/d. We call AD the normalized adjacency matrix of
G. It is straightforward to see that all eigenvalues of AD are contained in the interval [−1, 1].
     For two functions f , g ∈ V → R, let h f , gi := ∑v∈V f (v)g(v). Also, let f ⊗ g be the tensor product of
f , g; i. e., the matrix in RV ×V such that (u, v) entry is f (u) · g(v). For a function f ∈ RV , and S ⊆ V let
f (S) := ∑v∈S f (v).
     For a set S ⊆ V , let 1S be the indicator function of S, and let
                                                    (
                                                     d(v) v ∈ S,
                                         dS (v) :=
                                                     0        otherwise.
For any two sets S, T ⊆ V , and α ∈ R, we use the notation
                                       CUT(S, T, α) := α · (dS ⊗ dT )
to denote the matrix corresponding to the cut (S, T ), where (u, v) entry of the matrix is α · d(u) · d(v) if
u ∈ S, v ∈ T and zero otherwise. We remark that CUT(S, T, α) is not necessarily a symmetric matrix.
Definition 2.1 (Matrix norms). For a matrix M ∈ RV ×V , and S, T ⊆ V , let
                                          M(S, T ) :=     ∑ Mu,v .
                                                        u∈S,v∈T

The Frobenius norm and the cut norm are defined as follows:
                                                r
                                   kMkF :=              2 ,
                                                   ∑ Mu,v
                                                          u,v
                                       kMkC :=          max |M(S, T )| .
                                                      S,T ⊆V



                     T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                               244
      A N EW R EGULARITY L EMMA AND FASTER A LGORITHMS FOR L OW T HRESHOLD -R ANK G RAPHS

Definition 2.2 (Sum-of-squares threshold rank). For any unweighted graph G, with normalized adjacency
matrix AD , let λ1 , . . . , λn be the eigenvalues of AD with the corresponding eigenfunctions f1 , . . . , fn . For
δ > 0, the δ -sum-of-squares threshold rank of A is defined as
                                                      tδ (AD ) :=     ∑ λi2 .
                                                                    i:|λi |>δ

Also, the δ -threshold approximation of AD is defined as
                                                  Tδ (AD ) :=     ∑ λi fi ⊗ fi .
                                                                i:|λi |>δ

    In words, Tδ (AD ) is an approximation of AD obtained by removing the projection on eigenvectors
corresponding to the small eigenvalues. It is well known that Tδ (AD ) is the best approximation of its rank
to AD in `2 and Frobenius norm. In Lemma 3.1, below, we show that D1/2 Tδ (AD )D1/2 is always a good
approximation of A in cut norm.

2.2    Matrix decomposition theorem
The following matrix decomposition theorem is the main technical result of this paper.
Theorem 2.3. For any graph G, and ε > 0, let k := tε/2 (AD ). There is an algorithm that writes A as a
linear combination of cut matrices, W (1) ,W (2) , . . . ,W (σ ) , such that σ ≤ 16k/ε 2 , and

                                                A −W (1) − · · · −W (σ )            ≤ εm ,
                                                                                C
                                                                                     √
where each W (i) is a cut matrix CUT(S, T, α), for some S, T ⊆ V , such that |α| ≤ k/m and m is the
sum of the degrees of all vertices of G. The running time of the algorithm is polynomial in n, k, 1/ε.

2.3    Algorithmic applications
Our main algorithmic application of Theorem 2.3 is the following theorem that approximates any cut on
                                                   1.5 3
low threshold-rank graphs with a running time 2Õ(k /ε ) + poly(n).
Theorem 2.4. Let G = (V, E), and for a given ε > 0, let k := tε/8 (AD ). There is a randomized algorithm
such that for either of maximum cut or minimum cut problems over sets of volume
                                                 Γ − εm/2 ≤ |S| ≤ Γ + εm/2 ,
                1.5 /ε 3 )
in time 2Õ(k                + poly(n, k, 1/ε), with constant probability finds a set S such that |d(S) − Γ| ≤ εm and
                                         A(S, S) ≥          max                 A(S∗ , S∗ ) − εm
                                                     Γ−εm/2≤|S∗ |≤Γ+εm/2

if it is a maximization problem, and
                                         A(S, S) ≤          min                 A(S∗ , S∗ ) + εm
                                                     Γ−εm/2≤|S∗ |≤Γ+εm/2

otherwise.

                               T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                           245
                             S HAYAN OVEIS G HARAN AND L UCA T REVISAN

    In the minimum bisection problem we want to find the smallest cut with equal volume in both sides
of the cut,
                                               min       A(S, S) .
                                            S:d(S)=m/2

Similarly, in the maximum bisection problem we want to find the maximum cut with equal volume in
both sides. Although in the literature a bisection is typically defined as the cut with equal number of
vertices in the both sides, here we study cuts with (approximately) equal volume in both sides. This is
because of a limitation of spectral algorithms (cf. Cheeger’s inequality for finding the minimum bisection).
Nonetheless, the applications are very similar (e. g., we can use above corollary in divide and conquer
algorithms to partition a given graph into small pieces with few edges in between).
    We use the above theorem to provide a PTAS for maximum cut, maximum bisection, and minimum
bisection problems.

Corollary 2.5. Let G = (V, E), and for a given ε > 0, let k := tε/8 (AD ). There is a randomized algorithm
                  1.5 3
that in time 2Õ(k /ε ) + poly(n, k, 1/ε) finds an εm additive approximation of the maximum cut.

Proof. We can simply guess the size of the optimum within an εm/2 additive error and then use
Theorem 2.4.

Corollary 2.6. Let G = (V, E), and for a given ε > 0, let k := tε/8 (AD ). For any of the maximum bisection
                                                                                      1.5 3
and minimum bisection problems, there is a randomized algorithm that in time 2Õ(k /ε ) + poly(n, k, 1/ε)
finds a cut (S, S) such that |d(S) − m/2| ≤ εm and that A(S, S) provides an εm additive approximation of
the optimum.

Proof. For the maximum/minimum bisection the optimum must have size m/2. So we can simply use
Theorem 2.4 with Γ = m/2.


3    Regularity lemma for low threshold-rank graphs
In this section we prove Theorem 2.3. The first step is to approximate A by a low-rank matrix B. In the
next lemma we construct B such that the value of any cut in A is approximated within a small additive
error in B.

Lemma 3.1. Let A be the adjacency matrix of G. For 0 ≤ δ < 1, let

                                         B := D1/2 Tδ (AD )D1/2 .

Then, kA − BkC ≤ δ m.

Proof. Let λ1 , . . . , λn be the eigenvalues of AD , with the corresponding orthonormal eigenfunctions

                    T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                            246
     A N EW R EGULARITY L EMMA AND FASTER A LGORITHMS FOR L OW T HRESHOLD -R ANK G RAPHS

f1 , . . . , fn . For any S, T ⊆ V , we have

                      h1S , (A − B)1T i = hD1/2 1S , (A − B)D D1/2 1T i
                                           p                     p
                                        = h dS , (AD − Tδ (AD )) dT i
                                                      p        p
                                        ≤ δ · ∑ h dS , fi ih dT , fi i
                                                      i:|λi |≤δ
                                                      s               p            s           p
                                                                                2
                                               ≤ δ·         ∑        h dS , fi i ·     ∑ h dT , fi i2
                                                         i:|λi |≤δ                 i:|λi |≤δ
                                                        p             p         p   2
                                               ≤ δ·         dS ·       dT ≤ δ ·  dV = δ m ,

where the second inequality follows by the Cauchy–Schwarz inequality. The lemma follows by noting
the fact that kA − BkC is the maximum of the above expression for any S, T ⊆ V .

    By the above lemma if we approximate B by a linear combination of cut matrices, then it is also a
good approximation of A. Moreover, since tδ (AD ) = tδ (BD ), B has a small sum-of-squares threshold
rank iff A has a small sum-of-squares threshold rank.
Lemma 3.2. For any graph G with adjacency matrix A, δ > 0, and B = D1/2 Tδ (AD )D1/2 ,

                                                   kBD k2F = tδ (AD ) .



Proof. The lemma follows from the fact that the square of the Frobenius norm of any matrix is equal to
the summation of square of eigenvalues. If λ1 , . . . , λn are the eigenvalues of AD , then

                                   kBD k2F = trace(BD 2 ) =            ∑ λi2 = tδ (AD ) .
                                                                      |λi |>δ

Note that in the first equality we are using the fact that BD is a symmetric matrix.

    The next proposition is the main technical part of the proof of Theorem 2.3. We show that we can
write any (not necessarily symmetric) matrix B as a linear combination of O(kBk2F /ε 2 ) cut matrices such
that the cut norm of B is preserved within an additive error of εm. The proof builds on the existential
theorem of Frieze and Kannan [7, Theorem 7].
Proposition 3.3. For any matrix B ∈ RV ×V , k = kBD k2F , and ε > 0, there exist cut matrices

                                                 W (1) ,W (2) , . . . ,W (σ ) ,

such that σ ≤ 1/ε 2 , and for all S, T ⊆ V ,
                                                                 p
                        B −W (1) −W (2) − · · · −W (σ ) (S, T ) ≤ ε k · d(S) · d(T ) ,

where each W (i) is a cut matrix CUT(S, T, α), for some S, T ⊆ V , and α ∈ R.

                       T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                          247
                              S HAYAN OVEIS G HARAN AND L UCA T REVISAN

Proof. Let R(0) = B. We use the potential function h(R) := kRD k2F . We show that as long as there are
S, T ⊆ V such that                                p
                                    |R(S, T )| > ε kd(S)d(T )
we can add new cut matrices iteratively while maintaining the invariant that each time the value of the
potential function decreases by at least ε 2 h(B). Since h(R(0) ) = h(B), after at most 1/ε 2 we obtain a
good approximation of B.
    Assume that after i < 1/ε 2 iterations, R(i) = B −W (1) − · · · −W (i) . Suppose for some S, T ⊆ V ,
                                              p                         p
                        R(i) (S, T ) > ε ·     h(B) · d(S) · d(T ) = ε · k · d(S) · d(T ) .              (3.1)

Choose W (i+1) = CUT(S, T, α), for
                                                        R(i) (S, T )
                                                  α=                 ,
                                                       d(S) · d(T )
                                                                                                       (i+1)
and let R(i+1) = R(i) −W (i+1) . Note that for any pair of vertices u, v ∈ V if u ∈
                                                                                  / S or v ∈
                                                                                           / T , then Ru,v     =
 (i)
Ru,v . So,

                                                                 (i)                   (i) 2
                          (i+1)         (i)           (Ru,v − αd(u)d(v))2 − Ru,v
                      h(R         ) − h(R ) =   ∑
                                              u∈S,v∈T          d(u)d(v)
                                                 = −2αR(i) (S, T ) + α 2 d(S)d(T )
                                                      −R(i) (S, T )2
                                                 =                   ≤ −ε 2 · h(B) .
                                                       d(S)d(T )

The last equality follows by the definition of α, and the last inequality follows from Equation (3.1).
Therefore, after at most σ ≤ 1/ε 2 iterations, (3.1) cannot hold for all S, T ⊆ V .

    Although the previous proposition only proves the existence of a decomposition into cut matrices, we
can construct such a decomposition efficiently using the following nice result of Alon and Naor [1] that
gives a constant factor approximation algorithm for the cut norm of any matrix.

Theorem 3.4 (Alon and Naor [1]). There is a polynomial time randomized algorithm such that for any
given A ∈ RV ×V , with high probability, finds sets S, T ⊆ V , such that

                                               |A(S, T )| ≥ 0.56 kAkC .

Now we are ready to prove Theorem 2.3.

Proof of Theorem 2.3. Let δ := ε/2, and B := D1/2 Tδ (AD )D1/2 . By Theorem 3.1, we have that

                                              kA − BkC ≤ δ m = εm/2 .                                    (3.2)

So we just need to approximate B by a set of cut matrices within an additive error of εm/2. For a matrix
R, let h(R) := kRD k2F . By Lemma 3.2 we have h(B) = k.

                     T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                                 248
     A N EW R EGULARITY L EMMA AND FASTER A LGORITHMS FOR L OW T HRESHOLD -R ANK G RAPHS
                 √
    Let ε 0 :=√ε/ 4k. We use the proof strategy of Proposition 3.3. Let R(i) = B −W (1) − · · · −W (i) . If
 R(i) C ≥ ε 0 km, then by Theorem 3.4 in polynomial time we can find S, T ⊆ V such that
                                                    √
                                R(i) (S, T ) ≥ ε 0 · k · m/2 ≥ ε 0 · h(B) · m/2 .
                                                                    p
                                                                                                              (3.3)

Choose W (i+1) = CUT(S, T, α), for α = R(i) (S, T )/m2 , and let R(i+1) = R(i) −W (i+1) . We get

                                                                     R(i) (S, T )2     ε 02 · h(B)
           h(R(i+1) ) − h(R(i) ) = −2αR(i) (S, T ) + α 2 d(S)d(T ) ≤ −             ≤ −             .
                                                                          m2                 4
                                                                         √
Since h(R(0) ) = h(B), after σ ≤ 4/ε 02 = 16k/ε 2 , we have R(σ ) C ≤ ε 0 km. Using the triangle inequality
on the cut norm we obtain

                   A −W (1) − · · · −W (σ )       ≤ kA − BkC + B −W (1) − · · · −W (σ )
                                              C                                         C
                                                            0
                                                              √
                                                  ≤ εm/2 + ε km = εm .

where the second inequality uses (3.2).
     This proves the correctness of the algorithm. It remains to upper bound α. For each cut matrix
W (i) = CUT(S, T, α) constructed throughout the algorithm we have
                                                                  p
                                |R(i) (S, T )|   1            (i)     d(u)d(v)
                          |α| =                = 2 ∑ Ru,v p
                                     m2          m u∈S,v∈T            d(u)d(v)
                                                   v
                                                                    (i) 2
                                                   u
                                                 1 u              Ru,v p
                                               ≤ 2 t
                                                         ∑                   d(S)d(T )
                                                 m    u∈S,v∈T d(u)d(v)
                                                   v
                                                                  (i) 2
                                                   u
                                                 1u             Ru,v
                                               ≤   t
                                                       ∑ d(u)d(v)
                                                 m u∈S,v∈T
                                                 p            p             √
                                                   h(R(i) )       h(B)        k
                                               =            ≤             =     .
                                                    m             m         m
where the first inequality follows by the Cauchy–Schwarz inequality, the second inequality uses d(S), d(T ) ≤
m, and the last inequality follows by the fact that the potential function is decreasing throughout the
algorithm. This completes the proof of theorem.


4    Fast approximation algorithm for low threshold-rank graphs
In this section we prove Theorem 2.4. First, by Theorem 2.3 in time poly(n, k, 1/ε) we can find                √ cut
matrices W (1) , . . . ,W (σ ) for σ = O(k/ε 2 ), such that for all 1 ≤ i ≤ t, W (i) = CUT(Si , Ti , αi ), αi ≤ k/m,
and
                                                 kA −W kC ≤ εm/4 ,

                      T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                                  249
                                S HAYAN OVEIS G HARAN AND L UCA T REVISAN

where W := W (1) + · · · +W (σ ) . It follows from the above equation that for any set S ⊆ V ,
                                                              σ
                                                                                                εm
                     |A(S, S) −W (S, S)| = A(S, S) − ∑ αi · d(S ∩ Si ) · d(S ∩ Ti ) ≤              .          (4.1)
                                                              i=1                                4

Fix S∗ ⊆ V of volume Γ − ε/2 ≤ d(S∗ ) ≤ Γ + ε/2 (think of (S∗ , S∗ ) as the optimum cut), and let
s∗i := d(Si ∩ S∗ ), and ti∗ := d(Ti ∩ S∗ ). Observe that by Equation (4.1),

                                                          σ
                                                                           εm
                                           A(S∗ , S∗ ) − ∑ αi s∗i ti∗ ≤       .                               (4.2)
                                                         i=1                4

Let αmax := max1≤i≤σ |αi |. Choose ∆ = Θ(ε 3 m/k1.5 ) such that
                                                    n ε3 · m          ε        o
                                         ∆ ≤ min                 .,                                  (4.3)
                                                 48 48αmax · σ
                                                 √
Note that this is achievable since k ≥ 1, αmax ≤ k/m and σ = O(k/ε 2 ).
   We define an approximation of s∗i ,ti∗ by rounding them down to the nearest multiple of ∆, i. e.,

                                                  s̃∗i := ∆ · bs∗i /∆c ,
                                                  t˜i∗ := ∆ · bti∗ /∆c .

We use s̃∗ , t˜∗ to denote the vectors of the approximate values. It follows that we can obtain a good
approximation of the size of the cut (S∗ , S∗ ) just by guessing the vectors s̃∗ and t˜∗ . Since |s∗i − s̃∗i | ≤ ∆
and |ti∗ − t˜i∗ | ≤ ∆, we get
                σ
               ∑ |s∗i ti∗ αi − s̃∗i t˜i∗ αi | ≤ σ · αmax (2 · ∆ · m + ∆2 ) ≤ 3αmax · σ · ∆ · m ≤ ε · m/16 ,   (4.4)
               i=1

where we used (4.3).
      Observe that by Equations (4.1), (4.2), and (4.4), if we know the vectors s̃∗ , t˜∗ , then we can find
A(S , S∗ ) within an additive error of εm/2. Since s̃∗i , t˜i∗ ≤ m, there are only O(m/∆) possibilities for each
      ∗

s̃∗i and t˜i∗ . Therefore, we afford to enumerate all possible values of them in time (m/∆)2σ , and choose the
one that gives the largest cut. Unfortunately, for a given assignment of s̃∗ , t˜∗ the corresponding cut (S∗ , S∗ )
may not exist. Next we give an algorithm that for a given assignment of s̃∗ , t˜∗ finds a cut (S, S) such that

                                             A(S, S) = ∑ s̃∗i t˜i∗ αi ± εm ,
                                                          i

if one exists.
    First we distinguish the large degree vertices of G and simply guess which side they are mapped to in
the optimum cut. For the rest of the vertices we use the solution of LP(1). Let

                                                U := {v : d(v) ≥ ∆}

                      T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                                  250
     A N EW R EGULARITY L EMMA AND FASTER A LGORITHMS FOR L OW T HRESHOLD -R ANK G RAPHS

be the set of large degree vertices. Observe that |U| ≤ m/∆. Let P be the coarsest partition of the set
V \U such that for any 1 ≤ i ≤ σ , both Si \U and Ti \U can be written as a union of sets in P, and for
each P ∈ P, d(P) ≤ ∆. Observe that |P| ≤ 22σ + m/∆. For a given assignment of s̃∗ , t˜∗ , first we guess the
set of vertices in U that are contained in S∗ , US∗ := S∗ ∩U, and US∗ := U \US∗ . For the rest of the vertices
we use the linear program LP(1) to find the unknown d(S∗ ∩ P).

            LP(1)
            0              ≤ yP                                             ≤ 1                ∀P ∈ P
            Γ − εm/2       ≤     ∑ yP d(P) + d(US )  ∗                      ≤ Γ + εm/2                         (4.5)
                                  P
            s̃∗i           ≤      ∑ yP d(P) + d(US ∩ Si )∗                  ≤ s̃∗i + ∆         ∀1 ≤ i ≤ σ      (4.6)
                                 P⊆Si
            t˜i∗           ≤      ∑ (1 − yP )d(P) + d(US ∩ Ti )∗            ≤ t˜i∗ + ∆         ∀1 ≤ i ≤ σ .    (4.7)
                                 P⊆Ti

Observe that yP = d(S∗ ∩ P)/d(P) is a feasible solution to the linear program. In the next lemma which is
the main technical part of the analysis we show how to construct a set based on a given solution of the LP.

Lemma 4.1. There is a randomized algorithm such that for any S∗ ⊂ V , given s̃∗i , t˜i∗ and US∗ returns a
random set S such that
                                                                    
                                                3εm                       ε
                     P W (S, S) ≥ A(S∗ , S∗ ) −     ∧ |d(S) − Γ| ≤ εm ≥      ,                      (4.8)
                                                 4                        4
                                                                    
                                     ∗ ∗        3εm                       ε
                     P W (S, S) ≤ A(S , S ) +       ∧ |d(S) − Γ| ≤ εm ≥      .                      (4.9)
                                                 4                        4

Proof. Let y be a feasible solution of LP(1). We use a simple independent rounding scheme to compute
a random set S. We always include US∗ in S. For each P ∈ P, we include P in S, independently, with
probability yP . We prove that S satisfies the lemma’s statements.
    First of all, by linearity of expectation,

                                  E[d(S ∩ Si )] = d(US∗ ) + ∑ yP d(P) ,             and
                                                                   P⊆Si
                                            
                                 E d(S ∩ Ti ) = d(US∗ ) + ∑ (1 − yP )d(P) .
                                                                   P⊆Ti

Therefore, by (4.5),
                                             | E[d(S)] − Γ| ≤ εm/2 .                                          (4.10)
Furthermore, by (4.6) and (4.7) for any 1 ≤ i ≤ σ ,

                               | E[S ∩ Si ] − s̃∗i | ≤ ∆ and       | E S ∩ Ti − t˜i∗ | ≤ ∆ .
                                                                            
                                                                                                              (4.11)

In the following claim we show that d(S) is highly concentrated around its expectation. This shows that
d(S) is close to Γ.

                       T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                                  251
                              S HAYAN OVEIS G HARAN AND L UCA T REVISAN

                                        ε
Claim 4.2. P[|d(S) − Γ| ≥ εm] ≤            .
                                        12
Proof. We use the theorem of Hoeffding to prove the claim:

Theorem 4.3 (Hoeffding Inequality). Let X1 , . . . , Xn be independent random variables such that for each
1 ≤ i ≤ n, Xi ∈ [0, ai ]. Let X := ∑ni=1 Xi . Then, for any ε > 0

                                                                2ε 2
                                                                      
                                   P[|X − E[X] | ≥ ε] ≤ 2 exp − n 2 .
                                                               ∑i=1 ai

    Now, by the independent rounding procedure, we obtain

                                                               ε 2 m2
                                                                                   2 2
                                                                                       ε m
           P[|d(S) − E[d(S)] | ≥ εm/2] ≤ 2 exp −                             ≤ 2 exp −
                                                            2 ∑P d(P)2                 2m∆
                                                                                                 ε
                                                                             ≤ 2 exp(−24/ε) ≤       ,
                                                                                                 12
where the second inequality follows by the fact that d(P) ≤ ∆ and ∑P d(P) ≤ m and the third inequality
follows by (4.3). The claim flows by (4.10).

    In the next claim we upper bound the expected value of W (S, S) − A(S∗ , S∗ ).
                                        εm
Claim 4.4. E W (S, S) − A(S∗ , S∗ ) ≤
                      
                                            .
                                         2
                                                                           
Proof. First, we calculate E W (S, S) in terms of E[d(S ∩ Si )] , E d(S ∩ Ti ) .
                           "                         #
                           σ
             E W (S, S) = E ∑ d(S ∩ Si )d(S ∩ Ti )αi
                                  i=1
                                        "                           !                             !#
                             σ                                                                  
                         = ∑ αi E              ∑       d(P) I[P ⊆ S]         ∑      d(Q) I Q ⊆ S        (4.12)
                            i=1             P∈P:P⊆Si                     Q∈P:Q⊆Ti
                              σ                                                       
                            + ∑ αi d(US∗ ∩ Si ) E d(S ∩ Ti ) + d(US∗ ∩ Ti ) E[d(S ∩ Si )] .
                                 i=1

Since the event that P ⊆ S is independent of Q ⊆ S, iff P 6= Q we get
                                                   (
                                                  yP (1 − yQ ) if P 6= Q,
                           E I[P ⊆ S] I Q ⊆ S =
                                                      0            otherwise.
                                             
Let si := E[d(S ∩ Si )] and ti := E d(S ∩ Ti ) . By (4.12) and the above equation,

                                                σ             σ
                                                ∑ αi siti − ∑ αi ∑ yP (1 − yP )d(P)2 .
                                  
                         E W (S, S) =                                                                   (4.13)
                                                i=1          i=1   P∈P


                     T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                              252
     A N EW R EGULARITY L EMMA AND FASTER A LGORITHMS FOR L OW T HRESHOLD -R ANK G RAPHS

Using Equation (4.2) we write
                                                      σ               εm
        E W (S, S) − A(S∗ , S∗ )           E W (S, S) − ∑ αi s∗i ti∗ +
                                          
                                     ≤
                                                        i=1             4
                                            σ                           σ
                                                                                                             εm
                                     =     ∑ (αi siti − αi s∗i ti∗ ) − ∑ ∑ αi yP (1 − yP )d(P)2 + 4
                                           i=1                              i=1 P∈P
                                           σ                               σ
                                                                                                                     εm
                                     ≤    ∑   |αi siti − αi s̃∗i t˜i∗ | + |αi s̃∗i t˜i∗ − αi s∗i ti∗ | + σ αmax m∆ +
                                                                     ∑
                                          i=1                             i=1                                         4
                                           σ
                                                                           εm εm εm εm
                                     ≤    ∑   |αi siti − αi s̃∗i t˜i∗ | +      +        +        ≤        ,
                                          i=1                              16     48       4           2

where the equality follows by (4.13), the second inequality follows by the fact that d(P) ≤ ∆ for all P ∈ P
and ∑P d(P) ≤ m, the third inequality follows by (4.4) and (4.3), and the last equation can be proved
similar to (4.4) using (4.11). This completes the proof of Claim 4.4.

   Now, we are ready finish the proof of Lemma 4.1. Here, we prove (4.8). Equation (4.9) can be proved
similarly. By Claim 4.4,
                                                                 εm
                                      E W (S, S) ≥ A(S∗ , S∗ ) −
                                               
                                                                    .
                                                                  2
Since the size of any cut in G is at most m/2, by (4.1), for any S ⊆ V , W (S, S) ≤ εm/4 + m/2 < 3m/4.
Therefore, by Markov’s inequality,
                                                             
                                                ∗ ∗      3εm       ε
                               P W (S, S) ≥ A(S , S ) −         ≥ .
                                                           4       3

By the union bound, Lemma 4.1 follows from the discussion above and Claim 4.2.

    Our rounding algorithm is described in Algorithm 1. First, we prove the correctness, then we calculate
the running time of the algorithm. Let S be the output set of the algorithm. First, observe that the output
always satisfies |d(S) − Γ| ≤ εm. Now let A(S∗ , S∗ ) be the maximum cut among all sets of size Γ (the
minimization case can be proved similarly). In the iteration that the algorithm correctly guesses s̃∗i , t˜i∗ ,US∗ ,
there exists a feasible solution y of LP(1). by Lemma 4.1, for all 1 ≤ i ≤ 4/ε,
                                                                                
                                              ∗ ∗      3εm                            ε
                    P W (Sy (i), Sy (i)) ≥ A(S , S ) −     ∧ |d(Sy (i)) − Γ| ≤ εm ≥ .
                                                        4                             4

Since we take the best of 4/ε samples, with probability 1/e the output set S satisfies

                                         W (S, S) ≥ A(S∗ , S∗ ) − 3εm/4 .

But by (4.1), we must have A(S, S) ≥ A(S∗ , S∗ ) − εm. This proves the correctness of the algorithm.
    It remains to upper-bound the running time of the algorithm. First observe that if |U| = O(k/ε 2 ),
the running time of the algorithm is dominated by the time it takes to compute a feasible solution of

                      T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                                             253
                               S HAYAN OVEIS G HARAN AND L UCA T REVISAN


Algorithm 1 Approximate Maximum Cut (S, S) such that d(S) = Γ ± εm
  for all possible values of s̃∗i , t˜i∗ , and US∗ ⊆ U do
    if there is a feasible solution y of LP(1) then
       for i = 1 → 4/ε do
          Sy (i) ← US∗ .
          For each P ∈ P include P in Sy (i), independently, with probability yP .
       end for
    end if
  end for
  return among all sets Sy (i) sampled in the loop that satisfy |d(Sy (i)) − Γ| ≤ εm, the one that
  W (Sy (i), Sy (i)) is the maximum.

                                      2                                                        2
LP(1). Since the size of LP is 2Õ(k/ε ) , in this case Algorithm 1 terminates in time 2Õ(k/ε ) . Note that for
                                                                                            2
any sample set Sy (i), both d(Sy (i)) and W (Sy (i), Sy (i)) can be computed in time 2Õ(k/ε ) , once we know
|Sy (i) ∩ P| for any P ∈ P.
     Otherwise if |U|  k/ε 2 , the dependence of the running time of the algorithm on ε, k is dominated
by the step where we guess the subset of US∗ = U ∩ S∗ . By (4.3),
                                                             1.5 
                                                    m         k
                                             |U| ≤ = O              .
                                                    ∆          ε3
                                          1.5   3
Therefore, Algorithm 1 runs in time 2Õ(k /ε ) . Since it takes poly(n, k, 1/ε) to compute the decomposition
                                                            1.5 3
into W (1) , . . . ,W (σ ) , the total running time is 2Õ(k /ε ) + poly(n, k, 1/ε). This completes the proof of
Theorem 2.4.

Acknowledgement.
We would like to thank an anonymous reviewer for helpful comments.


References
 [1] N OGA A LON AND A SSAF NAOR: Approximating the cut-norm via Grothendieck’s in-
     equality.   SIAM J. Comput., 35(4):787–803, 2006. Preliminary version in STOC’04.
     [doi:10.1137/S0097539704441629] 248

 [2] S ANJEEV A RORA , B OAZ BARAK , AND DAVID S TEURER: Subexponential algorithms for Unique
     Games and related problems. In Proc. 51st FOCS, pp. 563–572. IEEE Comp. Soc. Press, 2010.
     [doi:10.1109/FOCS.2010.59] 242

 [3] B OAZ BARAK , P RASAD R AGHAVENDRA , AND DAVID S TEURER: Rounding semidefinite pro-
     gramming hierarchies via global correlation. In Proc. 52nd FOCS, pp. 472–481. IEEE Comp. Soc.
     Press, 2011. [doi:10.1109/FOCS.2011.95] 242, 243

                     T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                                254
    A N EW R EGULARITY L EMMA AND FASTER A LGORITHMS FOR L OW T HRESHOLD -R ANK G RAPHS

 [4] A MIN C OJA -O GHLAN , C OLIN C OOPER , AND A LAN M. F RIEZE: An efficient sparse regularity
     concept. SIAM J. Discrete Math., 23(4):2000–2034, 2010. Preliminary version in SODA’09.
     [doi:10.1137/080730160] 243

 [5] W ENCESLAS F ERNANDEZ DE LA V EGA , M AREK K ARPINSKI , R AVI K ANNAN , AND S ANTOSH
     V EMPALA: Tensor decomposition and approximation schemes for constraint satisfaction problems.
     In Proc. 37th STOC, pp. 747–754. ACM Press, 2005. [doi:10.1145/1060590.1060701] 243

 [6] A LAN M. F RIEZE AND R AVI K ANNAN: The regularity lemma and approximation schemes
     for dense problems. In Proc. 37th FOCS, pp. 12–20. IEEE Comp. Soc. Press, 1996.
     [doi:10.1109/SFCS.1996.548459] 242

 [7] A LAN M. F RIEZE AND R AVI K ANNAN: Quick approximation to matrices and applications.
     Combinatorica, 19(2):175–220, 1999. [doi:10.1007/s004930050052] 242, 247

 [8] M ICHEL X. G OEMANS AND DAVID P. W ILLIAMSON: Improved approximation algorithms for
     maximum cut and satisfiability problems using semidefinite programming. J. ACM, 42(6):1115–
     1145, 1995. Preliminary version in STOC’94. [doi:10.1145/227683.227684] 242

 [9] V ENKATESAN G URUSWAMI AND A LI K EMAL S INOP: Lasserre hierarchy, higher eigenvalues,
     and approximation schemes for graph partitioning and quadratic integer programming with PSD
     objectives. In Proc. 52nd FOCS, pp. 482–491. IEEE Comp. Soc. Press, 2011. Expanded version in
     CoRR (2011). Merged with [11] in CoRR (2013). [doi:10.1109/FOCS.2011.36] 242, 243, 255

[10] V ENKATESAN G URUSWAMI AND A LI K EMAL S INOP: Faster SDP hierarchy solvers for local
     rounding algorithms. In Proc. 53rd FOCS, pp. 197–206. IEEE Comp. Soc. Press, 2012. Expanded
     version in CoRR. [doi:10.1109/FOCS.2012.58] 242, 243

[11] V ENKATESAN G URUSWAMI AND A LI K EMAL S INOP: Approximating non-uniform sparsest cut via
     generalized spectra. In Proc. 24th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’13). SIAM,
     2013. Expanded version, merged with [9] in CoRR (2013). [doi:10.1137/1.9781611973105.22]
     242, 255

[12] A LEXANDRA KOLLA: Spectral algorithms for Unique Games. Comput. Complexity, 20(2):177–206,
     2011. Preliminary versions in CCC’10 and ECCC (2010). [doi:10.1007/s00037-011-0011-7] 242

[13] A LEXANDRA KOLLA AND M ADHUR T ULSIANI: Playing Unique Games using graph spectra.
     manuscript, 2007. 242

[14] J EAN B. L ASSERRE: Global optimization with polynomials and the problem of moments. SIAM J.
     Optim., 11(3):796–817, 2001. [doi:10.1137/S1052623400366802] 242

[15] S HAYAN OVEIS G HARAN AND L UCA T REVISAN: A new regularity lemma and faster approxi-
     mation algorithms for low threshold rank graphs. In Proc. 16th Internat. Workshop on Approxi-
     mation Algorithms for Combinatorial Optimization Problems (APPROX’13), pp. 303–316, 2013.
     [doi:10.1007/978-3-642-40328-6_22] 241

                   T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                     255
                           S HAYAN OVEIS G HARAN AND L UCA T REVISAN

[16] PABLO A. PARRILO: Structured semidefinite programs and semialgebraic geometry methods in
     robustness and optimization. Ph. D. thesis, California Institute of Technology, 2000. 242
[17] E NDRE S ZEMERÉDI: Regular partitions of graphs. Colloq. Internat. CNRS: Problèmes Combina-
     toires et Théorie des Graphes, 260:399–401, 1978. 242

AUTHORS
     Shayan Oveis Gharan
     Assistant Professor
     University of Washington, Seattle, WA
     shayan cs washington edu
     http://homes.cs.washington.edu/~shayan

     Luca Trevisan
     Professor
     U.C. Berkeley, Berkeley, CA
     luca berkeley edu
     http://www.cs.berkeley.edu/~luca/

ABOUT THE AUTHORS
     S HAYAN OVEIS G HARAN (listed alphabetically under “O”) is an assistant professor in the
        Computer Science and Engineering department at the University of Washington. He
        received his Ph. D. in 2013 from the MS&E department at Stanford University under
        the supervision of Amin Saberi and Luca Trevisan. Before joining UW he spent a year
        and a half as a postdoctoral Miller Fellow at UC Berkeley where his host was Umesh
        Vazirani. He did his undergraduate studies at the Computer Engineering department at
        Sharif University. Shayan’s research areas include the design and analysis of algorithms,
        spectral graph theory and applied probability. Shayan received an ACM doctoral dis-
        sertation award honorable mention for his Ph. D. thesis “New Rounding Techniques for
        the Design and Analysis of Approximation Algorithms” in 2014. He and his coauthors
        received the Best Paper awards at SODA 2010 and FOCS 2011 for their work on the
        Traveling Salesman Problem.

     L UCA T REVISAN is a professor of electrical engineering and computer sciences at U.C.
        Berkeley and a senior scientist at the Simons Institute for the Theory of Computing. Luca
        received his Dottorato (Ph. D.) in 1997, from the Sapienza University of Rome, working
        with Pierluigi Crescenzi. After graduating, Luca was a post-doc at MIT and at DIMACS,
        and he was on the faculty of Columbia University, U.C. Berkeley, and Stanford, before
        returning to Berkeley in 2014. Luca’s research is in theoretical computer science, and in
        the past six years he has been interested in spectral graph theory and its applications to
        graph algorithms. Luca lives, beyond his means, in San Francisco. When out of town, he
        can often be found in Rome or in Hong Kong.


                   T HEORY OF C OMPUTING, Volume 11 (9), 2015, pp. 241–256                           256