DOKK Library

Optimality of Correlated Sampling Strategies

Authors Mohammad Bavarian, Badih Ghazi, Elad Haramaty, Pritish Kamath, Ronald L. Rivest, Madhu Sudan,

License CC-BY-3.0

Plaintext
                            T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18
                                         www.theoryofcomputing.org




Optimality of Correlated Sampling Strategies
              Mohammad Bavarian∗                             Badih Ghazi†           Elad Haramaty‡
                 Pritish Kamath§                     Ronald L. Rivest¶             Madhu Sudank
                 Received January 15, 2018; Revised May 30, 2020; Published November 9, 2020




       Abstract. In the correlated sampling problem, two players are given probability distri-
       butions P and Q, respectively, over the same finite set, with access to shared randomness.
       Without any communication, the two players are each required to output an element sampled
       according to their respective distributions, while trying to minimize the probability that their
       outputs disagree. A well known strategy due to Kleinberg–Tardos and Holenstein, with
       a close variant (for a similar problem) due to Broder, solves this task with disagreement
       probability at most 2δ /(1 + δ ), where δ is the total variation distance between P and Q.
       This strategy has been used in several different contexts, including sketching algorithms,
       approximation algorithms based on rounding linear programming relaxations, the study of
       parallel repetition and cryptography.
           In this paper, we give a surprisingly simple proof that this strategy is essentially optimal.
       Specifically, for every δ ∈ (0, 1), we show that any correlated sampling strategy incurs
   ∗ Work done while at MIT. Supported in part by NSF Award CCF-1420692.
   † Work done while at MIT. Supported in part by NSF CCF-1420956, NSF CCF-1420692 and CCF-1217423.
   ‡ Work done while at Harvard. Supported by NSF Award CCF-1565641.
   § Work done while at MIT. Supported in part by NSF CCF-1420956 and NSF CCF-1420692.
   ¶ This work was supported by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under

grant agreement CCF-0939370.
   k This work was supported by NSF Awards CCF-1565641 and CCF-1715187, and a Simons Investigator Award.



ACM Classification: F.0, G.3
AMS Classification: 68Q99, 94A20, 68W15
Key words and phrases: distributions, sampling, correlated sampling, coupling, MinHash, communica-
tion complexity


© 2020 M. Bavarian, B. Ghazi, E. Haramaty, P. Kamath, R. L. Rivest and M. Sudan
c b Licensed under a Creative Commons Attribution License (CC-BY)                           DOI: 10.4086/toc.2020.v016a012
             M. BAVARIAN , B. G HAZI , E. H ARAMATY, P. K AMATH , R. L. R IVEST AND M. S UDAN

        a disagreement probability of essentially 2δ /(1 + δ ) on some inputs P and Q with total
        variation distance at most δ . This partially answers a recent question of Rivest.
            Our proof is based on studying a new problem that we call constrained agreement. Here,
        the two players are given subsets A ⊆ [n] and B ⊆ [n], respectively, and their goal is to output
        an element i ∈ A and j ∈ B, respectively, while minimizing the probability that i 6= j. We
        prove tight bounds for this question, which in turn imply tight bounds for correlated sampling.
        Though we settle basic questions about the two problems, our formulation leads to more
        fine-grained questions that remain open.


1     Introduction
In this paper, we study correlated sampling, a basic task, variants of which have been considered in the
context of sketching algorithms [2], approximation algorithms based on rounding linear programming
relaxations [7, 3], the study of parallel repetition [6, 12, 1] and cryptography [13].
    The correlated sampling problem is defined as follows. Alice and Bob are given probability dis-
tributions P and Q, respectively, over a finite set Ω. Without any communication, using only shared
randomness as the means to coordinate, Alice is required to output an element a distributed according to
P and Bob is required to output an element b distributed according to Q. Their goal is to minimize the
disagreement probability Pr[a 6= b], which we will compare with the total variation distance between P
and Q, defined as
                                                                1
                         dTV (P, Q) := sup P(A) − Q(A) =           ∑ |P(ω) − Q(ω)|.                (1.1)
                                        A⊆Ω                     2 ω∈Ω
A correlated sampling strategy is formally defined below, where ∆Ω denotes the set of all probability
distributions over Ω and (R, F, µ) denotes the probability space corresponding to the randomness shared
by Alice and Bob. R is the sample space, F is a σ -algebra over R and µ is a probability measure over
(R, F). Even though Ω is finite, we allow R to be infinite. For simplicitly, we abuse notation and use R to
denote both the sample space and the probability space.

Definition 1.1. A correlated sampling strategy for a finite set Ω with error ε : [0, 1] → [0, 1] is specified
by a probability space R and a pair of functions1 f , g : ∆Ω × R → Ω, such that for all P, Q ∈ ∆Ω with
dTV (P, Q) ≤ δ , the following hold.

                        [Correctness]               { f (P, r)}r∼R = P and {g(Q, r)}r∼R = Q,
                        [Error Guarantee]           Prr∼R [ f (P, r) 6= g(Q, r)] ≤ ε(δ ).

In the above, { f (P, r)}r∼R denotes the push-forward measure, that is, the distribution of the random
variable f (P, r) over Ω, where r ∼ R is the source of shared randomness. For simplicity, we will often
not mention R explicitly when talking about correlated sampling strategies. While we defined correlated
sampling strategies for finite sets only, it is possible to define it for infinite sets Ω; see Section 5 for a
discussion. In this paper we consider finite sets Ω only, except where otherwise stated.
    1 We require both functions to be measurable in their second argument.



                           T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                            2
                              O PTIMALITY OF C ORRELATED S AMPLING S TRATEGIES

     A correlated sampling strategy is notably different from the fundamental notion of a coupling (see,
e. g., [14] for an introduction), where we require a single coupling function h : ∆Ω × ∆Ω → ∆Ω×Ω such
that for any distributions P and Q it holds that the marginals of h(P, Q) are P and Q respectively. In other
words, a coupling function has the knowledge of both P and Q, whereas a correlated sampling strategy
operates locally on the knowledge of P and on the knowledge of Q. It is well known that for any coupling
function h, it holds that Pr(a,b)∼h(P,Q) [a 6= b] ≥ dTV (P, Q) and that this bound is achievable. Observe that
a correlated sampling strategy induces a coupling given as {( f (P, r), g(Q, r))}r∼R . Thus, it follows that
ε(δ ) ≥ δ . And yet a priori, it is unclear whether any non-trivial correlated sampling strategy can even
exist, since the error ε is not allowed to depend on the size of Ω.
     Somewhat surprisingly, there exists a simple strategy whose error can be bounded by roughly twice
the total variation distance (and in particular does not degrade with the size of Ω). Variants of this strategy
have been rediscovered multiple times in the literature, yielding the following theorem.
Theorem 1.2 ([2, 7, 6]). For all n ∈ Z≥0 , there exists a correlated sampling strategy for sets of size n,
with error ε : [0, 1] → [0, 1], such that for all δ ∈ [0, 1], it holds that
                                                                 2·δ
                                                      ε(δ ) ≤        .                                                    (1.2)
                                                                 1+δ
     Strictly speaking, Broder’s paper [2] did not consider the general correlated sampling problem. Rather
it introduced the MinHash strategy, which can be interpreted as a correlated sampling strategy for the
special case where P and Q are flat distributions, i. e., they are uniform over some subsets of Ω. In
particular, if P = U(A) and Q = U(B) are distributions that are uniform over sets A, B ⊆ Ω, respectively,
then the MinHash strategy gives an error probability of 1 − |A ∩ B|/|A ∪ B|, also known as the Jaccard
distance between A and B. In the special case when |A| = |B|, this is equivalent to the bound above.
     The technique can be generalized to other (non-flat) distributions to get the bound in Theorem 1.2,
thereby yielding a strategy due to Kleinberg–Tardos and Holenstein.2 Several variants of this (sometimes
referred to as “consistent sampling” protocols) have been used in applied work, including [10, 4, 9, 5].
     Given Theorem 1.2, a natural and basic question is whether the bound on the error can be improved;
the only lower bound we are aware of is the one that arises from coupling, namely ε(δ ) ≥ δ . This
question was recently raised by Rivest [13] in the context of a new encryption scheme and was one of
the motivations for this work. We give a surprisingly simple proof that the bound in Theorem 1.2 is
essentially tight.
Theorem 1.3 (Main Result). For all δ , γ ∈ (0, 1), for all sufficiently large n, any correlated sampling
strategy for a set of size n with error ε : [0, 1] → [0, 1] satisfies
                                                               2·δ
                                                   ε(δ ) ≥         − γ.                                                   (1.3)
                                                               1+δ
Organization of the paper. In Section 2, we prove Theorem 1.3. In Section 3, we consider the setting
where Ω is of a fixed finite size, which was the question originally posed by Rivest [13]. In this regime,
there turns out to be a surprising strategy that gets better error than Theorem 1.2 in a very special case.
   2 Strictly speaking, if P and Q are flat over subsets of different sizes, the above bound is weaker than that obtained from a

direct application of the MinHash strategy. See Section 5 for a discussion.


                           T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                                              3
           M. BAVARIAN , B. G HAZI , E. H ARAMATY, P. K AMATH , R. L. R IVEST AND M. S UDAN

However, it was conjectured in [13] that in fact a statement like Theorem 1.3 holds in every other case
and we make progress on this conjecture by proving it in one such case. For completeness, in Section 4
we describe the correlated sampling strategies of Broder, Kleinberg–Tardos, and Holenstein, underlying
Theorem 1.2. We conclude with some more observations and open questions in Section 5.


2    Lower bound on correlated sampling
In order to prove Theorem 1.3, we first introduce the constrained agreement problem, a relaxation of
the correlated sampling problem. In this problem, Alice and Bob are given sets A ⊆ Ω and B ⊆ Ω,
respectively, where the pair (A, B) is sampled from some (known) distribution D. Alice and Bob
are required to output elements a ∈ A and b ∈ B, respectively, such that the disagreement probability
Pr(A,B)∼D [a 6= b] is minimized.
     This can be viewed as a relaxation of the correlated sampling problem by first considering the case of
flat distributions in Theorem 1.1 and relaxing the restrictions of { f (P, r)}r∼R = P and {g(Q, r)}r∼R = Q
to only requiring that f (P, r) ∈ supp(P) and g(Q, r) ∈ supp(Q) for all r ∈ R. This makes it a constraint
satisfaction problem and we consider a distributional version of the same.
     In the following definition, we use 2Ω to denote the powerset of Ω.

Definition 2.1. A constrained agreement strategy for a finite set Ω and a probability distribution D over
2Ω × 2Ω is specified by a pair of functions f , g : 2Ω → Ω and has error errD ( f , g) if the following hold.

                      [Correctness]            ∀A, B ⊆ Ω : f (A) ∈ A and g(B) ∈ B,
                      [Error guarantee]        Pr(A,B)∼D [ f (A) 6= g(B)] =: errD ( f , g).

    Note that since the constrained agreement problem is defined with respect to a (known) probability
distribution D on pairs of sets, we can require, without loss of generality, that the strategies ( f , g) be
deterministic (since any randomized strategy can be derandomized with no degradation in the error).
    In order to prove Theorem 1.3, we characterize the optimal constrained agreement strategy in the
special case when D = D p where every element ω ∈ Ω is independently included in each of A and B
with probability p.

Lemma 2.2. For all p ∈ [0, 1], any constrained agreement strategy ( f , g) for a finite set Ω and distribution
D = D p over 2Ω × 2Ω , has error
                                                          2(1 − p)
                                        errD p ( f , g) ≥          .
                                                           2− p
Proof of Lemma 2.2. For ease of notation, let Ω = [n]. Let ( f , g) be a constrained agreement strategy.
We will construct functions f ∗ and g∗ such that

                                                                          2(1 − p)
                                errD p ( f , g) ≥ errD p ( f ∗ , g∗ ) ≥            .
                                                                           2− p

   For every i ∈ [n], let βi := PrB [g(B) = i]. Without loss of generality (by suitably permuting [n]), we
can assume that β1 ≥ β2 ≥ · · · ≥ βn . Since A and B are independently sampled in D p , it follows that

                       T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                                4
                            O PTIMALITY OF C ORRELATED S AMPLING S TRATEGIES

when Bob’s strategy is fixed to g, the strategy of Alice that results in the largest agreement probability is
simply f ∗ (A) := argmaxi∈A βi = min {i : i ∈ A} for all A ⊆ [n].
    So far we have errD p ( f , g) ≥ errD p ( f ∗ , g). We can repeat the same process again. For every i ∈ [n],
define αi := PrA [ f ∗ (A) = i]. Due to the specific choice of f ∗ , it holds that αi = (1 − p)i−1 p and hence
α1 ≥ α2 ≥ · · · ≥ αn . Thus, when Alice’s strategy is fixed to f ∗ , the strategy of Bob that results in the
largest agreement probability is given by g∗ (B) = argmaxi∈B αi = min {i : i ∈ B} for all B ⊆ [n]. Hence,
we get errD p ( f , g) ≥ errD p ( f ∗ , g) ≥ errD p ( f ∗ , g∗ ) where

                             errD p ( f ∗ , g∗ ) := 1 −       Pr      [ f ∗ (A) = g∗ (B)]
                                                          (A,B)∼D p
                                                          n
                                              = 1 − ∑ Pr[ f ∗ (A) = i] · Pr[g∗ (B) = i]
                                                              A                  B
                                                      i=1
                                                       n
                                              = 1 − ∑ (1 − p)2·(i−1) · p2
                                                      i=1
                                                       p     2(1 − p)
                                              ≥ 1−         =          .
                                                      2− p    2− p

Thus, we conclude that
                                                                    2(1 − p)
                                             errD p ( f , g) ≥               .
                                                                     2− p
Before turning to the proof of Theorem 1.3, we note a couple of basic facts.

Fact 2.3. For flat distributions P = U(A) and Q = U(B) with A, B ⊆ Ω, it holds, that

                                                                     |A ∩ B|
                                        dTV (P, Q) = 1 −                          .
                                                                   max {|A|, |B|}

The following concentration bound, due to Sergei Bernstein from the 1920s, is often referred to as
Chernoff’s or Hoeffding’s bound. The Bernoulli random variable X ∼ Ber(p) is a 0-1 random variable
with Pr[X = 1] = p.

Fact 2.4 (See, e. g., Cor 4.6 in [11])). For X1 , . . . , Xn drawn i. i. d. from Ber(p), it holds for all τ > 0, that
                                                                      2
                                    Pr ∑i Xi − pn ≥ τ · pn ≤ 2 · e−pnτ /3 .
                                                         


Proof of Theorem 1.3. Fix δ , γ ∈ (0, 1). Assume, for the sake of contradiction, that for infinitely many
values of n, there is a correlated sampling strategy ( f ∗ , g∗ ) for a set of size n with error

                                                              2·δ
                                                 ε(δ ) <          − γ.
                                                              1+δ
Let δ 0 ∈ (0, δ ) be such that
                                           2·δ      2·δ0     2·δ
                                               −γ <      0
                                                           <     .                                             (2.1)
                                           1+δ      1+δ      1+δ

                         T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                                     5
            M. BAVARIAN , B. G HAZI , E. H ARAMATY, P. K AMATH , R. L. R IVEST AND M. S UDAN

Consider the distribution D p over pairs (A, B) of subsets A, B ⊆ [n] where each i ∈ [n] is independently
included in each of A and B with probability p := 1 − δ 0 . Thus, we have E[|A|] = E[|B|] = p · n, and
E[|A ∩ B|] = p2 · n. Moreover, by Theorem 2.4 with τ = n−0.2 , we have that
                                                                                     0.6 /3
                                         Pr[||A| − pn| > p · n0.8 ] ≤ 2 · e−p·n               ,
                                         A
                                                                                     0.6 /3
                                         Pr[||B| − pn| > p · n0.8 ] ≤ 2 · e−p·n               ,
                                         B
                                                                                 2    0.6 /3
                                Pr [||A ∩ B| − p2 n| > p2 · n0.8 ] ≤ 2 · e−p ·n                   .
                                A,B

                                                                                                          2   0.6
Hence, by the union bound and using p2 ≤ p, we get that with probability at least 1 − 6 · e−p ·n /3 , we
have that ||A| − p · n| ≤ pn0.8 , ||B| − p · n| ≤ pn0.8 and ||A ∩ B| − p2 · n| ≤ p2 n0.8 . Thus, for the distributions
                                                                               2 0.6
P = U(A) and Q = U(B), it holds with probability at least 1 − 6 · e−p ·n /3 that

                                                            |A ∩ B|
                                  dTV (P, Q) = 1 −
                                                         max{|A|, |B|}
                                                   ≤ 1 − p + on (1)
                                                   = δ 0 + on (1)
                                                   < δ       for sufficiently large n.

The assumed strategy ( f ∗ , g∗ ) is such that

                                                                          2δ
                                        Pr [ f (P, r) 6= g(Q, r)] ≤              −γ
                                        r∼R                             (1 + δ )

when dTV (P, Q) ≤ δ and at most 1 otherwise. In our random choice of the pair of distributions (P, Q), the
probability of dTV (P, Q) > δ is at most on (1). Thus,

                                                                        2·δ
                                   Pr        [ f (P, r) 6= g(Q, r)] ≤       − γ + on (1)
                                (P,Q), r∼R                              1+δ

when applied on the randomly sampled (P, Q). In particular, by averaging, there exists a deterministic
constrained agreement strategy with no worse disagreement probability. That is,

                                                                     2·δ
                                 ∃ ( f , g),     errD p ( f , g) ≤       − γ + on (1).                              (2.2)
                                                                     1+δ

But from Lemma 2.2 we have that,

                                                                     2(1 − p)   2·δ0
                                 ∀ ( f , g),     errD p ( f , g) ≥            =      .                              (2.3)
                                                                      2− p      1+δ0

Putting Equations (2.2) and (2.3) together contradicts Equation (2.1) for sufficiently large n.

                         T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                                         6
                          O PTIMALITY OF C ORRELATED S AMPLING S TRATEGIES

3    Correlated sampling over a fixed set of finite size
Theorem 1.3 establishes that the correlated sampling strategy underlying Theorem 1.2 is nearly optimal
for Ω that is sufficiently large in size. However, it does not say that the strategy underlying Theorem 1.2
is exactly optimal for a fixed set of finite size. The quest for understanding optimality in this setting was
motivated by a new encryption scheme proposed by Rivest [13]. But as we will see shortly, this quest is
not entirely straightforward!
    In order to elaborate on this, it will be useful to formally define restricted versions of the correlated
sampling strategy which are required to work only when the input pair (P, Q) is promised to lie in a given
relation G ⊆ ∆Ω × ∆Ω .

Definition 3.1. For a finite set Ω and a relation G ⊆ ∆Ω × ∆Ω , a G-restricted correlated sampling strategy
with error ε is specified by a probability space R, a pair of functions f , g : ∆Ω × R → Ω if the following
hold for all pairs of distributions (P, Q) ∈ G,

                     [Correctness]           { f (P, r)}r∼R = P and {g(Q, r)}r∼R = Q,
                     [Error Guarantee]       Prr∼R [ f (P, r) 6= g(Q, r)] ≤ ε.

For example, letting G to be set of all pairs (P, Q) with dTV (P, Q) ≤ δ essentially recovers the original
setting of correlated sampling, for a fixed total variation distance bound between the input distributions.
For the rest of this section, we will consider a special kind of G-restriction corresponding to Alice and
Bob having flat distributions over Ω = [n].

Definition 3.2. For all n, the relation Gna,b,` ⊆ ∆[n] × ∆[n] is defined to consist of pairs (P, Q) of flat
distributions corresponding to sets A, B ⊆ [n] such that P = U(A), Q = U(B) and |A| = a, |B| = b,
|A ∩ B| = `. (For the relation to be non-empty, it is required that ` ≤ min {a, b} and a + b − ` ≤ n.)

Recall from Theorem 2.3, that for all (P, Q) ∈ Gna,b,` with P = U(A) and Q = U(B), is given by

                                                |A ∩ B|               `
                          dTV (P, Q) = 1 −                   = 1−            .
                                              max {|A|, |B|}      max {a, b}

Moreover, the MinHash strategy applied on input pairs (P, Q) ∈ Gna,b,` has a disgreement probability

                                            |A ∩ B|        `
                                       1−           = 1−       .
                                            |A ∪ B|      a+b−`

One might suspect that this is optimal for all values of n, a, b and `. But rather surprisingly, in the very
special case where |A ∩ B| = 1 and |A ∪ B| = n, Rivest [13] gave a strategy with smaller error probability
than the above! While we don’t know of any applications for this strategy itself, its purpose here is to
illustrate that there can be strategies which do better than the MinHash strategy in some special cases.

Theorem 3.3 ([13]). For all a, b ∈ Z≥1 there exists a Ga+b−1
                                                       a,b,1 -restricted correlated sampling strategy with
error at most 1 − 1/ max {a, b}.

                       T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                               7
           M. BAVARIAN , B. G HAZI , E. H ARAMATY, P. K AMATH , R. L. R IVEST AND M. S UDAN

For completeness, we describe this strategy in Section 3.1. Note that for (P, Q) ∈ Ga+b−1
                                                                                    a,b,1 ,

                                                      1             1
                              dTV (P, Q) = 1 −               < 1−       .
                                                  max {a, b}      a+b−1

This naturally leads to the question: Is there a better correlated sampling strategy for larger intersection
sizes? In fact, the MinHash strategy was conjectured to be optimal in every other case (in particular, for
all ` > 1) by Rivest [13] as this is sufficient for proving the security of his proposed encryption scheme.

Conjecture 3.4 (Rivest). For every collection of positive integers n ≥ a, b ≥ ` ≥ 2 such that n ≥ a + b − `,
any Gna,b,` -restricted correlated sampling strategy makes error at least 1 − `/(a + b − `).

As partial progress towards this conjecture, we prove that in the other extreme (as compared to Theo-
rem 3.3), the above conjecture does hold. In particular, we show the following theorem in Section 3.2.

Theorem 3.5. For all a = b ≥ 1, ` = a − 1 and n ≥ a + 1, any Gna,b,` -restricted correlated sampling
strategy makes error at least 1 − `/(a + b − `).

3.1   Correlated sampling strategy of Rivest [13]
In order to prove Theorem 3.3, we recall Philip Hall’s “Marriage Theorem.”

Lemma 3.6 (P. Hall; see, e. g., [8]). Fix a bipartite graph G on vertex sets L and R (with |L| ≤ |R|). There
exists a matching that entirely covers L if and only if for every subset S ⊆ L, we have that |S| ≤ |NG (S)|,
where NG (S) denotes the set of neighbors in G of vertices in S.

Proof of Theorem 3.3. First, let us consider the case where a = b. Let [n]
                                                                                  
                                                                                a   denote the set of all subsets
A ⊆ [n] with |A| = a. Consider the bipartite graph G on vertices [n]                [n]
                                                                                      
                                                                            a   ×    a , with an edge between
vertices A and B if |A ∩ B| = 1. It is easy to see that G is a-regular (since n = 2a − 1). Iteratively using
Theorem 3.6, we get that the edges of G can be written as a disjoint union of a matchings. Let us denote
these as M1 , M2 , . . . , Ma .
    The G2a−1
            a,a,1 -restricted correlated sampling strategy of Alice and Bob is as follows: Use the shared
randomness to sample a random index r ∈ [a] and consider the matching Mr . If (A, B0 ) is the edge present
in Mr , then Alice outputs the unique element in A ∩ B0 . Similarly, if (A0 , B) is the edge present in Mr , then
Bob outputs the unique element in A0 ∩ B. This strategy is summarized in Algorithm 1.
    It is easy to see that both Alice’s and Bob’s outputs are uniformly distributed in A and B, respectively.
Moreover, the probability that they output the same element, is exactly 1/a, which is the probability of
choosing the unique matching Mr which contains the edge (A, B) (i. e., enforcing A0 = A and B0 = B).
    The strategy in the general case of a 6= b is obtained by a simple reduction to the case above. Suppose
w.l.o.g. that a > b. Alice and Bob get sets A ⊆ [n] and B ⊆ [n] such that |A| = a, |B| = b and |A ∩ B| = 1
and A ∪ B = [n]. We extend the universe by adding (a − b) dummy elements to get a universe of size
(2a − 1) (note, n = a + b − 1). Moreover, whenever Bob gets set B, he extends it to B0 by adding all the
dummy elements to B and thus |B0 | = a while having |A ∩ B0 | = 1 and |A ∪ B| = 2a − 1. Now, Alice and
Bob can use the G2a−1                                                                                     0
                      a,a,1 -restricted correlated sampling strategy from above on the input pair (A, B ). This
achieves an error of 1 − 1/a = 1 − 1/ max {a, b}. However, Bob’s output is uniformly distributed over

                        T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                                  8
                          O PTIMALITY OF C ORRELATED S AMPLING S TRATEGIES


      Algorithm 1: Rivest’s strategy [13]
       Alice’s input: A ⊆ [n]
       Bob’s input: B ⊆ [n]
       G-restriction: |A| = |B| = a, |A ∩ B| = 1 and A ∪ B = [n], i. e., n = a + b − 1
       Preprocessing: Let G be the bipartite graph on vertices [n]         [n]
                                                                             
                                                                  a × a , with an edge between
       vertices A and B if |A ∩ B| = 1. Decompose the edges of G into a disjoint matchings
       M1 , . . . , Ma .
       Shared randomness: Index r ∼ U([a])
       Strategy:

           • Let (A, B0 ) and (A0 , B) be edges present in Mr .
           • f (A, r) := unique element in A ∩ B0 .
           • g(B, r) := unique element in A0 ∩ B.



B0 and not B. To fix this, Bob can simply output a uniformly random element of B whenever the above
strategy requires him to return an element of B0 \ B. It is easy to see that this doesn’t change the error
probability.

3.2    Proof of Theorem 3.5
Proof of Theorem 3.5. Let A, B ⊆ [n] be such that a = |A| = |B| = |A ∩ B| + 1 and let P = U(A) and
Q = U(B). For simplicity, we can assume without loss of generality that A ∪ B = [n]. Thus, n = a + 1
and ` = a − 1. Assume for the sake of contradiction that there is a Ga+1a,a,a−1 -correlated sampling strategy
with disagreement probability < 1 − `/(2a − `) = 2/n. Let D be the uniform distribution over pairs
(A, B) of subsets of [n] satisfying A ∪ B = [n] and |A| = |B| = |A ∩ B| + 1. Note that D is not a product
distribution over (A, B), unlike in Lemma 2.2, which is what makes it challenging to analyze. By an
averaging argument, there is a deterministic strategy pair ( f , g) such that,

                                                                 2
                                            Pr [ f (A) 6= g(B)] < .                                    (3.1)
                                         (A,B)∼D                 n

Let                                               n                      o
                                                        [n] 
                                  i := argmax      A ∈ n−1    : f (A) = `                              (3.2)
                                          `∈[n]

be the element that is most frequently output by Alice’s strategy f , and denote its number of occurences
by                                                                
                                                  [n]
                                  k :=     A∈            : f (A) = i .                               (3.3)
                                                 n−1
We consider three different cases depending on the value of k:

                        T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                              9
             M. BAVARIAN , B. G HAZI , E. H ARAMATY, P. K AMATH , R. L. R IVEST AND M. S UDAN

    (i) If k ≤ n − 3, then consider a subset B ⊆ [n] with |B| = n − 1. For any value of f (B) ∈ B, the
        conditional error probability Pr[ f (A) 6= g(B) | B] is at least 2/(n − 1). Averaging over all such B,
        we get a contradiction to Equation (3.1).

    (ii) If k = n − 2, let A1 6= A2 be the two subsets of [n] with |A1 | = |A2 | = n − 1 such that f (A1 ) 6= i
         and f (A2 ) 6= i. For all B ⊆ [n] with |B| = n − 1 such that B 6= A1 and B 6= A2 , the conditional error
         probability Pr[ f (A) 6= g(B) | B] is at least 2/(n − 1). Note that there are n − 2 such sets B, and that
         either A1 or A2 is the set [n] \ {i}. If B = [n] \ {i}, then the conditional disagreement probability
         Pr[ f (A) 6= g(B) | B] is at least (n − 2)/(n − 1). Averaging over all B, we get that
                                                                       
                                                       2      n−2     n−2    1   2
                          Pr [ f (A) 6= g(B)] ≥            ·       +       ·    ≥ ,
                        (A,B)∼D                       n−1      n      n−1    n   n

        where the last inequality holds for all n ≥ 2. This contradicts Equation (3.1).

 (iii) If k = n − 1, then the only subset A1 of [n] with |A1 | = n − 1 and such that f (A1 ) 6= i is A1 = [n] \ {i}.
       For all B 6= A1 , the conditional error probability Pr[ f (A) 6= g(B) | B] is at least 1/(n − 1). On the
       other hand, if B = A1 , then the conditional error probability is equal to 1. Averaging over all B, we
       get that
                                                                              
                                                          1         n−1            1         2
                             Pr [ f (A) 6= g(B)] ≥              ·           +1·          = ,
                          (A,B)∼D                      n−1            n            n         n
        which contradicts Equation (3.1).


Remark. In [7], the correlated sampling strategy is used to give a randomized rounding procedure for a
linear program. The factor 2 loss in the correlated sampling strategy translates into an integrality gap of at
most 2. In fact, they also prove that the integrality gap is roughly tight. As pointed out by an anonymous
reviewer, their proof essentially establishes Theorem 3.5 under the assumption that f = g.


4      Correlated sampling strategies of [2, 7, 6]
For sake of completeness, we describe the correlated sampling strategies of Broder and of Kleinberg–
Tardos and Holenstein, thereby proving Theorem 1.2.


Broder’s Min Hash Strategy. Consider the case of flat distributions, where the distributions P and
Q are promised to be of the following form: there exist A, B ⊆ [n] such that P = U(A) and Q = U(B).
In this case, it is easy to show that the strategy given in Algorithm 2 achieves an error probability of
1 − |A ∩ B|/|A ∪ B|. Since π is a random permutation, f (A, π) is uniformly distributed over A and g(B, π)
is uniformly distributed over B. Let i0 be the smallest index such that π(i0 ) ∈ A ∪ B. The probability that
π(i0 ) ∈ A ∩ B is exactly |A ∩ B|/|A ∪ B|, and this happens precisely when f (A, π) = g(B, π). Hence, we
get the claimed error probability.
    The correlated sampling strategy of [7, 6] follows a similar approach.

                         T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                                  10
                          O PTIMALITY OF C ORRELATED S AMPLING S TRATEGIES


      Algorithm 2: MinHash strategy [2]
       Alice’s input: A ⊆ [n]
       Bob’s input: B ⊆ [n]
       Shared randomness: a random permutation π : [n] → [n]
       Strategy:

           • f (A, π) = π(iA ), where iA is the smallest index such that π(iA ) ∈ A.
           • g(B, π) = π(iB ), where iB is the smallest index such that π(iB ) ∈ B.



Proof of Theorem 1.2. Given a finite set Ω and probability distributions P and Q over Ω, define
          A := {(ω, p) ∈ Ω × [0, 1] : p < P(ω)}      and   B := {(ω, q) ∈ Ω × [0, 1] : q < Q(ω)} .
Also for all ω ∈ Ω, define Aω := A ∩ ({ω} × [0, 1]) and Bω := B ∩ ({ω} × [0, 1]).
     The strategy of [7, 6] can be intuitively understood as follows. Alice and Bob use the MinHash
strategy on inputs A and B over Ω × [0, 1], to obtain elements (ωA , pA ) and (ωB , pB ), respectively, and
simply output ωA and ωB , respectively. However, this by itself is not well defined since Ω × [0, 1] is not
a finite set. Nevertheless, the MinHash strategy can be modified to instead have a (countably) infinite
sequence of points sampled i. i. d. from the uniform distribution over Ω × [0, 1], instead of a permutation
π. This strategy is summarized in Algorithm 3.
     Let µ be the uniform distribution over Ω × [0, 1]. Observe that µ(A) = µ(B) = 1/|Ω| and for all
ω ∈ Ω, we have µ(Aω ) = P(ω)/|Ω| and µ(Bω ) = Q(ω)/|Ω|. Similar to the analysis of the MinHash
strategy, for Alice’s chosen index iA , we have (ωiA , riA ) is uniform over A. Thus, Pr[ f (P, π) = ω] is
precisely µ(Aω )/µ(A) = P(ω). Thus, f (P, π) is distributed according to P and similarly, g(B, π) is
distributed according to Q. Finally, Pr[ f (P, π) = g(Q, π)] ≥ Pr[iA = iB ]. To bound this probability, note
that µ(A ∩ B) = (1 − δ )/|Ω| and µ(A ∪ B) = (1 + δ )/|Ω|.
                                                           µ(A ∩ B)   1−δ       2δ
               Pr[ f (P, π) = g(Q, π)] ≥ Pr[iA = iB ] =             =     = 1−     .
                                                           µ(A ∪ B)   1+δ      1+δ
 We can ignore the possibility that no index iA exists satisfying (ωiA , riA ) ∈ A (similarly for B) since this
happens with probability 0.


5     Discussion and open questions
An immediate open question is to resolve Conjecture 3.4. We reflect on some further open questions that
are raised by the results discussed in this paper.

5.1    Case of negatively correlated sets
In the context of Conjecture 3.4, even in the setting where the set sizes are allowed to vary slightly,
our knowledge is somewhat incomplete. Lemma 2.2 shows optimality of the MinHash strategy when

                       T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                                11
           M. BAVARIAN , B. G HAZI , E. H ARAMATY, P. K AMATH , R. L. R IVEST AND M. S UDAN


      Algorithm 3: The strategy of Kleinberg–Tardos and Holenstein [7, 6]
       Alice’s input: P ∈ ∆Ω ; let A := {(ω, p) ∈ Ω × [0, 1] : p < P(ω)}
       Bob’s input: Q ∈ ∆Ω ; let B := {(ω, q) ∈ Ω × [0, 1] : q < Q(ω)}
       Shared randomness: An infinite sequence π = ((ω1 , r1 ), (ω2 , r2 ), . . .) where each (ωi , ri ) is
       i. i. d. sampled uniformly from Ω × [0, 1].
       Strategy:

           • f (P, π) := ωiA , where iA is the smallest index such that (ωiA , riA ) ∈ A
           • g(Q, π) := ωiB , where iB is the smallest index such that (ωiB , riB ) ∈ B



(A, B) ∼ D p . In this case, A and B are independent and each of them is p-biased, so |A| ≈ p · n,
|B| ≈ p · n and |A ∩ B| ≈ p2 · n. A simple reduction to Lemma 2.2 also implies the optimality of the
MinHash strategy in the case where A and B are positively correlated. Specifically for parameters α > p,
consider the following distribution D p,α on pairs (A, B) of subsets of [n], where we first sample S ⊆ [n] by
independently including each element of [n] with probability p/α, and then independently including every
i ∈ S in each of A and B with probability α. In this case, |A| ≈ p · n, |B| ≈ p · n and |A ∩ B| ≈ α p · n > p2 · n.
Even if we reveal S to both Alice and Bob, Lemma 2.2 implies a lower bound of 2δ /(1 + δ ) on the error
probability (for large enough n). It is unclear if the optimality holds even in the case where A and B are
negatively correlated, i. e., when |A| ≈ p · n, |B| ≈ p · n and |A ∩ B|  p2 · n.

5.2    Fine-grained understanding of G-restricted correlated sampling
As alluded to in the Introduction, in the setting where P and Q are flat distributions on subsets of Ω
of different sizes, there is a strategy with lower error than provided in Theorem 1.2. In particular, for
P = U(A) and Q = U(B) where |A| 6= |B|, the MinHash strategy gives an error probability of
                                                        |A ∩ B|
                                                   1−                                                         (5.1)
                                                        |A ∪ B|
(which is the Jaccard distance between A and B). However, naïvely using the strategy of Kleinberg–Tardos
and Holenstein would give an error probability of
                                                      |A ∩ B|
                                           1−                         ,                                       (5.2)
                                                |A ∪ B| + ||A| − |B||
which is higher than the Jaccard distance when |A| 6= |B|. This implies that the strategy of Kleinberg–
Tardos and Holenstein is not always optimal. Thus, it will be interesting to identify the right measure that
captures the minimum error of a general G-restricted correlated sampling strategy.

5.3    Correlated sampling for infinite spaces
While this paper dealt with correlated sampling for finite sets Ω, it might also be interesting to study it
for infinite sets. This needs to be defined carefully in a measure theoretic sense, which could be done

                        T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                                   12
                           O PTIMALITY OF C ORRELATED S AMPLING S TRATEGIES

as follows. Consider a measure space (Ω, F, µ), where Ω is the sample space, F is a σ -algebra over
Ω and µ is a finite measure on (Ω, F). Let ∆(Ω,F,µ) be the set of all probability measures over (Ω, F)
that are absolutely continuous with respect to µ. The respective inputs of Alice and Bob are probability
measures P and Q in ∆(Ω,F,µ) . A correlated sampling strategy for (Ω, F, µ) is given by a pair of functions
 f , g : ∆(Ω,F,µ) × R → Ω, where f and g are required to be measurable in their second argument r ∈ R. In
order to define the error guarantee in terms of Prr∼R [ f (P, r) 6= g(Q, r)], however, we require that the event
{(ω, ω) : ω ∈ Ω} be measurable in (Ω × Ω, F ⊗ F). This is true, for example, when Ω is a separable
metric space equipped with the standard Borel algebra (see Chapters 3, 4 in [14]). We will assume this to
be the case in the discussion henceforth and it might be useful to keep in mind a concrete example such
as Ω = [0, 1], equipped with the Lebesgue measure.
   To the best of our knowledge, it remains open whether there exists a correlated sampling strategy for
general measure spaces (Ω, F, µ) with any non-trivial error bound, that is, to get ε(δ ) < 1 for all δ < 1.
This is in sharp contrast to coupling, where any two probability measures P and Q with dTV (P, Q) = δ
over (Ω, F) can be coupled with a disagreement probability of at most δ .
     Suppose the inputs P and Q are promised to be such that the corresponding Radon–Nikodym
derivatives (a. k. a. densities) dP/dµ and dQ/dµ are bounded everywhere by a known constant c. Then
it is possible to generalize the strategy of Kleinberg–Tardos and Holenstein (Algorithm 3) and get the
same error guarantee as in Theorem 1.2; this can be done by using µ instead of the uniform measure on
Ω and replacing [0, 1] by [0, c].
    However, the problem gets challenging if there is no promised upper bound on the Radon–Nikodym
derivatives. One explanation for why this challenge is not faced in obtaining a coupling is because
knowing both P and Q, we can always take µ 0 = (P + Q)/2 as a measure with respect to which both P
and Q are absolutely continuous and more strongly, the Radon–Nikodym derivatives dP/dµ 0 and dQ/dµ 0
are never greater than 2. On the other hand, for correlated sampling, the players do not have access to
such a common µ 0 .
     It might also be interesting to study a generalized notion of error in correlated sampling strategies
where we wish to minimize Er∼R [d( f (P, r), g(Q, r))] for some metric d : Ω × Ω → R≥0 over Ω. The
error guarantee studied in this paper corresponds to the discrete metric d(x, y) = 1 {x 6= y}. For Ω ⊆ R,
such as Ω = [0, 1], we might alternatively want to consider d(x, y) = |x − y|. Since a correlated sampling
strategy induces a coupling, this notion of error can never be lower than the Wasserstein distance W1 (P, Q)
(also known as Earth-Mover distance) between the distributions P and Q. To the best of our knowledge, it
remains open in this setting, whether correlated sampling strategies can get an error that is never larger
than some function of W1 (P, Q).




Acknowledgements

We thank anonymous ToC reviewers for their feedback that has significantly helped improve the presenta-
tion of this paper. We are also grateful to Laci Babai for detailed comments that helped reorganize the
paper and brought in more clarity to the discussion about correlated sampling for infinite spaces.

                       T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                                 13
          M. BAVARIAN , B. G HAZI , E. H ARAMATY, P. K AMATH , R. L. R IVEST AND M. S UDAN

References
 [1] B OAZ BARAK , M ORITZ H ARDT, I SHAY H AVIV, A NUP R AO , O DED R EGEV, AND DAVID
     S TEURER: Rounding parallel repetitions of unique games. In Proc. 49th FOCS, pp. 374–383.
     IEEE Comp. Soc., 2008. [doi:10.1109/FOCS.2008.55] 2

 [2] A NDREI Z. B RODER: On the resemblance and containment of documents. In Proc. Com-
     pression and Complexity of Sequences (SEQUENCES’97), pp. 21–29. IEEE Comp. Soc., 1997.
     [doi:10.1109/SEQUEN.1997.666900] 2, 3, 10, 11

 [3] M OSES S. C HARIKAR: Similarity estimation techniques from rounding algorithms. In Proc. 34th
     STOC, pp. 380–388. ACM Press, 2002. [doi:10.1145/509907.509965] 2

 [4] S REENIVAS G OLLAPUDI AND R INA PANIGRAHY: A dictionary for approximate string search
     and longest prefix search. In 15th Internat. Conf. on Information and Knowledge Managment
     (CIKM’06), pp. 768–775. ACM Press, 2006. [doi:10.1145/1183614.1183723] 3

 [5] B ERNHARD H AEUPLER , M ARK M ANASSE , AND K UNAL TALWAR: Consistent weighted sampling
     made fast, small, and easy, 2014. [arXiv:1410.4266] 3

 [6] T HOMAS H OLENSTEIN: Parallel repetition: simplifications and the no-signaling case.
     Theory of Computing, 5(8):141–172, 2009.         Preliminary version in STOC’07.
     [doi:10.4086/toc.2009.v005a008] 2, 3, 10, 11, 12

 [7] J ON K LEINBERG AND É VA TARDOS: Approximation algorithms for classification problems with
     pairwise relationships: Metric labeling and Markov random fields. J. ACM, 49(5):616–639, 2002.
     Preliminary version in FOCS’99. [doi:10.1145/585265.585268] 2, 3, 10, 11, 12

 [8] JACOBUS H ENDRICUS VAN L INT AND R ICHARD M ICHAEL W ILSON: A Course in Combinatorics.
     Cambridge Univ. Press, 2001. [doi:10.1017/CBO9780511987045] 8

 [9] M ARK M ANASSE , F RANK M C S HERRY, AND K UNAL TALWAR: Consistent weighted sampling.
     Technical Report MSR-TR 2010-73, 2010. 3

[10] U DI M ANBER: Finding similar files in a large file system. In USENIX Winter Tech. Conf. (WTEC’94),
     pp. 1–10. USENIX Assoc., 1994. Link at ACM DL. Available as U. Arizona CS TR93-33. 3

[11] M ICHAEL M ITZENMACHER AND E LI U PFAL: Probability and Computing: Randomized Algorithms
     and Probabilistic Analysis. Cambridge Univ. Press, 2005. 5

[12] A NUP R AO: Parallel repetition in projection games and a concentration bound. SIAM J. Comput.,
     40(6):1871–1891, 2011. Preliminary version in STOC’08. [doi:10.1137/080734042] 2

[13] RONALD L. R IVEST: Symmetric Encryption via Keyrings and ECC, 2016. Slide presentation,
     available on author’s home page. 2, 3, 4, 7, 8, 9

[14] H ERMANN T HORISSON: Coupling, Stationarity, and Regeneration. Springer, 2000. 3, 13


                      T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                          14
                     O PTIMALITY OF C ORRELATED S AMPLING S TRATEGIES

AUTHORS

   Mohammad Bavarian [about the author]
   Software Engineer
   Rubrik Inc.
   Palo Alto, CA, USA
   mobavarian gmail com
   https://bavarian.dev/


   Badih Ghazi [about the author]
   Research Scientist
   Google Research
   Mountain View, CA, USA
   badihghazi gmail com
   https://sites.google.com/view/badihghazi/home


   Elad Haramaty [about the author]
   Research Scientist
   Amazon Inc.
   Tel Aviv, Israel
   seladh gmail com


   Pritish Kamath [about the author]
   Postdoctoral Scholar
   Toyota Technological Institute at Chicago
   Chicago, IL, USA
   pritish alum mit edu
   https://pritishkamath.github.io/


   Ronald L. Rivest [about the author]
   Institute Professor
   Massachusetts Institute of Technology
   Cambridge, MA, USA
   rivest mit edu
   https://people.csail.mit.edu/rivest/




                  T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18   15
        M. BAVARIAN , B. G HAZI , E. H ARAMATY, P. K AMATH , R. L. R IVEST AND M. S UDAN

    Madhu Sudan [about the author]
    Gordon McKay Professor of Computer Science
    Harvard John A. Paulson School of Engineering and Applied Sciences
    Cambridge, MA, USA
    madhu cs harvard edu
    http://madhu.seas.harvard.edu/


ABOUT THE AUTHORS

    M OHAMMAD BAVARIAN obtained his Ph. D. under Prof. Madhu Sudan from MIT in 2017
       working on quantum computing and complexity theory. He now works as a software
       developer at Rubrik Inc. and spends his free time thinking about startups, business, and
       engineering. While not busy with above, he also dabbles in programming competitions
       and Machine Learning. He enjoys the beautiful palm trees and sunshine of California,
       but still sometimes misses the city life of the East Coast.


    BADIH G HAZI is a Research Scientist in the Algorithms team at Google. His current
      research interests include algorithmic aspects of machine learning, differential privacy,
      error-correcting codes and communication under uncertainty. He completed his Ph. D. in
      2018 at the Electrical Engineering and Computer Science (EECS) department at MIT
      where he was advised by Professors Madhu Sudan and Ronitt Rubinfeld. Previously, he
      got his M.S. in EECS also from MIT, and his B.Eng. in Computer and Communications
      Engineering from the American University of Beirut. During his graduate studies, he
      was awarded an IBM Ph. D. Fellowship and an MIT Irwin and Joan Jacobs Presidential
      Fellowship.


    E LAD H ARAMATY is a Research Scientist at Amazon Israel. Previously, he held postdoctoral
       researcher positions at Harvard University (hosted by Madhu Sudan) and at Northeastern
       University (hosted by Emanuele Viola). He earned his Ph. D. from the Technion Israel
       Institute of Technology under the guidance of Amir Shpilka. His research interests
       lie in a broad range of areas of theoretical computer science, especially in algebraic
       complexity. In his work he has studied mostly the structure, testability and applications
       of polynomials and algebraic codes. He nowadays mostly works on various aspects of
       Search and Machine Learning.




                   T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                           16
                   O PTIMALITY OF C ORRELATED S AMPLING S TRATEGIES

P RITISH K AMATH is a postdoctoral scholar at the Toyota Technological Institute at Chicago.
    He completed his Ph. D. at MIT, co-advised by Professors Madhu Sudan and Ronitt
    Rubinfeld. Prior to that, he completed a B. Tech. in Computer Science at IIT Bombay
    in 2012 after which he was a Research Fellow at Microsoft Research India until 2013,
    where he was mentored by Neeraj Kayal. He has broad interests in complexity theory
    and has worked in areas touching upon communication complexity, information theory,
    Boolean and algebraic circuit complexity and proof complexity and most recently is also
    interested in foundational aspects of machine learning. He likes to juggle multiple things
    in life; sometimes on a bicycle.


RONALD L. R IVEST is an Institute Professor in the Department of Electrical Engineering
  and Computer Science at MIT. He is a member of MIT’s Computer Science and Artificial
  Intelligence Laboratory (CSAIL), a member of the lab’s Theory of Computation Group
  and is a leader of its Cryptography and Information Security Group.
   Ron has current research interests in cryptography, computer and network security,
   voting systems, and algorithms. In the past he has also worked extensively in the area of
   machine learning.
   Ron is a coauthor (with Thomas Cormen, Charles Leiserson, and Clifford Stein) of the
   well-known text Introduction to Algorithms, published by MIT Press. Over 500,000
   copies of this text have been sold. It has been translated into 12 languages.
   Ron is an inventor of the RSA public-key cryptosystem. He has extensive experience
   in cryptographic design and cryptanalysis, and has published numerous papers in these
   areas. He is a founder of RSA Data Security. (RSA was bought by Security Dynamics;
   the combined company was renamed to RSA Security, and later purchased by EMC),
   and is also a co-founder of Verisign and of Peppercoin.
   Ron is a member of the CalTech/MIT Voting Technology Project. He served 2004–2009
   on the Technical Guidelines Development Committee (TGDC), advisory to the Election
   Assistance Commission, developing recommendations for voting system certification
   standards; he was chair of the TGDC’s Computer Security and Transparency Subcommit-
   tee. He also serves on the Board of the Verified Voting Foundation. He is a member of a
   Scantegrity team developing and testing voting systems that are verifiable “end-to-end.”
   He has worked extensively on statistical post-election tabulation audits, of both the
   “risk-limiting audit” and “Bayesian” flavors.
   Ron is a member of the Center for Science of Information.




                T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                            17
    M. BAVARIAN , B. G HAZI , E. H ARAMATY, P. K AMATH , R. L. R IVEST AND M. S UDAN

M ADHU S UDAN is a Gordon McKay Professor in the John A. Paulson School of Engineering
   and Applied Sciences at Harvard University, where he has been since 2015. Madhu got
   his Bachelors degree (apparently in Technology) from IIT Delhi in 1987 where he was
   introduced to theoretical computer science by Sachin Maheshwari. He got his Ph. D. from
   U. C. Berkeley in 1992 under the supervision of Umesh Vazirani. Madhu subsequently
   spent time at, and was even paid by, IBM, MIT and Microsoft.
   While Madhu’s research explores communication and computational complexity, he
   prefers simplicity, and is especially proud of his exposition (with Peter Gemmell) on
   the Berlekemp-Welch decoding algorithm and his exposition (with David Xiang) of the
   analysis of the Lempel-Ziv compression algorithm.




               T HEORY OF C OMPUTING, Volume 16 (12), 2020, pp. 1–18                         18