Embedding the Ulam metric into <i>ℓ</i><sub>1</sub>

Authors Moses Charikar, Robert Krauthgamer,
Plaintext
                                 T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224
                                             http://theoryofcomputing.org




                Embedding the Ulam metric into `1
                                  Moses Charikar∗                                Robert Krauthgamer
             Received: December 8, 2005; revised: September 18, 2006; published: September 29, 2006.




        Abstract: Edit distance is a fundamental measure of distance between strings, the ex-
        tensive study of which has recently focused on computational problems such as nearest
        neighbor search, sketching and fast approximation. A very powerful paradigm is to map
        the metric space induced by the edit distance into a normed space (e. g., `1 ) with small dis-
        tortion, and then use the rich algorithmic toolkit known for normed spaces. Although the
        minimum distortion required to embed edit distance into `1 has received a lot of attention
        lately, there is a large gap between known upper and lower bounds. We make progress on
        this question by considering large, well-structured submetrics of the edit distance metric
        space.
             Our main technical result is that the Ulam metric, namely, the edit distance on permu-
        tations of length at most n, embeds into `1 with distortion O(log n). This immediately leads
        to sketching algorithms with constant size sketches, and to efficient approximate nearest
        neighbor search algorithms, with approximation factor O(log n). The embedding and its
        algorithmic consequences present a big improvement over those previously known for the
        Ulam metric, and they are significantly better than the state of the art for edit distance in
        general. Further, we extend these results for the Ulam metric to edit distance on strings that
        are (locally) non-repetitive, i. e., strings where (close by) substrings are distinct.

ACM Classification: F.2.2, G.2.1, G.3
AMS Classification: 68P05, 68W20, 68W25
Key words and phrases: edit distance, metric embedding, Ulam metric, low distortion, sketching,
permutation edit distance
   ∗ Supported by NSF ITR grant CCR-0205594, DOE Early Career Principal Investigator award DE-FG02-02ER25540, NSF

CAREER award CCR-0237113, MSPA-MCS award 0528414, and an Alfred P. Sloan Fellowship


Authors retain copyright to their work and grant Theory of Computing unlimited rights
to publish the work electronically and in hard copy. Use of the work is permitted as
long as the author(s) and the journal are properly acknowledged. For the detailed
copyright statement, see http://theoryofcomputing.org/copyright.html.

c 2006 Moses Charikar and Robert Krauthgamer                                                 DOI: 10.4086/toc.2006.v002a011
                                     M. C HARIKAR AND R. K RAUTHGAMER

1    Introduction
The edit distance (aka Levenshtein distance) between two strings is the least number of character inser-
tions, deletions or substitutions required to transform one string to the other. The edit distance arises
naturally in several application areas, often involving large amounts of data, ranging from a moderate
number of extremely long strings (as in computational biology) to a large number of moderately long
strings (as in text processing and web search). Therefore, efficient algorithms for edit distance, even
with modest approximation guarantees, are highly desirable.
     The edit distance is a fundamental measure of (dis)similarity, because it is a very simple model that
exhibits nontrivial alignment (i. e., a single elementary modification may cause numerous characters to
change their position in the string). Popular metrics such as the Hamming distance do not adequately
capture this phenomenon, and thus for data analysis purposes, edit distance often offers a much better
model (up to minor variations such as weighting).
     Let ED(x, y) denote the edit distance between strings x and y. Fixing an alphabet Σ, the edit distance
function ED(·, ·) defines a metric space, called the edit distance metric, whose point set contains all
the strings over Σ. Every collection X of strings over this alphabet (e. g., X = {0, 1}n ) can therefore be
associated with the submetric (X, ED). A significant obstacle in dealing with the edit distance metric is
that it lacks many of the useful properties of normed spaces. We restrict our attention to collections X of
strings that have limited repetitions. While these collections of strings are rather large and give rise to
submetrics (X, ED) of a rich structure, we will be able to obtain results that are significantly better than
those known for the general case X = Σn (or even {0, 1}n ). Our main focus is the Ulam metric, which
we define next.


The Ulam Metric Following [7, 5], a string (over the alphabet Σ) is called a permutation if its charac-
ters are all distinct. (This notion, sometimes called a variation, extends the usual notion of permutation
to cases where |Σ| > n, but this slightly more general setting is more convenient for our purposes and
potentially more useful in applications.) Throughout this paper, the Ulam metric of dimension n (called
permutation edit distance in [7]) is the metric space (Pn , ED), where Pn contains all permutations of
length n over Σ.
     Remark: The standard definition of the Ulam metric for permutations (see e. g. [1]) uses the distance
function UL(x, y), defined as the least number of character moves needed to transform x into y. This dis-
tance function is obviously limited to the case n = |Σ| (i. e. to the usual notion of permutations), while it
is easily seen to be nearly equivalent to the edit distance, namely UL(x, y) ≤ ED(x, y) ≤ 2 UL(x, y). Thus,
our non-standard definition of the Ulam metric to be (Pn , ED) is merely a slight abuse of terminology to
gain additional generality.


Embeddings An embedding of a metric space (X, dX ) into a target metric space (Y, dY ) is a mapping
f : X → Y . The distortion of the embedding f is defined as the smallest K ≥ 1 for which there is (a
scaling factor) s > 0 such that

                dX (x1 , x2 ) ≤ s · dY ( f (x1 ), f (x2 )) ≤ K · dX (x1 , x2 )   for all x1 , x2 ∈ X .

                         T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                             208
                                         E MBEDDING THE U LAM METRIC INTO `1

Low-distortion embeddings are a very powerful paradigm for reducing a host of problems (such as Near-
est Neighbor Search, see Section 3) from a given metric space to a more structured or computationally
easier metric space, at the cost of a small loss in the approximation guarantee. It is easy to see that the
Ulam metric contains the bn/2c-dimensional hypercube as a submetric (up to scaling),1 and hence, by
                                                                                         √
Enflo’s Theorem [9], embedding the Ulam metric into `2 requires distortion Ω( n). But an `1 embed-
ding is usually sufficient for algorithmic applications such as sketching algorithms and Nearest Neighbor
Search (see Section 3.3 for definitions), which raises the question of embedding the Ulam metric, and
more generally the edit distance metric, into `1 .
    There is a very large gap between the known upper and lower bounds on the distortion required to
embed√the edit distance metric (Σn , ED) into `1 . Ostrovsky and Rabani [20] showed an upper bound
of 2O( log n log log n) for
                        p embedding edit distance into `1 . Very recently, Khot and Naor [16] obtained a
lower bound of Ω( log n/ log log n), and this was further improved√
                                                                             to Ω(log n) by Krauthgamer and
Rabani [17]. For the Ulam metric, the same upper bound 2        O( log n log log n) clearly holds and is still the

state of the art. On the other hand, Cormode [5, page 60] shows a lower bound of 4/3, using a 5-
point metric that is isomorphic to the K2,3 graph (up to scaling). Our embedding results make progress
towards resolving these intriguing questions by proving an exponentially smaller upper bound for the
Ulam metric, and extending it to several related edit distance submetrics.


1.1    Results

Our main result, appearing in Section 2, is that the n-dimensional
                                                                √         Ulam metric embeds into `1 with
                     2
distortion O(log n). The previously known upper bound is 2    O(  log n log log n) , using the embedding of [20]

for edit distance in general. Our embedding is surprisingly simple and easy to describe. In fact, this is
the complete description: we have a coordinate for every pair of symbols a, b ∈ Σ and the value of this
coordinate in the embedding of a string is simply the inverse of the distance between a and b in the string
(or 0 if one of a, b does not occur).


Techniques Our methodology is inspired by the work of Cormode, Muthukrishnan and Sahinalp [7],
who designed a mapping from the Ulam metric to Set-Intersection.3 However, their mapping is not an
embedding into `1 (in fact, Set-Intersection is not even a metric space) and it does not yield a sketching
algorithm for the Ulam metric. The main difficulty in the analysis of our embedding is to prove that it
is not too contractive. To this end, we build on the framework of [7], in which a common subsequence
for two permutations is constructed recursively by partitioning each permutation into two subsequences.
The main novelty in our analysis is that we replace the deterministic recursive partitioning of [7] with a
carefully-crafted stochastic partitioning, resulting in a suitable averaging over all symbol pairs.

   1 Consider e. g. the permutations P for which {P(2i − 1), P(2i)} = {2i − 1, 2i} for all i = 1, . . . , bn/2c.
   2 This metric space contains more than 2n points, and thus our distortion bound beats by far the one that follows from

Bourgain’s embedding theorem [4] for general finite metrics. Note that in the nearest neighbor search setting, one needs to
embed not only S into `1 , but also the (yet unknown) query point.
   3 Namely, every permutation is mapped to a subset of some fixed ground set U, such that the edit distance between two

permutations is approximately the size of the intersection between the two respective subsets.


                             T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                                      209
                                  M. C HARIKAR AND R. K RAUTHGAMER

Applications The Ulam metric is interesting in its own right, e. g. when considering the Ulam metric
as a measure of (dis)similarity between rankings for say aggregation purposes. But it is not clear a
priori that results on the Ulam metric would lend themselves to broader classes of strings; bit strings,
for instance, cannot have distinct characters unless n ≤ 2. Nevertheless, we show in Section 3 several
applications of the above embedding result to edit distance on more general strings.
     Let us mention one generalization of the above embedding here, leaving further discussion to Sec-
tion 3. Following [2], we say that a string is t-non-repetitive, if all its t-substrings are distinct. For
example, a random bit string of length n is, with high probability, 2 log n-non-repetitive. We can easily
extend the above embedding to t-non-repetitive strings with a 2t factor loss in the distortion. For in-
stance, this embedding is applicable, with high probability, for two random, but correlated, bit strings
(e. g., x is chosen uniformly at random and y is derived from it by deleting certain positions). Such
scenarios often arise in computational biology contexts due to background distributions.
    All our embeddings (e. g. the one specified above) are efficiently computable, and are thus readily
applicable to computational problems. For instance, they immediately yield sketching algorithms and
Nearest Neighbor Search schemes, as stated in Section 3. The algorithms that we obtain for restricted
families of strings all have significantly better approximation guarantees than the state of the art for
edit distance in general [14, 2, 20]. For one thing, restriction to strings with limited repetitions may be
reasonable in many specific scenarios, and may serve as a rigorous starting point for domain-specific
heuristics.
    In addition, our results make partial progress on the general case; they identify algorithmic tools that
are provably useful (in certain cases), and they pinpoint some difficult aspects (of the general case). The
recent embedding result of Ostrovsky and Rabani [20] relies on a recursive construction. Our results on
the Ulam metric and its extensions suggest that it may be possible to achieve a polylogarithmic distortion
for embedding general edit distance. However, achieving this bound may require going beyond recursive
constructions and using a “magical” direct embedding along the lines of the one we construct.



1.2   Related work

Cormode, Muthukrishnan and Sahinalp [7] were the first to suggest embeddings of various distance
functions on permutations. For reversal distance and transposition distance they designed embeddings
into Hamming space with constant distortion. However, as mentioned earlier, for edit distance (on
permutations) they designed a mapping into Set-Intersection, which cannot be embedded into `1 or yield
a sketching algorithm. They also use this mapping into Set-Intersection to obtain a fast approximate
string matching for permutations (under edit distance).
   Cormode and Muthukrishnan [6] show that a variant of edit distance called the block edit distance,
where a block of characters can be moved in a single edit operation, embeds into `1 with distortion
O(log n log∗ n). See also [8, 19] for embeddings of similar distance function on strings.
    Batu et al. [3] developed a sub-linear time algorithm that runs in O(nmax(α/2,2α−1) ) time and solves
the O(nα ) vs. Ω(n) edit distance gap problem. Their algorithm can be cast as a sketching algorithm,
although it would use a sketch whose size is far more than constant.

                        T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                             210
                                    E MBEDDING THE U LAM METRIC INTO `1

1.3    Notation
As usual, define [k] = {1, . . . , k}. A string is a sequence of characters (i. e., symbols) taken from an
alphabet Σ. The jth character in a string s is denoted by s[ j]. We write a ∈ s to denote that a character
a ∈ Σ appears in the string s, and a ∈  / s otherwise. A t-substring (aka t-gram) of a string s is a string
consisting of t consecutive characters in s, e. g. s[i], s[i + 1], . . . , s[i + t − 1]. In contrast, a subsequence of
s need not be contiguous in s.
    A permutation is string P whose characters are all distinct. For permutations, it will sometimes be
more convenient to work with P−1 , i. e., P−1 (a) is the position at which a ∈ Σ appears in the string P (if
at all).

Longest common subsequence and edit distance For two strings x, y, let LCS(x, y) be the length
of the longest common subsequence of x and y (i. e., the maximum length of a string z that is both a
subsequence of x and a subsequence of y). It is well-known and easy to verify that for every two strings
(and in particular permutations) x, y of length n, we have n − LCS(x, y) ≤ ED(x, y) ≤ 2(n − LCS(x, y)).


2     Embedding the Ulam metric
In this section we present a low-distortion embedding of the Ulam metric (i. e., the edit distance metric
on permutations) into `1 .
                                                                                              O(|Σ|2 )
Theorem 2.1. For every n, the Ulam metric of dimension n can be embedded into `1                         with distortion
O(log n).

    Fix an integer n; we may assume that n is a power of 2, e. g., by padding all strings using additional
characters. Let m = |Σ|, and assume without loss of generality that Σ = {1, . . . , m}.
                                       (m)
    Define an embedding f : Pn → `1 2 as follows. First, associate every coordinate of the target space
with a distinct pair {a, b} where a, b ∈ Σ and a 6= b. Now for every permutation P ∈ Pn , the coordinates
of f (P) are given by
                                (
                                 1/(P−1 (b) − P−1 (a)) if a, b ∈ P, a < b;
                  f (P){a,b} :=                                                                      (2.1)
                                 0                       otherwise (i. e., a ∈
                                                                             / P or b ∈
                                                                                      / P).

The proof of Theorem 2.1 is completed below in Lemma 2.2 and Lemma 2.3, which analyze the expan-
sion and contraction of this embedding f , respectively.

Lemma 2.2 (Expansion). Let P and Q be permutations of length n. Then

                                   k f (P) − f (Q)k1 ≤ O(log n) · ED(P, Q) .

Proof. Extend f to permutations of length at most n by using the same definition (2.1). Observe now
that it suffices to prove the claimed inequality k f (P) − f (Q)k1 ≤ O(log n) · ED(P, Q) for the case where
ED(P, Q) = 1, the length of P is n and the length of Q is n − 1. Indeed, the general case then follows by

                          T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                                       211
                                   M. C HARIKAR AND R. K RAUTHGAMER

the triangle inequality in `1 . (In the process, we simulate a character substitution by a deletion followed
by an insertion, which increases the number of character operations at most by a factor of 2.)
    Suppose then that Q is obtained from P by deleting some character P[s]. Thus, we have (i) Q[i] = P[i]
for all i < s and (ii) Q[i] = P[i + 1] for all i ≥ s. Clearly,

                             k f (P) − f (Q)k1 = ∑         f (P){a,b} − f (Q){a,b} .
                                                   a,b∈Σ

It suffices to consider only the terms in which a ∈ P and b ∈ P, as every other term is 0. So suppose
a = P[i] and b = P[ j] for i < j. In the case where i = s, clearly f (Q){a,b} = 0; thus, the total contribution
of this case is ∑nj=s+1 j−s1
                              ≤ H(n) where H(k) = ∑kz=1 1z is the kth harmonic number. The case where
 j = s is similar, with f (Q){a,b} = 0 and thus total contribution ∑s−1      1
                                                                      i=1 ( s−i ) ≤ H(n). The case where both
i and j are smaller than s, as well as the case where both i and j are larger than s, contribtes zero since
                             1
 f (P){a,b} = f (Q){a,b} = j−i . The last case where i < s < j has total contribution at most

                                       s−1   ∞
                                                     1    1
                                        ∑ ∑            −
                                                    j−i j−i−1
                                                                        ;
                                       i=1 j=s+1

                                                                              1
for each i, the summation over j is a telescopic sum bounded above by s−i       , implying that the total
contribution of this case is at most H(n). Hence, k f (P) − f (Q)k1 ≤ 3H(n) ≤ 3(1 + ln n) = O(log n).

    Our proof method of the next lemma, which bounds the contraction of f , is inspired by the work
of Cormode, Muthukrishnan and Sahinalp [7]. At a high level, we recursively construct a common
subsequence by first partitioning the alphabet, thereby partitioning each string into two subsequences,
and then merging the two common subsequences obtained by recursion. Our analysis is more involved
than that of [7]. In particular, we employ a carefully-crafted stochastic partitioning that “smooths” the
effect of any single pair of characters.
Lemma 2.3 (Contraction). Let P and Q be permutations of length n, and assume n is a power of 2.
Then k f (P) − f (Q)k1 ≥ 81 ED(P, Q).
    Before proving this lemma, we introduce some definitions and a technical proposition. Let P be a
permutation of length k. Denote by LIS(P) the length of a longest increasing subsequence of P. Define
a breakpoint in P to be a position i ∈ [k − 1] where P[i] > P[i + 1], and denote by b(P) the number of
breakpoints in P. Two subsequences of P are called a partition of P if each character of P appears in
exactly one of the two subsequences. A block is a pair of positions {2i − 1, 2i} where i ∈ [bk/2c]. A
partition of P into two subsequences P0 , P1 is called block-balanced if, at every block {2i−1, 2i}, exactly
one of the two characters belongs to P0 (hence also exactly one of them belongs to P1 ). Note that if the
length of P is even and a partition of P is block-balanced, then the two corresponding subsequences have
equal length.
Proposition 2.4. Let P be a permutation of length k, and suppose k is even. Then for every block-
balanced partition of P into subsequences P0 and, P1 ,

                                  LIS(P) ≥ LIS(P0 ) + LIS(P1 ) − 2 b(P) .

                         T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                               212
                                     E MBEDDING THE U LAM METRIC INTO `1

Proof. We will actually prove that

                                          LIS(P) ≥ 2 LIS(P0 ) − 2 b(P) .                                            (2.2)

A symmetric argument for P1 will imply similarly that LIS(P) ≥ 2 LIS(P1 ) − 2 b(P), and the lemma will
follow by averaging the last two inequalities.
     Let us then prove (2.2). Fix a longest increasing subsequence of P0 , and, with a slight abuse of no-
tation, let LIS(P0 ) denote both this subsequence and its length. We construct an increasing subsequence
of P, by augmenting LIS(P0 ) with certain characters from P1 . For each position j ∈ [k] such that P[ j]
is in P0 , let j0 ∈ { j − 1, j + 1} be the position such that { j, j0 } forms a block. Notice that P[ j0 ] is in P1 ,
and call it a candidate if the corresponding P[ j] is in LIS(P0 ). The number of candidates is thus exactly
LIS(P0 ). In particular, if augmenting LIS(P0 ) with all the candidates forms an increasing subsequence
of P, then the length of this increasing subsequence would be 2 LIS(P0 ), and it would prove (2.2). We
show next that LIS(P0 ) can be always be augmented with LIS(P0 ) − 2 b(P) candidates.
     Consider two consecutive characters of LIS(P0 ), say P[ j1 ] and P[ j2 ] with j1 < j2 . (Essentially the
same argument works in the two extremal cases where j2 is the first character of LIS(P0 ) and where j1 is
the last character of LIS(P0 ).) Let t be the number of candidates among P[ j1 +1], . . . , P[ j2 −1]. We claim
that t ≤ 2; indeed, only j1 +1 and j2 −1 can possibly be candidates, because if P[ j0 ] is a candidate for j0 ∈
{ j1 + 1, . . . , j2 − 1} then for the corresponding P[ j] in LIS(P0 ) we have j ∈ { j0 − 1, j0 + 1} ⊆ { j1 , . . . , j2 }
and thus j ∈ { j1 , j2 }. If the t candidates are themselves in increasing order and they are all between
P[ j1 ] and P[ j2 ], then augment LIS(P0 ) with these t candidates; clearly, the result is still an increasing
subsequence of P. Otherwise, P must contain some breakpoint jˆ ∈ { j1 , . . . , j2 − 1}, and this breakpoint
can be blamed for not augmenting LIS(P0 ) with these t ≤ 2 candidates. Applying this augmentation for
every two consecutive characters P[ j1 ] and P[ j2 ] of LIS(P0 ), we see that every breakpoint is blamed only
for candidates that are in the same interval { j1 , . . . , j2 − 1} as the breakpoint, and thus every breakpoint
is blamed for a total of at most 2 candidates. It follows that we have augmented LIS(P0 ) with at least
LIS(P0 ) − 2 b(P) candidates. It is easy to see that this results in an increasing subsequence of P (because
augmenting at one interval does not prevent augmenting at another interval) and this proves (2.2).

Proof of Lemma 2.3. Start with the two length n permutations P and Q, and recall that we used padding
to make n be a power of 2. We may assume that P and Q contain exactly the same characters, because
every character a ∈ Σ that appears in exactly one of the two strings contributes 1 to ED(P, Q) and at
least 2 (actually Ω(log n)) to its “own” coordinates (all {a, b} where b ∈ Σ \ a) in the difference vector
 f (P) − f (Q). We can further assume (by renaming characters in Σ) that Q is the identity permutation,
i. e., Q[i] = i. Hence,
                            ED(P, Q) ≤ 2(n − LCS(P, Q)) = 2(n − LIS(P)) .
   Pick a random block-balanced partition of P into subsequences P0 and P1 , i. e., independently assign
each P[2i] to either P0 or P1 uniformly at random, and assign P[2i − 1] to the other subsequence. By
Proposition 2.4, we have
                               LIS(P) ≥ E[LIS(P0 ) + LIS(P1 )] − 2 b(P) .
Now apply a similar partitioning to each subsequence using independent coins, e. g., P0 is split into P00
and P01 . Continue recursively in a similar fashion until we get a singleton subsequence Pσ for every
σ ∈ {0, 1}log n . For convenience, let ε denote the empty string, and define Pε = P and {0, 1}0 = {ε}.

                           T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                                       213
                                  M. C HARIKAR AND R. K RAUTHGAMER

    Applying Proposition 2.4 recursively, we have
                                                                  log n
                       LIS(P) ≥       ∑           E[LIS(Pσ )] − 2 ∑         ∑        E[b(Pσ )] .
                                  σ ∈{0,1}log n                    k=1 σ ∈{0,1}k−1

We rearrange this equation using the fact that if Pσ is a singleton sequence then LIS(Pσ ) = 1, and obtain
                                                   "                    #
                                                         log n
                                n − LIS(P) ≤ 2 E         ∑        ∑        b(Pσ )     .
                                                         k=1 σ ∈{0,1}k−1

    Notice that the sum ∑k ∑σ b(Pσ ) in the right-hand side gets a contribution of 1 every time two
characters P[i], P[ j] for which i < j and P[i] > P[ j] become consecutive characters in a subsequence Pσ .
Formally, for positions i and j such that i < j and P[i] > P[ j], let the random variable Zi j be the number
of subsequences Pσ which contain P[i], P[ j] as consecutive characters. By linearity of expectation we
get that
                              1
                              2
                                ED(P, Q) ≤ n − LIS(P) ≤ 2          ∑ E[Zi j ] .                         (2.3)
                                                             i< j;P[i]>P[ j]
                                                                                              log n
     We claim that E[Zi j ] ≤ 4/( j − i). To prove the claim, notice that E[Zi j ] = ∑z=1 Pr[Zi j ≥ z] because
Zi j takes only integral values between 0 and log n (as P[i] appears in log n subsequences). Therefore, it
suffices to show for every integer z ≥ 1 the upper bound
                                                                 22−z
                                              Pr[Zi j ≥ z] ≤          .
                                                                 j−i
Let us first show that P[i], P[ j] can become consecutive characters only after blog( j − i)c iterations (par-
titions of P). Indeed, if j − i ≥ 2, then at the first iteration these two characters must belong to different
blocks; thus, with probability 1/2 the random partition of P sends them to different subsequences, in
which case they will never form a consecutive pair of no Pσ , and with probability 1/2 they are sent to the
same subsequence, in which case the difference between their positions in that common subsequence is
at least b( j − i)/2c. Continuing similarly we see that if 2l ≤ j − i < 2l+1 then even after l − 1 partitions,
the difference between the positions of the two characters is at least 2, i. e., they can become consecutive
only after l iterations. Since the different sequence partitions are independent, we get that
                                                     blog( j−i)c
                                                      1                 2
                                     Pr[Zi j ≥ 1] ≤                 ≤       .
                                                      2                j−i
Similarly, if P[i], P[ j] are consecutive in a subsequence Pσ , they can be consecutive in a later iteration
only if they are sent to the same subsequence of Pσ , which happens with proability 1/2 or less. Fixing
an integer z ≥ 1 we can apply this argument z − 1 times to get that Pr[Zi j ≥ z | Zi j ≥ 1] ≤ (1/2)z−1 . We
thus conclude that Pr[Zi j ≥ z] ≤ 22−z /( j − i), which completes the proof of the claim.
    Using (2.3) together with the above claim regarding E[Zi j ], we get that
                                      1                               1
                                      2
                                        ED(P, Q) ≤ 4       ∑         j−i
                                                                         .
                                                     i< j;P[i]>P[ j]


                        T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                               214
                                       E MBEDDING THE U LAM METRIC INTO `1

Notice that for every i < j with P[i] > P[ j], the respective coordinates of f (P) and f (Q) are

                                        1                                              1
                 f (P){P[i],P[ j]} =        <0     and      f (Q){P[i],P[ j]} =                >0 ,
                                       i− j                                       P[i] − P[ j]

and hence f (P){P[i],P[ j]} − f (Q){P[i],P[ j]} > 1/( j − i). We conclude that

                                         1
                                           ED(P, Q) ≤ 4k f (P) − f (Q)k1 ,
                                         2
which completes the proof of Lemma 2.3.

    This completes the proof of Theorem 2.1. We note that the bounds in Lemma 2.2 and Lemma 2.3
are existentially tight, up to constant factors (for our embedding f ).


3     Applications
We present several applications of our `1 -embedding of the Ulam metric from Section 2. Following [2],
we say that a string is t-non-repetitive, if all its t-substrings are distinct. We first extend the embedding
to strings that are t-non-repetitive (Section 3.1). We also extend the above embedding to strings in which
every character appears a bounded number of times (Section 3.2). Both extensions follow by showing
that the corresponding strings can be embedded in the Ulam metric with low distortion. We then discuss
the immediate applications of these embeddings to sketching algorithms (Section 3.3) and to Nearest
Neighbor Search (Section 3.4).
     A technically more involved application of the above embedding is an improvement to a result of
Bar-Yossef et al. [2]. Call a string (t, r)-non-repetitive if every r successive t-substrings of it are distinct.
We improve over the sketching algorithm of [2] for locally non-repetitive strings in two aspects: (i)
we achieve an embedding result, which is stronger than a sketching algorithm (namely, a sketching
algorithm follows quite easily); and (ii) we improve the approximation factor. See Section 3.5 for more
details.

3.1   Embedding non-repetitive strings
Recall that a string is t-non-repetitive, if all its t-substrings are distinct. Let Xn,t contain all the t-non-
repetitive strings of length n over Σ.

Theorem 3.1. The metric space (Xn,t , ED) embeds with distortion 2t into the Ulam metric of dimension
n − t + 1 and alphabet size 2t . Consequently, it embeds into `1 with distortion O(t log n).

    The proof of the first part of the theorem is based on a simple observation, and that of the second
part is an immediate consequence of Theorem 2.1.

Proof. We define an embedding f of (Xn,t , ED) into the aforementioned Ulam metric. First, we identify
the Ulam metric alphabet [2t ] with {0, 1}t (using an arbitrary bijection). Now for a t-non-repetitive
string x ∈ {0, 1}n , define f (x) to be a length n − t + 1 string (over [2t ]), whose jth coordinate is given by

                          T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                                215
                                    M. C HARIKAR AND R. K RAUTHGAMER

f (x) j = b−1 (x[ j] . . . x[ j +t − 1]). To complete the proof, we will show that for every two strings x, y ∈ Σn ,

                                  ED(x, y) ≤ ED( f (x), f (y)) ≤ t ED(x, y) .                                 (3.1)

    To prove the first inequality, consider a longest common subsequence between f (x) and f (y). Each
character in it corresponds to a t-substring in x and in y. Taking the first character from each of these
t-substrings, except for the last such t-substring, which is taken in its entirety, yields a subsequence that
is common to x and y, and hence LCS(x, y) ≥ LCS( f (x), f (y)) + t − 1. We obtain that

              1
                ED(x, y) ≤ n − LCS(x, y) ≤ n − t + 1 − LCS( f (x), f (y)) ≤ ED( f (x), f (y)) .
              2

    To prove the second inequality, fix a longest common subsequence of x and y, and with a slight abuse
of notation let LCS(x, y) denote both this sequence and its length. Consider all the t-substrings of x and
of y that are entirely contained in LCS(x, y). Observe that the number of such t-substrings is at least
n −t + 1 −t(n − LCS(x, y)), because each of the n − LCS(x, y) characters that do not belong to LCS(x, y)
participates in t or fewer t-substrings. These t-substrings in x and in y give rise to a subsequence that is
common to f (x) and f (y). Therefore, LCS( f (x), f (y)) ≥ n − t + 1 − t(n − LCS(x, y)), implying that

           ED( f (x), f (y)) ≤ 2(n − t + 1 − LCS( f (x), f (y))) ≤ t(n − LCS(x, y)) ≤ t ED(x, y) .




3.2   Embedding bounded-occurrence strings
We say that a string P has t-bounded-occurrence if every character a ∈ Σ appears at most t times in P.
Let Bn,t ⊆ Σn contain all the t-bounded-occurrence strings of length n over Σ.

Theorem 3.2. The metric space (Bn,t , ED) embeds with distortion t into the Ulam metric of dimension
n over an extended alphabet of size t|Σ|. Consequently, it embeds into `1 with distortion O(t log n).

    The proof the first part of the theorem is based on a simple observation, and that of the second part
follows immediately from Theorem 2.1.

Proof. Let Σ0 be an alphabet of size t|Σ|, and associate every character a ∈ Σ with t distinct characters
a1 , . . . , at ∈ Σ0 . Given a string x, let f (x) be the string obtained from x by replacing, for every j ∈ [t] and
every character a ∈ Σ, the jth occurrence of a in x with the character a j ∈ Σ0 . To complete the proof of
the first part, it would suffice to prove that for every two strings x, y ∈ Σn ,

                                 1
                                   ED(x, y) ≤ ED( f (x), f (y)) ≤ t ED(x, y) ,                                (3.2)
                                 2

and it is indeed straightforward to verify this inequality.

                          T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                                   216
                                  E MBEDDING THE U LAM METRIC INTO `1

3.3   Sketching algorithms
A sketching algorithm for edit distance consists of two procedures that work in concert as follows.
The first procedures produce a fingerprint, called sketch, from each of the input strings, and the second
procedure uses solely the sketches to approximate the edit distance between the two strings. In the k
vs. m gap version of the problem, there is a promise that the edit distance between the two strings is
either at most k or more than m, and we wish to decide which of the two cases holds. The key feature is
that the sketch of each string is constructed without knowledge of the other string. The procedures are
randomized and are allowed to share random coins, and the main measure of complexity is the size of
the sketches produced. For actual applications it is also desirable that both procedures are efficient (say
run in time that is polynomial in their input size).
    In contrast to Hamming distance, whose sketching complexity is well-understood [15, 18, 10], rel-
atively little is known about sketching of edit distance. The result of Ostrovsky and Rabani [20] gives
a sketching algorithm that,√ for every k = k(n), distinguishes between pairs of strings at edit distance at
most k and at least k · 2 log n log log n) using sketches of size O(1).
                         O(

    For strings that are t-non-repetitive (including e. g. permutations), Bar-Yossef et al. [2] give a sketch-
ing algorithm that solves, for every k = k(n), the k vs. Ω(tk2 ) gap edit distance problem using sketches
of size O(1). We improve over their algorithm as follows.
Theorem 3.3 (Sketching t-non-repetitive strings). For every k = k(n) there exists a polynomial-time
sketching algorithm that solves the k vs. Ω(kt log n) gap edit distance problem on t-non-repetitive strings
of length n using sketches of size O(1).
    The proof of the theorem is a simple consequence of the `1 -embedding from Theorem 3.1, First,
convert the `1 -metric can be into a scaled Hamming metric. Observe that a scaling factor of O(n2 )
suffices: rounding each coordinate in the `1 -embedding to multiples of 1/Cn2 , for a sufficiently large
constant C > 0, increases the distortion by a factor 2, because f (P){a,b} − f (Q){a,b} for f in (2.1) is
either zero or has absolute value Ω(1/n2 ). (A different argument is that f (P) has at most n2 nonzero
coordinates, and thus the total error due to rounding is at most additive 1/C.) Then, use the sketching
algorithm stated below for Hamming spaces, which is implicit in [18], and can also be derived from the
locality-sensitive hashing algorithms of [15, 13] (using the fact that for binary strings there is a direct
correspondence between Hamming distance and `2 distance).
Theorem 3.4 (Sketching Hamming metric [18]). For every ε > 0 and k = k(n), there is a polynomial-
time sketching algorithm that solves the k vs. (1 + ε)k gap Hamming distance problem in binary strings
of length n, with a sketch of size O(1/ε 2 ).
    We note that besides being a very basic computational primitive for massive data sets (see e. g. [5,
Section 4.6]), sketching is also related to (i) Nearest Neighbor Search (see below), (ii) protocols that are
secure (i. e., leak no information), cf. [10], and (iii) the simultaneous messages communication model
with public coins [21].

3.4   Nearest Neighbor Search
One of the most extensively studied computational problems is Nearest Neighbor Search (NN): Given a
set S of points in a metric space X, preprocess S so as to efficiently answer queries for finding the point

                        T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                               217
                                    M. C HARIKAR AND R. K RAUTHGAMER

in S that is closest to a query point q ∈ X. The last decade has seen the advent of data structures for
approximate NN in (high) n-dimensional normed spaces. In particular, algorithms with preprocessing
space (storage) polynomial in n and |S| and query time that is polynomial in n and log |S|, were designed
in [15, 18] for `n1 and `n2 (achieving approximation factor 1 + ε for every constant ε > 0) and in [12]
for `n∞ (achieving approximation factor O(log log n)). Other algorithms for `n1 and `n2 , due to [15, 11],
achieve 1 + ε approximation require preprocessing space that is more moderate (subquadratic in |S|) and
have query time that is sublinear in |S| (roughly O(|S|1/(1+ε) )).
    In contrast, the known algorithmic guarantees for approximate NN in edit distance metrics (X, ED)
are much weaker. For the general case (Σn , ED), Indyk [14] designed, for every fixed α > 0, a con-
stant factor approximation (exponential in 1/α) with space requirement that is exponential in nα . The
recent embedding result √  of Ostrovsky and Rabani [20] leads to a nearest neighbor data structure with
approximation factor 2O( log n log log n) and polynomial space requirement.
    Combining our `1 -embedding from Theorem 3.2 with these NN algorithms for `1 (or Hamming
space or `2 , as discussed in Section 3.3) immediately yields a nearest neighbor algorithm for t-non-
repetitive strings (including e. g. permutations) achieving approximation factor O(t log n), which im-
proves over the bound obtain in [2] for this case. In particular, using the algorithms of [15, 18] results
in query time (n + log |S|)O(1) , and space requirement that (n + |S|)O(1) . Similarly, using the algorithms
of [15, 11] results in query time that is sublinear in |S| and space requirement that is subquadratic in |S|.
    The sketching algorithm can alternatively be used to speed up the naive algorithm that computes
the edit distance between the query q and every data point x ∈ S. Simply replace each edit distance
computation with an estimate derived from a sketch of x (computed at the preprocessing stage) and a
sketch of q. The number of iterations would still be O(|S|), but each iteration will be much faster–about
O(log |S| + log log n) time.

3.5   Embedding locally non-repetitive strings
We can further generalize Theorem 3.1 and obtain an embedding of strings that are locally non-repetitive
(see the definition below). The guarantee of this embedding is slightly weaker than a low distortion, since
it approximates well only distances that are sufficiently small. An interesting aspect of our embedding is
that it uses the sketching algorithm of [18] to obtain an embedding; this is opposite to the usual direction,
where an embedding is used to obtain a sketching algorithm.
     Our results improve over [2] in several respects: (i) We give an embedding, which consequently
leads to a sketching algorithm, while [2] only give a sketching algorithm. An embedding result is
stronger (unless it is computationally inefficient) since it has to simultaneously handle exponentially
many strings, including pairs x, y with rather different distance ED(x, y), while a sketching algorithm is
only required to have a high success probability for every pair. (ii) Our approximation factor is smaller
since we rely on the Ulam metric embedding.
Definition 3.5 (Locally non-repetitive string). A string is called (t, r)-non-repetitive, or in short locally
non-repetitive, if every r successive t-substrings are distinct, i. e., for each interval {i, . . . , i + r − 1}, the
r substrings of length t each that start in this interval are distinct.
   Let Xn,t,r contain all the (t, r)-non-repetitive strings of length n over Σ. Notice that this family
contains Xn,t since every t-non-repetitive string is also (t, r)-non-repetitive.

                          T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                                    218
                                   E MBEDDING THE U LAM METRIC INTO `1

Theorem 3.6 (Embedding). For every t = t(n) and every k = k(n), there exists an embedding f of the
(t, 180tk)-non-repetitive strings into `1 , such that for every two such strings x, y,

                       Ω(min{k, ED(x, y)/t log(tk)}) ≤ k f (x) − f (y)k ≤ ED(x, y) .

     This embedding builds on the iterative anchors technique used in the sketching algorithm of Bar-
Yossef et al. [2, Section 2]. The basic idea is to pick a set of non-overlapping substrings of length t in
a coordinated fashion. These substrings are referred to as anchors and partition the string into disjoint
substrings. The substrings between anchors are embedded into `1 and the embedding for the original
string is obtained by combining these embeddings in a suitable way. The key idea from [2] is a method to
pick these anchors in such a way that if x and y are two strings with small edit distance, then the anchor
selection process picks the same anchors for both x and y. One technical difference is that, between
successive anchors, we employ the `1 -embedding from Section 2. Another difference is that as a final
step we apply the `1 sketching algorithm of [18], which effectively “thresholds” the `1 distance between
images. For clarity, we make no attempt to optimize constants.

Proof. We describe the embedding of a (t, 180tk)-non-repetitive strings x using a randomized procedure
that generates a bit f 0 (x). The embedding f into `1 will then be the concatenation of f 0 (x), ranging over
all possible outcomes for the coin tosses, with suitable scaling.
     Fix W := 56tk. Augment the alphabet Σ with 2W + t new characters a1 , . . . , a2W +t and append to x
the fixed string a1 . . . a2W +t . Select a sequence of disjoint substrings α1 , . . . , αrx of x, called “anchors,”
iteratively as follows. Maintain a sliding window of length 2W + t over the string x. The left endpoint
of the sliding window is denoted by c; initially, c is set to 1. At each step, say step i, consider the W
substrings of length t whose starting position lies in the interval [c + W . . . c + 2W − 1], and denote the
 jth such substring, for j ∈ [W ], by

                                 si, j = x[c +W + j − 1 . . . c +W + j + t − 2] .

Select at random a permutation Πi of Σt , and set the anchor αi to be a substring si,l that is minimal
according to Πi (breaking ties arbitrarily), i. e.,

                                   Πi (si,l ) = min Πi (si,1 )), . . . , Πi (si,W )
                                                   
                                                                                      .

Then slide the window by setting c to the position immediately following the anchor, i. e.,

                                              c ← c +W + l + t − 1 .

If this new value of c is at most n start a new iteration. Otherwise, stop, letting rx ≤ O(n/tk) be the
number of anchors collected.
     For i ∈ [rx ], let φ i = φ i (x) be the substring of x starting at the position immediately after the last
character of anchor αi−1 and ending at the last character of αi . By convention, φ 1 starts at position 1.
                                    O(tk)
     Now embed each φ i into `1           using Theorem 3.1; notice that φ i is a substring of x of length at most
2W + t ≤ 180tk, and is thus t-non-repetitive. Next concatenate the resulting rx = O(n/tk) images into a

                          T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                                   219
                                            M. C HARIKAR AND R. K RAUTHGAMER

                   O(n)
vector ϕ(x) ∈ `1 (appending 0’s as necessary). Finally, choose a random bit string r ∈ {0, 1}O(n) of
the same length, such that for all i, ri = 1 independently with probability 1/kt log(kt), and let

                                                f 0 (x) = ∑ ri · ϕi (x) mod 2 .
                                                          i

(This step is similar to [18], though the purpose of using it here is very different.)
   The embedding’s correctness follows immediately from the next two lemmas, which we shall prove
shortly.
Lemma 3.7. If x and y are (t, 180tk)-non-repetitive strings then

                                            Pr[ f 0 (x) 6= f 0 (y)] ≤ O(ED(x, y)/k) .

Lemma 3.8. If x and y are (t, 180tk)-non-repetitive strings then

                               Pr[ f 0 (x) 6= f 0 (y)] ≥ Ω(min{ED(x, y)/kt log(kt), 1}) .

    The proof of Theorem 3.6 is completed by observing that the concatenation of bits f 0 with suitable
scaling yields an embedding f which satisfies

                            k f (x) − f (y)k1 = k E | f 0 (x) − f 0 (y)| = k Pr[ f 0 (x) 6= f 0 (y)] .



Proof of Lemma 3.7. The preliminary step of appending x and y with the same string of length 2W + t
clearly does not change ED(x, y). Now fix a sequence of edit operations τ that transforms this new x
into the new y and uses ED(x, y) edit operations. Let αi be the ith anchor chosen for x and let βi be
the ith anchor chosen for y. Let r = min{rx , ry }. As in the proof of Lemma 2.6 in [2], with probability
at least 1 − ED(x, y)/7k, the following event happens: for all i ∈ [r], the tranformation τ (sequence of
edit operations) maps αi to βi with no edit operations inside αi or βi .4 If this happens, we say that the
anchors match.
     Assume for the moment that the anchors match. Using the fact that {φ i (x)}i∈[r] are disjoint substrings
of x and similarly {φ i (y)}i∈[r] for y, we get that

                                             ED(x, y) = ∑ ED(φ i (x), φ i (y)) .
                                                          i∈[r]

Furthermore, at least one of the anchors αr and βr is the last anchor in its string and thus contains one of
the unique 2W + t characters that were appended to x and y. But since the anchors αr = βr , both must
be the last anchor in their string, and thus rx = ry . Using the guarantees of Theorem 3.1, we get that

            kϕ(x) − ϕ(y)k1 ≤ ∑ O(t log(2W + tk)) ED(φ i (x), φ i (y)) ≤ O(t log(tk)) ED(x, y) .
                                    i∈[r]

   4 The argument in [2] goes roughly as follows: at a single iteration, the probability that the anchor selection goes wrong is at

most t times the number of edit operations inside the current window divided by W (since there are W choices for the anchor).
It can be verified that an edit operation can only affect one iteration, and a union bound over all iterations gives an upper bound
of ED(x, y)/7k.


                             T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                                              220
                                     E MBEDDING THE U LAM METRIC INTO `1

    Finally, the analysis in [18] of a random inner product modulo 2 shows that

                  Pr f 0 (x) 6= f 0 (y) | ϕ(x), ϕ(y) = Θ(min{kϕ(x) − ϕ(y)k/kt log(kt), 1}) .
                                                   
                                                                                                                    (3.3)

We can then bound Pr[ f 0 (x) 6= f 0 (y)] from above by conditioning on whether the anchors match. If they
do match, f 0 (x) 6= f 0 (y) with probability O(ED(x, y)/k). Otherwise (which happens with probability at
most ED(x, y)/7k), f 0 (x) 6= f 0 (y) with probability at most 1. We thus conclude that

                                       Pr[ f 0 (x) 6= f 0 (y)] ≤ O(ED(x, y)/k) .



Proof of Lemma 3.8. The preliminary step of appending x and y with the same string of length 2W + t
clearly does not change ED(x, y). Now similar to the proof of Lemma 2.8 in [2], let r = max{rx , ry } and
for i = rx + 1, . . . , r let φ i (x) = ε be the empty string and similarly for i = ry + 1, . . . , r let φ i (y) = ε. Let
g be the `1 -embedding from Theorem 3.1. Then for all i ∈ [r] we have

                                kg(φ i (x)) − g(φ i (y))k ≥ Ω(ED(φ i (x), φ i (y))) .

Since the substrings {φ i (x)}i∈[r] induce a partition of the string x and similarly {φ i (y)}i∈[r] for y, we get
that
                         ED(x, y) ≤ ∑ ED(φ i (x), φ i (y)) ≤ O(kϕ(x) − ϕ(y)k) .
                                         i∈[r]

The lemma follows by applying the analysis of a random inner product modulo 2, as stated in (3.3).

Improved sketching algorithm             The embedding of Theorem 3.6 leads to the following sketching re-
sult.

Theorem 3.9 (Sketching). For every t = t(n) and every k = k(n), there exists a polynomial-time effi-
cient sketching algorithm that solves the k vs. Ω(tk log k) gap edit distance problem for (t, 180tk)-non-
repetitive strings using sketches of size O(1).

     The proof follows in a straightforward way by “concatenating” the embedding of Theorem 3.6 with a
sketching algorithm for the k0 vs. k0 (1 + ε) gap Hamming distance (or `1 ) problem that is implicit in [15,
18]. We note that the embedding uses many dimensions (coordinates), but for the purpose of sketching
it suffices to generate only O(kt log(kt)) coordinates f 0 at random, which can be done efficiently using
the shared random coins. It is also easy to verify that the random permutation Π can be replaced by an
almost min-wise hash family that is efficiently computable using shared randomness, similar to [2].
     Note that a permutation is (1, r)-non-repetitive for every r ≥ 1, and so this theorem offers a somewhat
unexpected small improvement for the Ulam metric (over Theorem 3.3), reducing the gap from O(log n)
to O(log k).

Acknowledgements           We thank the anonymous reviewers for helpful comments.

                           T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                                       221
                                M. C HARIKAR AND R. K RAUTHGAMER

References
 [1] * D. A LDOUS AND P. D IACONIS: Longest increasing subsequences: from patience sorting
     to the Baik-Deift-Johansson theorem. Bull. Amer. Math. Soc. (N.S.), 36(4):413–432, 1999.
     [BullAMS:1999-36-04/S0273-0979-99-00796-X]. 1

 [2] * Z. BAR -YOSSEF, T. S. JAYRAM , R. K RAUTHGAMER , AND R. K UMAR: Approximating edit
     distance efficiently. In Proc. 45th FOCS, pp. 550–559. IEEE Computer Society Press, 2004.
     [FOCS:10.1109/FOCS.2004.14]. 1.1, 3, 3.3, 3.4, 3.5, 3.5, 3.5, 4, 3.5, 3.5

 [3] * T. BATU , F. E RG ÜN , J. K ILIAN , A. M AGEN , S. R ASKHODNIKOVA , R. RUBINFELD , AND
     R. S AMI: A sublinear algorithm for weakly approximating edit distance. In Proc. 35th STOC, pp.
     316–324. ACM Press, 2003. [STOC:10.1145/780542.780590]. 1.2

 [4] * J. B OURGAIN: On Lipschitz embedding of finite metric spaces in Hilbert space. Israel J. Math.,
     52(1-2):46–52, 1985. 2

 [5] * G. C ORMODE: Sequence Distance Embeddings. PhD thesis, University of Warwick, 2003. 1,
     1, 3.3

 [6] * G. C ORMODE AND S. M UTHUKRISHNAN: The string edit distance matching problem with
     moves. In Proc. 13th Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’02), pp. 667–676,
     2002. [SODA:545381.545470]. 1.2

 [7] * G. C ORMODE , S. M UTHUKRISHNAN , AND S. C. S AHINALP: Permutation editing and match-
     ing via embeddings. In Proc. 28th Internat. Coll. on Automata, Languages, and Programming
     (ICALP’01), volume 2076 of Lecture Notes in Computer Science, pp. 481–492. Springer, 2001.
     [ICALP:hf0vwuh0rcyujug1]. 1, 1.1, 1.2, 2

 [8] * G. C ORMODE , M. PATERSON , S. C. S AHINALP, AND U. V ISHKIN: Communication com-
     plexity of document exchange. In Proc. 11th Annual ACM–SIAM Symp. on Discrete Algorithms
     (SODA’00), pp. 197–206, 2000. [SODA:338219.338252]. 1.2

 [9] * P. E NFLO: On the nonexistence of uniform homeomorphisms between L p -spaces. Ark. Mat.,
     8:103–105, 1969. 1

[10] * J. F EIGENBAUM , Y. I SHAI , T. M ALKIN , K. N ISSIM , M. J. S TRAUSS , AND R. N. W RIGHT:
     Secure multiparty computation of approximations. In Proceedings of 28th International Collo-
     quium on Automata, Languages, and Programming, volume 2076 of Lecture Notes in Computer
     Science, pp. 927–938. Springer, 2001. [ICALP:cpq5t97vrymq7q3n]. 3.3, 3.3

[11] * A. G IONIS , P. I NDYK , AND R. M OTWANI: Similarity search in high dimensions via hashing. In
     Proc. 25th Internat. Conf. on Very Large Data Bases, pp. 518–529. Morgan Kaufmann Publishers
     Inc., 1999. [VLDB:645925.671516]. 3.4

[12] * P. I NDYK: On approximate nearest neighbors in non-euclidean spaces. In Proc. 39th FOCS, pp.
     148–155. IEEE Computer Society Press, 1998. [FOCS:10.1109/SFCS.1998.743438]. 3.4

                      T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                         222
                             E MBEDDING THE U LAM METRIC INTO `1

[13] * P. I NDYK: Dimensionality reduction techniques for proximity problems. In Proc. 11th
     Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’00), pp. 371–378. SIAM, 2000.
     [SODA:338219.338582]. 3.3

[14] * P. I NDYK: Approximate nearest neighbor under edit distance via product metrics. In
     Proc. 15th Ann. ACM–SIAM Symp. on Discrete Algorithms, pp. 646–650. SIAM, 2004.
     [SODA:982792.982889]. 1.1, 3.4

[15] * P. I NDYK AND R. M OTWANI: Approximate nearest neighbors: towards removing the curse of
     dimensionality. In 30th STOC, pp. 604–613. ACM Press, 1998. [STOC:10.1145/276698.276876].
     3.3, 3.3, 3.4, 3.5

[16] * S. K HOT AND A. NAOR: Nonembeddability theorems via Fourier analysis. Mathematische
     Annalen, 334(4):821–852, 2006. [Springer:n4671147n1684344]. 1

[17] * R. K RAUTHGAMER AND Y. R ABANI: Improved lower bounds for embeddings into L1 .
     In Proc. 16th Ann. ACM–SIAM Symp. on Discrete Algorithms, pp. 1010–1017. SIAM, 2006.
     [SODA:1109557.1109669]. 1

[18] * E. K USHILEVITZ , R. O STROVSKY, AND Y. R ABANI: Efficient search for approximate near-
     est neighbor in high dimensional spaces. SIAM Journal on Computing, 30(2):457–474, 2000.
     [SICOMP:30/34717]. 3.3, 3.3, 3.4, 3.4, 3.5, 3.5, 3.5, 3.5

[19] * S. M UTHUKRISHNAN AND S. C. S AHINALP: Approximate nearest neighbors and sequence
     comparisons with block operations. In Proc. 32nd STOC, pp. 416–424. ACM Press, 2000.
     [STOC:10.1145/335305.335353]. 1.2

[20] * R. O STROVSKY AND R. R ABANI: Low distortion embeddings for edit distance. In Proc. 37th
     STOC, pp. 218–224. ACM Press, 2005. [STOC:1060590.1060623]. 1, 1.1, 1.1, 3.3, 3.4

[21] * A. C-C. YAO: Some complexity questions related to distributive computing. In Proc. 11th
     STOC, pp. 209–213. ACM Press, 1979. [STOC:10.1145/800135.804414]. 3.3


AUTHORS

     Moses Charikar
     Dept. of Computer Science
     Princeton University
     35 Olden Street
     Princeton, NJ 08540, USA
     moses cs princeton edu
     http://www.cs.princeton.edu/~moses/




                     T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                   223
                             M. C HARIKAR AND R. K RAUTHGAMER

   Robert Krauthgamer
   IBM Almaden Research Center
   Department K53/B2
   650 Harry Road
   San Jose, CA 95120, USA
   robi almaden ibm com
   http://www.almaden.ibm.com/cs/people/robi/


ABOUT THE AUTHORS

   M OSES C HARIKAR is an Assistant Professor in the Computer Science department at Prince-
      ton University. He received his Ph. D. in 2000 from Stanford University under the su-
      pervision of Rajeev Motwani. Before that, he obtained his undergraduate degree from
      the Indian Institute of Technology, Bombay. His research interests are in approximation
      algorithms, metric embeddings, and algorithmic techniques for large data sets. His work
      on dimension reduction in `1 won the Best Paper award at FOCS 2003. A one year stint
      in the research group at Google gave him an opportunity to apply his theoretical ideas in
      the real world. He still reaps the benefits of that experience – he has successfully man-
      aged to retain the top spot for a Google search on his last name, but has wisely given up
      trying to compete with his well-known namesake for searches on his first name.


   ROBERT K RAUTHGAMER is a Research Staff Member in the theory group at the IBM Al-
     maden Research Center in San Jose, CA. He received his Ph. D. in 2001 from the Weiz-
     mann Institute of Science in Israel. A paper, coauthored (as part of his thesis) with his
     advisor, Uri Feige, was awarded the 2005 SIAM Outstanding Paper Prize. Subsequently
     he spent two years as a postdoc in the theory group at Berkeley. His research interests
     include combinatorial algorithms, finite metric spaces, high-dimensional geometry, data
     analysis, and related areas. His favorite sport since youth has been swimming; once he
     swam across the Sea of Galilee in a 10km competitive race, and was the last one to
     arrive at the finish line.




                    T HEORY OF C OMPUTING, Volume 2 (2006), pp. 207–224                           224
Embedding the Ulam metric into <i>&#8467;</i><sub>1</sub>

Embedding the Ulam metric into <i>ℓ</i><sub>1</sub>