DOKK Library

Measuring the Evolving Internet in the Cloud Computing Era: Infrastructure, Connectivity, and Performance

Authors Bahador Yeganeh

License CC-BY-4.0

Plaintext
MEASURING THE EVOLVING INTERNET IN THE CLOUD COMPUTING
 ERA: INFRASTRUCTURE, CONNECTIVITY, AND PERFORMANCE




                                  by

                        BAHADOR YEGANEH




                          A DISSERTATION

    Presented to the Department of Computer and Information Science
           and the Graduate School of the University of Oregon
                 in partial fulfillment of the requirements
                              for the degree of
                           Doctor of Philosophy

                            December 2019
                       DISSERTATION APPROVAL PAGE

Student: Bahador Yeganeh

Title: Measuring the Evolving Internet in the Cloud Computing Era:
Infrastructure, Connectivity, and Performance

This dissertation has been accepted and approved in partial fulfillment of the
requirements for the Doctor of Philosophy degree in the Department of Computer
and Information Science by:

Prof. Reza Rejaie                  Chair
Prof. Ramakrishnan Durairajan      Co-Chair
Prof. Jun Li                       Core Member
Prof. Allen Malony                 Core Member
Dr. Walter Willinger               Core Member
Prof. David Levin                  Institutional Representative


and

Kate Mondloch                      Interim Vice Provost and Dean of the
                                   Graduate School

Original approval signatures are on file with the University of Oregon Graduate
School.

Degree awarded December 2019




                                         ii
            c 2019 Bahador Yeganeh
This work is licensed under a Creative Commons
          Attribution 4.0 License.




                      iii
                            DISSERTATION ABSTRACT

Bahador Yeganeh

Doctor of Philosophy

Department of Computer and Information Science

December 2019

Title: Measuring the Evolving Internet in the Cloud Computing Era:
Infrastructure, Connectivity, and Performance

       The advent of cloud computing as a means of offering virtualized computing

and storage resources has radically transformed how modern enterprises run their

business and has also fundamentally changed how today’s large cloud providers

operate. For example, as these large cloud providers offer an increasing number

of ever-more bandwidth-hungry cloud services, they end up carrying a significant

fraction of today’s Internet traffic. In response, they have started to build-out

and operate their private backbone networks and have expanded their service

infrastructure by establishing a presence in a growing number of colocation facilities

at the Internet’s edge. As a result, more and more enterprises across the globe

can directly connect (i.e. peer) with any of the large cloud providers so that much

of the resulting traffic will traverse these providers’ private backbones instead of

being exchanged over the public Internet. Furthermore, to reap the benefits of the

diversity of these cloud providers’ service offerings, enterprises are rapidly adopting

multi-cloud deployments in conjunction with multi-cloud strategies (i.e., end-to-end

connectivity paths between multiple cloud providers).

       While prior studies have focused mainly on various topological and

performance-related aspects of the Internet as a whole, little to no attention has


                                           iv
been given to how these emerging cloud-based developments impact connectivity

and performance in today’s cloud traffic-dominated Internet. This dissertation

presents the findings of an active measurement study of the cloud ecosystem

of today’s Internet. In particular, the study explores the connectivity options

available to modern enterprises and examines the performance of the cloud traffic

that utilizes the corresponding end-to-end paths. The study’s main contributions

include (i) studying the locality of traffic for major content providers (including

cloud providers) from the edge of the network (ii) capturing and characterizing the

peering fabric of a major cloud provider, (iii) characterizing the performance of

different multi-cloud strategies and associated end-to-end paths, and (iv) designing

a cloud measurement platform and decision support framework for the construction

of optimal multi-cloud overlays.

       This dissertation contains previously published co-authored material.




                                           v
                            CURRICULUM VITAE


NAME OF AUTHOR: Bahador Yeganeh


GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED:

     University of Oregon, Eugene, OR, USA
     Isfahan University of Technology, Isfahan, Iran


DEGREES AWARDED:

     Doctor of Philosophy, Computer and Information Science, 2019, University
        of Oregon
     Bachelor of Science, Computer Engineering, 2013, Isfahan University of
        Technology


AREAS OF SPECIAL INTEREST:

     Internet Measurement
     Internet Topology
     Cloud Computing
     Network Overlays


PROFESSIONAL EXPERIENCE:


     Graduate Research Assistant, University of Oregon, Eugene, OR, USA,
       2013-2019

     Software Engineer, PANA Co, Isfahan, Iran, 2010-2013

     Summer Intern, InfoProSys, Isfahan, Iran, 2008




GRANTS, AWARDS AND HONORS:



                                       vi
     Internet Measurement Conference (IMC) Travel Grant, 2018

     Gurdeep Pall Scholarship in Computer & Information Science University of
       Oregon, 2018

     Phillip Seeley Scholarship in Computer & Information Science University of
        Oregon, 2017

     J. Hubbard Scholarship in Computer & Information Science University of
        Oregon, 2014




PUBLICATIONS:


     Bahador, Yeganeh & Ramakrishnan, Durairajan & Reza, Rejaie & Walter,
       Willinger (2020). Tondbaz: A Measurement-Informed Multi-cloud
       Overlay Service. SIGCOMM - In Preparation

     Bahador, Yeganeh & Ramakrishnan, Durairajan & Reza, Rejaie & Walter,
       Willinger (2020). A First Comparative Characterization of Multi-cloud
       Connectivity in Today’s Internet. Passive and Active Measurement
       Conference (PAM) - In Submission

     Bahador, Yeganeh & Ramakrishnan, Durairajan & Reza, Rejaie & Walter,
       Willinger (2019). How Cloud Traffic Goes Hiding: A Study of Amazon’s
       Peering Fabric. Internet Measurement Conference (IMC)

     Reza, Motamedi & Bahador, Yeganeh & Reza, Rejaie & Walter, Willinger
        & Balakrishnan, Chandrasekaran & Bruce, Maggs (2019). On Mapping
        the Interconnections in Today’s Internet. Transactions on Netowrking
        (TON)

     Bahador, Yeganeh & Reza, Rejaie & Walter, Willinger (2017). A View
       From the Edge: A Stub-AS Perspective of Traffic Localization and its
       Implications. Network Traffic Measurement and Analysis Conference
       (TMA)




                                     vii
                                ACKNOWLEDGEMENTS



          I would like to thank my parents and sister for their sacrifices, endless

support throughout all stages of my life and for encouraging me to always pursue

my goals and dreams even if it meant that I would be living thousands of miles

away from them.

          I thank my advisor Prof. Reza Rejaie for providing me with the opportunity

to pursue a doctoral degree at UO. I am grateful to him as well as my co-advisor

Prof. Ramakrishnan Durairajan and collaborator Dr. Walter Willinger for their

guidance and feedback in every step of my PhD, the numerous late nights they

stayed awake to help me meet deadlines, and for instilling perseverance in me.

Completing this dissertation wouldn’t have been possible without each one of you.

          I am thankful to my committee members Prof. Jun Li, Prof. Allen Malony,

and Prof. David Levin for their valuable input and guidance on shaping the

direction of my dissertation and for always being available even on the shortest

notice.

          Lastly, I am grateful for all the wonderful friendship relations that I

have formed through the past years. These friends have been akin to a second

family and their support and help through the highs and lows of my life has been

invaluable to me.




                                             viii
                                TABLE OF CONTENTS

Chapter                                                                                   Page



I.   INTRODUCTION           . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      1

     1.1. Challenges in Topology Discovery & Internet Measurement . . . . . .                3

     1.2. Dissertation Scope & Contributions . . . . . . . . . . . . . . . . . . .           4

          1.2.1. Locality of Traffic Footprint . . . . . . . . . . . . . . . . . . .         5

          1.2.2. Discovery of Cloud Peering Topology . . . . . . . . . . . . . .             5

          1.2.3. Cloud Connectivity Performance . . . . . . . . . . . . . . . .              6

          1.2.4. Optimal Cloud Overlays . . . . . . . . . . . . . . . . . . . . .            7

     1.3. Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . .         8

          1.3.1. Navigating the Chapters . . . . . . . . . . . . . . . . . . . . .           8


II. RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                10

     2.1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        11

     2.2. Tools & Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        12

          2.2.1. Measurement Tools & Platforms . . . . . . . . . . . . . . . . .            14

                  2.2.1.1. Path Discovery . . . . . . . . . . . . . . . . . . . . .         14

                  2.2.1.2. Alias Resolution . . . . . . . . . . . . . . . . . . . .         17

                  2.2.1.3. Interface Name Decoding . . . . . . . . . . . . . . .            19

          2.2.2. Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       20

                  2.2.2.1. BGP Feeds & Route Policies . . . . . . . . . . . . .             20

                  2.2.2.2. Colocation Facility Information . . . . . . . . . . . .          21

                  2.2.2.3. IXP Information . . . . . . . . . . . . . . . . . . . .          22

                  2.2.2.4. IP Geolocation . . . . . . . . . . . . . . . . . . . . .         22


                                             ix
Chapter                                                                                 Page

   2.3. Capturing Network Topology . . . . . . . . . . . . . . . . . . . . . .            23

          2.3.1. AS-Level Topology . . . . . . . . . . . . . . . . . . . . . . . .        25

                 2.3.1.1. Graph Generation & Modeling . . . . . . . . . . . .             27

                 2.3.1.2. Topology Incompleteness . . . . . . . . . . . . . . . .         28

                 2.3.1.3. IXP Peerings . . . . . . . . . . . . . . . . . . . . . .        32

          2.3.2. Router-Level Topology . . . . . . . . . . . . . . . . . . . . . .        37

                 2.3.2.1. Peering Inference . . . . . . . . . . . . . . . . . . . .       38

                 2.3.2.2. Geo Locating Routers & Remote Peering . . . . . . .             44

          2.3.3. PoP-Level Topology . . . . . . . . . . . . . . . . . . . . . . .         49

          2.3.4. Physical-Level Topology . . . . . . . . . . . . . . . . . . . . .        56

   2.4. Implications & Applications of Network Topology . . . . . . . . . . .             60

          2.4.1. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . .      61

                 2.4.1.1. AS-Level Topology . . . . . . . . . . . . . . . . . . .         62

                 2.4.1.2. Router-Level Topology . . . . . . . . . . . . . . . . .         67

                 2.4.1.3. Physical-Level Topology . . . . . . . . . . . . . . . .         71

          2.4.2. Resiliency . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     73

                 2.4.2.1. AS-Level Topology . . . . . . . . . . . . . . . . . . .         74

                 2.4.2.2. Router-Level Topology . . . . . . . . . . . . . . . . .         76

                 2.4.2.3. Physical-Level Topology . . . . . . . . . . . . . . . .         78

          2.4.3. AS Relationship Inference . . . . . . . . . . . . . . . . . . . .        80

                 2.4.3.1. AS-Level . . . . . . . . . . . . . . . . . . . . . . . .        80

                 2.4.3.2. PoP-Level . . . . . . . . . . . . . . . . . . . . . . . .       82


III. LOCALITY OF TRAFFIC . . . . . . . . . . . . . . . . . . . . . . . . . .              84

   3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      84

   3.2. Data Collection for a Stub-AS: UOnet . . . . . . . . . . . . . . . . .            87
                                             x
Chapter                                                                              Page

  3.3. Identifying Major Content Providers . . . . . . . . . . . . . . . . . .         89

  3.4. Traffic Locality for Content Providers . . . . . . . . . . . . . . . . . .      95

  3.5. Traffic From Guest Servers . . . . . . . . . . . . . . . . . . . . . . . .      98

          3.5.1. Detecting Guest Servers . . . . . . . . . . . . . . . . . . . . .     99

          3.5.2. Relative Locality of Guest Servers . . . . . . . . . . . . . . . . 102

  3.6. Implications of Traffic Locality . . . . . . . . . . . . . . . . . . . . . . 103

  3.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108


IV. CLOUD PEERING ECOSYSTEM                 . . . . . . . . . . . . . . . . . . . . . 110

  4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

  4.2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

  4.3. Data Collection & Processing . . . . . . . . . . . . . . . . . . . . . . 115

  4.4. Inferring Interconnections . . . . . . . . . . . . . . . . . . . . . . . . 117

          4.4.1. Basic Inference Strategy . . . . . . . . . . . . . . . . . . . . . 117

          4.4.2. Second Round of Probing to Expand Coverage . . . . . . . . . 119

  4.5. Verifying Interconnections . . . . . . . . . . . . . . . . . . . . . . . . 120

          4.5.1. Checking Against Heuristics . . . . . . . . . . . . . . . . . . . 120

          4.5.2. Verifying Against Alias Sets . . . . . . . . . . . . . . . . . . . 122

  4.6. Pinning Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

          4.6.1. Methodology for Pinning . . . . . . . . . . . . . . . . . . . . . 123

          4.6.2. Evaluation of Pinning . . . . . . . . . . . . . . . . . . . . . . 130

  4.7. Amazon’s Peering Fabric . . . . . . . . . . . . . . . . . . . . . . . . . 131

          4.7.1. Detecting Virtual Interconnections . . . . . . . . . . . . . . . 131

          4.7.2. Grouping Amazon’s Peerings . . . . . . . . . . . . . . . . . . . 133

          4.7.3. Inferring the Purpose of Peerings . . . . . . . . . . . . . . . . 137

          4.7.4. Characterizing Amazon’s Connectivity Graph . . . . . . . . . 142
                                           xi
Chapter                                                                            Page

  4.8. Inferring Peering with bdrmap . . . . . . . . . . . . . . . . . . . . . . 144

  4.9. Limitations of Our Study . . . . . . . . . . . . . . . . . . . . . . . . . 146

  4.10. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148


V. CLOUD CONNECTIVITY PERFORMANCE . . . . . . . . . . . . . . . 150

  5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

  5.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

  5.3. Measurement Methodology . . . . . . . . . . . . . . . . . . . . . . . . 155

          5.3.1. Deployment Strategy . . . . . . . . . . . . . . . . . . . . . . . 155

          5.3.2. Measurement Scenario & Cloud Providers . . . . . . . . . . . 157

          5.3.3. Data Collection & Performance Metrics . . . . . . . . . . . . . 159

          5.3.4. Representation of Results . . . . . . . . . . . . . . . . . . . . 161

          5.3.5. Ethical and Legal Considerations . . . . . . . . . . . . . . . . 161

  5.4. Characteristics of C2C routes . . . . . . . . . . . . . . . . . . . . . . 162

          5.4.1. Latency Characteristics . . . . . . . . . . . . . . . . . . . . . . 162

          5.4.2. Why do CPP routes have better latency than

                 TPP routes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

          5.4.3. Throughput Characteristics . . . . . . . . . . . . . . . . . . . 168

          5.4.4. Why do CPP routes have better throughput

                 than TPP routes? . . . . . . . . . . . . . . . . . . . . . . . . . 170

          5.4.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

  5.5. Characteristics of E2C routes . . . . . . . . . . . . . . . . . . . . . . 173

          5.5.1. Latency Characteristics . . . . . . . . . . . . . . . . . . . . . . 173

          5.5.2. Why do TPP routes offer better latency than

                 BEP routes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

          5.5.3. Throughput Characteristics . . . . . . . . . . . . . . . . . . . 174
                                           xii
Chapter                                                                            Page

          5.5.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

  5.6. Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . 175

  5.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178


VI. OPTIMAL CLOUD OVERLAYS                . . . . . . . . . . . . . . . . . . . . . . 179

  6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

  6.2. Tondbaz Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

          6.2.1. Measurement Platform . . . . . . . . . . . . . . . . . . . . . . 182

                 6.2.1.1. Measurement Agent . . . . . . . . . . . . . . . . . . 183

                 6.2.1.2. Centralized Controller . . . . . . . . . . . . . . . . . 184

          6.2.2. Data Collector . . . . . . . . . . . . . . . . . . . . . . . . . . 185

          6.2.3. Optimization Framework . . . . . . . . . . . . . . . . . . . . . 185

  6.3. A Case for Multi-cloud Overlays . . . . . . . . . . . . . . . . . . . . . 188

          6.3.1. Measurement Setting & Data Collection . . . . . . . . . . . . 188

          6.3.2. Are Cloud Backbones Optimal? . . . . . . . . . . . . . . . . . 189

                 6.3.2.1. Path Characteristics of CP Backbones . . . . . . . . 189

                 6.3.2.2. Performance Characteristics of CP Backbones . . . . 190

                 6.3.2.3. Latency Characteristics of CP Backbones . . . . . . 191

          6.3.3. Are Multi-Cloud Paths Better Than Single

                 Cloud Paths? . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

                 6.3.3.1. Overall Latency Improvements . . . . . . . . . . . . 192

                 6.3.3.2. Intra-CP Latency Improvements . . . . . . . . . . . 194

                 6.3.3.3. Inter-CP Latency Improvements        . . . . . . . . . . . 195

          6.3.4. Are there Challenges in Creating Multi-Cloud Overlays? . . . 196

                 6.3.4.1. Traffic Costs of CP Backbones . . . . . . . . . . . . 196

          6.3.5. Cost Penalty for Multi-Cloud Overlays . . . . . . . . . . . . . 199
                                          xiii
Chapter                                                                           Page

          6.3.6. Further Optimization Through IXPs . . . . . . . . . . . . . . 201

   6.4. Evaluation of Tondbaz . . . . . . . . . . . . . . . . . . . . . . . . . . 202

          6.4.1. Case Studies of Optimal Paths . . . . . . . . . . . . . . . . . . 202

          6.4.2. Deployment of Overlays . . . . . . . . . . . . . . . . . . . . . 205

                 6.4.2.1. Empirical vs Estimated Overlay Latencies . . . . . . 207

   6.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209


VII.CONCLUSIONS & FUTURE WORK . . . . . . . . . . . . . . . . . . . . 211

   7.1. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

   7.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213


REFERENCES CITED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218




                                          xiv
                                  LIST OF FIGURES


Figure                                                                                  Page



   1.    Abstract representation for topology of ASA , ASB ,

         and ASC in red, blue, and green accordingly. ASA and

         ASB establish a private interconnection inside colo1

         at their LA PoP while peering with each other as well

         as ASC inside colo2 at their NY PoP facilitated by an

         IXP’s switching fabric. . . . . . . . . . . . . . . . . . . . . . . . . . .      13

   2.    Illustration of inferring and incorrect link (b − e) by

         traceroute due to load balanced paths. Physical links

         and traversed paths are shown with black and red

         lines accrodingly. The T T L    =    2 probe traverses the

         top path and expires at node b while the T T L       =    3

         probe traverses the bottom path and expires at node

         e. This succession of probes causes traceroute to infer

         a non-existent link (b − e). . . . . . . . . . . . . . . . . . . . . . . . .     17

   3.    Illustration of an IXP switch and route server along

         with 4 tenant networks ASa , ASb , ASc , and ASd . ASa

         establishes a bi-lateral peering with ASd (solid red

         line) as well as multi-lateral peerings with ASb and

         ASc (dashed red lines) facilitated by the route server

         within the IXP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      33




                                             xv
Figure                                                                                  Page

   4.    Illustration of address sharing for establishing an

         inter-AS link between border routers. Although

         the traceroute paths (dashed lines) are identical

         the inferred ownership of router interfaces and the

         placement of the inter-AS link differs for these two possibilities. . . .        38

   5.    Fiber optic backbone map for CenturyLink’s network

         in continental US. Each node represents a PoP for

         CenturyLink while links between these PoPs are

         representative of the fiber optic conduits connecting

         these PoPs together. Image courtesy of CenturyLink. . . . . . . . . .            55

   6.    The volume of delivered traffic from individual top

         content providers to UOnet along with the CDF

         of aggregate fraction of traffic by top 21 content

         providers in the 10/04/16 snapshot. . . . . . . . . . . . . . . . . . . .        90

   7.    The prevalence and distribution of rank for any

         content provider that has appeared among the top

         content providers in at least one daily snapshot.      . . . . . . . . . . .     92

   8.    Distribution of the number of top IPs across different

         snapshots in addition to total number of unique top

         IP addresses (blue line) and the total number of

         unique IPs across all snapshots (red line) for each

         target content provider. . . . . . . . . . . . . . . . . . . . . . . . . . .     93




                                           xvi
Figure                                                                                Page

   9.    Radar plots showing the aggregate view of locality

         based on RTT of delivered traffic in terms of bytes

         (left plot) and flows (right plot) to UOnet in a daily

         snapshot (10/04/2016). . . . . . . . . . . . . . . . . . . . . . . . . . .     94

   10.   Two measures of traffic locality, from top to bottom,

         Summary distribution of NWL and the RTT of the

         closest servers per content provider (or minRTT). . . . . . . . . . . .        98

   11.   Locality (based on RTT in ms) of delivered traffic

         (bytes, left plot; flows, right plot) for Akamai-owned

         servers as well as Akamai guest servers residing within

         three target ASes for snapshot 2016-10-04. . . . . . . . . . . . . . . . 102

   12.   Summary distribution of average throughput for

         delivered flows from individual target content

         providers towards UOnet users across all of our snapshots. . . . . . . 104

   13.   Maximum Achievable Throughput (MAT) vs MinRTT

         for all content providers. The curves show the change

         in the estimated TCP throughput as a function of

         RTT for different loss rates. . . . . . . . . . . . . . . . . . . . . . . . 106

   14.   Average loss rate of closest servers per target content

         provider measured over 24 hours using ping probes

         with 1 second intervals. For each content provider we

         choose at most 10 of the closest IP addresses. . . . . . . . . . . . . . 108




                                          xvii
Figure                                                                              Page

   15.   Overview of Amazon’s peering fabric. Native routers

         of Amazon & Microsoft (orange & blue) establishing

         private interconnections (AS3 - yellow router), public

         peering through IXP switch (AS4 - red router),

         and virtual private interconnections through cloud

         exchange switch (AS1 , AS2 , and AS5 - green routers)

         with other networks. Remote peering (AS5 ) as well as

         connectivity to non-ASN businesses through layer-2

         tunnels (dashed lines) happens through connectivity partners. . . . . 113

   16.   Illustration of a hybrid interface (a) that has both

         Amazon and client-owned interfaces as next hop.        . . . . . . . . . . . 121

   17.   (a) Distribution of min-RTT for ABIs from the closest

         Amazon region, and (b) Distribution of min-RTT

         difference between ABI and CBI for individual

         peering links. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

   18.   Distribution of the ratio of two lowest min-RTT from

         different Amazon regions to individual unpinned

         border interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129




                                          xviii
Figure                                                                            Page

   19.   Key features of the six groups of Amazon’s peerings

         (presented in Table 7) showing (from top to bottom):

         the number of /24 prefixes within the customer cone

         of peering AS, the number of probed /24 prefixes

         that are reachable through the CBIs of associated

         peerings of an AS, the number of ABIs and CBIs of

         associated of an AS, the difference in RTT of both

         ends of associated peerings of an AS, and the number

         of metro areas which the CBIs of each peering AS

         have been pinned to. . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

   20.   Distribution of ABIs (log scale) and CBIs degree in

         left and right figures accordingly. . . . . . . . . . . . . . . . . . . . . 143

   21.   Three different multi-cloud connectivity options. . . . . . . . . . . . . 152

   22.   Our measurement setup showing the locations of

         our VMs from AWS, GCP and Azure. A third-party

         provider’s CRs and line-of-sight links for TPP, BEP,

         and CPP are also shown. . . . . . . . . . . . . . . . . . . . . . . . . . 158




                                          xix
Figure                                                                              Page

   23.   Rows from top to bottom represent the distribution

         of RTT (using letter-value plots) between AWS, GCP,

         and Azure’s network as the source CP and various

         CP regions for intra (inter) region paths in left (right)

         columns. CPP and TPP routes are depicted in blue

         and orange, respectively. The first two characters of

         the X axis labels encode the source CP region with

         the remaining characters depicting the destination CP

         and region.   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

   24.   Comparison of median RTT values (in ms) for CPP

         and TPP routes between different pairs. . . . . . . . . . . . . . . . . 164

   25.   (a) Distribution for number of ORG hops observed on

         intra-cloud, inter-cloud, and cloud to LG paths. (b)

         Distribution of IP (AS/ORG) hop lengths for all paths

         in left (right) plot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

   26.   Distribution of RTT between the source CP and the

         peering hop. From left to right plots represent AWS,

         GCP, and Azure as the source CP. Each distribution

         is split based on intra (inter) region values into the

         left/blue (right/orange) halves, respectively. . . . . . . . . . . . . . . 167




                                           xx
Figure                                                                            Page

   27.   Rows from top to bottom in the letter-value plots

         represent the distribution of throughput between

         AWS’, GCP’s, and Azure’s network as the source

         CP and various CP regions for intra- (inter-) region

         paths in left (right) columns. CPP and TPP routes

         are depicted in blue and orange respectively. . . . . . . . . . . . . . . 169

   28.   Upper bound for TCP throughput using the formula

         of Mathis et al. Mathis, Semke, Mahdavi, and Ott

         (1997) with an MSS of 1460 bytes and various latency

         (X axis) and loss-rates (log-scale Y axis) values. . . . . . . . . . . . . 170

   29.   Rows from top to bottom in the letter-value plots

         represent the distribution of loss-rate between AWS,

         GCP, and Azure as the source CP and various CP

         regions for intra- (inter-) region paths in left (right)

         columns. CPP and TPP routes are depicted using

         blue and orange respectively. . . . . . . . . . . . . . . . . . . . . . . . 172

   30.   (a) Distribution of latency for E2C paths between our

         server in AZ and CP instances in California through

         TPP and BEP routes. Outliers on the Y-axis have

         been deliberately cut-off to increase the readability of

         distributions. (b) Distribution of RTT on the inferred

         peering hop for E2C paths sourced from CP instances

         in California. (c) Distribution of throughput for E2C

         paths between our server in AZ and CP instances in

         California through TPP and BEP routes. . . . . . . . . . . . . . . . . 174

                                           xxi
Figure                                                                               Page

   31.   Global regions for AWS, Azure, and GCP. . . . . . . . . . . . . . . . 180

   32.   Overview of components for the measurement system

         including the centralized controller, measurement

         agents, and data-store. . . . . . . . . . . . . . . . . . . . . . . . . . . 183

   33.   Distribution of latency inflation between network

         latency and RTT approximation using speed of light

         constraints for all regions of each CP. . . . . . . . . . . . . . . . . . . 191

   34.   Distribution of median RTT and coefficient of

         variation for latency measurements between all VM

         pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

   35.   Distribution for difference in latency between forward

         and reverse paths for unique paths. . . . . . . . . . . . . . . . . . . . 193

   36.   Distribution for RTT reduction ratio through all,

         intra-CP, and inter-CP optimal paths. . . . . . . . . . . . . . . . . . 193

   37.   Distribution for the number of relay hops along

         optimal paths (left) and the distribution of latency

         reduction percentage for optimal paths grouped based

         on the number of relay hops (right). . . . . . . . . . . . . . . . . . . . 194

   38.   Distribution of latency reduction percentage for intra-

         CP paths of each CP, divided based on the ownership

         of the relay node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

   39.   Distribution of latency reduction ratio for inter-CP

         paths of each CP, divided based on the ownership of

         the relay nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196



                                           xxii
Figure                                                                             Page

   40.   Cost of transmitting traffic sourced from different

         groupings of AWS regions. Dashed (solid) lines

         present inter-CP (intra-CP) traffic cost. . . . . . . . . . . . . . . . . . 197

   41.   Cost of transmitting traffic sourced from different

         groupings of Azure regions. . . . . . . . . . . . . . . . . . . . . . . . . 198

   42.   Cost of transmitting traffic sourced from different

         groupings of GCP regions. Solid, dashed, and dotted

         lines represent cost of traffic destined to China

         (excluding Hong Kong), Australia, and all other global

         regions accordingly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

   43.   Distribution of cost penalty within different latency

         reduction ratio bins for intra-CP and inter-CP paths. . . . . . . . . . 200

   44.   Distribution for RTT reduction percentage through

         CP, IXP, and CP+IXP relay paths. . . . . . . . . . . . . . . . . . . . 203

   45.   Overlay network composed of 2 nodes (V M1 and

         V M3 ) and 1 relay node (V M2 ). Forwarding rules are

         depicted below each node. . . . . . . . . . . . . . . . . . . . . . . . . 206




                                          xxiii
                                 LIST OF TABLES


Table                                                                               Page



   1.   Topics covered in each chapter of the dissertation. . . . . . . . . . . .      9

   2.   Main features of the selected daily snapshots of our

        UOnet Netflow data. . . . . . . . . . . . . . . . . . . . . . . . . . . .     89

   3.   Number of unique ABIs and CBIs along with their

        fraction with various meta data, prior (rows 2-3) and

        after (rows 4-5) /24 expansion probing. . . . . . . . . . . . . . . . . . 119

   4.   Number of candidate ABIs (and corresponding CBIs)

        that are confirmed by individual (first row) and

        cumulative (second row) heuristics. . . . . . . . . . . . . . . . . . . . 122

   5.   The exclusive and cumulative number of anchor

        interfaces by each type of evidence and pinned

        interfaces by our co-presence rules. . . . . . . . . . . . . . . . . . . . 128

   6.   Number (and percentage) of Amazon’s VPIs. These

        are CBIs that are also observed by probes originated

        from Microsoft, Google, IBM, and Oracle’s cloud

        networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

   7.   Breakdown of all Amazon peerings based on their key attributes. . . . 134

   8.   Hybrid peering groups along with the number of

        unique ASes for each group. . . . . . . . . . . . . . . . . . . . . . . . 136




                                        xxiv
Table                                                                            Page

   9.    List of selected overlay endpoints (first two columns)

         along the number of relay nodes for each overlay

         presented in the third column. The default RTT,

         estimated overlay RTT, and empirical RTT are

         presented in the last three columns respectively. . . . . . . . . . . . . 208

   10.   List of selected overlay endpoints (first two columns)

         along with the optimal relay nodes (third column). . . . . . . . . . . 208




                                         xxv
                                    CHAPTER I

                                 INTRODUCTION

       The Internet since its inception as a network for interconnecting a handful of

academic and military networks has gone through constant evolution throughout

the years and has become a large scale distributed network spanning the globe

that is intertwined with every aspect of our daily lives. Given its importance,

we need to study its health, vulnerability, and connectivity. This is only made

possible through constant network measurements. Researchers have conducted

measurements in order to gain a better understanding of traffic routing through

this network, its connectivity structure as well as its performance. Our interest and

ability to conduct network measurements can vary in both scopes with respect to

the number or size of networks under study as well as the resolution with regards to

focusing on networks as a single unit or paying attention to finer network elements

such as routers.

       The advent of cloud computing can be considered among the most recent

and notable changes in the Internet. Cloud providers (CPs) offer an abundance

of compute and storage resources in centralized regions in an on-demand basis.

Reachability to these remote resources has been made possible via the Internet.

Conversely, this shift in computing paradigm has resulted in cloud providers to

become one of the main end-points of traffic within today’s Internet. These cloud

service offerings have fundamentally changed how business is conducted in all

segments of the private and public sectors. This, in turn, has transformed the way

these companies connect to major cloud service providers to utilize these services.

In particular, many companies prefer to bypass the public Internet and directly

connect to major cloud service providers at a close-by colocation (or colo) facility

                                          1
to experience better performance when using these cloud services. In response to

these demands, some of the major colo facilities have started to deploy and operate

new switching infrastructure called cloud exchanges CoreSite (2018); Demchenko

et al. (2013). Importantly, in conjunction with this new infrastructure, these colo

providers have also introduced a new interconnection service offering called “virtual

private interconnection (VPI)" Amazon (2018a); Google (2018a); Microsoft (2018a).

By purchasing a single port on the cloud exchange switching fabric in a given

facility, VPIs enable enterprises that are either natively deployed in that facility

to establish direct peering to any number of cloud service providers that are present

on that exchange. Furthermore, there is the emergence of new Internet players in

the form of third-party private connectivity providers (e.g. DataPipe, HopOne,

among others Amazon (2018c); Google (2018b); Microsoft (2018c)). These entities

offer direct, secure, private, layer 3 connectivity between CPs (henceforth referred

to as third-party private (TPP)) and extend the reach of peering points towards

CPs to non-native colo facilities in a wider geographic footprint. TPP routes

bypass the public Internet at Cloud Exchanges CoreSite (2018); Demchenko et al.

(2013) and offer additional benefits to users (e.g. enterprise networks can connect

to CPs without owning an Autonomous System Number, or ASN, or physical

infrastructure).

       The implications of this transformation for the Internet’s interconnection

ecosystem have been profound. First, the on-demand nature of VPIs introduces

a degree of dynamism into the Internet interconnection fabric that has been

missing in the past were setting up traditional interconnections of the public or

private peering types took days or weeks. Second, once the growing volume of an

enterprise’s traffic enters an existing VPI to a cloud provider, it is handled entirely

                                           2
by that cloud provider’s private infrastructure (i.e. the cloud provider’s private

backbone that interconnects its own datacenters) and completely bypasses the

public Internet.

       The extensive means of connectivity towards cloud providers coupled with

the competing market place of multiple CPs has lead enterprises to adopt a mutli-

cloud strategy where instead of considering and consuming compute resources as

a utility from a single CP, to better satisfy their specific requirements, enterprise

networks can pick-and-choose services from multiple participating CPs (e.g. rent

storage from one CP, compute resources from another) and establish end-to-end

connectivity between them and their on-premises server(s) at the same or different

locations. In the process, they also avoid vendor lock-in, enhance the reliability

and performance of the selected services, and can reduce the operational cost of

deployments. Indeed, according to an industry report from late 2018 Krishna,

Cowley, Singh, and Kesterson-Townes (2018), 85% of the enterprises have already

adopted multi-cloud strategies, and that number is expected to rise to 98% by 2021.

These disparate resources from various CP regions are connected together either

via TPP networks, cloud-providers private (CPP) backbone, or simply via the best-

effort public Internet (BEP).

       The aforementioned market trends collectively showcase the implications

of the cloud computing paradigm on the Internet’s structure and topology and

highlight the need for focusing on these emergent technologies to have a correct

understanding of the Internet’s structure and operation.

1.1   Challenges in Topology Discovery & Internet Measurement

       The topology of the Internet has been a key enabler for studying routing

of traffic in addition to gaining a better understanding of Internet performance

                                           3
and resiliency. The measurement of Internet in general and capturing Internet

topology in specific is challenging due to many factors, namely (i) scale: the vast

scale of the Internet as a network spanning the globe limits our abilities to fully

capture its structure, (ii) visibility: our view of the Internet is constrained to the

perspective that we are able to glean from the limited number of vantage points we

are able to look at it, (iii) dynamic: the Internet as an ever-evolving entity is under

constant structural change added to this the existence of redundant routes, backup

links, and load-balanced paths limits our ability to fully capture the current state

of the Internet’s topology, (iv) tools: researchers have relied on tools which were

originally designed for troubleshooting purposes and the protocol stack of Internet

lacks any inherent methods for identifying topology, and (v) intellectual property:

many of the participating entities within the Internet lack incentives for sharing or

disclosing data pertaining to their internal structure as often these data are key to

their competitive edge.

1.2   Dissertation Scope & Contributions

       In this dissertation, we study and assess the impact of the wide adoption

of CPs on today’s Internet traffic and topology. In a broad sense this dissertation

can be categorized into four main parts, namely (i) studying the locality of traffic

for major content providers (including CPs) from the edge of the network, (ii)

presenting methodologies for capturing the topology surrounding cloud providers

with a special focus on VPIs that have been under-looked up to this point, (iii)

characterizing and evaluating the performance of various connectivity options

towards CPs, and (iv) designing and presenting a measurement platform to

support the measurement of cloud environments in addition to a decision support



                                            4
framework for optimal utilization of cloud paths. The following presents an

overview of the main contributions of this dissertation.

          1.2.1   Locality of Traffic Footprint. Serving user requests from

near-by caches or servers has been a powerful technique for localizing Internet

traffic with the intent of providing lower delay and higher throughput to end users

while also lowering the cost for network operators. This basic concept has led to

the deployment of different types of infrastructures of varying degrees of complexity

that large CDNs, CPs, ISPs, and content providers operate to localize their user

traffic. This work assesses the nature and implications of traffic localization as

experienced by end-users at an actual stub-AS. We report on the localization

of traffic for the stub-AS UOnet (AS3582), a Research & Education network

operated by the University of Oregon. Based on a complete flow-level view of the

delivered traffic from the Internet to UOnet, we characterize the stub-AS’s traffic

footprint (i.e. a detailed assessment of the locality of the delivered traffic by all

major content providers), examine how effective individual content providers utilize

their built-out infrastructures for localizing their delivered traffic to UOnet, and

investigate the impact of traffic localization on perceived throughput by end-users

served by UOnet. Our empirical findings offer valuable insights into important

practical aspects of content delivery to real-world stub-ASes such as UOnet.

          1.2.2   Discovery of Cloud Peering Topology. This works main

contribution consists of presenting a third-party, cloud-centric measurement study

aimed at discovering and characterizing the unique peerings (along with their

types) of Amazon, the largest cloud service provider in the US and worldwide.

Each peering typically consists of one or multiple (unique) interconnections between

Amazon and a neighboring Autonomous System (AS) that are typically established

                                            5
at different colocation facilities around the globe. Our study only utilizes publicly

available information and data (i.e. no Amazon-proprietary data is used) and is

therefore also applicable for discovering the peerings of other large cloud providers.

We describe our technique for inferring peerings towards Amazon and pay special

attention to inferring the VPIs associated with this largest cloud provider. We

also present and evaluate a new method for pinning (i.e. geo-locating) each end

of the inferred interconnections or peering links. Our study provides a first look

at Amazon’s peering fabric. In particular, by grouping Amazon’s peerings based

on their key features, we illustrate the specific role that each group plays in how

Amazon peers with other networks. Overall, our analysis of Amazon’s peering

fabric highlights how (e.g. using virtual and non-BGP peerings) and where (e.g.

at which metro) Amazon’s cloud traffic “goes hiding"; that is, bypasses the public

Internet. In particular, we show that as large cloud providers such as Amazon

aggressively pursue new connect locations closer to the Internet’s edge, VPIs are

an attractive interconnection option as they (i) create shortcuts between enterprises

at the edge of the network and the large cloud providers (i.e. further contributing to

the flattening of the Internet) and (ii) ensure that cloud-related traffic is primarily

carried over the large cloud providers’ private backbones (i.e. not exposed to the

unpredictability of the best-effort public Internet).

         1.2.3    Cloud Connectivity Performance. This work aims to

empirically examine the different types of multi-cloud connectivity options that

are available in today’s Internet and investigate their performance characteristics

using non-proprietary cloud-centric, active measurements. In the process, we are

also interested in attributing the observed characteristics to aspects related to

connectivity, routing strategy, or the presence of any performance bottlenecks. To

                                            6
study multi-cloud connectivity from a C2C perspective, we deploy and interconnect

VMs hosted within and across two different geographic regions or availability

zones (i.e. CA and VA) of three large cloud providers (i.e. Amazon Web Services

(AWS), Google Cloud Platform (GCP) and Microsoft Azure) using the TPP, CPP,

and BEP option, respectively. Using this experimental setup, we first compare

the stability and/or variability in performance across the three connectivity

options using metrics such as delay, throughput, and loss rate over time. We

find that CPP routes exhibit lower latency and are more stable when compared

to BEP and TPP routes. CPP routes also have higher throughput and exhibit

less variation compared to the other two options. In our attempt to explain the

subpar performance of TPP routes, we find that inconsistencies in performance

characteristics are caused by several factors including border routers, queuing

delays, and higher loss-rates of TPP routes. Moreover, we attribute the CPP

routes’ overall superior performance to the fact that each of the CPs has a private

optical backbone, there exists rich inter-CP connectivity, and that the CPs’ traffic

always bypasses (i.e. is invisible to) BEP transits.

         1.2.4    Optimal Cloud Overlays. This work focuses on the design

of a measurement platform for multi-cloud environments aimed at gaining a

better understanding of the connectivity and performance characteristics of inter-

cloud connectivity paths. We demonstrate the applicability of this platform by

deploying it on all available regions of the top three CPs (i.e. Amazon, Microsoft,

and Google) and measure the latency among all regions. Furthermore, we capture

the traffic cost models of each CP based on publicly published resources. The

measured latencies and cost models are utilized by our optimal overlay construction

framework that is capable of constructing overlay networks composed of network

                                           7
paths within the backbone of CP networks. These overlays satisfy the deployment

requirements of an enterprise in terms of target regions, and overall traffic budget.

Overall our results demonstrate that CP networks are tightly interconnected with

each other. Second, multi-cloud paths exhibit higher latency reductions than single

cloud paths; e.g., 67% of all paths, 54% of all intra-CP paths, and 74% of all inter-

CP paths experience an improvement in their latencies. Third, although traffic

costs vary from location to location and across CPs, the costs are not prohibitively

high.

1.3     Dissertation Outline

        The remainder of this thesis is organized as follows. We provide a

background and overview of studies related to topology discovery and performance

characteristics of Internet routes in Chapter II. Next, in Chapter III we characterize

the locality of Internet traffic from an edge perspective and demonstrate that

the majority of Internet traffic can be attributed to CDNs and cloud providers.

Chapter IV presents our work on the discovery of Amazon’s peering ecosystem

with a special focus on VPIs. We evaluate and characterize the performance of

different connectivity options in a multi-cloud setting within Chapter V. Chapter

VI presents our proposed measurement platform for multi-cloud environments and

showcases the applicability of this measurement platform in the creation of optimal

overlays. We conclude and summarize our contributions in Chapter VII.

          1.3.1   Navigating the Chapters. This dissertation studies the effects

of cloud-providers on the Internet from multiple perspectives, including (i) traffic,

(ii) topology (iii), performance, and (iv) multi-cloud deployments. The chapters

presented in this dissertation can be read independently. A reader interested in



                                           8
Table 1. Topics covered in each chapter of the dissertation.

   Chapter           Traffic         Topology       Performance      Multi-Cloud

       3                X                                  X

       4                                 X

       5                                 X                X                X

       6                                 X                X                X


individual topics can refer to Table 1 for a summary of topics that are covered in

each chapter.




                                          9
                                    CHAPTER II

                                 RELATED WORK

       This chapter presents a collection of prior studies for various aspects of

Internet measurement to gain insight into the topology of the Internet as well as

its implications in designing applications. For Internet measurement, we focus on

recent studies regarding the simulation and characterization of Internet topology.

Furthermore, we organize these studies based on the resolution of the uncovered

topology with an emphasis on the utilized datasets and employed methodologies.

On the second part, we focus on various implications of Internet topology on the

design and performance of applications. These studies are organized in accordance

with the implication of topology on performance or resiliency of the Internet.

Furthermore we emphasis on how various resolutions of Internet topology allow

researchers to conduct different studies. The collection of these studies present a

handful of open and interesting problems regarding the future of Internet topology

with the advent of cloud providers and their centrality within today’s Internet.

       The rest of this chapter is organized as follows. First, in Section 2.1 we

present a primer on the Internet and introduce the reader with a few taxonomies

that are frequently used within this document. Second, an overview of most

common datasets, platforms, and tools which are used for topology discovery is

given in Section 2.2. Third, the review for recent studies on Internet topology

discovery is presented in Section 2.3. Lastly, Section 2.4 covers the recent studies

which utilize Internet topologies to study the performance and resiliency of the

Internet.




                                          10
2.1   Background

       The Internet is a globally federated network composed of many networks

each of which has complete autonomy over the structure and operation of its own

network. These autonomous systems or networks (AS) can be considered as the

building blocks of the Internet. Each AS represents a virtual entity and can be

composed of a vast network infrastructure composed of networking equipment

like routers and switches as well as transit mediums such as Ethernet and fiber

optic cables. These ASes can serve various purposes such as providing transit

or connectivity for other networks, generating or offering content such as video

streams, or merely represent the network of an enterprise. Each of the connectivity

provider ASes can be categorized into multiple tiers based on their size and how

they are interconnected with other ASes. These tiers create a natural hierarchy

of connectivity that is broadly composed of 3 tiers namely, (i) Tier-1: an AS that

can reach all other networks without the need to pay for its traffic exchanges, (ii)

Tier-2: an AS which can have some transit-free relations with other ASes while

still needing to pay for transit for reachability to some portion of the Internet, and

(iii) Tier-3: an AS that solely purchases transit for connectivity to the Internet.

While each network has full control over its own internal network and can deliver

data from one internal node to another, transmitting data from one AS to another

requires awareness of a path that can reach the destination AS. This problem is

solved by having each AS advertise its own address space to neighboring ASes

through the border gateway protocol (BGP). Upon receiving a BGP announcement,

each AS would prepend its own AS number (ASN) to the AS-path attribute of this

announcement and advertise this message to its own neighbors. This procedure

allows ASes to learn about other networks and the set of AS-paths or routes that

                                          11
they can be reached through. ASes can interconnect with each by linking their

border routers at one or multiple physical locations. These border routers are

responsible for advertising their prefixes in addition to performing the actual

routing of traffic within the Internet. The border routers of ASes are placed

within colocation facilities (colo) that offer space, power, security, and networking

equipment to the tenants ASes. Each AS can have a physical presence in multiple

metro areas. The collection of their routers within each of these metro areas are

referred to as the points of presence (PoP) for these ASes. Figure 1 presents a high

level abstraction of the aforementioned concepts. The figure consists of 3 ASes

namely, ASA , ASB , and ASC in red, blue, and green accordingly. The internal

structure ASes is abstracted out presenting only the border routers of each AS.

ASA and ASB have two PoPs one in LA and another in NY while ASC is only

present in NY. ASA and ASB establish a private interconnection with each other

through their LA PoP within colo1 while they peer with each other as well as ASC

in their NY PoP in colo2 through an IXPs switching fabric.

2.2   Tools & Datasets

       This section provides an overview of various tools and datasets that have

been commonly used by the measurement community for discovering Internet

topology. We aim to familiarize the reader with these tools and datasets as they are

continuously used within the literature by researchers. Researchers have utilized a

wide range of tools for the discovery of topologies; they range from generic network

troubleshooting tools such as traceroute or paris-traceroute to tools developed

by the Internet measurement community such as Sibyl or MIDAR. Furthermore,

researchers have benefited from many measurement platforms such as RIPE Atlas



                                          12
                 PoPLA            ASB              PoPNY




                      colo1                colo2                 ASC
                                                   IXP




                 PoPLA            ASA              PoPNY




Figure 1. Abstract representation for topology of ASA , ASB , and ASC in red, blue,
and green accordingly. ASA and ASB establish a private interconnection inside
colo1 at their LA PoP while peering with each other as well as ASC inside colo2 at
their NY PoP facilitated by an IXP’s switching fabric.


or PlanetLab which enable them to perform their measurements from a diverse set

of ASes and geographic locations.

       In addition to the aforementioned toolsets researchers have benefited from

various datasets within their work. These datasets are collected by a few well-

known projects in the Internet measurement community such as Routeviews

University of Oregon (2018), CAIDA’s Ark CAIDA (2018), and CAIDA’s AS

relationships datasets or stem from other sources such as IP to geolocation datasets

or information readily available on colocation facilities or IXP operators websites.

       The remainder of this section is organized within two subsections. First,

§2.2.1 would provide an overview of the most commonly used tools and platforms

for Internet topology discovery. Second, §2.2.2 would give a brief overview of the

datasets that appear in the literature presented within §2.3 and §2.4.




                                          13
         2.2.1     Measurement Tools & Platforms. Broadly speaking the

tools used for Internet topology discovery can be categorized within three groups

namely, (i) path discovery, (ii) alias resolution, and (iii) interface name decoding.

         2.2.1.1     Path Discovery. Although originally developed for

troubleshooting purposes, traceroute Jacobson (1989) has become one of the

prominent tools used within the Internet measurement community. traceroute

displays the set of intermediate router interfaces that are traversed towards a

specific destination in the forward path. This is made possible by sending packets

towards the destination with incremental TTL values, each router along the path

would decrease the TTL value before forwarding the packet. If a router encounters

a packet with a TTL value of 0 the packet would be dropped, and a notification

message with its source address would be sent back to the originator of the packet.

This, in turn, allows the originator of these packets to identify the source address of

router interfaces along the forward path. Deployment of load-balancing mechanics

by routers which rely on packet header fields can lead to inaccurate and incomplete

paths to be reported by traceroute. Figure 2 illustrates an example of incorrect

inferences by traceroute in the presence of load-balanced paths. Node a is a load-

balancer and multiplexes packets between the top and bottom paths. In this

example, the T T L = 2 probe originated from the source traverses the top path

and expires at node b while the T T L = 3 probe goes through the bottom path

and terminates at node e. These successive probes cause traceroute to incorrectly

infer a non-existent link between nodes b and e. To address this problem, Augustin,

Friedman, and Teixeira (2007) developed paris-traceroute which relies on packet

header contents to enforce load-balancers to pick a single route for all probes of a



                                          14
single traceroute session. Furthermore, paris-traceroute uses a stochastic probing

algorithm in order to enumerate all possible interfaces and links at each hop.

       Given the scale of the Internet and its geographic span relying on a

single vantage point (VP) to conduct topology discovery studies would likely

lead to incomplete or inaccurate inferences. Researchers have relied on various

active measurement platforms which either host a pre-defined set of tools, e.g.,

Dasu, Bismark, Dimes, Periscope, and RIPE Atlas Giotsas, Dhamdhere, and

Claffy (2016); RIPE NCC (2016); Sánchez et al. (2013); Shavitt and Shir (2005);

Sundaresan, Burnett, Feamster, and De Donato (2014) or provide full-access

control, e.g. PlanetLab, CAIDA Archipelago, and GENI Berman et al. (2014);

Chun et al. (2003); Hyun (2006) to the user to conduct their measurements from

a diverse set of networks and geographic locations. For example, RIPE Atlas RIPE

NCC (2016) is composed of many small measurement devices (10k at the time of

this survey) that are voluntarily hosted within many networks on a global scale.

Hosting RIPE Atlas nodes would give credit to the hosting entity which later on

could be used to conduct latency (ping) and reachability (traceroute and paris-

traceroute) measurements. Periscope Giotsas et al. (2016) is another platform

that provides a unified interface for probing around 1.7k publicly available looking

glasses (LGs) which provide a web interface to conduct basic network commands

(ping, traceroute, and bgp on routers hosted in roughly 0.3k ASes. Periscope VPs

are located at core ASes while RIPE Atlas probes are hosted in a mix of core and

edge networks. Dasu Sánchez et al. (2013) on the other hand mainly consists of

VPs at edge networks and more specifically broadband users relying on ISPs to

have Internet connectivity. Dasu consists of a plugin for the Vuze BitTorrent client

that is able to conduct network measurement from the computers of users who

                                         15
have installed their plugin on their Vuze client. The authors of Dasu incentivize

its adoption by reporting broadband network characteristics to its users. Cunha et

al. (2016) developed a route oracle platform named Sibyl which allowed users to

define the path requirements for their measurement through an expressive input

language based on symbolic regular-expressions after which Sibyl would select the

source (LG) and destination pair that has the highest likelihood of satisfying the

users path requirements based on its internal model.

       Lastly, considering the large number of Internet hosts and networks,

researchers have developed a series of tools that allow them to conduct large

scale measurements in parallel. The methodology of paris-traceroute has

been incorporated in scamper Luckie (2010), an extensible packet prober that

implements various common network measurement functionalities such as

traceroute, ping, and alias resolution into a single tool. scamper is able to conduct

measurements in parallel without exceeding a predefined probing rate. While

scamper is able to run measurements in parallel, each measurement is conducted

sequentially, this in turn could hinder its rate or induce overhead to the probing

device in order to maintain the state of each measurement. yarrp Beverly (2016);

Beverly, Durairajan, Plonka, and Rohrer (2018) is a high-rate IPv4 and IPv6

capable, Internet-scale probing tool inspired by the state-less design principles

of ZMap Durumeric, Wustrow, and Halderman (2013) and masscan Graham,

Mcmillan, and Tentler (2014). yarrp randomly permutates the IP and TTL space

and encodes the state information of each probe within the IP and TCP header

fields (which are included in the ICMP response) and is therefore able to conduct

traceroute probes in parallel without incrementally increasing the TTL value.



                                          16
Figure 2. Illustration of inferring and incorrect link (b − e) by traceroute due to load
balanced paths. Physical links and traversed paths are shown with black and red
lines accrodingly. The T T L = 2 probe traverses the top path and expires at node
b while the T T L = 3 probe traverses the bottom path and expires at node e. This
succession of probes causes traceroute to infer a non-existent link (b − e).


         2.2.1.2    Alias Resolution. Paths which are obtained via the tools

outlined in §2.2.1.1 all specify the router interfaces that are encountered along

the forward path. It is possible to observe multiple interfaces of a single router

within different traceroute paths. The association of these interfaces to a single

physical router is not clear from these outputs. Alias resolution tools have

been developed to solve this issue. These tools would accept a set of interface

addresses as an input and would provide a collection of interface sets, each

of which corresponds to a single router. Alias resolution tools can broadly be

categorized into two groups namely, (i) probing Bender, Sherwood, and Spring

(2008); Govindan and Tangmunarunkit (2000); Keys, Hyun, Luckie, and Claffy

(2013); Spring, Mahajan, and Wetherall (2002); Tozal and Sarac (2011) and (ii)

inference M. Gunes and Sarac (2009); M. H. Gunes and Sarac (2006); Sherwood,

Bender, and Spring (2008); Spring, Dontcheva, Rodrig, and Wetherall (2004) based

techniques. The former would require a VP which would probe the interfaces in

question to identify sets of interfaces which belong to the same router. Probe

based techniques mostly rely on the IP ID field which is used for reassembling

fragmented packets at the network layer. These techniques assume that routers rely

                                          17
on a single central incremental counter which assigns these ID values regardless

of the interface. Given this assumption, Ally Spring et al. (2002) probed IPs

with UDP packets having high port numbers (most likely not in use) to induce

an ICMP port unreachable response. Ally will infer IP addresses to be aliases if

successive probes have incremental ID values within a short distance. Radargun

Bender et al. (2008) tries to address the probing complexity of Ally (O(n2 ))

by iteratively probing IPs and inferring aliases based on the velocity of IP ID

increments for each IP. MIDAR Keys et al. (2013) presents a precise methodology

for probing large scale pool of IP addresses by eliminating unlikely IP aliases using

a velocity test. Furthermore, aliases are inferred by comparing the monotonicity

of IP ID time series for multiple target IP addresses. MIDAR utilizes ICMP,

TCP, and UDP probes to increase the likelihood of receiving responses from

each router/interface. Palmtree Tozal and Sarac (2011) probes /30 or /31 mates

of target IPs using a TTL value inferred to expire at the router in question to

induce an ICMP_TTL_EXPIRED response from another interface of the router.

Assuming no path changes have happened between measuring the routers hop

distance and the time the ICMP_TTL_EXPIRED message has been generated,

the source address of the ICMP_TTL_EXPIRED message should reside on the

same router of the target IP and therefore are inferred to be aliases.

       Inference based techniques accept a series of traceroute outputs and rely

on a set of constraints and assumptions regarding the setting and environment

which these routers are deployed to make inferences about interfaces that are most

likely part of the same router. Spring et al. suggest a common successor heuristic

to attribute IP addresses on the prior hop to the same router. This heuristic

assumes that no layer-2 devices are present between the two routers in question.

                                          18
Analytical Alias Resolution (AAR) M. H. Gunes and Sarac (2006) infers aliases

using symmetric traceroute pairs by pairing interface addresses using the common

address sharing convention of utilizing a /30 or /31 prefix for interfaces on both

ends of a physical link. This method requires the routes between both end-pairs to

be symmetrical. DisCarte Sherwood et al. (2008) relies on the route record option

to capture the forward and reverse interfaces for the first nine hops of a traceroute.

Limited support and various route record implementations by routers in addition to

the high complexity of the inference algorithm limits its applicability to wide/large

scenarios.

         2.2.1.3    Interface Name Decoding. Reverse DNS (RDNS) entries

for observed interface addresses can be the source of information for Internet

topology researchers. Port type, port speed, geolocation, interconnecting AS,

and IXP name are examples of information which can be decoded from RDNS

entries of router interfaces. These information sets are embedded by network

operators within RDNS entries for ease of management in accordance to a (mostly)

structured convention. For example, ae-4.amazon.atlnga05.us.bb.gin.ntt.net is an

RNDS entry for a router interface residing on the border router of NTT (ntt.net)

within Atlanta GA (atlnga) interconnecting with Amazon. Embedding this

information is completely optional, and the structure of this information varies

from one AS to another. Several tools have been developed to parse and extract

the embedded information within RDNS entries Chabarek and Barford (2013);

Huffaker, Fomenkov, et al. (2014); Scheitle, Gasser, Sattler, and Carle (2017);

Spring et al. (2002). Spring et al. extracted DNS encoded information for the ISPs

under study in their Rocketfuel project Spring et al. (2002). As part of this process,

they relied on the city code names compiled in Padmanabhan and Subramanian

                                          19
(2001) to search for domain names which encode geoinformation in their name.

PathAudit Chabarek and Barford (2013) is an extension to traceroute which report

encoded information within observed router hops. In addition to geo information,

PathAudit reports on interface type, port speed, and manufacturing vendor of the

router. The authors of PathAudit extract common encodings (tags) from device

configuration parameters, operator observations, and common naming conventions.

Using this set of tags, RDNS entries from CAIDA’s Ark project CAIDA (2018)

are parsed to match against one or multiple of these tags. A clustering algorithm

is employed to identify similar naming structures within domains of a common

top level domain TLD. These common structures are translated into parsing rules

which can match against other RDNS entries. DDeC Huffaker, Fomenkov, and

claffy (2014) is a web service which decodes embedded information within RDNS

entries by unifying the rulesets obtained by both UNDNS Spring et al. (2002) and

DRoP Huffaker, Fomenkov, et al. (2014) projects.

         2.2.2     Datasets. Internet topology studies have been made possible

through various data sources regarding BGP routes, IXP information, colo facility

listings, AS attributes, and IP to geolocation mapping. The following sub-section

provides a short overview of data sources most commonly used by the Internet

topology community.

         2.2.2.1     BGP Feeds & Route Policies. University of Oregon’s

RouteViews and RIPE Routing Information Service (RIS) RIPE (2018); University

of Oregon (2018) are projects originally conceived to provide real-time information

about the global routing system from the standpoint of several route feed collectors.

These route collectors periodically report the set of BGP feeds that they receive

back to a server where the information is made publicly accessible. The data from

                                         20
these collectors have been utilized by researchers to map prefixes to their origin-

AS or to infer AS relationships based on the set of observed AS-paths from all

the route collectors. Routeviews and RIPE RIS provide a window into the global

routing system from higher tier networks. Packet Clearing House (PCH) Packet

Clearing House (2018) maintains more than 100 route collectors which are placed

within IXPs around the globe and provides a complementary view to the global

routing system presented by Routeviews and RIPE RIS. Lastly, Regional Internet

Registries (RIRs) maintain databases regarding route policies of ASes for each

of the prefixes that are delegated to them using the Route Policy Specification

Language (RPSL). Historically, RPSL entries are not well adopted and typically

are not maintained/updated by ASes. The entries are heavily concentrated within

RIPE and ARIN regions but nonetheless have been leveraged by researchers to

infer or validate AS relationships Giotsas, Luckie, Huffaker, and Claffy (2015);

Giotsas, Luckie, Huffaker, et al. (2014).

         2.2.2.2    Colocation Facility Information. Colocation facilities

(colo for short) are data-centers which provide space, power, cooling, security,

and network equipment for other ASes to host their servers and also establish

interconnections with other ASes that have a presence within the colo. PeeringDB

and PCH Packet Clearing House (2017); PeeringDB (2017) maintain information

regarding the list of colo facilities and their physical location as well as tenant ASes

within each colo. Furthermore, some colo facility operators provide a list of tenant

members as well as the list of transit networks that are available for peering within

their facilities for marketing purposes on their website. This information has been

mainly leveraged by researchers to define a set of constraints regarding the points of

presence (PoP) for ASes.

                                            21
         2.2.2.3    IXP Information. IXPs are central hubs providing rich

connectivity opportunities to the participating ASes. Their impact and importance

regarding the topology of the Internet have been highlighted within many works

Augustin, Krishnamurthy, and Willinger (2009); Castro, Cardona, Gorinsky, and

Francois (2014); Comarela, Terzi, and Crovella (2016); Nomikos et al. (2018).

IXPs provide a switching fabric within one or many colo facilities where each

participating AS connects their border router to this switch to establish bi-

lateral peering with other member ASes or establishes a one to many (multi-

lateral) peering with the route server that is maintained by the IXP operator.

IXP members share a common subnet owned by the IXP operator. Information

regarding the location, participating members, and prefixes of IXPs is readily

available through PeeringDB, PCH, and the IXP operators website Packet Clearing

House (2017); PeeringDB (2017).

         2.2.2.4    IP Geolocation. The physical location of IP addresses isn’t

known. Additionally, IP addresses could correspond to mobile end-hosts or can be

repurposed by the owner AS and therefore have a new geolocation. Several free and

commercial databases have been made throughout the years that attempt to map

IP addresses to physical locations. These datasets can vary in their coverage as well

as the resolution of mapped addresses (country, state, city, and geo-coordinates).

Maxmind’s GeoIP2 MaxMind (2018), IP2Location databases IP2Location (2018),

and NetAcuity NetAcuity (2018) are among the most widely used IP geolocating

datasets used by the Internet measurement community. Majority of these datasets

have been designed to geolocate end-host IP addresses. Gharaibeh et al. (2017)

compare the accuracy of these datasets for geolocating router interfaces and

while NetAcuity has relatively higher accuracy than Maxmind and IP2Location

                                         22
datasets, relying on RTT validated geocoding of RDNS entries is more reliable for

geolocating router and core addresses.

2.3   Capturing Network Topology

       This section provides an overview of Internet measurement studies which

attempt to capture the Internet’s topology using various methodologies motivated

by different end goals. Capturing Internet topology has been the focus of many

pieces of research over the past decade, while each study has made strides of

incremental improvements to present a more complete and accurate picture of

Internet topology, the problem remains widely open and the subject of many recent

studies.

       Internet topology discovery has been motivated by a myriad of applications

ranging from protocol design, performance measurement in terms of inter-AS

congestion, estimating resiliency towards natural disasters and service or network

interruptions, security implications of DDoS attacks and much more. A motivating

example would be the Netflix Verizon dispute where the subpar performance

of Netflix videos for Verizon customers lead to lengthy accusations from both

parties Engebretson (2014). The lack of proper methodologies to capture inter-

AS congestion by independent entities at the time further elongated the dispute.

Within Section 2.4 we provide a complete overview of works which rely on some

aspect of Internet topology to drive their research and provide insight regarding the

performance or resiliency of the Internet.

       Capturing Internet topology is hard due to many contributing factors, the

following is a summary of them:


   – The Internet is by nature a decentralized entity composed of a network

      of networks, each of the constituent networks lacks any incentive to share
                                             23
     their topology publicly and often can have financial gains by obscuring this

     information.

   – Topology discovery studies are often based on “hackish" techniques that

     rely on toolsets which were designed for completely different purposes. The

     designers of the TCP/IP protocol stack did not envision the problem of

     topology discovery within their design most likely due to the centralized

     nature of the Internet in its inception. The de facto tool for topology

     discovery has been traceroute which is designed for troubleshooting and

     displaying paths between a host and a specific target address.

   – Capturing inter-AS links within Internet topology becomes even more

     challenging due to lack of standardization for proper ways to establish these

     links. More specifically, the shared address between two border routers could

     originate from either of the participating networks. Although networks

     typically rely on common good practices such as using addresses from the

     upstream provider, the lack of any oversight or requirement within RFC

     standards does not guarantee its proper execution within the Internet.

   – A certain set of RFCs regarding how routers should handle TTL expired

     messages has resulted in incorrect inferences of the networks which are

     establishing inter-AS interconnections. For example, responses generated by

     third-party interfaces on border routers could lead to the inference of an inter-

     AS link between networks which necessarily are not interconnected with each

     other.


       Topology discovery studies can be organized according to many of their

features; in particular, the granularity of the obtained topology seems to be the
                                         24
most natural fit. Each of the studies in this section based on the utilized dataset,

or devised methodology results in topologies which capture the state of the Internet

at different granularities, namely physical-level, router-level, PoP-level, and AS-

level. The aforementioned resolutions of topology have a direct mapping to the

abstract layers of the TCP/IP stack, e.g. physical-level corresponds to the first

layer (physical), router-level can be mapped to the transport layer, and PoP-level

as well as AS-level topologies are related to application layer at the top of the

TCP/IP stack. These abstractions allow one to capture different features of interest

without the need for dealing with the complexities of lower layers. For instance, the

interplay of routing and the business relationships between different ASes can be

captured through an AS-level topology without the need to understand how and

where these inter-AS relationships are being established.

       In the following subsections, we will provide an overview of the most

recent as well as prominent works that have captured Internet topology at

various granularities. We present all studies in accordance to their chronological

order starting with works related to AS-level topologies as the most abstract

representation of Internet topology within Section §2.3.1, AS-level topologies

are the oldest form of Internet topology but have retained their applicability for

various forms of analyses throughout the years. Later we’ll present router-level and

physical-level topologies within Section §2.3.2 and §2.3.4 accordingly.

         2.3.1    AS-Level Topology. The Internet is composed of various

networks or ASes operating autonomously within their domain that interconnect

with each other at various locations. This high-level abstraction of the Internet’s

structure is captured by graphs representing AS-level topologies where each

node is an AS and edges present an interconnection between two ASes. These

                                          25
graphs lay-out virtual entities (ASes) that are interconnecting with each other and

abstract out details such as the number and location where these inter-AS links are

established. For example, two large Tier-1 networks such as Level3 and AT&T can

establish many inter-AS links through their border routers at various metro areas.

These details are abstracted out, and all of these inter-AS links are represented

by a single edge within the AS-level topology. The majority of studies rely on

control plane data that is obtained by active measurements of retrieving router

dumps through available looking glasses or passive measurements that capture BGP

feeds, RPSL entries and BGP community attributes. Path measurements captured

through active or passive traceroute probes have been an additional source of

information for obtaining AS-level topologies. The obtained traceroute paths have

been mapped to their corresponding AS path by translating each hop’s address to

its corresponding AS. Capturing AS-level topology has been challenging mainly due

to limited visibility into the global routing system, more specifically the limited set

of BGP feeds that each route collector is able to observe. This limited visibility is

known as the topology incompleteness problem within the community. Researchers

have attempted to address this issue by either modeling Internet topology by

combing the limited ground truth information with a set of constraints or by

presenting novel methodologies that merge various data sources in order to obtain

a comprehensive view of Internet topology. The later efforts lead to research’s that

highlighted the importance of IXPs as central hubs of rich connectivity. Within

the remainder of this Section we organize works into the following three groups:

(i) graph generative and modeling, (ii) topology incompleteness, and (iii) IXP’s

internal operation and peerings.



                                          26
         2.3.1.1     Graph Generation & Modeling. Graph generation

techniques attempt to simulate network topologies by relying on a set of constraints

such as the maximum number of physical ports on a router. These constraints

coupled with the limited ground truth information regarding the structure of

networks are used to model and generate topologies. The output of these models

can be used in other studies which investigate the effects of topology on network

performance and resiliency of networks towards attacks or failures caused by

natural disasters.

       Li, Alderson, Willinger, and Doyle (2004) argue that graph generating

models rely on replicating too abstract measures such as degree distribution which

are not able to express the complexities/realities of Internet topology. Authors

aim to model ASes/ISPs as the building blocks of the Internet at the granularity

of routers, where nodes represent routers and links are Layer2 physical links

which connect them together. Furthermore, the authors argue that technological

constraints on routers switching fabric dictate the amount of bandwidth-links we

can have within this topology. Furthermore, due to economical reasons access

providers aggregate their traffic over a few links as possible since the cost of

laying physical links could surpass that of the switching/routing infrastructure.

This, in turn, leads to lower degree core and high degree edge elements. The

authors create five graphs with the same degree distribution but based on different

heuristics/models and compare the performance of these models using a single

router model. Interestingly graphs that are less likely to be produced using

statistical measures have the highest performance.

       Gregori, Improta, Lenzini, and Orsini (2011) conduct a structural

interpretation of the Internet connectivity graph with an AS granularity. They

                                           27
report on the structural properties of this graph using k-core decomposition

techniques. Furthermore, they report what effects IXPs have on the AS-level

topology.

       The data for this study is compiled from various datasets, namely CAIDA’s

Ark, DIMES, and Internet Topology Collection from IRL which is a combination

of BGP updates from Routeviews, RIPE RIS, and Abilene. The first two datasets

consist of traceroute data and are converted to AS-level topologies by mapping

each hop to its corresponding ASN. A list of IXPs was obtained using from PCH,

PeeringDB, Euro-IX, and bgp4.as. The list of IXP members was compiled either

from the IXP websites or by utilizing the show ip bgp summary command

from IXPs which host an LG.

       Using the obtained AS-level graph resulted from combing various data

sources the authors report on various characteristics of the graph namely: degree,

average neighbor degree, clustering coefficient, betweenness centrality, and k-core

decomposition. A k-core subgraph has a minimum degree of k for every node and is

the largest subgraph which has this property. The authors present stats regarding

the penetration of IXPs in different continents with Europe having the largest

share (47%) and North America (19%) at second position. Furthermore using k-

core decomposition, the authors identify a densely connected core and a loosely

connected periphery which consists of the majority of nodes. The authors also look

at the fraction of nodes in the core which are IXP participants and find that IXPs

play a fundamental role in the formation of these cores.

            2.3.1.2   Topology Incompleteness. Given the limited visibility of

each of the prior works, researchers have relied on a diverse set of data sources

and devised new methodologies for inferring additional peerings to address the

                                          28
incompleteness of Internet topology. These works have lead to highlighting the

importance of IXPs as a means of providing the opportunity for establishing many

interconnections with IXP members and a major source for identifying missing

peering links. Peerings within IXPs and their rich connectivity fabric between many

edge networks caused topological changes to the structure of the Internet deviating

from the historical hierarchical structure and as a consequence creating a more flat

Internet structure referred to as Internet flattening within the literature.

       He, Siganos, Faloutsos, and Krishnamurthy (2009) address AS-level topology

incompleteness by presenting tools and methodologies which identify and validate

missing links. BGP snapshots from various (34 in total) Routeviews, RIPE RIS,

and public route servers are collected to create a baseline AS-level topology graph.

The business relationship of each AS edge is identified by using the PTE algorithm

Xia and Gao (2004). The authors find that the majority of AS links are of a c2p

type, while most of the additional links which are found by additional collectors

are p2p links. Furthermore, by parsing IRR datasets using Nemecis Siganos and

Faloutsos (2004) to infer additional AS links. A list of IXP participants is compiled

by gathering IXP prefixes from PCH and performing DNS lookups and parsing the

resulting domain name to infer the participating ASN. Furthermore, the authors

infer inter-AS links within IXPs by relying on traceroute measurements which cross

IXP addresses and utilize a majority voting scheme to infer the participants ASN

reliably. By Combing all these datasets and proposed methodologies, the authors

find about 300% additional links compared to prior studies, most of which is found

to be established through IXPs.

       Augustin et al. (2009) attempt to expand on prior works for discovering IXP

peering relationships by providing a more comprehensive view of this ecosystem.

                                          29
They rely on various data sources to gather information on IXPs as much as

possible, their data-sources are: (i) IXP databases such as PCH and PeeringDB,

(ii) IXP websites which typically list their tenants as well as the prefixes which

are employed by them, (iii) RIRs may include BGP policy entries specifically the

import and export entries that expose peering relationships, (iv) DNS names of

IXP addresses which include information about the peer, (v) BGP dumps from

LGs, Routeviews, and RIPE’s RIS can include next hop neighbors which are part

of an IXP prefix. The authors conduct targeted traceroute measurements with

the intention of revealing peering relationships between members of each IXP. To

limit the number of conducted probes, the authors either select a vantage point

within one of the member ASes or if not available they rely on the AS relationship

datasets to discover a - at most 2 hops away - neighbor for each member which has

a VP. Using the selected VPs, they conduct traceroutes towards alive addresses

(or random address if such an address was not discovered) in the target network.

Inference of peerings based on traceroutes is done using a majority voting scheme

similar to He et al. (2009). The authors augment their collected dataset with the

data plane measurements of CAIDA’s Skitter, DIMES, and traceroutes measured

from about 250 PlanetLab nodes. The resultant dataset is able to identify peerings

within 223 (out of 278) IXPs which consisted of about 100% (40%) more IXPs

(peerings) compared to the work of He et al. He et al. (2009).

       Ager et al. (2012) rely on sFlow records from one of largest European/global

IXPs as another source of information for inferring peering relationships between

IXP tenants and provide insight on three fronts: (i) they outline the rich

connectivity which is happening over the IXP fabric and contrast that with known

private peerings which are exposed through general topology measurement studies,

                                          30
(ii) present the business dynamics between participants of the IXP and providing

explanation for their incentives to establish peering relationships with others, and

(iii) provide the traffic matrix between peers of the IXP as a microcosm of Internet

traffic. Among the set of analyses that have been conducted within the paper one

could point to: (i) comparison of peering visibility from Routeviews, RIPE, LGs,

and the IXPs perspective, (ii) manual label for AS types as well as the number of

established peerings per member, (iii) breakdown of traffic into various protocols

based on port numbers as well as the share of each traffic type among various AS

types, and (iv) traffic asymmetry, ratio of used/served prefixes and geo-distance

between end-points.

       Khan, Kwon, Kim, and Choi (2013) utilize LG servers to provide a

complementary view to Routeviews and RIPE RIR of the AS-level Internet

topology. A list of 1.2k LGs (420 were operational at the time of the study) has

been built by considering various sources including PeeringDB, traceroute.org,

traceroute.net.ru, bgp4.as, bgp4.net, and virusnet. AS-level topologies from IRL,

CAIDA’s Ark, iPlane, and IRR’s are used to compare the completeness of the

identified AS-links. For the duration of a month show ip bgp summary is issued

twice a week and BGP neighbor ip advertised is issued once a week towards

all LGs which support the command. The first command outputs each neighbor’s

address and its associated ASN while the second command outputs the routing

table of the router, consisting of reachable prefixes, next hop IP as well as the

AS path towards the given prefix. AS-level connectivity graph is constructed by

parsing the output of the prior commands. Using this new data source enables the

authors to identify an additional 11k AS-links and about 700 new ASes.



                                          31
       Klöti, Ager, Kotronis, Nomikos, and Dimitropoulos (2016) perform a cross-

comparison of three public IXP datasets, namely PeeringDB PeeringDB (2017),

Euro-IX European Internet Exchange Association (2018), and PCH Packet Clearing

House (2017) to study several attributes of IXPs such as location, facilities,

and participants. Aside from the three aforementioned public IXP datasets, for

validation purposes BGP feeds collected by PCH route collectors as well as data

gathered from 40 IXP websites was used through the study. The three datasets

lack common identifiers for IXPs across datasets, for this reason in a first pass IXPs

are linked together through an automated process by relying on names and geo

information, in the second pass linked IXPs are manually checked for correctness.

The authors present one of the largest IXP information datasets at the time as a

side effect of their study.

       Geo coverage of each dataset is examined where the authors find relatively

close coverage by each dataset except for North America region where PCH has

the highest coverage. Facility location for IXPs is compared across datasets and

is found that PCH lacks this information and in general facility information for

IXPs is limited for other datasets. Complementarity of datasets is presented using

both Jaccard and overlap index. It is found that PeeringDB and Euro-IX have the

largest overlap within Europe and larger IXPs tend to have the greatest similarity

across all pairs of datasets.

          2.3.1.3    IXP Peerings. The studies within this section provide insight

into the inner operation of IXPs and how tenants establish peerings with other

ASes. Each tenant of an IXP can establish a one-to-one (bilateral) peering with

other ASes of the IXP similar to how regular peerings are established. Given

the large number of IXP members, a great number of peering sessions should

                                          32
                                        ASc


                     Route Server




                                        IXP
                       ASb                                ASd




                                                              BLP

                                        ASa                   MLP
Figure 3. Illustration of an IXP switch and route server along with 4 tenant
networks ASa , ASb , ASc , and ASd . ASa establishes a bi-lateral peering with ASd
(solid red line) as well as multi-lateral peerings with ASb and ASc (dashed red
lines) facilitated by the route server within the IXP.


be maintained over the IXP fabric. Route servers have been created to alleviate

this issue where each member would establish a peering session with the route

server and describe its peering preferences. This, in turn, has enabled one-to-many

(multilateral) peering relationships between IXP tenants. Figure 3 illustrates an

IXP with 4 tenant networks ASa , ASb , ASc , and ASd . ASa established a bi-lateral

peering with ASd (solid red line) as well as multi-lateral peerings with ASb and ASc

(dashed red lines) that are facilitated by the route server within the IXP. Studies

within this section propose methodologies for differentiating these forms of peering

relationships from each other and emphasize the importance of route servers in the

operation of IXPs.

       Giotsas, Zhou, Luckie, and Klaffy (2013) present a methodology to discover

multilateral peerings within IXPs using the BGP communities attributes and

                                         33
route server data. The BGP communities attribute which is 32bits follows specific

encoding to indicate either of the following policies by each member of an IXP:

(i) ALL routes are announced to all IXP members. (ii) EXCLUDE block an

announcement towards a specific member, this policy is usually used in conjunction

with the ALL policy. (iii) NONE block an announcement towards all members,

and (iv) INCLUDE allow an announcement towards a specific member, this

policy is used with the NONE policy. Using a combination of prior policies a

member AS can control which IXP members receive its BGP announcements. By

leveraging available LGs at IXPs and issuing router dump commands, the authors

obtain the set of participating ASes and the BGP communities values for their

advertised prefixes which in turn allows them to infer the connectivity among IXP

participants. Furthermore, additional BGP communities values are obtained by

parsing BGP feeds from Routeviews and RIPE RIS archives. Giotsas et al. infer

the IXP by either parsing the first 16bits of the BGP communities attribute or by

cross-checking the list of excluded ASes against IXP participants.

       By combining the passive and active measurements, the authors identify

207k multilateral peering (MLP) links between 1.3k ASes. They validate their

findings by finding LGs which are relevant to the identified links from PeeringDB,

by testing 26k different peerings they are able to confirm 98.4% of them.

Furthermore Giotsas et al. parse the peering policies of IXP members either from

PeeringDB or from IXP websites which provide this information and find that 72%,

24%, and 4% of members have an open, selective, and restrictive peering policy

accordingly. Participation in a route server seems to be positively correlated to

a networks openness in peering. The authors present the existence of a binary

pattern in terms of the number of allowed/blocked ASes where ASes either allow

                                         34
or block the majority of ASes from receiving their announcements. Peering density

as a representation of the percentage of established links against the number of

possible links is found to be between 80%-95%.

       Giotsas and Zhou (2013) expand their prior work Giotsas et al. (2013) by

inferring multi-lateral peering (MLP) links between IXP tenants by merely relying

on passive BGP measurements. BGP feeds are collected from both Routeviews

and RIPE RIS collectors. Additionally, the list of IXP looking glasses, as well as

their tenants, are gathered from PeeringDB and PCH. The authors compile a list of

IXP tenants, using which the setter of each BGP announcement containing the

communities attribute is determined by matching the AS path against the list

of IXP tenants. If less than two ASes match against the path, no MLP link can

be identified. From the two matching ASes, the AS which is closest to the prefix

would be the setter, if more than two ASes match, only two ASes which have a p2p

relationship according to CAIDA’s AS relationship dataset are selected and the one

closer to the prefix is identified as the setter. Depending on a blacklist or whitelist

policy that the setter AS has chosen a list of multi-lateral peers for each setter AS

is compiled.

       The methodology is applied to 11 large IXP route servers; the authors find

about 73% additional peering links out of which only 3% of the links are identified

within CAIDA’s Ark and DIMES datasets. For validation, the authors rely on IXP

LGs and issue a show ip bgp command for each prefix. About 3k links where tested

for validation and 94% of them were found to be correct.

       Richter et al. (2014) outline the role and importance of route servers

within IXPs. For their data, weekly snapshots of peer and master RIBs from

two IXPs which exposes the multi-lateral peerings that have been happening at

                                           35
the IXP are used. Furthermore, the authors have access to sFlow records which

are sampled from the IXP’s switching infrastructure. This dataset allows the

authors to identify peerings between IXP members which have been established

without the help of route servers. Using peer RIB snapshots peering relationships

between IXP members as well as the symmetrical nature of it is identified. For

the master RIB, Richter et al. assume peering with all members unless they find

members using BGP community values to control their peering. The data plane

sFlow measurements would correspond to a peering relationship if BGP traffic is

exchanged between two members of the IXP. The proclivity of multi-lateral peering

over bi-lateral peering is measured and found that ASes favor multi-lateral peerings

with a ratio of 4:1 and 8:1 in the large and medium IXPs accordingly. Furthermore,

traffic volumes transmitted over multi-lateral and bi-lateral peerings are measured

and found that ASes tend to send more traffic over bi-lateral links with a ratio

of 2:1 and 1:1 for the large and medium IXPs accordingly. It is found that ASes

have binary behavior of either advertising all or none of their prefixes through

the route server. Additionally, when ASes establish hybrid (multi and bi-lateral)

peerings, they do not advertise further prefixes over their bi-lateral links. Majority

of additional peerings happen over multi-lateral fabric while traffic ratios between

multi(bi)-lateral peerings remain fairly consistent over the period of study.

       Summary: This subsection provided an overview of researches concerned with

AS-level topology. The majority of studies were concerned with the incompleteness

of Internet topology graphs. These efforts lead to highlighting the importance of

IXPs as central hubs of connectivity. Furthermore, various sources of information

such as looking glasses, router collectors within IXPs, targeted traceroutes,

RPSL entries, and traffic traces of IXPs were gleaned together to provide a more

                                          36
comprehensive view of inter-AS relationships within the Internet. Lastly the

importance of route servers to the inner operation of IXPs and how they enable

multi-lateral peering relationships was brought into attention.

         2.3.2   Router-Level Topology. Although AS-level topologies provide

a preliminary view into the structure and peering relations of ASes, they merely

represent virtual relationships and do not reflect details such as the number and

location where these peerings are established. ASes establish interconnections

with each other by placing their border routers within colos where other ASes are

also present. Within these colos ASes can establish one to one peerings through

private interconnections or rely on an IXPs switching fabric to establish public

peerings with the IXP participants. Furthermore, some ASes extend their presence

into remote colos to establish additional peerings with other ASes by relying on

layer2 connectivity providers. Capturing these details can become important for

accurately attributing inter-AS congestion to specific links/routers or for pin-

pointing links/routers that are responsible for causing outages or disruptions

within the connectivity of a physical region or network. Studies within this

section aim to present methodologies to infer router-level topologies using data

plane measurements in the form of traceroute. These methods would address

the aforementioned shortcomings of AS-level topologies by mapping the physical

entities (border routers) which are used to establish peering relations and therefore

can account for multiple peering links between each AS. Furthermore, given that

routers are physical entities, researchers are able to pinpoint these border routers

to geo locations using various data sources and newly devised methodologies.

Creating router-level topologies of the Internet can be challenging due to many

reasons. First, given the span of the Internet as well as the interplay of business

                                          37
relationships and routing dynamics, traceroute as the de-facto tool for capturing

router-level topologies is only capable of recording a minute fraction of all possible

paths. Routing dynamics caused by changes in each ASes route preference as well

as the existence of load-balancers further complicate this task. Second, correctly

inferring which set of ASes have established an inter-AS link through traceroute

is not trivial due to non-standardized practices for establishing interconnections

between border routers as well as several RFCs regarding the operation of routers

that cause traceroute to depict paths that do not correspond to the forward

path. Lastly, given the disassociation of the physical layer from the transport

layer establishing the geolocation for the set of identified routers is not trivial.

Within Section 2.2 we presented a series of platforms which try to address the first

problem. The following studies summarize recent works which try to address the

latter two problems.




                                 ′                    ′




                                 ′                    ′




Figure 4. Illustration of address sharing for establishing an inter-AS link between
border routers. Although the traceroute paths (dashed lines) are identical the
inferred ownership of router interfaces and the placement of the inter-AS link differs
for these two possibilities.


         2.3.2.1     Peering Inference. As briefly mentioned earlier, inferring

inter-AS peering relationships using traceroute paths is not trivial. To highlight

this issue, consider the sample topology within Figure 4 presenting the border

                                            38
routers of AS1 and AS2 color coded as orange and blue accordingly. This figure

shows the two possibilities for address sharing on the inter-AS link. The observed

traceroute path traversing these border routers is also presented at the top of each

figure with dashed lines. Within the top figure AS2 is providing the address space

for the inter-AS link (y 0 − y) while AS1 provides the address space for the inter-

AS link for the bottom figure. As we can see both of the traceroute paths are

identical to each other while the ownership of router interfaces and the placement

of the inter-AS link differs for these two possibilities. To further complicate the

matter, a border router can respond with an interface (a in the top figure using

address space owned by AS3 color coded with red), not on the forward path of the

traceroute leading to incorrect inference of an inter-AS link between AS1 and AS3 .

Lastly, the border routers of some ASes are configured to not respond to traceroute

probes which restrict the chances of inferring inter-AS peerings with those ASes.

The studies within this section try to address these difficulties by using a set of

heuristics which are applied to a set of traceroutes that allow them to account for

these difficulties.

        Spring et al. (2002) done the seminal work of mapping networks of large

ISPs and inferring their interconnections through traceroute probes. They make

three contributions namely, (i) conducting selective traceroute probes to reduce

the overall overhead of running measurements, (ii) provide an alias resolution

technique to group IP address into their corresponding router, and (iii) parse DNS

information to extract PoP/GEO information. Their selective probing method is

composed of two main heuristics: (i) directed probing, which utilizes Routeviews

data and the advertised paths to probe prefixes which are likely to cross the target

network, (ii) path reduction, that avoids conducting traceroutes which would

                                          39
lead to redundant paths, i.e., similar ingress or egress points. Additionally, an

alias resolution technique named Ally is devised to group interfaces from a single

network into routers. Lastly, a series of DNS parsing rules are crafted to extract

geoinformation from router interface RDNS entries. The extracted geo information

allows the authors to identify the PoPs of each AS. Looking glasses listed on

traceroute.org are used to run Rocketfuel ’s methodology to map the network of

10 ISPs including AT&T, Sprint, and Verio. The obtained maps were validated

through private correspondence with network operators and by comparing the set

of identified BGP neighbors with those obtainable through BGP feeds.

       Nomikos and Dimitropoulos (2016) develop an augmented version of

traceroute (traIXroute) which annotates the output path and reports whether (and

at which exact hop) an IXP has been crossed along the path. The tool can operate

with either traceroute or scamper as a backend. As input, traIXroute requires IXP

membership and a list of their corresponding prefixes from PeeringDB and PCH

as well as Routeviews’ prefix to origin-AS mapping datasets. traIXroute annotates

the hops of the observed path with the origin AS and tags hops which are part

of an IXP prefix and also provides the mapping between an IXP address and the

members ASN if such a mapping exists. Using a sliding window of size three the

hops of the path are examined to find (i) hops which are part of an IXP prefix, (ii)

hops which have an IXP to ASN mapping, and (iii) whether the adjacent ASes are

IXP members or not. The authors account for a total of 16 possible combinations

and present their assessment regarding the location of the IXP link for 8 cases that

were most frequent. About 75% of observed paths matched rules which rely on IXP

to ASN mapping data. The validity of this data source is looked into by using BGP

dumps from routers that PCH operates within multiple IXPs. A list of IXP address

                                          40
to ASN mappings was compiled by using the next hop address and first AS within

the AS path from these router dumps. The authors find that 92% (93%) of the

IXP to ASN mappings reported by PeeringDB (PCH) are accurate according to the

BGP dumps. Finally, the prevalence of IXPs along Internet paths are measured by

parsing a CAIDA Ark snapshot. About 20% of paths are reported to cross IXPs,

the IXP hop on average is located on the 6th hop at the middle of the path, and

only a single IXP is observed along each route which is in accordance with valley-

free routing.

       Luckie, Dhamdhere, Huffaker, Clark, et al. (2016) develop bdrmap, a

method to identify inter-domain links of a target network at the granularity of

individual routers by conducting targeted traceroutes. As an input to their method,

they utilize originated prefixes from Routeviews and RIPE RIS, RIR delegation

files, list of IXP prefixes from PeeringDB and PCH, and CAIDA’s AS-to-ORG

mapping dataset. Target prefixes are constructed from the BGP datasets by

splitting overlapping prefixes into disjoint subnets, the first address within each

prefix is targeted using paris-traceroute, neighbors border addresses are added to

a stop list to avoid further probing within the customer’s network. IP addresses

are grouped together to form a router topology by performing alias resolution

using Ally and Mercator. By utilizing the prefixscan tool, they try to eliminate

third-party responses for cases where interfaces are responsive to alias resolution.

Inferences to identify inter-AS links are done by iteratively going through a set

of 8 heuristics which are designed to minimize inference errors caused by address

sharing, third-party response, and networks blocking traceroute probes. Luckie et

al. deploy their tool within 10 networks and receive ground truth results from 4

network operators; their method is able to identify 96-99% of inter-AS links for

                                          41
these networks correctly. Furthermore, the authors compare their findings against

BGP inferred relationships and find that they are able to observe between 92% -

97% of BGP links. Using a large US access network as an example, the authors

study the resiliency of prefix reachability in terms of the number of exit routers and

find that only 2% of prefixes exit through the same router while a great majority

of prefixes had about 5-15 exit routers. Finally, the authors look at the marginal

utility of using additional VPs for identifying all inter-AS links and find that results

could vary depending on the target network and the geographic distribution of the

VPs.

       Marder and Smith (2016) devise a tool named MAP-IT for identifying

inter-AS links by utilizing data-plane measurements in the form of traceroutes.

The algorithm developed in this method requires as input the set of traceroute

measurements which were conducted in addition to prefix origin-AS from BGP

data as well as a list of IXP prefixes and CAIDA’s AS to ORG mapping dataset.

For each interface a neighbor set (Ns ) composed of addresses appearing on prior

(Nb ) and next (Nf ) hops of traceroute is created. Each interface is split into two

halves, the forward and backward halves. Direct inferences are made regarding the

ownership of each interface half by counting the majority ASN based on the current

IP-to-AS mapping dataset. At the end of each round, if a direct inference has been

made for an interface half, the other side will be updated with an indirect inference.

Furthermore, within each iteration of the algorithm using the current IP-to-AS

mapping, MAP-IT visits interface halves with direct inferences to check whether

the connected AS still holds the majority, if not the inference is reduced to indirect,

after visiting all interface halves any indirect inference without an associated

direct inference is removed. MAP-IT would update the IP to AS mapping dataset

                                          42
based on the current inferences and would continue this process until no further

inferences are made. For verification Marder et al. use Internet2’s network topology

as well as a manually compiled dataset composed of DNS names for Level3 and

TeleSonera interfaces. The authors investigate the effect of the hyper parameter f

which controls the majority voting outcome for direct inferences and empirically

find that a value of 0.5 yields the best result. Using f=0.5 MAP-IT has a recall of

82% - 100% and a precision of 85% - 100% for each network. The authors also look

into the incremental utility of each iteration of MAP-IT, interestingly the majority

( 80%) of inferences can be made in the first round which is equivalent to making

inferences based on a simple IP2AS mapping. The algorithm converges quickly

after its 2nd and 3rd iterations.

       Alexander et al. (2018) combine the best practices of bdrmap Luckie et al.

(2016) and MAP-IT Marder and Smith (2016) into bdrmapIT, a tool for identifying

the border routers that improves MAP-IT ’s coverage without loosing bdrmap’s

accuracy at identifying border routers of a single ASN. The two techniques are

mainly made compatible with the introduction of “Origin AS Sets" which annotates

each link between routers with the set of origin ASes from the prior hop. bdrmapIT

relies on a two-step iterative process. During the first step, the owner of routers are

inferred by counting the routers majority subsequent interfaces votes. Exceptions

in terms of the casted vote for IXP interfaces, reallocated prefixes, and multi-

homed routers are made to account for these cases correctly. During the second

step, interfaces are annotated with an ASN using either the origin AS (if router

annotation matches that of the interface) or the majority vote of prior connected

routers (if router annotation differs from the interface). The iterative process

is repeated until no further changes are made to the connectivity graph. The

                                          43
methodology is evaluated using bdrmap’s ground truth dataset, as well as the ITDK

dataset by removing the probes from a ground truth VP. The authors find that

bdrmapIT improves the coverage of MAP-IT by up to 30% while maintaining the

accuracy of bdrmap.

         2.3.2.2    Geo Locating Routers & Remote Peering. Historically

ASes would have established their peering relations with other ASes local to

their PoPs and would have relied on their upstream providers for connectivity

to the remainder of the Internet. IXPs enabled ASes to establish peerings that

both improved their performance due to shorter paths and reduced their overall

transit costs by offload upstream traffic on p2p links instead of c2p links. With

the proliferation of IXPs and their aforementioned benefits, ASes began to expand

their presence not only within local IXPs but also remote ones as well. ASes would

rely on layer2 connectivity providers to expand their virtual PoPs within remote

physical areas. Layer3 measurements are agnostic to these dynamics and are not

able to distinguish local vs. remote peering relations from each other. Researchers

have tried to solve this issue by pinpointing border routers of ASes to physical

locations. The association of routers to geolocations is not trivial, researchers have

relied on a collection of complementary information such as geocoded embeddings

within reverse DNS names or by constraining the set of possible locations through

colo listings offered by PeeringDB and similar datasets. In the following, we present

a series of recent studies which tackle this unique issue.

       Castro et al. (2014) present a methodology for identifying remote peerings,

where two networks interconnect with each other via a layer-2 connectivity

provider. Furthermore, they derive analytical conditions for the economic viability

of remote peering versus relying on transit providers. Levering PeeringDB, PCH,

                                          44
and information available on IXP websites a list of IXP’s as well as their tenants,

prefixes and interface to member mapping is obtained. For this study, IXPs which

have at least one LG or RIPE NCC probe (amounting to a total of 22) are selected.

By issuing temporally spaced probes towards all of the identified interfaces within

IXP prefixes and filtering interfaces which either do not respond frequently or do

not match an expected maximum TTL value of 255 or 64 a minimum RTT value

for each interface is obtained. By examining the distribution of minimum RTT for

each interface, a conservative threshold of 10ms is selected to consider an interface

as remote. A total of 4.5k interfaces corresponding to 1.9k ASes in 22 IXPs are

probed in the study. The authors find that 91% of IXPs have remote peering while

285 ASes have a remote interface. Findings including RTT measures as well as

remote labels for IXP members were confirmed for TorIX by the staff. One month

of Netflow data captured at the border routers of RedIRIS (Spain’s research and

education network) is used to examine the amount of inbound and outbound

traffic between RedIRIS and its transit providers, using which an upper bound

for traffic which can be offloaded is estimated. Furthermore, the authors create a

list of potential peers (2.2k) which are reachable through Euro-IX, these potential

peers are also categorized into different groups based on their peering policy which

is listed on PeeringDB. Considering all of the 2.2k networks RedIRIS can offload

27% (33%) of its inbound (outbound) traffic by remotely peering with these ASes.

Through their analytical modeling, the authors find that remote peering is viable

for networks with global traffic as well as networks which have higher ratios of

traffic-independent cost for direct peering compared to remote peering such as

networks within Africa.



                                          45
       Giotsas, Smaragdakis, Huffaker, Luckie, and claffy (2015) attempt to

obtain a peering interconnection map at the granularity of colo facilities. Authors

gather AS to facility mapping information from PeeringDB as well as manually

parsing this information for a subset of networks from their websites. IXP lists

and members were compiled by combining data from PeeringDB, PCH, and

IXP websites. For data-plane measurements, the authors utilize traceroute data

from RIPE Atlas, iPlane, CAIDA’s Ark, and a series of targeted traceroutes

conducted from looking glasses. The authors annotate traceroute hops with their

corresponding ASN and consider the segment which has a change in ASN as the

inter-AS link. Using the colo-facility listing obtained in the prior step the authors

produce a list of candid facilities for each inter-AS link which can result in three

cases: (i) a single facility is found, (ii) multiple facilities match the criteria, or

(iii) no candid facility is found. For the latter two cases, the author’s further

constraint the search space by either benefiting from alias resolution results

(two alias interfaces should reside in the same facility) or by conducting further

targeted probes which are aimed at ASNs that have a common facility with the

owner AS of the interface in question. The methodology is applied to five content

providers (Google, Yahoo, Akamai, Limelight, and Cloudflare) and five transit

networks (NTT, Cogent, DT, Level3, and Telia). The authors present the effect

of each round of their constrained facility search (CFS) algorithm’s iteration (max

iteration count of 100), the majority of pinned interfaces are identified up to the

40th iteration with RIPE probes providing a better opportunity for resolving new

interfaces. The authors find that DNS-based pinning methods are able to identify

only 32% of their findings. The authors also cross-validate their findings using

direct feedback from network admins, BGP communities attribute, DNS records,

                                             46
and IXP websites with 90% of the interfaces being pinned correctly and for the

remainder, the pinning accuracy was correct at a metro granularity.

       Nomikos et al. (2018) present a methodology for identifying remote peers

within IXPs, furthermore they apply their methodology to 30 large IXPs and

characterize different aspects of the remote peering ecosystem. They define an

IXP member as a remote peer if it is not physically connected to the IXPs fabric

or reaches the IXP through a reseller. The development of the methodology and

the heuristics used by the authors are motivated by a validation dataset which

they obtain through directly contacting several IXP operators. A collection of 5

heuristics are used in order to infer whether an IXP member is peering locally

or remotely these heuristics in order of importance are: (i) the port capacity of

a customer, (ii) latency measurements from VPs within IXPs towards customer

interfaces, (iii) colocation locations within an RTT radius, (iv) multi-IXP router

inferences by parsing traceroutes from publicly available datasets and corroborating

the location of these IXPs and whether the AS in question is local to any of them,

and (v) identifying private peerings (by parsing public traceroute measurements)

between the target AS and one or more local IXP members is used as a last resort

to infer whether a network is local or remote to a given IXP. The methodology

is applied to 30 large IXPs, and the authors find that a combination of RTT and

colo listings to be the most effective heuristics in inferring remote peers. Overall

28% of interfaces are inferred to be peering remotely and for 90% of IXPs. The

size of local and remote ASes in terms of customer cone is observed to be similar

while hybrid ASes tend to have larger network sizes. The growth of remote peering

is investigated over a 14 month period, and the authors find that the number of

remote peers grew twice as fast as the number of local peers.

                                          47
       Motamedi et al. (2019) propose a methodology for inferring and geolocating

interconnections at a colo level. The authors obtain a list of colo facility members

from PeeringDB and colo provider webpages. A series of traceroutes towards the

address space of prior steps ASes are conducted using available measurement

platforms such as looking glasses and RIPE Atlas nodes in the geo proximity of

the targeted colo. tracerotue paths are translated to a router-level connectivity

graph using alias resolution and a set of heuristics based on topology constraints.

The authors argue that a router-level topology coupled with the prevalence of

observations allows them to account for traceroute anomalies and they are able

to infer the correct ASes involved in each peering. To geolocate routers, an initial

set of anchor interfaces with a known location is created by parsing reverse DNS

entries for the observed router interfaces. This information is propagated/expanded

through the router-level graph by a Belief Propagation algorithm that uses a set of

co-presence rules based-on membership in the same alias set and latency difference

between neighboring interfaces.

       Summary: while traceroutes have been historically utilized as a source of

information to infer inter-AS links, the methodologies did not correctly account

for the complexities of inferring BGP peerings from layer-3 probes. The common

practice of simply mapping interface addresses along the path to their origin-AS

based on BGP data does not account for the visibility of BGP collectors, address

sharing for establishing inter-AS links, third-party responses of TTL expired

messages by routers, and unresponsive routers or firewalled networks along the

traceroute path. The presented methodologies within this section attempt to account

for these difficulties by corroborating domain knowledge for common networking

practices and relying on a collection of traceroute paths and their corresponding

                                          48
router view (obtained by using alias resolution techniques) to make accurate

inferences of the entities which are establishing inter-AS links. Furthermore, pin-

pointing routers to physical locations was the key enabler for highlighting remote

peerings that are simply not visible from an AS-level topology.

           2.3.3   PoP-Level Topology. PoP-level topologies present a middle

ground between AS-level and router-level topologies. A PoP-level graph presents

the points of presence for one or many networks. These topologies inherently have

geo information at the granularity of metro areas embedded within. They have

been historically at the center of focus as many ASes disclose their topologies at

a PoP level granularity and do not require detailed information regarding each

individual router and merely represent a bundle of routers within each PoP as

a single node. They have lost their traction to router-level topologies that are

able to capture the dynamics of these topologies in addition to providing finer

details of information. Regardless of this, due to the importance of some ASes

and their centrality in the operation of today’s Internet, several studies Schlinker

et al. (2017); Wohlfart, Chatzis, Dabanoglu, Carle, and Willinger (2018); Yap et

al. (2017) outlining the internal operation of these ASes within each PoP have

emerged. These studies offer insight into the challenges these ASes face for peering

and serving the vast majority of the Internet as well as the solutions that they have

devised.

       Cunha et al. (2016) develop Sibyl, a system which provides an expressive

interface that allows the user to specify the requirements for the path of a

traceroute, given the set of requirements Sibyl would utilize all available vantage

points and rely on historical data to conduct a traceroute from a given vantage

point towards a specific destination that is most likely to satisfy the users

                                          49
constraints. Furthermore, given that each vantage point has limited probing

resources and that concurrent requests can be made, Sibyl would pick source-

destination pairs which optimize for resource utilization. Sibyl combines PlanetLab,

RIPE Atlas, traceroute servers accessible through looking glasses, DIMES,

and Dasu measurement platforms to maximize its coverage. Symbolic regular

expressions are used for the query interface where the user can express path

properties such as the set of traversed ASes, cities, and PoPs. The likelihood of

each source-destination pair matching the required path properties is calculated

using a supervised machine learning technique (RuleFit) which is trained based

on prior measurements and is continuously updated based on new measurements.

Resource utilization optimization is addressed by using a greedy algorithm, Sibyl

chooses to issue traceroutes that fit the required budget and that have the largest

marginal expected utility based on the output of the trained model.

       Schlinker et al. (2017) outline Facebook’s edge fabric within their PoPs

by utilizing an SDN based system that alters BGP local-pref attributes to utilize

alternative paths towards specific prefixes better. The work is motivated by BGP’s

shortcomings namely, lack of awareness of link capacities and incapability to

optimize path selections based on various performance metrics. More specifically

BGP makes its forwarding decisions using a combination of AS-path length and the

local-perf metric. Facebook establishes BGP connections with other ASes through

various means namely, private interconnections, public peerings through IXPs, and

peerings through router servers within IXPs. The authors report that the majority

of their interconnections are established through public peerings while the bulk of

traffic is transmitted over the private links. The later reflects Facebook’s preference

to select private peerings over public peerings while peerings established through

                                          50
route servers have the lowest priority. Furthermore, the authors observe that for

all PoPs except one, all prefixes have at least two routes towards each destination

prefix. The proposed solution isolates the traffic engineering per PoP to simplify

the design, the centralized SDN controller within each PoP gathers router RIB

tables through a BMP collector. Furthermore, traffic statistics are gathered through

sampled sFlow or IPFIX records. Finally, interface information is periodically

pulled by SNMP. The collector emulates BGP’s best path selection and projects

interface utilization. For overloaded interfaces prefixes with alternative routes are

selected, an alternative route is selected based on a set of preferences. The output

of this step generates a set of route overrides which are enforced by setting a high

local-pref value for them. The authors report that their deployed system detours

traffic from 18% of interfaces. The median of detour time is 22 minutes and about

10% of detours last as long as 6 hours. The detoured routes resulted in 45% of

the prefixes achieving a median latency improvement of 20ms while 2% of prefixes

improved their latency by 100ms.

       Yap et al. (2017) discuss the details of Espresso, an application-aware

routing system for Google’s peering edge routing infrastructure. Similar to the

work of Schlinker et at. Schlinker et al. (2017) Espresso is motivated by the need

for a more efficient (both technically and economically) edge peering fabric that

can account for traffic engineering constraints. Unlike the work of Schlinker et al.

Schlinker et al. (2017) Espresso maintains two layers of control plane one which is

localized to each PoP while the other is a global centralized controller that allows

Google to perform further traffic optimizations. Espresso relies on commodity

MPLS switches for peering purposes, traffic between the switches and servers are

encapsulated in IP-GRE and MPLS headers. IP-GRE header encodes the correct

                                          51
switch, and the MPLS header determines the peering port. The global controller

(GC) maintains an egress map that associates each client prefix and PoP tuple

to an edge router/switch and egress port. User traffic characteristics such as

throughput, RTT, and re-transmits are reported at a /24 granularity to the global

controller. Link utilization, drops, and port speeds are also reported back to the

global controller. A greedy algorithm is used by the GC to assign traffic to a candid

router port combination. The greedy algorithm starts by making its decisions using

traffic priority metrics and orders its available options based on BGP policies,

user traffic metrics, and the cost of serving on a specific link. Espresso has been

incrementally deployed within Google and at the time of the study was responsible

for serving about 22% of traffic. Espresso is able to maintain higher link utilization

while maintaining low packet drop rates even for fully utilized links (95% less than

2.5%). The authors report that the congestion reaction feature of the GC results in

higher goodput and mean time between re-buffers for video traffic.

       Wohlfart et al. (2018) present an in-depth study of the connectivity fabric

of Akamai at its edge towards its peers. The authors account 3.3k end-user facing

(EUF) server deployments with varying size and capabilities which are categorized

into four main groups. Two of these groups have Akamai border routers and

therefore establish explicit peerings with peers and deliver content directly to

them while the other two groups are hosted within another ASes network and

are responsible for delivering content implicitly to other peers. Customers are

redirected to the correct EUF server through DNS, the mapping is established

by considering various inputs including BGP feeds collected by Akamai routers,

user performance metrics, and link cost information. To analyze Akamai’s peering

fabric, the authors rely on proprietary BGP snapshots obtained from Akamai

                                          52
routers and consist of 3.65M AS paths and about 1.85M IPv4 and IPv6 prefixes

within 61k ASes (ViewA). As a point of comparison, a combination of daily BGP

feeds from Routeviews, RIPE RIS, and PCH consisting of 21.1M AS paths and

900k prefixes within 59k ASes is used (ViewP). While at an AS level both datasets

seem to have a relatively similar view, ViewA (ViewP) observes 1M (0.1M) prefixes

the majority of which are prefixes longer than /25. Only 15% of AS paths within

ViewP are observed by ViewA which suggests that a large number of AS paths

within ViewP are irrelevant for the operation of Akamai. Wohlfart et al. report

6.1k unique explicit peerings between Akamai and its neighbors by counting the

unique number of next-hop ASN from the Akamai BGP router dumps. About

6k of these peerings happen through IXPs while the remainder are established

through PNIs. In comparison, only 450 peerings between Akamai and other ASes

are observed through ViewP. Using AS paths within ViewP the authors report

about 28k implicit peers which are within one AS hop from Akamai’s network.

Lastly, the performance of users sessions are looked into by utilizing EUF server

logs containing the clients IP address, throughput, and a smoothed RTT value. The

performance statistics are presented for two case studies (i) serving a single ISP

and (ii) serving customers within 6 distinct metros. Overall 90% of traffic is coming

from about 1% of paths and PNIs are responsible for delivering the bulk of traffic

and PNIs and cache servers within eyeball ASes achieve the best performance

regarding RTT.

       Nur and Tozal (2018) study the Internet AS-level topology using a

multigraph representation where AS pairs can have multiple edges between each

other. Traceroute measurements from CAIDA’s Ark and iPlane projects are

collected for this study. For IP to AS mapping Routeviews’ BGP feed is utilized.

                                          53
Next hop addresses for BGP announcements are extracted from Routeviews as well

as RIPE RIS. For mapping IP addresses to their corresponding geo-location various

data sources have been employed namely, (i) UNDNS for DNS parsing, (ii) DB-IP,

(iii) Maxmind GeoLite2 City, and (iv) IP2Location DB5 Lite.

       Each ASes border interface is identified by tracking ASN changes along the

hops of each traceroute. Each cross border interface X-BI is geolocated to the city

in which it resides by applying one of the following methods in order of precedence:

(i) relying on UNDNS for extracting geoinformation from reverse DNS names, (ii)

majority vote along three (DB-IP, Maxmind, and IP2Location) IP to GEO location

datasets, (iii) sandwich method where an unresolved IP between two IPs in the

same geolocation is mapped to the same location, (iv) RTT based geo locating

which relies on the geolocation of prior or next hops of an unresolved address that

have a RTT difference smaller than 3 ms for mapping them to the same location,

and (v) if all of the prior methods fail Maxmind’s output is used for mapping the

geolocation of the X-BI. The set of inter-AS links resulting from parsing traceroutes

is augmented by benefiting from BGP data. If an AS relationship exists between

two ASes but is missing from the current AS-level graph and all identified X-BIs

corresponding to these ASes are geolocated to a single city, a link will be added

to the AS-level topology graph under the assumption that this is the only possible

location for establishing an interconnection between these two ASes.

       The inferred PoP nodes in the AS graph are validated for major research

networks as well as several commercial ISPs. The overlap of identified PoPs is

measured for networks which have publicly available PoP-level maps. The maps

align with the set of identified cities by X-AS with deviations in terms of number

of PoPs per city. This is a limitation of X-AS as it is only able to identify one

                                          54
PoP per city. Identified AS-links are compared against CAIDA’s AS relationships

dataset, the percentage of discrepancy for AS links of each AS is measured. For

78% of ASes, the maps agree with each other completely, and the average link

agreement is about 85% for all ASes. Various properties of the resulting graph are

analyzed in the paper, the authors find that the number of X-BI nodes per AS,

X-BI nodes degree, and AS degree all follow a power law distribution.

       Summary: PoP-level topologies can offer a middle ground between router-

level and AS-level topologies offering an understanding of inter-AS peering

relationships while also being able to distinguish instances of these peerings

happening at various geo-locations/PoPs. Additionally, we reviewed studies that

elaborate on the faced challenges as well as the devised solutions for content

provider (Google, Facebook) and CDN (Akamai) networks which are central to the

operation of today’s Internet.




Figure 5. Fiber optic backbone map for CenturyLink’s network in continental US.
Each node represents a PoP for CenturyLink while links between these PoPs are
representative of the fiber optic conduits connecting these PoPs together. Image
courtesy of CenturyLink.



                                          55
         2.3.4   Physical-Level Topology. This subsection is motivated

by the works of Knight, Nguyen, Falkner, Bowden, and Roughan (2011) and

Durairajan, Ghosh, Tang, Barford, and Eriksson (2013); Durairajan, Sommers,

and Barford (2014); Durairajan, Sommers, Willinger, and Barford (2015) which

presented the groundwork for having a comprehensive physical map of the Internet

consisting of edges corresponding to fiber optic cables providing connectivity

between metro areas and PoPs as nodes within these topologies. A sample of this

topology for CenturyLink’s fiber-optic backbone network within the continental

US is presented in Figure 5. Physical maps were mostly neglected by the Internet

topology community mainly due to two reasons: (i) the scarcity of well-formatted

information and (ii) the complete disassociation of physical layers from probes

conducted within higher layers of the TCP/IP stack. The following set of papers

try to address the former issue by gathering various sources of information and

compiling them into a unified format.

       Knight et al. (2011) present the Internet topology Zoo which is a collection

of physical maps of various networks within the Internet. The authors rely on

ground truth data publicly provided by the network operators on their websites.

These maps are presented in various formats such as static images or flash objects.

The authors transcribe all maps using yEd (a graph editor and diagraming

program) into a unified graph specification format (GML) and annotate nodes

and links with any additional information such as link speed, link type, longitude,

and latitudes that is provided by these maps. Each map and its corresponding

network is classified as a backbone, testbed, customer, transit, access or internet

exchange based on the properties of their network. For example, backbone networks

should connect at least two cities together while access networks should provide

                                          56
edge access to individuals. A total of 232 networks are transcribed by the authors.

About 50% of networks are found to have more than 21 PoPs and each of these

PoPs have an average degree of about 3. Lastly similar to Gregori et al. (2011)

the core density of networks is examined by measuring the 2-core size of networks.

A wide degree of 2-core sizes ranging from 0 (tree-like networks) to 1 (densely

connected core with hanging edges) are found within the dataset.

       Durairajan et al. (2013) create a map of the physical Internet consisting of

nodes representing colocation facilities and data-centers, links representing conduits

between these nodes and additional metadata related to these entities. The authors

rely on publicly available network maps (images, Flash objects, Google Maps

overlays) provided by ASes. The methodology for transcribing images consists of

5 steps: (i) capturing high-resolution sub-images, (ii) patching sub-images into a

composite image, (iii) extracting a link image using color masking techniques, (iv)

importing link image into ArcGIS using geographic reference points, and (v) using

link vectorization in ArcGIS to convert links into vectors. Given that each map has

a different geo resolution, different scores are attributed to nodes with lat/lon or

street level, city, and state having a corresponding score of 1.0, 0.75, 0.5. All maps

have at least city level resolution with about 20% of nodes having lat/lon or street

level accuracy.

       Durairajan et al. (2014) work is motivated by two research questions: (i)

how do physical layer and network layer maps compare with each other? and (ii)

how can probing techniques be improved to reveal a larger portion of physical

infrastructure? For physical topologies, the authors rely on maps which are

available from the Internet Atlas project. From this repository the maps for 7 Tier-

1 networks and 71 non-Tier-1 networks which are present in North America are

                                          57
gathered, these ASes collectively consist of 2.6k PoPs and 3.6k links. For network

layer topologies, traceroutes from the CAIDA Ark project during the September

2011 to Match 2013 period are used. Additionally DNS names for router interfaces

are gathered from the IPv4 Routed /24 DNS Names Dataset which includes

the domain names for IP addresses observed in the CAIDA Ark traceroutes.

Traceroute hops are annotated with their corresponding geo information (extracted

with DDeC) as well as the AS number which is collected from TeamCymru’s

service. Effects of vantage point selection on node identification are studied by

employing public traceroute servers. Different modalities depending on the AS

ownership of the traceroute server and the target address are considered ([V Pi n,

ti n], [V Pi n, to ut], [V Po ut, ti n]). Their methodology (POPsicle) chooses VPs based

on geo proximity towards the selected targets and along the pool of destinations,

those which have a square VP to destination distance greater than the sum of

squares of the distance between target VP and destination are selected to create

a measurement cone. For this study 50 networks that have a comprehensive set of

geo-information for their physical map are considered. Out of these 50 networks,

21 of them do not have any geo information embedded in their DNS names.

Furthermore, 16 ASes were not observed in the Ark traces. This results in 13 ASes

out of the original 50 which have both traces and geo-information in the network

layer map. POPsicle was deployed in an IXP (Equinix Chicago) to identify the

PoPs of 10 tenants. Except for two networks, POPsicle was able to identify all

known PoPs of these networks. Furthermore, POPsicle was evaluated by targeting

13 ISPs through Atlas probes which were deployed in IXPs, for all of these ISPs

POPsicle was able to match or outperform Ark and Rocketfuel. Furthermore for 8

of these ISPs POPsicle found all or the majority of PoPs present in Atlas maps.

                                           58
       Durairajan et al. (2015) obtain the long-haul fiber network within the US

and study its characteristics and limitations. For the construction of the long-

haul fiber map, Durairajan et al. rely on the Internet Atlas project Durairajan

et al. (2013) as a starting point and confirm the geo-location or sharing of conduits

through legal documents which outline laying/utilization of infrastructure. The

methodology consists of four steps: (i) using Internet Atlas maps for tier-1 ASes

that have geo-coded information, a basic map is constructed, (ii) the geolocation

of nodes and links for the map is confirmed through any form of legal document

which can be obtained, (iii) the map is augmented with additional maps from large

transit ASes which lack any geo-coded information, (iv) the augmented map is

once again confirmed through any legal document that would either confirm the

geolocation of a node/link or would indicate conduit sharing with links that have

geo-coded information. The long-haul fiber map seems to be physically aligned

with roadway and railway routes, the authors use the polygon overlap feature

of ArcGIS to compare the overlap of these maps and find that most often long-

hauls run along roadways. The authors also assess shared conduit risks, for this

purpose they construct a conduit sharing matrix were rows are ASes and columns

are conduits the value within each row indicates the number of ASes which are

utilizing that conduit. Out of 542 identified conduits about 90% of them are shared

by at least one other AS. Using the risk matrix the hamming distance for each AS

pair is measured to identify ASes which have similar risk profiles. Using traceroute

data from Edgescape and parsing geoinformation in domain names the authors

infer which conduits were utilized by each traceroute and utilize the frequency of

traceroutes as a proxy measure of traffic volume. Finally a series of risk mitigation

analysis are conducted namely: (i) the possibility of increasing network robustness

                                          59
by utilizing available conduits or by peering with other networks is investigated

for each AS (ii) increasing network robustness through the addition of additional k

links is measured for each network, and lastly (iii) possibility for improving latency

is investigated by comparing avg latencies against right of way (ROW), line of sight

(LOS), and best path delays.

       Summary: the papers within this sub-section provided an overview of

groundbreaking works that reveal physical-level topologies of the Internet. The

researchers gathered various publicly available maps of ASes as well as legal

documents pertaining to the physical location of these networks to create a unified,

well-formatted repository for all these maps. Furthermore, the applicability of these

maps towards the improvement of targeted probing methodologies and the possibility

of improving and provisioning the infrastructure of each network is investigated.

Although the interplay of routing on top of these physical topologies is unknown

and remains as an open problem, these physical topologies provide complementary

insight into the operation of the Internet and allow researchers to provision or

design physical infrastructure supporting lower latency Internet access or to measure

the resiliency of networks towards natural disasters.

2.4   Implications & Applications of Network Topology

       This section will provide an overview of the studies which rely on Internet

topology to provide additional insight regarding the performance, resiliency, and

various characteristics of the Internet. The studies which are outlined in this

section look into various properties of the Internet including but not limited to:

path length both in terms of router and AS hops, latency, throughput, packet loss,

redundancy, and content proximity. In a more broad sense, we can categories these

studies into three main groups: (i) studying performance characteristics of the

                                          60
Internet, (ii) studying resiliency of the Internet, and (iii) classifying the type of

inter-AS relationships between ASes. Depending on the objective of the study one

or more of the aforementioned properties of the Internet could be the subject which

these studies focus on. Each of these studies would require different resolutions

of Internet topology. As outlined in Section 2.3 obtaining a one to one mapping

between different resolutions is not always possible. For example, each AS link can

correspond to multiple router level links while each router level link can correspond

to multiple physical links. For this reason, each study would rely on a topology

map which better captures the problems objectives. As an example, studying the

resiliency of a transit ASes backbone to natural disasters should rely on a physical

map while performing the same analyses using an AS-level topology could lead

to erroneous conclusions given the disassociation of ASes to physical locations.

While on the other hand studying the reachability and visibility of an AS through

the Internet would require an AS-level topology and conducting the same study

using a fiber map would be inappropriate as the interplay of the global routing

system on top of this physical map is not known. The remainder of this section

would be organized into three sub-sections presenting the set of studies which focus

on the (i) Internet performance, (ii) Internet resiliency, and (iii) AS relationship

classification. Furthermore, each sub-section would further divide the studies based

on the granularity of the topology which is employed.

         2.4.1    Performance. Raw performance metrics such as latency

and throughput can be conducted using end-to-end measurements without

any attention to the underlying topology. While these measurements can be

insightful on their own, gaining a further understanding of the root cause of subpar

performance often requires knowledge of the underlying topology. For example,

                                           61
high latency values reported through end-to-end measurements can be a side effect

of many factors including but not limited to congestion, a non-optimal route, an

overloaded server, and application level latencies. Many of these underlying causes

can only be identified by a correct understanding of the underlying topology.

Congestion can happen on various links along the forward and reverse path,

identifying the faulty congested link or more specifically the inter-AS link requires

a correct mapping for the traversed topology. Expanding infrastructure to address

congestion or subpar latency detected through end-to-end measurements is possible

through an understanding of the correct topology as well as the interplay of routing

on top of this topology. In the following Section, we will present studies that have

relied on router, AS, and physical level topologies to provide insight into various

network performance related issues.

            2.4.1.1   AS-Level Topology. Studies in this section rely on BGP

feeds as well as traceroute probes that have been translated to AS paths to study

performance characteristics such as increased latency and path lengths due to

insufficient network infrastructure within Africa Fanou, Francois, and Aben (2015);

Gupta et al. (2014), path stability and the latency penalties due to AS path

changes Green, Lambert, Pelsser, and Rossi (2018), IXPs centrality in Internet

connectivity as a means for reducing path distances towards popular content

Chatzis, Smaragdakis, Böttger, Krenc, and Feldmann (2013), and estimating traffic

load on inter-AS links through the popularity of traversed paths Sanchez et al.

(2014).

          Chatzis et al. (2013) demonstrate the centrality of a large European IXP

in the Internet’s traffic by relying on sampled sFlow traces captured by the IXP

operator. Peering relationships are identified by observing BGP as well as regular

                                           62
traffic being exchanged between tenant members. The authors limit their focus

to web traffic as it constitutes the bulk of traffic which is observed over the IXP’s

fabric. Endhost IP addresses are mapped to the country which they reside in by

using Maxmind’s IP to GEO dataset. The authors observe traffic from nearly every

country (242 out of 250). While tenant ASes generated the bulk of traffic, about

33% of traffic originated from ASes which were one or more hops away from the

IXP. The authors find that recurrent IP addresses generate about 60% of server

traffic. Finally, the authors highlight the heterogeneity of AS traffic by identifying

servers from other ASes which are hosted within another AS. Heterogeneous servers

are identified by applying a clustering algorithm on top of the SOA records of all

observed IP addresses. Lastly, the share of heterogeneous traffic on inter-AS links

is presented for Akamai and Cloudflare. It is found that about 11% (54%) of traffic

(servers) are originated (located) within 3rd-party networks.

       Sanchez et al. (2014) attempt to characterize and measure inter-domain

traffic by utilizing traceroutes as a proxy measure. Traceroute probes towards

random IP addresses from the Ono BitTorrent extension are gathered over two

separate months. Ground truth data regarding traffic volume is obtained from two

sources: (i) sampled sFlows from a large European IXP and (ii) link utilization

for the customers of a large ISP presenting the 95th percentile of utilization using

SNMP.

       AS-link traversing paths (ALTP) are constructed by mapping each hop of

traceroutes to their corresponding ASN. For each ALTP-set a relative measure

of link frequency is defined which represents the cardinality of the link to the

sum of cardinalities of all links in that set. This measure is used as a proxy for

traffic volume. The authors measure different network syntax metrics namely:

                                          63
connectivity, control value, global choice, and integration for the ALTP-sets

which have common links with their ground truth traffic data. r2 is measured

for regression analysis of the correlation between network syntax metrics and

traffic volume. ALTP-frequency shows the strongest correlation with r2 values

between 0.71 - 0.97 while the remainder of metrics also show strong and very-

strong correlations. The authors utilize the regression model to predict traffic

volume using ALTP-frequency as a proxy measure. Furthermore Sanchez et al.

demonstrate that the same inferences cannot be made from a simple AS-level

connectivity graph which is derived from BGP streams. Finally, the authors apply

the same methodology to CAIDA’s Ark dataset and find similar results regarding

the correlation of network syntax metrics and traffic volume.

       Gupta et al. (2014) study circuitous routes in Africa and their degrading

effect on latency. Circuitous routes are between two endpoints within Africa that

traverse a path outside of Africa, i.e. the traversed route should have ideally

remained within Africa but due to sub-par connectivity has detoured to a country

outside of Africa. Two major datasets are used for the study, (i) BGP routing

tables from Routeviews, PCH, and Hurricane Electric, and (ii) periodic (every 30

minutes) traceroute measurements from BISmark home routers towards MLab

servers, IXP participants, and Google cache servers deployed across Africa.

Traceroute hops are annotated with their AS owner and inter-AS links are

identified with the observation of ASN changes along the path. Circuitous routes

are identified by relying on high latency values for the given path. Latency penalty

is measured as the ratio of path latency to the best case latency between the source

node and a node in the same destination city. The authors find two main reasons

for paths with high latency penalty values namely, (i) ASes along the path are not

                                          64
physically present at a local IXP, or (ii) the ASes are present at a geographically

closeby IXP but do not peer with each other due to business preferences.

       Fanou et al. (2015) study Internet topology and its characteristics within

Africa. By expanding RIPE’s Atlas infrastructure within African countries, the

authors leverage this platform to conduct traceroute campaigns with the intention

of uncovering as many as possible AS paths. To this end, periodic traceroutes

were ran between all Atlas nodes within Africa. These probes would target both

IPv4 and IPv6 addresses if available. Traceroute hops were mapped to their

corresponding country by leveraging six public datasets, namely OpenIPMap,

MaxMind, Team Cymru, AFRINIC DB, Whois, and reverse DNS lookup. Upon

disagreement between datasets, RIPE probes within the returned countries were

employed to measure latency towards the IP addresses in question, the country

with the lowest latency was selected as the host country. Interface addresses are

mapped to their corresponding ASN by utilizing Team Cymru’s IP to AS service

TeamCymru (2008), using the augmented traceroute path the AS path between the

source and destination is inferred. Using temporal data the preference of AS pairs

to utilize the same path is studied, 73% (82%) of IPv4 (IPv6) paths utilize a path

with a frequency higher than 90%. Path length for AS pairs within west and south

Africa are studied, with southern countries having a slightly shorter average path

of 4 compare to 5. AS path for pairs of addresses which reside within the same

country in each region is also measured where it’s found that southern countries

have a much shorter path compared to pairs of addresses which are in the same

western Africa countries (average of 3 compared to 5). AS-centrality (percentage

of paths which AS appears in and is not the source or destination) is measured

to study transit roles of ASes. Impact of intercontinental transit on end-to-end

                                          65
delay is measured by identifying the IP path which has the minimum RTT. It is

found that intercontinental paths typically exhibit higher RTT values while a small

fraction of these routes still have relatively low RTT values (< 100ms) and are

attributed to inaccuracies in IP to geolocation mapping datasets.

       Green et al. (2018) leverage inter-AS path stability as a measure for

conducting Internet tomography and anomaly detection. Path stability is analyzed

by the stability of a primary path. The primary path of router r towards prefix

p is defined as the most prevalent preferred path by r during the window time-

frame of W. Relying on 3 months of BGP feeds from RIPE RIS’ LINX collector

it is demonstrated that 85% (90%) of IPv4 (IPv6) primary paths are in use for at

least half of the time. Any deviation from the primary path are defined as pseudo-

events which are further categorized into two groups: (i) transient events where

a router explores additional paths before reconverging to the primary path, and

(ii) structural events where a router consistently switches to a new primary path.

For each pseudo-event, the duration and set of new paths that were explored are

recorded. About 13% of transient pseudo-events are found to be longer than an

hour while 12% of structural pseudo-events last less than 7 days. The number of

explored paths and the recurrence of each path is measured for pseudo-events. It

is found that MRAI timers and route flap damping are efficient at regulating BGP

dynamics. However, these transient events could be recurrent and require more

complex mechanisms in order to be accounted for. For anomaly detection about

2.3k AS-level outages and hijack events reported by BGPmon during the same

period of the study are used as ground truth. About 84% of outages are detected

as pseudo-events in the same time window while about 14% of events the detection

time was about one hour earlier than what BGPmon reported. For hijacks, the

                                         66
announced prefix is looked-up amongst pseudo-events if no match is found less

specific prefixes are used as a point of comparison with BGPmon. For about 82% of

hijacking events, a matching pseudo-event was found, and the remainder of events

are tagged as explicit disagreements.

         2.4.1.2    Router-Level Topology. With the rise of peering disputes

highlighted by claims of throttling for Netflix’s traffic access to unbiased

measurements reflecting the underlying cause of subpar performance seems

necessary more than before. Doing so would require a topology map which captures

inter-AS links. The granularity of these links should be at the router level since

two ASes could establish many interconnections with each other, each of which

could exhibit different characteristics in terms of congestion. As outlined in Section

2.3 various methodologies have been presented that enable researchers to infer

the placement of inter-AS links from data plane measurements in the form of

traceroutes. A correct assessment of the placement of inter-AS links is necessary

to avoid attributing intra-AS congestion to inter-AS congestion, furthermore

incorrectly identifying the ASes which are part of the inter-AS link could lead to

attributing congestion to incorrect entities.

       Dhamdhere et al. (2018) rely on prior techniques Luckie et al. (2016) to

infer both ends of an interconnect link and by conducting time series latency

probes (TSLP) try to detect windows of time where the latency time series deviates

from its usual profile. Observing asymmetric congestion for both ends of a link is

attributed to inter-AS congestion. The authors deploy 86 vantage points within

47 ASes on a global scale. By conducting similar TSLP measurements towards

the set of identified inter-AS links over the span of 21 months starting at March

2016, the authors study congestion patterns between various networks and their

                                           67
upstream transit providers as well the interconnections they establish with content

providers. Additionally, the authors conduct throughput measurements using the

Network Diagnostic Tool (NDT) M-Lab (2018) as well as SamKnows SamKnows

(2018) throughput measurements of Youtube servers and investigate the correlation

of inter-AS congestion and throughput.

       Chandrasekaran, Smaragdakis, Berger, Luckie, and Ng (2015) utilize a

large content delivery networks infrastructure to assess the performance of the

Internet’s core. The authors rely on about 600 servers spanning 70 countries

and conduct pairwise path measurements in both forward and reverse directions

between the servers. Furthermore, AS paths are measured by translating router

hop interfaces to their corresponding AS owner, additionally inter-AS segments

are inferred by relying on a series of heuristics developed by the authors based

on domain knowledge and common networking practices. Latency characteristics

of the observed paths are measured by conducting periodic ping probes between

the server pairs. Consistency and prevalence of AS paths for each server pair are

measured for a 16 month period. It is found that about 80% of paths are dominant

for at least half of the measurement period. Furthermore, about 80% of paths

experience 20 or fewer route changes during the 16 month measurement period.

The authors measure RTT inflation in comparison to optimal AS paths and find

that sub-optimal paths are often short-lived although a small number (10%) of

paths experience RTT inflation for about 30% of the measurement period. Effects

of congestion on RTT inflation are measured by initially selecting the set of server

pairs which experience RTT inflation using ping probe measurements while the

first segment that experiences congestion is pinpointed by relying on traceroute

measurements which are temporally aligned with the ping measurements. The

                                         68
authors report that most inter and intra-AS links experience about 20 to 30 ms

of added RTT due to congestion.

       Chiu, Schlinker, Radhakrishnan, Katz-Bassett, and Govindan (2015) assess

path lengths and other properties for paths between popular content providers

and their clients. A collection of 4 datasets were used throughout the study

namely: (i) iPlane traceroutes from PlanetLab nodes towards 154k BGP prefixes,

(ii) aggregated query counts per /24 prefix (3.8M) towards a large CDN, (iii)

traceroute measurements towards 3.8M + 154k prefixes from Google’s Compute

Engine (GCE), Amazon Elastic Cloud, and IBM’s Softlayer VMs, and (iv)

traceroutes from RIPE Atlas probes towards cloud VMs and a number of popular

websites. Using traceroute measurements from various platforms and converting

the obtained IP hop path to its corresponding AS-level path the authors assess the

network distance between popular content providers and client prefixes. iPlane

traceroutes are used as a baseline for comparison, only 2% of these paths are

one hop away from their destination this value increases to 40% (60%) for paths

between GCE and iPlane (end user prefixes). This indicates that Google peers

directly with the majority of networks which host its clients. Using the CDN logs

as a proxy measure for traffic volume the authors find that Google peers with the

majority of ASes which carry large volumes of traffic. Furthermore Chiu et al. find

that the path from clients towards google.com due to off-net hosted cache servers is

much shorter where 73% of queries come from ASes that either peer with Google

or have an off-net server in their network or their upstream provider. A similar

analysis for Amazon’s EC2 and IBM’s Softlayer was performed each having 30%

and 40% one hop paths accordingly.



                                         69
       Kotronis, Nomikos, Manassakis, Mavrommatis, and Dimitropoulos (2017)

study the possibility of improving latency performance through the employment

of relay nodes within colocation facilities. This work tries to (i) identify the best

locations/colos to place relay nodes and (ii) quantify the latency improvements

that are attainable for end pairs. The authors select a set of ASes per each country

which covers at least 10% of the countries population by using APNIC’s IPv6

measurement campaign dataset APNIC (2018). RIPE Atlas nodes within these

AS country pairs are selected which are running the latest firmware, are connected

and pingable, and have had stable connectivity during the last 30 days. Colo

relays are selected by relying on the set of pinned router interfaces from Giotsas,

Smaragdakis, et al. (2015) work. Due to the age of the dataset, a series of validity

tests including conformity with PeeringDB data, pingability, consistent ASN owner,

and RTT-based geolocation test with Periscope LGs have been conducted over the

dataset to filter out stale information. A set of PlanetLab relays and RIPE Atlas

relays are also considered as reference points in addition to the set of colo relays.

The measurement framework consists of 30 minute rounds between April 20th -

May 17th 2017. Within each round, ping probes are sent between the selected end

pairs to measure direct latency. Furthermore, the relay paths latency is estimated

by measuring the latency between the <src, relay> and <dst, relay> pairs. The

authors observed improve latency for 83% of cases with a median of 12-14ms

between different relay types. Colo relays having the largest improvement. The

number of required relays for improved latency is measured, the authors find that

colo relays have the highest efficiency where 10 relays account for 58% of improved

cases while the same number of improved cases for RIPE relays would require more

than 100 relays. Lastly, the authors list the top 10 colo facilities which host the

                                           70
20 most effective colo relays, 4 of these color are in the top 10 PeeringDB colos in

terms of the number of colocated ASes and all host at least 2 or more IXPs within

them.

        Fontugne, Pelsser, Aben, and Bush (2017) introduce a statistical model

for measuring and pin-pointing delay and forwarding anomalies from traceroute

measurements. Given the prevalence of route asymmetry on the Internet,

measuring the delay of two adjacent hops is not trivial. This issue is tackled by

the key insight that differential delay between two adjacent hops is composed of

two independent components. Changes in link latency can be detected by having

a diverse set of traceroute paths that traverse the under study link and observing

latency values disrupting the normal distribution for latency median. Forwarding

patterns for each hop are established by measuring a vector accounting for the

number of times a next hop address has been observed. Pearson product-moment

correlation coefficient is used as a measure to detect deviations or anomalies within

the forwarding pattern of a hop. RIPE Atlas’ built-in and anchoring traceroute

probes for an eight-month period in 2015 are used for the study. The authors

highlight the applicability of their proposed methodology by providing insight

into three historical events namely, DDoS attacks on DNS root servers, Telekom

Malaysia’s BGP route leak, and Amsterdam IXP outage.

          2.4.1.3   Physical-Level Topology. Measuring characteristics of

physical infrastructure using data plan measurements is very challenging due to

the disassociation of routing from the physical layer. Despite these challenges, we

overview two studies within this section that investigate the effects of sub-optimal

fiber infrastructure on latency between two end-points Singla, Chandrasekaran,



                                          71
Godfrey, and Maggs (2014) and attempt to measure and pinpoint the causes of

observing subpar latency within fiber optic cables Bozkurt et al. (2018).

       Singla et al. (2014) outline the underlying causes of sub-par latency within

the Internet. The authors rely on about 400 Planet Lab nodes to periodically fetch

the front page of popular websites, geolocate the webserver’s location and measure

the optimal latency based on speed of light (c-latency) constraints. Interestingly

the authors find that the median of latency inflation is about 34 times greater

than c-latency. Furthermore, the authors breakdown the webpage fetch time into

its constituent components namely, DNS resolution, TCP handshake, and TCP

transfer. Router path latency is calculated by conducting traceroutes towards

the servers and lastly, minimum latency towards the web server is measured

by conducting periodic ping probe. It is found that the median of router paths

experience about 2.3x latency inflation. The authors hypothesize that latencies

within the physical layer are due to sub-optimal fiber paths between routers. The

validity of this hypothesis is demonstrated by measuring the pairwise distance

between all nodes of Internet2 and GEANT network topologies and also computing

road distance using Google Maps API. It is found that fiber links are typically

1.5-2x longer than road distances. While this inflation is smaller in comparison to

webpage fetching component’s latency the effects of fiber link inflation are evident

within higher layers due to the stacked nature of networking layers.

       Bozkurt et al. (2018) present a detailed analysis of the causes for sub-

par latency within fiber networks. The authors rely on Durairajan et al. (2014)

InterTubes dataset to estimate fiber lengths based on their conduits in the dataset.

Using the infrastructure of a CDN, server clusters which are within a 25km radius

of conduit endpoints were selected, and latency probes between pairs of servers at

                                         72
both ends of the conduit were conducted every 3 hours for the length of 2 days.

The conduit length is estimated using the speed of light within fiber optic cables

(f-latency), and the authors find that only 11% of the links have RTTs within

25% of the f-latency for their corresponding conduit. Bozkurt et al. enumerate

various factors which can contribute to the inflated latency that they observed

within their measurements namely, (i) refraction index for different fiber optic

cables varies, (ii) slack loops within conduits to account for fiber cuts, (iii) latency

within optoelectrical and optical amplifier equipment, (iv) extra fiber spools to

compensate for chromatic dispersion, (v) publication of mock routes by network

operators to hide competitive details, and (vi) added fiber to increase latency

for price differentiation. Using published latency measurements from AT&T and

CenturyLink RTT inflation in comparison to f-latency from InterTubes dataset is

measured to have a median of 1.5x (2x) for AT&T and CenturyLink’s networks.

The accuracy of InterTubes dataset is verified for Zayo’s network. Zayo published

detailed fiber routes on their website. The authors find great conformity for the

majority of fiber conduit lengths while for 12% of links the length difference is more

than 100km.

         2.4.2    Resiliency. Studying the resiliency of Internet infrastructure

has been the subject of many types of research over the past decade. While many

of these studies have reported postmortems regarding natural disasters and their

effects on Internet connectivity, others have focused on simulating what-if scenarios

to examine the resiliency of the Internet towards various types of disruptions.

Within these studies, researchers have utilized Internet topologies which were

contemporary to their time. The resolution of these topologies would vary in

accordance with the stated problem. For example, the resiliency of long haul fiber

                                           73
infrastructure to rising sea levels due to global warming is measured by relying on

physical topology maps Durairajan, Barford, and Barford (2018) while the effects

of router outages on BGP paths and AS reachability is studied using a combination

of router and AS level topologies within Luckie and Beverly (2017). The remainder

of this section is organized according to the resolution of the underlying topology

which is used by these studies.

         2.4.2.1    AS-Level Topology. Katz-Bassett et al. (2012) propose

LIFEGUARD as a system for recovering from outages by rerouting traffic.

Outages are categorized into two groups of forward and reverse path outages.

Outages are detected and pinpointed by conducting periodic ping and traceroute

measurements towards the routers along the path. A historical list of responsive

routers for each destination is maintained. Prolonged unresponsive ping probes

are attributed to outages. For forward path outages, the authors suggest the use

of alternative upstream providers which traverse AS-paths that do not overlap

with the unresponsive router. For reverse path outages, the authors propose a

BGP poisoning solution where the origin AS would announce a path towards its

own prefix which includes the faulty AS within the advertised path. This, in turn,

causes the faulty AS to withdraw the advertisement (to avoid a loop) of the prefix

and therefore cause alternative routes to be explored in the reverse path. A less-

specific sentinel prefix is advertised by LIFEGUARD to detect the recovery of the

previous path.

       Luckie and Beverly (2017) correlate BGP outage events to inferred

router outages by relying on time-series of IPID values obtained through active

measurements. This work is motivated by the fact that certain routers rely on

central incremental counters for the generated IPID values, given this assumption

                                         74
one would expect to observe increasing IPID values for a single router. Any

disruption in this pattern can be linked to a router reboot. IPID values for

IPv4 packets are susceptible to counter rollover since they are only 16 bits wide.

The authors rely on IPID values obtained by inducing fragmentation within

IPv6 packets. The authors rely on a hit list of IPv6 router addresses which is

obtained from intermediate hops of CAIDA’s Ark traceroute measurements. By

analyzing the time series of IPID values, an outage window is defined for each

router. Router outages are correlated with their corresponding BGP control

plane events by looking at BGP feeds and finding withdrawal and announcement

messages occurring during the same time frame. It is found that for about 50%

of router/prefix pairs at least 1-2 peers withdrew the prefix and nearly all peers

withdrew their prefix announcement for about 10% of the router/prefix pairs.

Luckie et al. find that about half of the ASes which had outages were completely

unrouted during the outage period and had single points of failure.

       Unlike Luckie et al. approach which relied on empirical data to assess the

resiliency of Internet, Lad, Oliveira, Zhang, and Zhang (2007) investigate both

the impact and resiliency of various ASes to prefix hijacking attacks by simulating

different attacks using AS-level topologies obtained through BGP streams. Impact

of prefix hijacking is measured as the fraction of ASes which believe the false

advertisement by a malicious AS. Similarly, the resiliency of an AS against prefix

hijacks is measured as the number of ASes which believe the true prefix origin

announcement. Surprisingly it is found that 50% of stub and transit ASes are more

resilient than Tier-1 ASes this is mainly attributed to valley-free route preferences.

       Fontugne, Shah, and Aben (2018) look into structural properties, more

specifically AS centrality, of AS-level IPv4 and IPv6 topology graphs. AS-level

                                          75
topologies are constructed using BGP feeds of Routeviews, RIPE RIS, and

BGPmon monitors. The authors illustrate the sampling bias of betweenness

centrality (BC) measure by sub-sampling the set of available monitors and

measuring the variation of BC for each sample. AS hegemony is used as an

alternative metric for measuring the centrality of ASes which accounts for monitor

biases by eliminating monitors too close or far from the AS in question and

averaging the BC score across all valid monitors. Additionally, BC is normalized to

account for the size of advertised prefixes. The AS hegemony score is measured for

the AS-level graphs starting from 2004 till 2017. The authors find a great decrease

in the hegemony score throughout the years supporting Internet flattening reports.

Despite these observations, the hegemony score for ASes with the largest scores

have remained consistent throughout the years pointing to the importance of large

transit ASes in the operation of the Internet. AS hegemony for Akamai and Google

is measured, the authors report little to no dependence for these content providers

to any specific upstream provider.

            2.4.2.2   Router-Level Topology. Palmer, Siganos, Faloutsos,

Faloutsos, and Gibbons (2001) rely on topology graphs gathered by SCAN and

Lucent projects consisting of 285k (430k) nodes (links) to simulate the effects

of link and node failures within the Internet connectivity graph. The number of

reachable pairs is used as a proxy measure to assess the impact of link or node

failures. It is found that the number of reachable nodes does not vary significantly

up to the removal of 50k links failures while this value drops to about 10k for node

removals.

       Kang and Gligor (2014); Kang, Lee, and Gligor (2013) propose the Crossfire

denial of service attack that targets links which are critical for Internet connectivity

                                          76
of ASes, cities, regions, or countries. The authors rely on a series of traceroute

measurements towards addresses within the target entity and construct topological

maps from various VPs towards these targets. The attacker would choose links

that are “close" to the target (3-4 router hops) and appear with a high frequency

within all paths. The attacker could cut these entities from the Internet by utilizing

a bot-net to launch coordinated low rate requests towards various destinations in

the target entity. Furthermore, the attacker can avoid detection by the target by

targeting addresses which are in close proximity of the target entity, e.g. sending

probes towards addresses within the same city where an AS resides within. The

pervasiveness and applicability of the Crossfire attack is investigated by relying

on 250 PlanetLab Chun et al. (2003) nodes to conduct traceroutes towards 1k web

servers located within 15 target countries and cities. Links are ranked according

to their occurrence within traceroutes and for all target cities and countries,

the authors observe a very skewed power-law distribution. This observation is

attributed to cost minimization within Internet routing (shortest path for intra-

domain and hot-potato for inter-domain routing). Bottleneck links are measured to

be on average about 7.9 (1.84) router (AS) hops away from the target.

       Giotsas et al. (2017) develop Kepler a system that is able to detect

peering outages. Kepler relies on BGP communities values that have geocoded

embeddings. Although BGP community values are not standardized, they have

been utilized by ASes for traffic engineering, traffic blackholing, and network

troubleshooting. Certain ASes use the lower 16bits of the BGP communities

attribute as a unique identifier for each of their border routers. These encodings

are typically documented on RIR webpages. The authors compile a dictionary of

BGP community values and their corresponding physical location (colo or IXP) by

                                          77
parsing RIR entries. Furthermore, a baseline of stable BGP paths is established by

monitoring BGP feeds and removing transient announcements. Lastly, the tenants

of colo facilities and available IXPs and their members is compiled from PeeringDB,

DataCenterMap, and individual ASes websites. Deviations in stable BGP paths

such as explicit withdrawal or change in BGP community values are considered as

outage signals.

         2.4.2.3     Physical-Level Topology. Schulman and Spring (2011)

investigate outages within the last mile of Internet connectivity which are caused

by severe weather conditions. The authors design a tool called ThunderPing

which relies on weather alerts from the US National Weather Service to conduct

connectivity probes prior, during, and after a severe weather condition towards

the residential users of the affected regions. A list of residential IP addresses is

compiled by parsing the reverse DNS entry for 3 IP addresses within each /24

prefix. If any of the addresses have a known residential ISP such as Comcast

or Verizon within their name the remainder of addresses within that block are

analyzed as well. IP addresses are mapped to their corresponding geolocation

by relying on Maxmind’s IP to GEO dataset. Upon the emergence of a weather

alert ThunderPing would ping residential IP addresses within the affected region

for 6 hours before, during, and after the forecasted event using 10 geographically

distributed PlanetLab nodes. A sliding window containing 3 pings is used to

determine the state of a host. A host responding with more than half of the

pings is considered to be UP, not responding to any pings is considered to be

DOWN, and host responding to less than half of the pings is in a HOSED state.

The authors find that failure rates are more than double during thunderstorms

compared to other weather conditions. Furthermore, the median for the duration

                                           78
of DOWN times is almost an order of magnitude larger (104 seconds) during

thunderstorms compared to clear weather conditions.

       Eriksson, Durairajan, and Barford (2013) present a framework (RiskRoute)

for measuring the risks associated with various Internet routes. RiskRoute has two

main objectives namely, (i) computing backup routes and (ii) to measure new paths

for network provisioning. The authors introduce the bit-risk miles measure which

quantifies the geographic distance that is traveled by traffic in addition to the

outage risk along the path both in historical and immediate terms. Furthermore

bit-risk miles is scaled to account for the impact of an outage by considering the

population that is in the proximity of an outage. The likelihood of historical outage

for a specific location is estimated using a Gaussian kernel which relies on observed

disaster events at all locations. For two PoPs, RiskRoute aims to calculate the path

which minimizes the bit-risk mile measure. For intra-domain routes, this is simply

calculated as the path which minimizes the bit-risk mile measure among all possible

paths which connect the two PoPs. For inter-domain routing the authors estimate

BGP decisions using geographic proximity and rely on shortest path routes. Using

the RiskRoute framework, improvements in the robustness of networks is analyzed

by finding an edge which would result in the largest increase in bit-risk measure

among all possible paths. It is found that Sprint and Teliasonera networks observe

the greatest improvement in robustness while Level3’s robustness remains fairly

consistent mostly due to rich connectivity within its network.

       Durairajan et al. (2018) assess the impact of rising sea levels on the Internet

infrastructure within the US. The authors align the data from the sea level rise

inundation dataset from the National Oceanic and Atmospheric Administration

(NOAA) with long-haul fiber maps from the Internet Atlas project Durairajan et

                                          79
al. (2013) using the overlap feature of ArcGIS. The amount of affected fibers as well

as the number of PoPs, colos, and IXPs that will be at risk due to the rising sea

levels is measured. The authors find that New York, Seattle, and Miami are among

the cities with the highest amount of vulnerable infrastructure.

         2.4.3     AS Relationship Inference. ASes form inter-AS connections

motivated by different business relationships. These relationships can be in the

form of a transit AS providing connectivity to a smaller network as a customer

(c2p) by charging them based on the provided bandwidth or as a settlement-free

connection between both peers (p2p) where both peers exchange equal amounts of

traffic through their inter-AS link. These inter-AS connections are identical from

topologies obtained from control or data plan measurements. The studies within

this section overview a series of methodologies developed based on these business

relationships in conjunction with the valley free routing principle to distinguish

these peering relationships from each other.

         2.4.3.1     AS-Level. Luckie, Huffaker, Dhamdhere, and Giotsas (2013)

develop an algorithm for inferring the business relationships between ASes by solely

relying on BGP data. Relationships are categorized as a customer to provider (c2p)

relationship were a customer AS pays a provider AS for its connectivity to the

Internet or a peer to peer (p2p) relationship were two ASes provide connectivity

to each other and often transmit equal amounts of traffic through their inter-

AS link(s). Inference of these relationships are based on BGP data using three

assumptions: (i) there is a clique of large transit providers at the top of the

Internet hierarchy, (ii) customers enter a transit agreement to be globally reachable,

and (iii) we shouldn’t have a cycle in customer to provider (c2p) relationships.

The authors validate a subset (43k) of their inferences, which is the largest by the

                                          80
time of publication, and finally they provide a new solution for inferring customer

cones of ASes. For their analyses, the authors rely on various data sources namely

BGP paths from Routeviews and RIPE’s RIS, any path containing origin ASes

which do not contain valid ASNs (based on RIRs) is excluded from the dataset. For

validation Luckie et al. use three data sources: validation data reported by network

operators to their website, routing policies reported to RIRs in export and import

fields, and finally they use the communities attribute of BGP announcements based

on the work of Giotsas and Zhou (2012). The authors define two metrics node

degree and transit degree which can be measured from the AS relationship graph.

       Giotsas, Luckie, et al. (2015) modify CAIDA’s IPv4 relationship inference

algorithm Luckie, Huffaker, Dhamdhere, and Giotsas (2013) and adapt it to IPv6

networks with the intention of addressing the lack of a fully-connected transit-

free clique within IPv6 networks. BGP dumps from Routeviews and RIPE RIS

which announce reachability towards IPv6 prefixes are used throughout this study.

For validation of inferred relationships three sources are used: BGP communities,

RPSL which is a route policy specification language that is available in WHOIS

datasets and is mandated for IXPs within EU by RIPE, and local preference

(LocPref) which is used to indicate route preference by an AS where ASes assign

higher values to customers and lower values to providers to minimize transit cost.

Data is sanitized by removing paths with artifacts such as loops or invalid ASNs.

The remainder of the algorithm is identical to Luckie, Huffaker, Dhamdhere, and

Giotsas (2013) with modifications to two steps: i) inferring the IPv6 clique and

ii) removing c2p inferences made between stub and clique ASes. In addition to

considering the transit degree and reachability, peering policy of ASes is also taken

into account for identifying cliques. Peering policy is extracted from PeeringDB,

                                          81
a restrictive policy is assumed for ASes who do not report this value. ASes with

selective or restrictive policies are selected as seeds to the clique algorithm. For an

AS to be part of the clique, it should provide BGP feeds to Routeviews or RIPE

RIS and announce routes to at least 90% of IPv6 prefixes available in BGP. The

accuracy of inferences is validated using the three validation sources which where

described, a consistent accuracy of at least 96% was observed for p2c and p2p

relationships for the duration of the study. The fraction of congruent relationships

where the relationship type is identical for IPv4 and IPv6 networks is measured.

The authors find that this fraction increases from 85% in 2006 to 95% in 2014.

         2.4.3.2     PoP-Level. Giotsas et al. (2014) provide a methodology for

extending traditional AS relationship models to include two complex relationships

namely: hybrid and partial transit relationships. Hybrid relations indicate different

peering relations at different locations. Partial transit relations restrict the scope of

a customers relation by not exporting all provider paths to the customer. AS path,

prefixes, and communities strings are gathered from Routeviews and RIPE RIS

datasets. CAIDA’s Ark traceroutes in addition to a series of targeted traceroutes

launched from various looking glasses are employed to confirm the existence of

various AS relationships. Finally, geoinformation for AS-links are gathered from

BGP community information, PeeringDB’s reverse DNS scan of IXP prefixes, DNS

parsing of hostnames by CAIDA’s DRoP service, and NetAcuity’s IP geolocation

dataset is used as a fallback when other methods do not return a result. Each

AS relationship is labeled into one of the following export policies: i) full transit

(FT) where the provider exports prefixes from its provider, ii) partial transit (PT)

where prefixes of peers and customers are only exported, and iii) peering (P) where

prefixes of customers are only exported. Each identified relationship defaults to

                                           82
peering unless counter facts are found through traceroute measurements which

indicate PT or FT relationships. Out of 90k p2c relationships 4k of them are

classified as complex with 1k and 3k being hybrid and partial-transit accordingly.

For validation (i) direct feedback from network operators, (ii) parsed BGP

community values, and (iii) RPSL objects are used. Overall 19% (7%) of hybrid

(partial-transit) relationships were confirmed.




                                          83
                                    CHAPTER III

                             LOCALITY OF TRAFFIC

       This chapter provides a study on the share of cloud providers and CDNs in

Internet traffic from the perspective of an edge network (UOnet). Furthermore, this

work quantifies the degree to which the serving infrastructure for cloud providers

and CDNs is close/local to UOnet’s network and investigates the implications of

this proximity on end-users performance.

       The content in this chapter is derived entirely from Yeganeh, Rejaie,

and Willinger (2017) as a result of collaboration with co-authors listed in the

manuscript. Bahador Yeganeh is the primary author of this work and responsible

for conducting all the presented analyses.

3.1   Introduction

       During the past two decades, various efforts among different Internet players

such as large Internet service providers (ISP), commercial content distribution

networks (CDN) and major content providers have focused on supporting the

localization of Internet traffic. Improving traffic localization has been argued to

ensure better user experience (in terms of shorter delays and higher throughput)

and also results in less traffic traversing an ISP’s backbone or the interconnections

(i.e., peering links) between the involved parties (e.g., eyeball ASes, transit

providers, CDNs, content providers). As a result, it typically lowers a network

operator’s cost and also improves the scalability of the deployed infrastructure in

both the operator’s own network and the Internet at large.

       The main idea behind traffic localization is to satisfy a user request for

a certain piece of content by re-directing the request to a cache or front-end

server that is in close proximity to that user and can serve the desired piece

                                           84
of content. However, different commercial content distribution companies use

different strategies and deploy different types of infrastructures to implement their

business model for getting content closer to the end users. For example, while

Akamai Akamai (2017) operates and maintains a global infrastructure consisting

of more then 200K servers located in more than 1.5K different ASes to bring the

requested content by its customers closer to the edge of the network where this

content is consumed, other CDNs such as Limelight or EdgeCast rely on existing

infrastructure in the form of large IXPs to achieve this task Limelight (2017).

Similar to Akamai but smaller in scale, major content providers such as Google and

Netflix negotiate with third-party networks to deploy their own caches or servers

that are then used to serve exclusively the content provider’s own content. In fact,

traffic localization efforts in today’s Internet continue as the large cloud providers

(e.g., Amazon, Microsoft) are in the process of boosting their presence at the edge

of the network by deploying increasingly in the newly emerging 2nd-tier datacenters

(e.g., EdgeConneX EdgeConneX (2018)) that target the smaller- or medium-sized

cities in the US instead of the major metropolitan areas.

       These continued efforts by an increasing number of interested parties to

implement ever more effective techniques and deploy increasingly more complex

infrastructures to support traffic localization has motivated numerous studies on

designing new methods and evaluating existing infrastructures to localize Internet

traffic. While some of these studies Adhikari et al. (2012); Böttger, Cuadrado,

Tyson, Castro, and Uhlig (2016); Calder et al. (2013); Fan, Katz-Bassett, and

Heidemann (2015) have focused on measurement-based assessments of different

deployed CDNs to reveal their global Böttger et al. (2016); Calder et al. (2013)

or local Gehlen, Finamore, Mellia, and Munafò (2012); Torres et al. (2011)

                                           85
infrastructure nodes, others have addressed the problems of reverse-engineering a

CDN’s strategy for mapping users to their close-by servers or examining whether

or not the implemented re-direction techniques achieve the desired performance

improvements for the targeted end users Adhikari et al. (2012); Fan et al. (2015);

Gehlen et al. (2012). However, to our knowledge, none of the existing studies

provides a detailed empirical assessment of the nature and impact of traffic

localization as seen from the perspective of an actual stub-AS. In particular,

the existing literature on the topic of traffic localization provides little or no

information about the makeup of the content that the users of an actual stub-AS

request on a daily basis, the proximity of servers that serve the content requested

by these users (overall or per major content provider), and the actual performance

benefits that traffic localization entails for the consumers of this content (i.e., end

users inside the stub-AS).

       In this chapter, we fill this gap in the existing literature and report on a

measurement study that provides a detailed assessment of different aspects of the

content that arrives at an actual stub-AS as a result of the requests made by its

end users. To this end, we consider multiple daily snapshots of unsampled Netflow

data for all exchanged traffic between a stub-AS that represents a Research &

Education network (i.e., UOnet operated by the University of Oregon) and the

Internet 3.2. We show that some 20 content providers are responsible for most of

the delivered traffic to UOnet and that for each of these 20 content providers, the

content provider specific traffic is typically coming from only a small fraction of

source IPs (Section 3.3). Using RTT to measure the distance of these individual

source IPs from UOnet, we present a characterization of this stub-AS’ traffic

footprint; that is, empirical findings about the locality properties of delivered

                                            86
traffic to UOnet, both in aggregate and at the level of individual content providers

(Section 3.4). In particular, we examine how effective the individual content

providers are in utilizing their infrastructure nodes to localize their delivered

traffic to UOnet and discuss the role that guest servers (i.e., front-end servers or

caches that some of these content providers deploy in third-party networks) play

in localizing traffic for this stub-AS (Section 3.5). As part of this effort, we focus

on Akamai and develop a technique that uses our data to identify all of Akamai’s

guest servers that delivered content to UOnet. We then examine different features

of the content that arrived at UOnet from those guest servers as compared to the

content that reached UOnet via servers located in Akamai’s own AS. Finally, we

investigate whether or not a content provider’s ability to localize its traffic has

implications on end user-perceived performance, especially in terms of observed

throughput (Section 3.6).

3.2   Data Collection for a Stub-AS: UOnet

       The stub-AS that we consider for this study is the campus network of the

University of Oregon (UO), called UOnet (ASN3582). UOnet serves more than 24K

(international and domestic) students and 4.5K faculty/staff during the academic

year. These users can access the Internet through UOnet using wireless (through

2000+ access points) or wired connections. Furthermore, more than 4,400 of the

students reside on campus and can access the Internet through UOnet using their

residential connections. UOnet has three upstream providers, Neronet (AS3701),

Oregon Gigapop (AS4600) and the Oregon IX exchange. Given the types of offered

connectivity and the large size and diversity of the UOnet user population, we

consider the daily traffic that is delivered from the rest of the Internet to UOnet



                                           87
to be representative of the traffic that a stub-AS that is classified as a US Research

& Education network is likely to experience.

       To conduct our analysis, we rely on un-sampled Netflow (v5) data that is

captured at the different campus border routers. As a result, our Netflow data

contains all of the flows between UOnet users and the Internet. The Netflow

dataset contains a separate record for each incoming (and outgoing) flow from

(to) an IP address outside of UOnet, and each record includes the following flow

attributes: (i) source and destination IP addresses, (ii) source and destination port

numbers, (iii) start and end timestamps, (iv) IP protocol, (v) number of packets,

and (vi) number of bytes. We leverage Routeviews data to map all the external IPs

to their corresponding Autonomous Systems (ASes) and use this information to

map individual flows to particular providers (based on their AS number) and then

determine the number of incoming (and outgoing) flows (and corresponding bytes)

associated with each provider. In our analysis, we only consider the incoming flows

since we are primarily interested in delivered content and services from major

content providers to UOnet users. An incoming flow refers to a flow with the source

IP outside and destination IP inside UOnet. We select 10 daily (24 hour) snapshots

of Netflow data that consist of Tuesday and Wednesday from five consecutive

weeks when the university was in session, starting with the week of Oct 3rd and

ending with the week of of Oct 31st in 2016. Table 2 summarizes the main features

of the selected snapshots, namely their date, the number of incoming flows and

associated bytes, and the number of unique external ASes and unique external

IPs that exchanged traffic with UOnet during the given snapshot. In each daily

snapshot, wireless connections are responsible for roughly 62% (25%) of delivered



                                          88
Table 2. Main features of the selected daily snapshots of our UOnet Netflow data.

             Snapshot     Flows (M)       TBytes      ASes (K)     IPs (M)
             10/04/16        196           8.7           39          3.3
             10/05/16        193           8.5           37          3.0
             10/11/16        199           9.0           41          4.1
             10/12/16        198           9.1           41          4.7
             10/18/16        202           8.8           40          3.7
             10/19/16        200           9.1           38          3.3
             10/25/16        205           8.7           37          2.9
             10/26/16        209           9.1           40          4.1
             11/01/16        212           8.6           39          3.5
             11/02/16        210           8.7           40          4.3


bytes (flows) and residential users contributed to about 17% (10%) of incoming

bytes (flows).

3.3   Identifying Major Content Providers

       Our main objective is to leverage the UOnet dataset to provide an

empirical assessment of traffic locality for delivered flows to UOnet and examine

its implications for the end users served by UOnet. Here by “locality" we refer to a

notion of network distance between the servers in the larger Internet that provide

the content/service requested from within UOnet. Since the level of locality of

delivered traffic by each content provider depends on both the relative network

distance of its infrastructure and its strategy for utilizing this infrastructure, we

conduct our analysis at the granularity of individual content providers and focus

only on those that are responsible for the bulk of delivered content to UOnet.

Moreover, because the number of unique source IPs that send traffic to UOnet on

a daily basis is prohibitively large, we identify and focus only on those IPs that are

responsible for a significant fraction of the delivered traffic.

Inferring Top Content Providers: Figure 6 (left y-axis) shows the histogram of

delivered traffic (in TB) to UOnet by those content providers that have the largest

                                            89
                          2.5                                                                     1.0

                          2.0                                                                     0.8
 Delivered Traffic (TB)


                          1.5                                                                     0.6

                          1.0                                                                     0.4

                          0.5                                                                     0.2

                          0.0                                                                     0.0
                                  Twitter




                                  Twitch
                                   Fastly




                                Pandora
                                  Netflix
                                 Akamai
                                Dropbox



                               Edgecast
                                   Valve



                                Comcast


                                  Quantil
                                 Neronet

                                  Level3
                                Internap


                               Limelight
                                     NTT
                                 Amazon
                             CenturyLink
                              CloudFlare
                                  Cogent
                                    OVH
Figure 6. The volume of delivered traffic from individual top content providers
to UOnet along with the CDF of aggregate fraction of traffic by top 21 content
providers in the 10/04/16 snapshot.

contributions in the 10/04/16 snapshot. It also shows (right y-axis) the CDF of

the fraction of aggregate traffic that is delivered by the top-k content providers in

this snapshot. The figure is in full agreement with earlier studies such as Ager,

Mühlbauer, Smaragdakis, and Uhlig (2011); Chatzis et al. (2013) and clearly

illustrates the extreme skewness of this distribution – the top 21 content providers

(out of some 39K ASes) are responsible for 90% of all the delivered daily traffic to

UOnet.

                           To examine the stability of these top content providers across our 10 daily

snapshots, along the x-axis of Figure 7, we list any content provider that is among

the top content providers (with 90% aggregate contributions in delivered traffic) in

at least one daily snapshot (the ordering is in terms of mean rank, from small to

                                                             90
large for content providers with same prevalence). This figure shows the number

of daily snapshots in which a content provider has been among the top content

providers (i.e. content provider’s prevalence, left y-axis) along with the summary

distribution (i.e., box plot) of each of the content providers rankings among the

top content providers across different snapshots (rank distribution, right y-axis).

We observe that the same 21 content providers consistently appear among the

top content providers. These 21 content providers are among the well-recognized

players of today’s Internet and include major content providers (e.g. Netflix,

Twitter), widely-used CDNs (e.g. Akamai, LimeLight and EdgeCast), and large

providers that offer hosting, Internet access, and cloud services (e.g. Comcast,

Level3, CenturyLink, Amazon). In the following, we only focus on these 21 content

providers (called target content providers) that are consistently among the top

content providers in all of our snapshots. These target content providers are also

listed in Figure 6 and collectively contribute about 90% of the incoming daily bytes

in each of our snapshots.

Inferring Top IPs per Target Content Providers: To assess the locality of

the traffic delivered to UOnet from each target content provider, we consider the

source IP addresses for all of the incoming flows in each daily snapshot. While

for some target content providers, the number of unique source IP addresses is

as high as a few tens of thousands, the distribution of delivered traffic across

these IPs exhibits again a high degree of skewness; i.e. for each target content

provider, only a small fraction of source IPs (called top IPs) is responsible for

90% of delivered traffic. Figure 8 shows the summary distribution (in the form

of box plots) of the number of top IPs across different snapshots along with the

cumulative number of unique top IPs (blue line) and all IPs (red line) across all

                                          91
                                                                              30
              10
                                                                              25
               8
                                                                              20
 Prevelance




                                                                                  Rank Dist
               6                                                              15
               4                                                              10
               2                                                              5
               0                                                              0




                             UW
                          Fastly
                            NTT




                            ATT
                            OVH
                          Valve
                         Twitch




                           TiNet
                          M247
                        Quantil




                           TATA

                         Yahoo
                         Netflix
                         Level3

                        Twitter
                       Akamai
                      Internap




                        Cogent



                        Spotify
                        RedHat
                       Amazon




                      Pandora
                       Neronet

                     Comcast
                    Limelight
                      Dropbox
                     Edgecast




                     KoreaTel



                    ChinaNet
                   CloudFlare
                  CenturyLink




                   Sec Comm
                  CDNetworks




                 Charter Com
               ChinaBackbone

Figure 7. The prevalence and distribution of rank for any content provider that has
appeared among the top content providers in at least one daily snapshot.

of our 10 snapshots. The log-scale on the y-axis shows that the number of top

IPs is often significantly smaller than the number of all IP addresses (as a result

of the skewed distribution of delivered content by different IPs per target content

provider). A small gap between the total number of top IPs and their distribution

across different snapshots illustrates that for many of the target content providers,

the top IPs do not vary widely across different snapshots. In our analysis of traffic

locality below, we only consider the collection of all top IPs associated with each of

the target content provider across different snapshots. Focusing on these roughly

50K IPs allows us to capture a rather complete view of delivered traffic to UOnet

without considering the millions of observed source IPs.




                                          92
            10 6
            10 5
            10 4
 IP Count



            10 3
            10 2
            10 1
            10 0
               CenturyLink
                CloudFlare




                 Limelight

                    Twitch
                  Comcast
                   Amazon




                    Quantil
                 Edgecast



                    Twitter
                  Dropbox
                  Internap
                  Pandora
                   Akamai
                    Level3
                    Cogent


                     Valve
                   Neronet
                     Fastly
                    Netflix
                      OVH


                       NTT



Figure 8. Distribution of the number of top IPs across different snapshots in
addition to total number of unique top IP addresses (blue line) and the total
number of unique IPs across all snapshots (red line) for each target content
provider.

Measuring the Distance of Top IPs: Using the approximately 50K top IPs

for all 21 target content providers, we conducted a measurement campaign (on

11/10/16) that consisted of launching 10 rounds of traceroutes1 from UOnet to

all of these 50K top IPs to infer their minimum RTT.

             Note that the value of RTT for each top IP accounts for possible path

asymmetry between the launching location and the target IP and is therefor largely

insensitive to the direction of the traceroute probe (i.e. from UOnet to a top IP vs.

from a top IP to UOnet). Our traceroute probes successfully reached 81% of the

targeted IP addresses. We exclude three target content providers (i.e., Internap,

   1
    We use all three types of traceroute probes(TCP, UDP, ICMP) and spread them throughout
the day to reach most IPs and reliable capture minimum RTT

                                              93
       50%                                                           50%
       75%      Pandora Netflix Neronet                              75%      Pandora Netflix Neronet
       90% Quantil                    Akamai                         90% Quantil                    Akamai
           OVH                                    Level3                 OVH                                    Level3
                                                  50 60                                                         50 60
                                            30 40                                                         30 40
       Cogent                            20          Fastly          Cogent                            20          Fastly
                                    10                                                            10
      CloudFlare                                     Twitter        CloudFlare                                     Twitter

       CenturyLink                               Comcast             CenturyLink                               Comcast
                 Valve                         NTT                             Valve                         NTT
                      EdgecastDropboxLimelight                                      EdgecastDropboxLimelight


Figure 9. Radar plots showing the aggregate view of locality based on RTT of
delivered traffic in terms of bytes (left plot) and flows (right plot) to UOnet in a
daily snapshot (10/04/2016).


Amazon and Twitch) from our analysis because their servers did not respond

to more than 90% of our traceroute probes. All other target content providers

responded to more than 90% of our probes.

       The outcome of our measurement campaign is the list of top IPs along

with their min RTT and the percentage of delivered traffic (in terms of bytes

and flows) for each target content provider. With the help of this information,

we can now assess the locality properties of the content that is delivered from

each target content provider to UOnet. Note that in theory, any distance measure

could be used for this purpose. However, in practice, neither AS distance (i.e.,

number of AS hops), nor hop-count distance (i.e., number of traceroute hops), nor

geographic distance are reliable metrics. While the first two ignore the commonly

encountered asymmetry of IP-level routes in today’s Internet Sánchez et al. (2013),

the last metric suffers from known inaccuracies in commercial databases such

as IP2Location IP2Location (2015) and Maxmind MaxMind (2018) that are

commonly used for IP geolocation. We choose the RTT distance (i.e., measured

by min RTT value) as our metric-of-choice for assessing the locality of delivered


                                                               94
traffic since it is the most reliable distance measure and also the most relevant in

terms of user-perceived delay.

3.4   Traffic Locality for Content Providers

Overall View of Traffic Locality: We use radar plots to present an overall view

of the locality of aggregate delivered traffic from our target content providers to

UOnet based on RTT distance. Radar plots are well suited for displaying multi-

variable data where individual variables are shown as a sequence of equiangular

spokes, called radii. We use each spoke to represent the locality of traffic for

a given target content provider by showing the RTT values for 50th, 75th and

90th percentiles of delivered traffic (in bytes or flows). In essence, the spoke

corresponding to a particular target content provider shows what percentage of

the traffic that this content provider delivers to UOnet originates from within 10,

20,..., or 60ms distance from our stub-AS. Figure 9 shows two such radar plots

for a single daily snapshot (10/04/16). In these plots, the target CPs are placed

around the plot in a clock-wise order (starting from 12 o’ clock) based on their

relative contributions in delivered bytes (as shown in Figure 6), and the distances

(in terms of min RTT ranges) are marked on the 45-degree spoke. The left and

right plots in Figure 9 show the RTT distance for 50, 75 and 90th percentile of

delivered bytes and flows for each content provider, respectively. By connecting the

same percentile points on the spokes associated with the different target content

providers, we obtain a closed contour where the sources for 50, 75 or 90% of the

delivered content form our target content providers to UOnet are located. We

refer to this collection of contours as the traffic footprint of UOnet. While more

centrally-situated contours indicate a high degree of overall traffic locality for the



                                           95
considered stub-AS, contours that are close to the radar plot’s boundary for some

spokes suggest poor localization properties for some content providers.

       The radar plots in Figure 9 show that while there are variations in traffic

locality for different target content providers, 90% of the delivered traffic for the

top 13 content providers are delivered from within a 60ms RTT distance from

UOnet and for 9 of them from within 20ms RTT. Moreover, considering the case

of Cogent, while 50% of bytes from Cogent are delivered from an RTT distance of

20ms, 50% of the flows are delivered from a distance of 60ms. Such an observed

higher level of traffic locality with respect to bytes compared to flows suggests

that a significant fraction of the corresponding target content provider’s (in this

case, Cogent) large or “elephant” flows are delivered from servers that are in closer

proximity to UOnet than those that serve the target content provider’s smaller

flows. Collectively, these findings indicate that for our stub-AS, the overall level of

traffic locality for delivered bytes and flows is high but varies among the different

target content providers. These observations are by and large testimony to the

success of past and ongoing efforts by the different involved parties to bring content

closer to the edge of the network where it is requested and consumed. As such, the

results are not surprising, but to our knowledge, they provide the first quantitative

assessment of the per-content provider traffic footprint (based on RTT distance) of

a stub-AS.

Variations in Traffic Locality: After providing an overall view of the locality

of the delivered traffic to UOnet for a single snapshot, we next turn our attention

to how traffic locality of a content provider (with respect to UOnet) varies over

time. To simplify our analysis, we consider all flows of each target content provider

and bin them based on their RTTs using a bin size of 2ms. The flows in each bin

                                           96
are considered as a single group with an RTT value given by the mid-bin RTT

value. We construct the histogram of percentages of delivered bytes from each

group of flows in each bin and define the notion of Normalized Weighted Locality

for delivered traffic from a provider P in snapshot s as:
                                        X            F racBytes(i) ∗ RT T (i)
                  N W L(s, P ) =
                                                          minRT T (P )
                                   iRT T Bins(P )

N W L(s,P) is simply the sum of the fraction of delivered traffic from each RTT bin

(F racBytes(i)) that is weighted by its RT T and then normalized by the lowest

RTT among all bin (minRT T (P )) for a content provider across all snapshots.

N W L is an aggregate measure that illustrates how effectively a content provider

localizes its delivered traffic over its own infrastructure. A N W L value of 1 implies

that all of the traffic is delivered from the closest servers while larger values

indicate more contribution from servers that are further from UOnet.

       The top plot in Figure 10 presents the summary distribution of N W L(s, P )

across different daily snapshots for each content provider. The bottom plot in

Figure 10 depicts min RTT for each content provider. These two plots together

show how local the closest server of a content provider is and how effective each

content provider is in utilizing its infrastructure. The plots also demonstrate the

following points about the locality of traffic. For one, for many target content

providers (e.g. Netflix, Comcast, Valve), the N W L values exhibits small or no

variations across different snapshots. Such a behavior suggests that the pattern

of delivery from different servers is stable across different snapshots. In contrast,

for content providers with varying N W L values, the contribution of various servers

(i.e. the pattern of content delivery from various content provider servers) changes

over time. Second, the value of N W L is less than 2 (and often very close to 1)

for many content providers. This in turn indicates that these content providers
                                               97
effectively localize their delivered traffic to UOnet over their infrastructure.

The value of N W L for other content providers is larger and often exhibit larger

variation due to their inability to effectively utilize their nodes to localize delivered

traffic to UOnet.
            7
            6
            5
            4
   NWL




            3
            2
            1
           70
           60
           50
 min RTT




           40
           30
           20
           10
            0
                     Twitter
                      Fastly
                     Netflix

                    Akamai



                   Comcast
                        NTT
                  Limelight




                     Quantil
                   Dropbox
                  Edgecast
                      Valve
                CenturyLink
                 CloudFlare
                     Cogent
                       OVH

                   Pandora
                    Neronet

                     Level3




Figure 10. Two measures of traffic locality, from top to bottom, Summary
distribution of NWL and the RTT of the closest servers per content provider (or
minRTT).


3.5        Traffic From Guest Servers

           To improve the locality properties of their delivered content and services to

end users, some content providers expand their infrastructure by deploying some

of their servers in other networks. We refer to such servers as guest servers and to

the third-party networks hosting them as host networks or host ASes. For example,

Akamai is known to operate some 200K such servers in over 1.5K different host


                                             98
networks, with the servers using IP addresses that belong to the host networks Fan

et al. (2015); Triukose, Wen, and Rabinovich (2011).

       We present two examples to illustrate the deployment of guest servers.

First, our close examination of delivered traffic from Neronet which is one of

UOnet’s upstream providers revealed that all of its flows are delivered from a

small number of IPs (see Figure 8) associated with Google servers, i.e. Google

caches Calder et al. (2013) that are deployed in Neronet. This implies that all

of Google’s traffic for UOnet is delivered from Neronet-based Google caches and

explains why Google is not among our target content providers. Second, Netflix

is known to deliver its content to end users through its own caches (called Open

Connect AppliancesNetflix (2017b)) that are either deployed within different host

networks or placed at critical IXPs Böttger et al. (2016). When examined the

DNS names for all the source IPs of our target content providers, we observed a

number of source IPs that are within another network and their DNS name follow

the *.pdx001.ix.nflxvideo.net format. This is a known Neflix convention for

DNS names and clearly indicates that these guest servers are located at an IXP in

Portland, Oregon Böttger et al. (2016).

         3.5.1   Detecting Guest Servers. Given the special nature of content

delivery to UOnet from Google (via Neronet) and Netflix (via a close-by IXP),

we focus on Akamai to examine how its use of guest servers impact the locality of

delivered traffic to UOnet. However, since our basic methodology that relies on a

commonly-used IP-to-AS mapping technique cannot identify Akamai’s guest servers

and simply associates them with their host network, we present in the following

a new methodology for identifying Akamai’s guest servers that deliver content to

UOnet.

                                          99
        Our proposed method leverages Akamai-specific information and proceeds

in two steps. The first step consists of identifying the URLs for a few small, static

and popular objects that are likely to be cached at many Akamai servers. Then, in

a second step, we probe the observed source IP addresses at other target content

providers with properly-formed HTTP request for the identified objects. Any

third-party server that provides the requested objects is considered an Akamai

guest server. More precisely, we first identify a few Akamai customer websites and

interact with them to identify small, static and popular objects (i.e., “reference

objects"). Since JavaScript or CSS files are less likely to be modified compared to

other types of objects and thus are more likely to be cached by Akamai servers,

we used in our experiments two JavaScript objects and a logo from Akamai client

web sites (e.g. Apple, census.gov, NBA). Since an Akamai server is responsible

for hosting content from multiple domain names, the web server needs a way to

distinguish requests that are redirected from clients of different customer websites.

This differentiation is achieved with the help of the HOST field of the HTTP

header. Specifically, when constructing a HTTP request to probe an IP address,

we set the HOST field to the original domain name of the reference object (e.g.

apple.com, census.gov, nba.com). Next, for each reference object, we send a

separate HTTP request to each of the 50K top source IP addresses in our datasets

(see Section 3). If we receive the HTTP OK/200 status code in response to our

request and the first 100 bytes of the provided object match the requested reference

object 2 , we consider the server to be an Akamai guest server and identify its AS as

host AS. We repeat our request using other reference objects if the HTTP request

  2
    The second condition is necessary since some servers provide a positive response to any HTTP
requests.



                                              100
fails or times-out. If all of our requests time-out or receive a HTTP error code, we

mark the IP address as a non-Akamai IP address.

       To evaluate our proposed methodology, we consider all the 601 servers in our

dataset whose IP addresses are mapped to Akamai (based on IP to AS mapping)

and send our HTTP requests to all of them. Since all Akamai servers are expected

to behave similarly, the success rate of our technique in identifying these Akamai

servers demonstrates its accuracy. Indeed, we find that 585 (97%) of these servers

properly respond to our request and are thus identified as Akamai servers. The

remaining 3% either do not respond or respond with various HTTP error codes.

When examining these 16 failed servers more closely, we discovered that 11 of

them were running a mail server and would terminate a connection to their web

server regardless of the requested content. This suggests that these Akamai servers

perform functions other than serving web content.

       Using our proposed technique, we probed all 50K top source IP addresses

associated with our 21 target content providers in all of our snapshots. When

performing this experiment (on 11/20/16), we discovered between 143-295 Akamai

guest servers in 3-7 host ASes across the different snapshots. In total, there were

658 unique guest servers from 7 unique host ASes, namely NTT, CenturyLink,

OVH, Cogent, Comcast, Dropbox and Amazon. Moreover, these identified Akamai

guest servers deliver between 121-259 GBytes to UOnet in their corresponding daily

snapshots which is between 9-20% of the aggregate daily traffic delivered from

Akamai to UOnet. These results imply that the 34-103 Akamai-owned servers in

each snapshot deliver on average 12 times more content to UOnet than Akamai’s

143-295 guest servers. Moreover, we observed that the bulk of delivered bytes from

Akamai’s guest servers to UOnet (i.e., 98%) is associated with guest servers that

                                         101
                     Akamai                                    Akamai
                                                       50%
                                                       75%
                                                       90%

                                            20                                        20
                                       15                                        15
                                  10                                        10
                              5                                         5
 OVH                                         NTT OVH                                   NTT




                   CenturyLink                               CenturyLink



Figure 11. Locality (based on RTT in ms) of delivered traffic (bytes, left plot;
flows, right plot) for Akamai-owned servers as well as Akamai guest servers residing
within three target ASes for snapshot 2016-10-04.


are deployed in two content providers, namely NTT (76.1%) and CenturyLink

(21.9%).

           3.5.2   Relative Locality of Guest Servers. Deploying guest servers

in various host ASes enables a content provider to either improve the locality of

its traffic or provide better load balancing among its servers. To examine these

two objectives, we compare the level of locality of traffic delivered from Akamai-

owned servers vs Akamai’s guest servers. The radar plots in Figure 11 illustrate the

locality (based on RTT) of delivered content from Akamai-owned servers shown at

12 o’clock (labeled as Akamai) as well as from Akamai’s guest servers in all three

host networks in the snapshot from 10/04/16. The guest servers are grouped by

their host ASes and ordered based on their aggregate contribution in delivered


                                                 102
bytes (for Akamai flows) in a clock-wise order. We observe that traffic delivered

from Akamai-owned servers exhibits a higher locality – 75% (90%) of the bytes

(flows) are delivered from servers that are 4ms (8ms) RTT away. The Akamai

traffic from CenturyLink, NTT and OVH is delivered from servers that are at RTT

distance of 8, 15 and 20ms, respectively. While these guest servers serve content

from further away than the Akamai-owned servers, they are all relatively close to

UOnet which suggests that they are not intended to offer higher level of locality for

delivered content to UOnet users.

3.6    Implications of Traffic Locality

        Improving end user-perceived performance (i.e. decreasing delay and/or

increasing throughput) is one of the main motivations for major content providers

to bring their front-end servers closer to the edge of the network. In the following,

we examine whether such performance improvements are indeed experienced by

the end users served by UOnet and to what extent for a given content provider the

observed performance is correlated with that content provider’s traffic locality.

        We already showed in Figure 9 that the measured min RTT values for a

majority of content providers (with some exceptions such as OVH, Quantil, Cogent)

are consistently low (<20ms) across all flows. The average throughput of each

flow can be easily estimated by dividing the total number of delivered bytes by its

duration 3 . To get an overall sense of the observed average throughput, Figure 12

shows the summary distributions of the measured throughput across delivered flows

by each target content provider. We observe that 90% of the flows for all target

content providers (except Level3) experience low throughput (< 0.5MB/s, and in

   3
    Note that we may have fragmented flows for this analysis. This means that long flows will be
divided into 5min intervals. However, 5min is sufficiently long to estimate average throughput of
individual flows.


                                               103
                     3.0
                     2.5
 Throughput (MB/s)


                     2.0
                     1.5
                     1.0
                     0.5
                     0.0




                        CenturyLink
                         CloudFlare
                           Comcast

                           Dropbox
                          Edgecast
                            Neronet




                           Pandora
                            Akamai

                             Twitter

                          Limelight




                             Cogent
                              Fastly




                              Valve



                             Quantil
                             Netflix

                             Level3


                                NTT




                               OVH
Figure 12. Summary distribution of average throughput for delivered flows from
individual target content providers towards UOnet users across all of our snapshots.

most cases even < 0.25MB/s). This raises the question why these very localized

flows do not achieve higher throughput.

                      In general, reliably identifying the main factors that limit the throughput

of individual flows is challenging Sundaresan, Feamster, and Teixeira (2015). The

cause could be any combination of factors that include

         – Content Bottleneck: the flow does not have sufficient amount of content to “fill

                     the pipe";

         – Receiver Bottleneck: the receiver’s access link (i.e. client type) or flow control

                     is the limiting factor;

         – Network Bottleneck: the fair share of network bandwidth is limited due to

                     cross traffic (and resulting loss rate);
                                                          104
   – Server Rate Limit: a content provider’s server may limit its transmission

       rate implicitly due to its limited capacity or explicitly as a results of the

       bandwidth requirements of the content (e.g. Netflix videos do not require

       more than 0.6 MB/s for a Full-HD stream Netflix (2017a)).

Rather than inferring the various factors that affect individual flows, our goal is to

identify the primary factor from the above list that limits the maximum achievable

throughput by individual content providers. To this end, we only consider 3-4%

(or 510-570K) of all flows for each target content provider that their size exceeds

1 MB and refer to them as “elephant" flows.4 These elephant flows have typically

several 100s of packets and are thus able to fully utilize available bandwidth in the

absence of other limiting factors (i.e. content bottleneck does not occur). More than

0.5 million elephant flows for individual content providers are delivered to end users

in UOnet that have diverse connection types (wireless, residential, wired). Therefor,

receiver bottleneck should not be the limiting factor for the maximum achievable

throughput by individual content providers. This in turn suggests that either the

network or the server are responsible for limiting the achievable throughput.

        To estimate the Maximum Achievable Throughput (MAT) for each content

provider, we group all elephant flows associated with that content provider based

on their RTT into 2ms bins and select the 95% throughput value (i.e. median of the

top 10%) in the bin as its MAT with its mid-bin RTT value as the corresponding

RTT. Since a majority (96%) of these flows are associated with TCP connection

and thus are congestion controlled, we can examine the key factors responsible

for limiting throughput. Figure 13 shows a scatter plot where each labelled dot

   4
    Selecting the 1 MB threshold for flow size strikes a balance between having sufficiently large
flows Sodagar (2011) and obtaining a large set of flows for each content provider.


                                                105
                       6        AKAM                                                   10 −2
                                                                                       10 −3
                                                 AK-DRPBX
                                                                                       10 −4
                       5                                                               2 ∗ 10 −5
                                   FSTLY
   Throughput (MB/s)


                               NERO
                       4           AK-CLINK
                                        AK-NTT
                                     LVL3 PNDR
                       3
                                   CLFLR         DRPBX                                      AK-OVH
                                   LMLT
                       2                                                       OVH
                               NFLX EDGCS   CMCST
                                    QNTL CGNT
                       1            TWTR
                                    VALV                           AK-CMCST
                                         NTT CLINK                                      AK-CGNT
                       0
                           0       10      20        30     40     50     60    70    80           90
                                                            RTT (ms)
Figure 13. Maximum Achievable Throughput (MAT) vs MinRTT for all content
providers. The curves show the change in the estimated TCP throughput as a
function of RTT for different loss rates.

represents a target content provider with its y-value denoting its MAT and its

x-value denoting the associated RTT. We also group all Akamai flows from its

guest servers at each host ASx , determine their separate MAT and exclude them

from ASx ’s own flows to avoid double-counting them. For example, Akamai flows

that are delivered from OVH are marked as AK-OVH. To properly compare the

measured MAT values across different RTTs, we also plot an estimated TCP

throughput as a function of RTT for three different loss rates that we obtain by
                                                                                     M SS       √1 .
applying the commonly-used equationMathis et al. (1997): T <                         RT T
                                                                                            ∗     L
                                                                                                       In

this equation, M SS denotes the Maximum Segment Size which we set to 1460; L

represents the loss rate. We consider three different loss rate values, namely 10−2 ,

10−3 , 10−4 , to cover a wide range of "realistic" values.

                                                             106
          Examining Figure 13, we notice that the relative location of each labelled

dot with respect to the TCP throughput lines reveals the average “virtual" loss

rate across all elephant flows of a content provider if bandwidth bottleneck were

the main limiting factor. The figure shows that this virtual loss rate for many

content providers is at or above 10−3 . However, in practice, average loss rates

higher than 10−3 over such short RTTs (<20ms) are very unlikely in our setting

(e.g., UOnet is well provisioned and most incoming flows traverse the paths with

similar or identical tail ends). To test this hypothesis, we directly measure the loss

rate between UOnet and the closest servers for each content provider using 170K

ping probes per content provider.5 Figure 14 depicts the average loss rate for each

target content provider and shows that the measured average loss rate for all of the

target content providers is at least an order of magnitude lower than the virtual

loss rate for each content provider. This confirms that all of the measured MAT

values must be rate-limited by the server, either explicitly (due to the bandwidth

requirements of the content) or implicitly (due to server overload).

          Figure 13 also shows that the measured MAT values for Akamai guest

servers are often much larger than those for the servers owned by the host AS. For

example, the MAT value for AK-CLINK (AS-DRPBX or AK-NTT) is much higher

than the MAT for CLINK (DRPBX, or NTT). Furthermore, the measured MAT

value for all the flows from Akamai’s guest servers is lower than its counterpart for

all flows from Akamai-owned servers.

          To summarize, there are two main take-aways from our examination of the

performance implications of traffic locality. On the one hand, traffic locality is key

to achieving the generally and uniformly very small measured delays for traffic

  5
      Note that ping measures loss in both directions of a connection.


                                                 107
                    5
                    4
 Loss Rate x 10−4


                    3
                    2
                    1
                    0
                                Twitter
                                 Fastly




                               AK-NTT


                           AK-Comcast
                               AK-OVH
                                Netflix
                               Akamai


                              Comcast
                                   NTT
                             Limelight




                                Quantil
                        AK-CenturyLink

                           AK-Dropbox
                            AK-Cogent
                              Dropbox
                             Edgecast
                                 Valve


                                  OVH
                           CenturyLink
                            CloudFlare
                                Cogent

                              Pandora
                               Neronet
                                Level3




Figure 14. Average loss rate of closest servers per target content provider measured
over 24 hours using ping probes with 1 second intervals. For each content provider
we choose at most 10 of the closest IP addresses.

delivered to UOnet. On the other hand, our results show that a majority of flows

for all target content providers are associated with small files and thus do not reach

a high throughput. Furthermore, the throughput for most of the larger flows are

not limited by the network but rather by the front-end servers. In other words,

high throughput delivery of content at the edge is either not relevant (for small

objects) or not required by applications.

3.7                 Summary

                    Our work contributes to the existing literature on content delivery by

providing a unique view of different aspects of content delivery as experienced by

the end users served by a stub-AS (i.e., a Research & Education network). To this

end, we examine the complete flow-level view of traffic delivered to this stub-AS

                                                     108
from all major content providers and characterize this stub-AS’ traffic footprint (i.e.

a detailed assessment of the locality properties of the delivered traffic).

       We also study the impact that this traffic footprint has on the performance

experienced by its the end users and report on two main takeaways. First, this

stub-AS’ traffic locality is uniformly high across the main CPs; i.e., the traffic that

these CPs deliver to this stub-AS experiences in general only very small delays.

Second, the throughput of the delivered traffic remains far below the maximum

achievable throughout and is not limited by the network but rather by the front-

end servers.

       Lastly, to complement the effort described in this chapter, assessing the

locality properties of the traffic that constitutes the (long) tail of the distribution

in Figure 6 and is typically delivered from source IP addresses that are rarely seen

in our data or are responsible for only minuscule portions of the traffic delivered to

UOnet looms as an interesting open problem and is part of future work.




                                           109
                                           CHAPTER IV

                               CLOUD PEERING ECOSYSTEM

          Chapter III presented an overview of CPs and content providers’ share in

Internet traffic and the degree of locality for their infrastructure. In this chapter,

we focus on the topology and connectivity of CPs to the rest of the Internet. We

pay special attention to the new form of peering relationships that CPs are forming

with edge networks.

          The content in this chapter is derived entirely from Yeganeh, Durairajan,

Rejaie, and Willinger (2019) as a result of collaboration with co-authors listed

in the manuscript. Bahador Yeganeh is the primary author of this work and

responsible for conducting all measurements and producing the presented analyses.

4.1     Introduction

          In this chapter, we present a third-party, cloud-centric measurement study

aimed at discovering and characterizing the unique peerings (along with their

types) of Amazon, the largest cloud service provider in the US and worldwide.

Each peering typically consists of one or multiple (unique) interconnections between

Amazon and a neighboring Autonomous System (AS) that are typically established

at different colocation facilities around the globe. Our study only utilizes publicly

available information and data (i.e. no Amazon-proprietary data is used) and is

therefore also applicable for discovering the peerings of other large cloud providers.1

          We start by presenting the required background on Amazon’s serving

infrastructure, including the different types of peerings an enterprise network can

establish with Amazon at a colo facility in § 4.2. § 4.3 describes the first round of

our data collection; that is, launching cloud-centric traceroute probes from different

  1
      As long as the cloud provider does not filter traceroute probes.


                                                  110
regions of Amazon’s infrastructure toward all the /24 (IPv4) prefixes to infer a

subset of Amazon’s peerings. We present our methodology for inferring Amazon’s

peerings across the captured traceroutes in § 4.4.1. Our second round of data

collection consists of using traceroute probes that target the prefixes around the

peerings discovered in the first round and are intended to identify all the remaining

(IPv4) peerings of Amazon (§ 4.4.2). In § 4.5, we present a number of heuristics

to resolve the inherent ambiguity in inferring the specific traceroute segment that

is associated with a peering. We further confirm our inferred peerings by assessing

the consistency of border interfaces at both the Amazon side and client side of an

inferred interconnection.

       Pinning (or geo-locating) each end of individual interconnections associated

with Amazon’s peerings at the metro level forms another contribution of this study

(§ 4.6). To this end, we develop a number of methods to identify border interfaces

that have a reliable location and which we refer to as anchors. Next, we establish

a set of co-presence rules to conservatively propagate the location of anchors to

other close-by interfaces. We then identify the main factors that limit our ability

to pin all border interfaces at the metro level and present ways to pin most of the

interfaces at the regional level. Finally, we evaluate the accuracy and coverage of

our pinning technique and characterize the pinned interconnections.

       The final contribution of this work is a new method for inferring the

client border interface that is associated with that client’s VPI with Amazon. In

particular, by examining the reachability of a given client border interface from

a number of other cloud providers (§ 4.7) and identifying overlapping interfaces

between Amazon and those other cloud providers, our method provides a lower

bound on the number of Amazon’s VPIs. We then assign all inferred Amazon

                                         111
peerings to different groups based on their key attributes such as being public or

private, visible or not visible in BGP, and physical or virtual. We then carefully

examine these groups of peerings to infer their purpose and explore hybrid peering

scenarios. In particular, we show that one-third of Amazon’s inferred peerings are

either virtual or not visible in BGP and thus hidden from public measurement.

Finally, we characterize the inferred Amazon connectivity graph as a whole.

4.2   Background

       Amazon’s Ecosystem. The focus of our study of peerings in today’s

Internet is Amazon, arguably the largest cloud service provider in the US and

worldwide. Amazon operates several data centers worldwide. While these data

centers’ street addresses are not explicitly published by Amazon, their geographic

locations have been reported elsewhere Burrington (2016); DatacenterMap (2018);

Miller (2015); Plaven (2017); WikiLeaks (2018); Williams (2016). Each data center

hosts a large number of Amazon servers that, in turn, host user VMs as well as

other services (e.g. Lambda). Amazon’s data center locations are divided into

independent and distinct geographic regions to achieve fault tolerance/stability.

Specifically, each region has multiple, isolated availability zones (AZs) that provide

redundancy and offer high availability in case of failures. AZs are virtual and their

mapping to a specific location within their region is not known Amazon (2018f).

As of 2018, Amazon had 18 regions (55 AZs) across the world, with five of them

(four public + one US government cloud) located in the US. For our study, we were

not able to utilize three of these regions. Two of them are located in China, are not

offered on Amazon’s AWS portal, and require approval requests by Amazon staff.

The third region is assigned to the US government and is not offered to the public.



                                         112
Peering with Amazon at Colo Facilities. Clients can connect to Amazon

through a specific set of colo facilities. Amazon is considered a native tenant in

these facilities, and their locations are publicly announced by Amazon Amazon

(2018d). Amazon is also reachable through a number of other colo facilities via

layer-2 connectivity offered by third-party providers (e.g. Megaport).2


                                                            AS 4
                     b                             f
           Azure         Azure          IXP
                                                            AS 3     Access   Local
          west-us                                                    Network Business
                                                   e
                                                            AS 2
                                                   d
                     a                                                            g
                                                            AS 1   Connectivity
                                  Cloud Exchange c
                         AWS                                                          AS 5
            AWS                                                      Partner
          us-west1
                                    CoreSite LA1                       Databank SLC
Figure 15. Overview of Amazon’s peering fabric. Native routers of Amazon &
Microsoft (orange & blue) establishing private interconnections (AS3 - yellow
router), public peering through IXP switch (AS4 - red router), and virtual private
interconnections through cloud exchange switch (AS1 , AS2 , and AS5 - green
routers) with other networks. Remote peering (AS5 ) as well as connectivity to
non-ASN businesses through layer-2 tunnels (dashed lines) happens through
connectivity partners.

        Figure 15 depicts an example of different types of peerings offered by cloud

providers at two colo facilities. Both Amazon (AWS) and Microsoft (Azure) are

native (i.e. house their border routers) in the CoreSite LA1 colo facility and are

both present at that facility’s IXP and cloud exchange. (Open) cloud exchanges

are switching fabrics specifically designed to facilitate interconnections among

network providers, cloud providers, and enterprises in ways that provide the

scalability and elasticity essential for cloud-based services and applications (e.g.

see CoreSite (2018); Equinix (2017)). Major colo facility providers (e.g. Equinix

   2
    These entities are called “AWS Direct Connect Partners" at a particular facility and are listed
online along with their points of presence Amazon (2018c).
                                               113
and CoreSite) also offer a new interconnection service option called “virtual private

interconnection (VPI).” VPIs enable local enterprises (that may or may not own

an ASN) to connect to multiple cloud providers that are present at the cloud

exchange switching fabric by means of purchasing a single port on that switch. In

addition, VPIs provide their customers access to a programmable, real-time cloud

interconnection management portal. Through this portal, the operators of these

new switching fabrics make it possible for individual enterprises to establish their

VPIs in a highly-flexible, on-demand, and near real-time manner. This portal also

enables enterprises to monitor in real-time the performance of their cloud-related

traffic that traverses these VPIs.

       While cloud exchanges rely on switching fabrics that are similar to those

used by IXPs, there are two important differences. For one, cloud exchanges enable

each customer to establish virtualized peerings with multiple cloud providers

through a single port. Moreover, they provide exclusive client connectivity to

cloud providers without requiring a client to use its pre-allocated IP addresses.

Operationally, a cloud customer establishes VPIs using either public or private IP

addresses depending on the set of cloud services that this customer is trying to

reach through these interconnections. On the one hand, VPIs relying on private

addresses are limited to the customer’s virtual private cloud (VPC) through VLAN

isolation. On the other hand, VPIs with public addresses can reach compute

resources in addition to other AWS offerings such as S3 and DynamoDB Amazon

(2018b). Given the isolation of network paths for VPIs with private addresses, any

peerings associated with these VPIs are not visible to the probes from VMs owned

by other Amazon customers. This makes it, in practice, impossible to discover

established VPIs that rely on private addresses. In Figure 15, the different colors

                                         114
of the client routers indicate the type of their peerings; e.g. public peering through

the IXP (for AS4 ), direct physical interconnection (also called “cross-connect") (for

AS3 ), private virtual peerings that are either local (for AS1 and AS2 ) or remote

(for AS5 ). Here, a local virtual private peering (e.g. AS2 ) could be associated

with an enterprise that is brought to the cloud exchange by its access network

(e.g. Comcast) using layer-2 technology; based on traceroute measurements, such

a peering would appear to be between Amazon and the access network. In contrast,

a remote private virtual peering could be established by an enterprise (e.g. AS5 )

that is present at a colo facility (e.g. Databank in Salt Lake City in Figure 15)

where Amazon is not native but that houses an “AWS Direct Connect Partner"

(e.g. Megaport) which in turn provides layer-2 connectivity to AWS.

4.3    Data Collection & Processing

        To infer all peerings between Amazon and the rest of the Internet, we

perform traceroute campaigns from Amazon’s 15 available global regions to a .1

in each /24 prefix of the IPv4 address space.3 To this end, we create a t2-micro

instance VM within each of the 15 regions and break down the IPv4 address

space into /24 prefixes. While we exclude broadcast and multicast prefixes, we

deliberately consider addresses that are associated with private and shared address

spaces since these addresses can be used internally in Amazon’s own network. This

process resulted in 15.6M target IPv4 addresses.

        To probe these target IPs from our VMs, we use the Scamper tool Luckie

(2010) with UDP probes as they provide the highest visibility (i.e. response

rate). Individual probes are terminated upon encountering five consecutive

unresponsive hops in order to limit the overall measurement time while reaching

   3
     We observed a negligible difference in the visibility of interconnections across probes from
different AZs in each region. Therefore, we only consider a single AZ from each region.
                                                 115
Amazon’s border routers. We empirically set our probing rate to 300pps to prevent

blacklisting or rate control of our probe packets by Amazon. With this probing

rate, our traceroute campaign took nearly 16 days to complete (from 08/03/2018 to

08/19/2018). Each collected traceroute is associated with a status flag indicating

how the probe was terminated. We observed that the total number of completed

traceroutes across different regions is fairly consistent but rather small (mean 7.7%

and std 5 ∗ 10−4 ) which suggests a limited yield. However, since our main objective

is to identify Amazon interconnections and not to maximize traceroute yield, we

consider any traceroute that leaves Amazon’s network (i.e. reaches an IP outside of

Amazon’s network) as a candidate for revealing the presence of an interconnection,

and the percentage of these traceroutes is about 77%.

Annotating Traceroute Data. To identify any Amazon interconnection

traversed by our traceroutes, we annotate every IP hop with the following

information: (i) its corresponding ASN, (ii) its organization (ORG), and (iii)

whether it belongs to an IXP prefix. To map each IP address to its ASN, we rely

on BGP snapshots from RouteViews and RIPE RIS (taken at the same time as

our traceroute campaign). For ORG, we rely on CAIDA’s AS-to-ORG dataset

Huffaker, Keys, Fomenkov, and Claffy (2018) and map the inferred ASN of each

hop from the previous step to its unique ORG identifier. ORG information allows

us to correctly identify the border interface of a customer in cases where traceroute

traverses through hops in multiple Amazon ASes prior to reaching a customer

network4 . Finally, to determine if an IP hop is part of an IX prefix, we rely on

PeeringDB PeeringDB (2017), Packet Clearing House (PCH) Packet Clearing House

  4
    We observed AS7224, AS16509, AS19047, AS14618, AS38895, AS39111, AS8987, and AS9059
for Amazon.



                                          116
(2017), and CAIDA’s IXP dataset CAIDA (2018) to obtain prefixes assigned to

IXPs.

        In our traceroutes, we observe IP hops that do not map to any ASN. These

IPs can be divided into two groups. The first group consists of the IPs that belong

to either a private or a shared address space (20.3%); we set the ASN of these IPs

to 0. The second group consists of all the IPs that belong to the public address

space but were not announced by any AS during our traceroute campaign (7%); for

these IPs, we infer the AS owner by relying on WHOIS-provided information (i.e.

name or ASN of the entity/company assigned by an RIR).

4.4     Inferring Interconnections

        In this section, we describe our basic inference strategy for identifying an

Amazon-related interconnection segment across a given traceroute probe (§ 4.4.1)

and discuss the potential ambiguity in the output of this strategy. We then

discuss the extra steps we take to leverage these identified segments in an effort

to efficiently expand the number of discovered Amazon-related interconnections

(§ 4.4.2).

             4.4.1   Basic Inference Strategy. Given the ASN-annotated

traceroute data, we start from the source and sequentially examine each hop until

we detect a hop that belongs to an organization other than Amazon (i.e. its ORG

number is neither 0 nor 7224, which is Amazon). We refer to this hop as customer

border hop and to its IP as a Customer Border Interface (CBI). The presence of

a CBI indicates that the traceroute has exited Amazon’s network; that is, the

traceroute hop right before a CBI is the Amazon Border Interface (ABI), and

the corresponding traceroute probe thus must have traversed an Amazon-related

interconnection segment. For the remainder of our analysis, we only consider these

                                          117
initial portions of traceroutes between a source and an encountered CBI .5 Next, for

each CBI , we check to confirm that the AS owners of all the downstream hops in

each traceroute does not include any ASN owned by Amazon (i.e. a sanity check

that the traceroute does not re-enter Amazon); all of our traceroutes meet this

condition. Finally, because of their unreliable nature, we exclude all traceroutes

that contain either an (IP-level) loop, unresponsive hop(s) prior to Amazon’s

border, a CBI as the destination of a traceroute Baker (1995), or duplicate hops

before Amazon’s border. The first two rows of Table 3 summarize the number of

ABIs and CBIs that we identified in our traceroute data, along with the fraction

of interfaces in each group for which we have BGP, Whois, and IXP-association

information. As highlighted in § 2.3.2, in certain cases, our basic strategy may not

identify the correct Amazon-related interconnection segment on a given traceroute.

Given that our traceroutes are always launched from Amazon to a client’s network,

when Amazon provides addresses for the physical interconnection, our strategy

incorrectly identifies the next downstream segment as an interconnection Amazon

(2018b).

         In summary, the described method always reveals the presence of an

Amazon-related interconnection segment in a traceroute. The actual Amazon-

specific interconnection segment is either the one between the identified ABI and

CBI or the immediately preceding segment. Because of this ambiguity in accurately

inferring the Amazon-specific interconnection segments, we refer to them as

candidate interconnection segments. In § 4.5, we present techniques for a more

precise determination of these inferred candidate interconnection segments.

  5
      In fact, we only need the CBI and the prior two ABIs.




                                               118
Table 3. Number of unique ABIs and CBIs along with their fraction with various
meta data, prior (rows 2-3) and after (rows 4-5) /24 expansion probing.

                 All              BGP%               Whois%               IXP%

 ABI             3.68k             38.4%               61.6%                 -

 CBI           21.73k             54.74%               24.8%              20.46%

 eABI            3.78k            38.85%              61.15%                 -

 eCBI          24.75k             79.82%              22.32%              17.86%


         4.4.2     Second Round of Probing to Expand Coverage. We

perform our traceroute probes from each Amazon’s region in two rounds. First,

as described in § 4.4.1, we target .1 in each /24 prefix of the IPv4 address space

(§ 4.3) and identify the pool of candidate interconnection segments. However,

it is unlikely that our traceroute probes in this first round traverse through all

the Amazon interconnections. Therefore, to increase the number of discovered

interconnections, in a second round, we launch traceroutes from each region

towards all other IP addresses in the /24 prefixes that are associated with each CBI

that we discovered in the first round. Our reasoning for this “expansion probing" is

that the IPs in these prefixes have a better chance to be allocated to CBIs than the

IPs in other prefixes. Similar to round one, we annotate the resulting traceroutes

and identify their interconnection segments (and the corresponding ABIs and

CBIs). The bottom two rows in Table 3 show the total number of identified ABIs

and CBIs after processing the collected expansion probes. In particular, while the

first column of Table 3 shows a significant increase in the number of discovered

CBIs (from 21.73k to 24.99k) and even some increase in the number of peering




                                           119
ASNs (from 3.52k to 3.55k) as a result of the expansion probing, the number of

ABIs remains relatively constant.

4.5   Verifying Interconnections

       To address the potential ambiguity in identifying the correct Amazon-

specific segment of each inferred interconnection (§ 4.4.1), we first check these

interconnections against three different heuristics (§ 4.5.1) and then rely on the

router-level connectivity among border routers (§ 4.5.2) to verify (and possibly

correct) the inferred ABIs and CBIs.

         4.5.1    Checking Against Heuristics. We develop a few heuristics to

check the aforementioned ambiguity of our approach with respect to inferring the

correct interconnection segment. Since the actual interconnection segment could

be the segment prior to the identified candidate segment (i.e. we might have to

shift the interconnection to the previous segment), our heuristics basically check

for specific pieces of evidence to decide whether an inferred ABI is correct or

should be changed to its corresponding CBI . Once an ABI is confirmed, all of its

corresponding CBIs are also confirmed. The heuristics are described below and are

ordered (high to low) based on our level of confidence in their outcome.

IXP-Client. An IP address that is part of an IXP prefix always belongs to

a specific IXP member. Therefore, if the IP address for a CBI in a candidate

interconnection segment is part of an IXP prefix, then that CBI and its

corresponding ABI are correctly identified Nomikos and Dimitropoulos (2016).

Hybrid IPs. We observe ABI interfaces with hybrid connectivity. For example,

in Figure 16, interface a represents such an interface with hybrid connectivity; it

appears prior to the client interface b in one traceroute and prior to the Amazon

interface c in another traceroute. Even if we are uncertain about the owner of

                                          120
an interface c (i.e. it may belong to the same or different Amazon client), we can

reliably conclude that interface a has hybrid connectivity and must be an ABI .


                                                      e
                       a               c
                                                     d
                                       b
Figure 16. Illustration of a hybrid interface (a) that has both Amazon and client-
owned interfaces as next hop.

Interface Reachability. Our empirical examination of traceroutes revealed that

while ABIs are generally reachable from their corresponding clients, for security

reasons, they are often not visible/reachable from the public Internet (e.g. a

campus or residential networks). However, depending on the client configuration,

CBIs may or may not be publicly reachable. Based on this empirical observation,

we apply a heuristic that probes all candidate ABIs and CBIs from a vantage

point in the public Internet (i.e. a node at the University of Oregon). Reachability

(or unreachability) of a candidate CBI (or ABIs) from the public Internet offers

independent evidence in support of our inference.

       Table 4 summarizes the fraction of identified ABIs (and thus their

corresponding CBIs) that are confirmed by our individual (first row) and combined

(second row) heuristics, respectively. We observe that our heuristics collectively

confirmed 87.8% of all the inferred ABIs and thus 96.96% of the CBIs. The

remaining 0.37k (or 9.81%) ABIs that do not match with any heuristic are

interconnected with one (or multiple) CBIs that belong to a single organization.

The resulting low rate of error in detecting the correct interconnection segments

implies high confidence in the correctness of our inferred Amazon peerings.
                                           121
Table 4. Number of candidate ABIs (and corresponding CBIs) that are confirmed
by individual (first row) and cumulative (second row) heuristics.

                         IXP                    Hybrid                Reachable

 Individual         0.83k (13.66k)          2.05k (14.44k)           2.8k (15.14k)

 Cumulative         0.83k (13.66k)          2.26k (15.14k)          3.31k (24.23k)


         4.5.2    Verifying Against Alias Sets. To further improve our ability

to eliminate possible ambiguities in inferring the correct interconnection segments,

we infer the router-level topology associated with all the candidate interconnections

segments and determine the AS owner of individual routers. We consider any

inferred interconnection segment to be correct if its ABI is on an Amazon router

and its CBI is on a client router. In turn, for any incorrect segment, we first adjust

the ownership of its corresponding ABI and CBI so as to be consistent with the

determined router ownership and then identify the correct interconnection segment.

       To this end, we utilize MIDAR Bender et al. (2008) to perform alias

resolution from VMs in all the regions where all the candidate ABIs and CBIs

were observed. Each instance of this alias resolution effort outputs a set of (two

or more) interfaces that reside on a single router. Given the potentially limited

visibility of routers from different regions, we combine the alias sets from different

regions that have any overlapping interfaces. Overall, we identify 2.64k alias sets

containing 8.68k (2.31k ABI plus 6.37k CBI ) interfaces and their sizes have a

skewed distribution.

       The direction of our traceroute probes (from Amazon towards client

networks) and the fact that each router typically responds with the incoming

interface suggest that the observed interfaces of individual Amazon (or client)

                                          122
border routers in our traceroute (i.e. IPs in each alias set) should typically belong

to the same AS. This implies that there should be a majority AS owner among

interfaces in an alias set. To identify the AS owner of each router, we simply

examine the AS owner of individual IPs in the corresponding alias set. The AS

that owns a clear majority of interfaces in an alias set is considered as the owner

of the corresponding router and all the interfaces in the alias set.6 We observe that

for more than 94% (92%) of all alias sets, there is a single AS that owns >50%

(100%) of all of an alias set’s interfaces. The remaining 6% of alias sets comprises

343 interfaces with a median set size of 2. We consider the majority AS owner

of each alias set as the AS owner of (all interfaces for) that router. Using this

information, we check all of the inferred ABIs and CBIs to ensure that they are on

a router owned by Amazon and the corresponding client, respectively. Otherwise,

we change their labels. This consistency check results in changing the status of

only 45 interfaces (i.e. 18, 2, and 25 change from ABI → CBI, CBI → ABI, and

CBI → CBI7 , respectively). These changes ultimately result in 3.77k ABIs and

24.76k CBIs associated with 3.55k unique ASes.

4.6     Pinning Interfaces

          In this section, we first explore techniques to pin (i.e. geo-locate) each end

of the inferred Amazon peerings (i.e. all ABIs and CBIs) to a specific colo facility,

metro area, or a region and then evaluate our pinning methodology.

            4.6.1     Methodology for Pinning. Our method for pinning individual

interfaces to specific locations involves two basic steps. In a first step, we identify

   6
     We also examined router ownership at the organization level by considering all ASNs that
belong to a single organization. This strategy allows us to group all Amazon/client interfaces
regardless of their ASN to accurately detect the AS owner. However, since we observed one ASN
per ORG in 99% of the identified alias sets, we present here only the owner AS of each router.
  7
      This simply implies that the CBI interface belongs to another client.

                                                 123
a set of border interfaces with known locations that we call anchors. Then, in a

second step, we establish two co-presence rules to iteratively infer the location of

individual unpinned interfaces based on the location of co-located anchors or other

already pinned interfaces. That is, in each iteration, we propagate the location of

pinned interfaces to their co-located unpinned neighbors.

Identifying Anchors. For ABIs or CBIs to serve as anchors for pinning other

interfaces, we leverage the following four sources of information and consider them

as reliable indicators of interface-specific locations.

         DNS Information (CBIs): A CBI 8 with specific location information

embedded in its DNS name can be pinned to the corresponding colo or metro

area. For example, a DNS name such as ae-4.amazon.atlnga05.us.bb.gin.ntt

.net indicates that the CBI associated with NTT interconnects with Amazon

in Atlanta, GA (atlnga). We use DNS parsing tools such as DRoP Huffaker,

Fomenkov, and claffy (2014) along with a collection of hand-crafted rules to

extract the location information (using 3-letter airport codes and full city names)

from the DNS names of identified CBIs. In the absence of any ground truth,

we check the inferred geolocation against the footprint of the corresponding AS

from its PeeringDB listings or information on its webpage. Furthermore, we

perform an RTT-constraint check using the measured RTTs from different Amazon

regions to ensure that the inferred geolocation is feasible. This check, similar to

DRoP Huffaker, Fomenkov, and claffy (2014), conservatively excludes 0.87k CBIs

for which their inferred locations do not satisfy this RTT constraint.

         IXP Association (CBIs): CBIs that are part of an IXP prefix can be pinned

to the colo(s) in a metro area where the IXP is present. In total we have identified

  8
      None of the ABIs had a reverse domain name associated with them.


                                              124
671 IXPs within 471 (117) unique cities (countries) but exclude 10 IXPs (and their

corresponding 366 CBIs) that are present in multiple metro areas as they cannot

be pinned to a specific colo or metro area. Furthermore, we exclude all interfaces

belonging to members that peer remotely. To determine those members, we first

identified minIXRegion, the closest Amazon region to each IXP. We did this by

measuring minIXRTT, the minimum RTT between the various regions and all

interfaces that are part of the IXP and selecting minIXRegion as the Amazon

region where minIXRTT is attained. Then we measure the minimum RTT between

all interfaces and minIXRegion and label an interface as “local" if its RTT value

is no more than 2ms higher than minIXRTT. We note that for about 80% of

IXPs, the measured minIXRTT is less than 1.5ms (i.e. most IXPs are in very close

proximity to at least one AWS region). This effort results in labeling about 2k out

of the encountered 3.5k IXP interfaces in our measurements as “local." Conversely,

there are some 1.5k interfaces belonging to members that peer remotely.

 1.00                                               1.00

 0.75                                               0.75

 0.50                                               0.50

 0.25                                               0.25

 0.00                                               0.00
        0       5      10         15    20   25            0      10       20       30    40
                 min-RTT to ABIs (ms)                          min-RTT for Peering (ms)


                            (a)                                           (b)
Figure 17. (a) Distribution of min-RTT for ABIs from the closest Amazon region,
and (b) Distribution of min-RTT difference between ABI and CBI for individual
peering links.

            Single Colo/Metro Footprint (CBIs): CBIs of an AS that are present only at

a single colo or at multiple colos in a given metro area can be pinned to that metro

area. To identify those ASes that are only present in a single colo or a single metro
                                                  125
area, we collect the list of all tenant ASes for 2.6k colo facilities from PeeringDB

Lodhi, Larson, Dhamdhere, Dovrolis, et al. (2014) as well as the list of all IXP

participants from PeeringDB and PCH.

       Native Amazon Colos (ABIs): Intuitively, ABIs that are located at colo

facilities where Amazon is native (i.e. facilities that house Amazon’s main border

routers) must exhibit the shortest RTT from the VM in the corresponding region.

To examine this intuition, we use two data sources for RTT measurements: (i) RTT
                                             9
values obtained through active probing           of CBIs and ABIs; and (ii) RTT values

collected as part of the traceroute campaign. Figure 17a shows the distribution of

the minimum RTT between VMs in different regions of Amazon and individual

ABIs. We observe a clear knee at 2ms where around 40% of all the ABIs exhibit

shorter RTT from a single VM. Given that all Amazon peerings have to be

established through colo facilities where Amazon is native, we pin all these ABIs

to the native colo closest to the corresponding VM. In some metro areas where

Amazon has more than one native colo, we conservatively pinned the ABIs to the

corresponding metro area rather than to a specific native colo.

Consistency Checking of Anchors. We perform two sets of consistency

checks on the identified anchors. First, we check whether the inferred locations

are consistent for those interfaces (i.e. 1.1k in total) that satisfy more than one

of the four indicators we used to classify them as anchors. Second, we check for

consistency across the inferred geolocation of different interfaces in any given alias

set. These checks flagged a total of 66 (48 and 18) interfaces that had inconsistent

geolocations and that we therefore excluded from our anchor list. These checks also

highlight the conservative nature of our approach. In particular, by removing any

  9
    This probing was done for a full day and used exclusively ICMP echo reply messages that can
only be generated by intermediate hops and not by the target itself.
                                             126
anchors with inconsistent locations, we avoid the propagation of unreliable location

information in our subsequent iterative pinning procedure (see below). The middle

part of Table 5 presents the exclusive and cumulative numbers of CBI and ABI

anchors (excluding the flagged ones) that resulted from leveraging the four utilized

source of information.

Inferring Co-located Interfaces. We use two co-presence rules to infer whether

two interfaces are co-located in the same facility or same metro area. (i) Rule 1

(Alias sets): This rule states that all interfaces in an alias set must be co-located

in the same facility. Therefore, if an alias set contains one (or more) anchor(s),

all interfaces in that set can be pinned to the location of that (those) anchor(s).

(ii) Rule 2 (Interconnections in a Single Metro Area): An Amazon peering is

established between an Amazon border router and a client border router, and these

routers are either in the same or in different colo/metro areas. Therefore, a small

RTT between the two ends of an interconnection segment is an indication of their

co-presence in at least the same metro area. The key issue is to determine a proper

threshold for RTT delay to identify these co-located pairs. To this end, Figure 17b

shows the distribution of the min-RTT differences between the two ends of all the

inferred Amazon interconnection segments. While the min-RTT difference varies

widely across all interconnection segments, the distribution exhibits a pronounced

knee at 2ms, with approximately half of the inferred interconnection segments

having min-RTT values less than this threshold. We use this threshold to separate

interconnection segments that reside within a metro area (i.e. both ends are in the

metro area) from those that extend beyond the metro area. Therefore, if one end

of such a “short" interconnection segment is pinned, its other end can be pinned to

the same metro area.

                                          127
Table 5. The exclusive and cumulative number of anchor interfaces by each type of
evidence and pinned interfaces by our co-presence rules.

                              Anchor Interface                Pinned Interface

                      DNS      IXP      Metro       Native    Alias     min-RTT

            Exc.      5.31k    2.0k      1.66k      1.42k     0.65k        5.38k

            Cum.      5.31k    6.73k     7.22k      8.64k     9.21k       14.37k

Iterative Pinning. Given a set of initial anchors at known locations as input, we

identify and pin the following two groups of interfaces in an iterative fashion: (i)

all unpinned alias sets that contain one (or more) anchor(s), and (ii) the unpinned

end of all the short interconnection segments that have only one end pinned. For

both steps, we extend our pinning knowledge to other interfaces only if all anchors
                                                                           10
unanimously agree with the geolocation of the unpinned interface                . This iterative

process ends when there is no more interface that meets our co-presence rules. Our

pinning process requires only four rounds to complete. The right-hand side of Table

5 summarizes the exclusive and cumulative number of interfaces pinned by each

co-presence rule. Including all the anchors, we are able to pin 45.05% (75.87%) of

all the inferred CBIs (ABIs), and 50.21% of all border interfaces associated with

Amazon’s peerings.

Pinning at a Coarser Resolution. To better understand the reasons for being

able to map only about half of all inferred interfaces associated with Amazon at the

metro level, we next explore whether the remaining (14.21k) unpinned interfaces

can be associated with a specific Amazon region based on their relative RTT

distance. To this end, we examine the ratio of the two smallest min-RTT values for

individual unpinned interfaces from each of the 15 Amazon regions. 1.11k of these

  10
    We observed such a conflict in the propagation of pinning information only for 179 (1.2%)
interfaces

                                              128
             1.00

             0.75

             0.50

             0.25

             0.00
                     1           2               3        4           5
                                        min-RTT ratio
Figure 18. Distribution of the ratio of two lowest min-RTT from different Amazon
regions to individual unpinned border interfaces.

interfaces are only visible from a single region and therefore the aforementioned

ratio is not defined for these interfaces. We associate these interfaces to the only

region from which they are visible. Figure 18 depicts the CDF of the ratio for

the remaining (13.1k) unpinned interfaces that are reachable from at least two

regions and shows that for 57% of these interfaces the ratio of two lowest min-

RTT is larger than 1.5, i.e. the interface’s RTT is 50% larger for one region. We

map these interfaces to the region with the lowest delay. The relatively balanced

min-RTT values for the remaining 43% of interfaces is mainly caused by the limited

geographic separation of some regions. For example, the relatively short distance

between Virginia and Canada, or between neighboring European countries makes it

difficult to reliably associate some of the interfaces that are located between them

using min-RTT values. This coarser pinning strategy can map 8.67k (30.37%)

of the remaining interfaces (0.62k ABIs and 8.05k CBIs) to a specific region

which improves the overall coverage of the pinning process to a total of 80.58%.

However, because of the coarser nature of pinning, we do not consider these 30.37%

of interfaces for the rest of our analysis and only focus on those 50.21% that we

pinned at the metro (or finer) level.

                                           129
         4.6.2    Evaluation of Pinning. Accuracy. Given the lack of ground

truth information for the exact location of Amazon’s peering interfaces, we perform

cross-validation on the set of identified anchors to enhance the confidence in our

pinning results. Specifically, we perform a 10-fold stratified cross-validation with

a 70-30 split for train-test samples. We employ stratified sampling Diamantidis,

Karlis, and Giakoumakis (2000) to maintain the distribution of anchors within each

metro area and avoid cases where test samples are selected from metro areas with

fewer anchors. We run our pinning process over the training set and measure both

the number of pinned interfaces that match the test set (recall) and the number of

pinned interfaces which agree geolocation-wise with the test set (precision). The

results across all rounds are very consistent, with a mean value of 99.34% (57.21%)

for precision (recall) and a standard deviation of 1.6 ∗ 10−3 (5.5 ∗ 10−3 ). The

relatively low recall can be attributed to the lack of known anchors in certain metro

areas that prevented pinning information from propagating. The high precision

attests to the conservative nature of our propagation technique (i.e. inconsistent

anchors are removed and interfaces are only pinned when reliable (location)

information is available) and highlights the low false positive rate of our pinning

approach.

Geographic Coverage. We examine the coverage of our pinning results by

comparing the cities where Amazon is known to be present against the metros

where we have pinned border interfaces. Combining the reported list of served cities

by Amazon Amazon (2018d) and the list of PeeringDB-provided cities PeeringDB

(2017) where Amazon establishes public or private peerings shows that Amazon is

present in 74 metro areas. Our pinning strategy has geo-located Amazon-related

border interfaces to 305 different metro areas across the world that cover all but

                                          130
Table 6. Number (and percentage) of Amazon’s VPIs. These are CBIs that are
also observed by probes originated from Microsoft, Google, IBM, and Oracle’s cloud
networks.

                      Microsoft      Google (%)        IBM (%)          Oracle (%)
                        (%)

 Pairwise            4.69k (18.93)   0.79k (3.17)     0.23k (0.94)          0 (0)

 Cumulative          4.69k (18.93)   4.93k (19.91)    5.01k (20.23)    5.01k (20.23)


three metro areas from Amazon’s list, namely Bangalore (India), Zhongwei (China),

and Cape Town (South Africa). While it is possible for some of our discovered, but

unpinned CBIs to be located in these metros, we lack anchors in these three metros

to reliably pin any interface to these locations. Finally, that our pinning strategy

results in a significantly larger number of observed metros than the 74 metro areas

reported by Amazon should not come as a surprise in view of the many inferred

remote peerings where we have sufficient evidence to reliably pin the corresponding

CBIs.

4.7     Amazon’s Peering Fabric

        In this section, we first present a method to detect whether an inferred

Amazon-related interconnection is virtual (§ 4.7.1). Then we utilize various

attributes of Amazon’s inferred peerings to group them based on their type

(§ 4.7.2) and reason about the differences in peerings across the identified groups

(§ 4.7.3). Finally, we characterize the entire inferred Amazon connectivity graph

(§ 4.7.4).

             4.7.1   Detecting Virtual Interconnections. To identify private

peerings that rely on virtual interconnections, we recall that a VPI is associated

with a single (CBI ) port that is utilized by a client to exchange traffic with one or

more cloud providers (or other networks) over a layer-2 switching fabric. Therefore,
                                          131
a CBI that is common to two or more cloud providers must be associated with

a VPI. Motivated by this observation, our method for detecting VPIs consists of

the following three steps. First, we create a pool of target IP addresses that is

composed of all identified non-IXP CBIs for Amazon, each of their +1 next IP

address, and all the destination IPs of those traceroutes that led to the discovery of

individual unique CBIs. Second, we probe each of these target IPs from a number

of major cloud providers other than Amazon and infer all the ABIs and CBIs along

with the probes that were launched from these other cloud providers (using the

methodology described in § 4.4). Finally, we identify any overlapping CBIs that

were visible from two (or more) cloud providers and consider the corresponding

interconnection to be a VPI. Note that this method yields a lower bound for the

number of Amazon-related VPIs as it can only identify VPIs whose CBIs are

visible from the considered cloud service providers. Any VPI that is not used for

exchanging traffic with multiple cloud provider is not identified by this method.

Furthermore, we are only capable of identifying VPIs which utilize public IP

addresses for their CBIs Amazon (2018b). VPIs utilizing private addresses are

confined to the virtual private cloud (VPC) of the customer and are not visible

from anywhere within or outside of Amazon’s network.

       Applying this method, we probed nearly 327k IPs in our pool of target IP

addresses from VMs in all regions of each one of the following four large cloud

providers: Microsoft, Google, IBM, and Oracle. The results are shown in Table 6

where the first row shows the number of pairwise common CBIs between Amazon

and other cloud providers. The second row shows the cumulative number of

overlapping CBIs. From this table, we observe that roughly 20% of Amazon’s CBIs

are related to VPIs as they are visible from at least one other of the four considered

                                         132
cloud provider. While roughly 19% of VPIs are common between Amazon and

Microsoft, there is no overlap in VPIs between Amazon and Oracle. Only 0.1%

of Amazon’s CBIs are common with Microsoft, Google and IBM.

       Note that our method incorrectly identifies a VPI if a customer’s border

router is directly connected to Amazon but responds to our probe with a default

or 3rd party interface. However, either of these two scenarios is very unlikely. For

one, recall (§ 4.4) that we use UDP probes and do not consider a target interface as

a CBI to avoid a response by the default interface Baker (1995). Furthermore, our

method selects +1 IP addresses as traceroute targets (i.e. during the expansion

probing) to increase the likelihood that the corresponding traceroutes cross

the same CBI without directly probing the CBI itself. Also, the presence of a

customer border router that responds with a third party interface implies that the

customer relies on the third party for reaching Amazon while directly receiving

downstream traffic from Amazon. However, such a setting is very unlikely for

Amazon customers.

         4.7.2   Grouping Amazon’s Peerings. To study Amazon’s inferred

peering fabric, we first group all the inferred peerings/interconnections based on

the following three key attributes: (i) whether the type of peering relationship

is public or private, (ii) whether the corresponding AS link is present in public

BGP feeds, and (iii) in the case of private peerings, whether the corresponding

interconnection is physical or virtual (VPI). A peering is considered to be public

(bi-lateral or multi-lateral) if its CBI belongs to an IXP prefix. We also check

whether the corresponding AS relationship is present in the public BGP data

by utilizing CAIDA’s AS Relationships dataset CAIDA (2018) corresponding

to the dates of our data collection. Although this dataset is widely used for AS

                                         133
Table 7. Breakdown of all Amazon peerings based on their key attributes.

 Group               ASes(%)                   CBIs(%)              ABIs(%)

 Pb-nB               2.52k (71)                3.93k (16)           0.79k (21)

 Pb-B                0.20k (5)                 0.56k (2)            0.56k (15)

 Pb                  2.69k (76)             4.46k (18)              0.83k (22)

 Pr-nB-V             0.24k (7)                 2.99k (12)           0.54k (14)

 Pr-nB-nV            1.1k (31)              10.24k (41)             2.59k (69)

 Pr-nB               1.18k (33)             13.24k (53)             2.68k (71)

 Pr-B-nV             0.11k (3)                 5.67k (23)           2.07k (55)

 Pr-B-V              0.06k (2)                 2.09k (8)             0.33k (9)

 Pr-B                0.12k (3)              7.76k (31)              2.11k (56)


relationship information, its coverage is known to be limited by the number and

placement of BGP feed collectors (e.g., see Luckie, Huffaker, Dhamdhere, Giotsas,

et al. (2013) and references therein).

        Table 7 gives the breakdown of all of Amazon’s inferred peerings into six

groups based on the aforementioned three attributes. We use the labels Pr/Pb

to denote private/public peerings, B/nB for being visible/not visible in public

BGP feeds, and V/nV for virtual/non-virtual peerings (applies only in the case of

private interconnections). For example, Pr-nB-nV refers to the number of Amazon’s

(unique) inferred private peerings that are not seen in public BGP feeds and are

not virtual (e.g. cross connections). Each row in Table 7 shows the number (and


                                         134
percentage) of unique AS peers that establish certain types of peerings, along with

the number (and percentage) of corresponding CBIs and ABIs for those peers.

Since there are overlapping ASes and interfaces between different groups, Table 7

also presents three rows (i.e. rows 3, 6, and 9 with italic fonts) that aggregate

the information for the two closely related prior pair of rows/groups. These three

aggregate rows provide an overall view of Amazon’s inferred peering fabric that

highlight two points of general interest: (i) While 76% of Amazon’s peers use Pb

peering, only 33% of Amazon’s peers use Pr-nB (virtual or physical) peerings, with

the overlap of about 10% of peer ASes relying on both Pr-nB and Pb peerings, and

the fraction of Pr-B peerings being very small (3%). (ii) The average number of

CBIs (and ABIs) for ASes that use Pr-B, Pr-nB and Pb peerings to interconnect

with Amazon is 65 (17), 11 (2), and 2 (0.3), respectively.

Hidden Peerings. Note that there are groups of Amazon’s inferred peerings

shown in Table 7 (together with their associated traffic) that remain in general

hidden from the measurement techniques that are commonly used for inferring

peerings (e.g. traceroute). One such group consists of all the virtual peerings (Pr-

*-V) since they are used to exchange traffic between customer ASes of Amazon (or

their downstream ASes) and Amazon. The second group is made up of all other

non-virtual peerings that are not visible in BGP data, namely Pr-nB-nV and even

Pb-nB. The presence of these peerings cannot be inferred from public BGP data

and their associated traffic is only visible along the short AS path to the customer

AS. These hidden peerings make up 33.29% of all of Amazon’s inferred peerings

and their associated traffic is carried over Amazon’s private backbone and not over

the public Internet.



                                          135
Table 8. Hybrid peering groups along with the number of unique ASes for each
group.
              Different Types of Hybrid Peering              #ASN
              Pb-nB                                             2187
              Pr-nB-nV                                           686
              Pr-nB-nV; Pb-nB                                    207
              Pb-B                                               117
              Pr-nB-nV; Pr-nB-V                                   83
              Pr-nB-nV; Pb-nB; Pr-nB-V                            60
              Pb-nB; Pr-nB-V                                      41
              Pr-nB-V                                             38
              Pr-B-nV; Pb-B                                       37
              Pr-B-V; Pr-B-nV; Pb-B                               31
              Pr-B-nV                                             24
              Pr-B-V; Pr-B-nV                                     16
              Pr-nB-nV; Pr-B-nV; Pr-B-V                            5
              Pr-B-V; Pb-B                                         4
              Pr-B-V                                               4
              Pb-nB; Pb-B                                          2
              Pr-nB-nV; Pr-B-nV; Pr-B-V; Pb-B                      2
              Pr-nB-nV; Pr-B-nV                                    1
              Pr-nB-nV; Pr-B-nV; Pb-B                              1
              Pr-nB-nV; Pr-nB-V; Pr-B-nV                           1
              Pr-nB-nV; Pr-nB-V; Pr-B-nV; Pr-B-V; Pb-B             1

Hybrid Peering. Individual ASes may establish multiple peerings of different

types (referred to as “hybrid" peering) with Amazon; that is, appear as a member

of two (or more) groups in Table 7. We group all ASes that establish such hybrid

peering based on the combination of peering types that are listed in Table 7 types

and that they maintain with Amazon. The following are two of the most common

hybrid peering scenarios we observe. Pr-nB-nV + Pb-nB: With 207 ASes, this is

the largest group of ASes which utilize hybrid peering. Members of this group use

both types of peerings to exchange their own traffic with Amazon and include ASes

such as Akamai, Intercloud, Datapipe, Cloudnet, and Dell. Pr-nB-nV; Pb-nB;

Pr-nB-V: This group is similar to the first group one but its members also utilize

                                        136
virtual peerings to exchange their own traffic with Amazon. This group consists

of 60 ASes that include large providers such as Google, Microsoft, Facebook, and

Limelight. Table 8 gives a detailed breakdown of the observed hybrid (and non-

hybrid) peering groups and shows for each group the number of ASes that use that

peering group. Note that each AS is counted only once in the group that has the

most specific peering types.

          4.7.3    Inferring the Purpose of Peerings. In an attempt to gain

insight into how each of the six different groups of Amazon’s peerings is being

used in practice, we consider a number of additional characteristics of the peers

in each group and depict those characteristics using stacked boxplots as shown

in Figure 19. In particular, starting with the top row in Figure 19, we consider
                             11
summary distributions of          (i) size of customer cone of peering AS (i.e. number

of /24 prefixes that are reachable through the AS (labeled as "BGP /24"); (ii)

number of /24 prefixes that are reachable from Amazon through the identified CBIs

associated with each peering; (iii) number of ABIs for individual peering AS; (iv)

number of CBIs for individual peering AS; (v) min RTT difference between both

ends of individual peering; (vi) number of unique metro areas that the CBIs of each

peering AS have been pinned to (see § 4.6).

        For example, we view the number of /24 prefixes in the customer cone of

an AS to reflect the AS’s size/role (i.e. as tier-1 or tier-2 AS) in routing Internet

traffic. Moreover, comparing the number of /24 prefixes in the customer cone

with the number of reachable /24 prefixes through a specific peering for an AS

reveals the purpose of the corresponding peering to route traffic to/from Amazon

from/to its downstream networks. In the following, we discuss how the combined

  11
    For ASes that utilize hybrid peering with Amazon, the reported information in each group
only includes peerings related to that group.
                                              137
information in Table 7 and Figure 19 sheds light on Amazon’s global-scale peering

fabric and illuminates the different roles of the six groups of peering ASes.

Pb-nB. The peers in this group are typically edge networks with a small customer

cone (including content, enterprise, and smaller transit/access networks) that

exchange traffic with Amazon through a single CBI at an IXP. The corresponding

routes are between Amazon and these edge networks and are thus not announced in

BGP. Peers in this group include CDNs like Akamai, small transit/access providers

like Etisalat, BT, and Floridanet, and enterprises such as Adobe, Cloudflare,

Datapipe (Rackspace), Google, Symantec, LinkedIn, and Yandex.

Pb-B. This group consists mostly of tier-2 transit networks with moderate-sized

customer cones. These networks are present at a number of IXPs to connect their

their downstream customer networks to Amazon. The corresponding routes must

be announced to downstream ASes and are thus visible in BGP. Example peers in

this group are CW, DigitalOcean, Fastweb, Seabone, Shaw Cable, Google Fiber,

and Vodafone.

Pr-nB-V. The peers in this group are a combination of small transit providers and

some content and enterprise networks. They establish VPIs at a single location to

exchange either their own traffic or the traffic of their downstream networks with

Amazon through a VPI. Therefore, their peering is not visible in BGP. About 85%

of these peers are visible from two cloud providers while the rest is visible from

more than two cloud providers. Examples of enterprise and content networks in

this group are Apple, UCSD, UIOWA, LG, and Edgecast, and examples of transit

networks are Rogers, Charter, and CenturyLink.

Pr-nB-nV. These peers appear to establish physical interconnections (i.e. cross-

connects) with Amazon since they are not reachable from other cloud providers.

                                         138
                     10 5
     BGP /24
                     10 3

                     10 1
     Reachable /24




                     10 5

                     10 3

                     10 1



                     10 2
     ABIs




                     10 1

                     10 0

                     10 2
     CBIs




                     10 1


                     10 0
     RTT Diff (ms)




                     10 2
                     10 1
                     10 0




                      15
         Metros




                      10
                       5

                            Pb-nB   Pb-B   Pr-nB-V    Pr-nB-nV   Pr-B-nV   Pr-B-V

Figure 19. Key features of the six groups of Amazon’s peerings (presented in
Table 7) showing (from top to bottom): the number of /24 prefixes within the
customer cone of peering AS, the number of probed /24 prefixes that are reachable
through the CBIs of associated peerings of an AS, the number of ABIs and CBIs
of associated of an AS, the difference in RTT of both ends of associated peerings
of an AS, and the number of metro areas which the CBIs of each peering AS have
been pinned to.
However, given the earlier-mentioned under-counting of VPIs by our method,

we hypothesize that some or all of these peerings could be associated with VPIs,
                                                139
similar to the previous group. The composition of the peers in this group is

comparable to Pr-nB-V but includes a larger fraction of enterprise networks (i.e.

main users of VPIs) which in turn is consistent with our hypothesis. Examples of

peers in this group are enterprises such as Datapipe (Rackspace), Chevron, Vox-

Media, UToronto, and Georgia-Tech, CDNs such as Akamai and Limelight and

transit/access providers like Comcast. To further examine our hypothesis, we parse

the DNS names of 4.85k CBIs associated with peers in the Pr-nB group. 170 of

these DNS names (100 from Pr-nB-nV and 70 from Pr-nB-V interfaces) contain

VLAN tags, indicating the presence of a virtual private interconnection. We also

observe some commonly used (albeit not required) keywords Amazon (2018e) such

as dxvif (Amazon terminology for “direct connect virtual interface"), dxcon, awsdx

and aws-dx for 125 (out of 170) CBIs where the “dx"-notation is synonymous with

an interface’s use for “direct interconnections". We consider the appearance of these

keywords in the DNS names of CBIs for this group of peerings (and only in this

group) as strong evidence that the interconnections in question are indeed VPIs.

Therefore, a subset of Pr-nB-nV interconnections is likely to be virtual as well.

Pr-B-nV. The peers in this group are very large transit networks that establish

cross-connections at various locations (many CBIs and ABIs) across the world).

The large number of prefixes that are reachable through them from Amazon and

the visibility of the peerings in BGP suggest that these peers simply provide

connectivity for their downstream clients to Amazon. Given the large size of

these transit networks, the visibility of these peerings in BGP is due to the

announcement of routes from Amazon to all of their downstream networks.

Intuitively, given the volume of aggregate traffic exchanged between Amazon

and these large transit networks, the peers in this group have the largest number

                                         140
of CBIs, and these CBIs are located at different metro areas across the world.

Example networks in this group are AT&T, Level3 (now CenturyLink), GTT,

Cogent, HE, XO, Zayo, and NTT.

Pr-B-V. This group consists mostly a subset of the very large transit networks

in Pr-B-nV and the peers in this group also establish a few VPIs (at different

locations) with Amazon. The small number of prefixes that are reachable from

Amazon through these peers along with the large number of CBIs per peer

indicates that these peers bring specific Amazon clients (a provider or enterprise,

perhaps even without an ASN) to a colo facility to exchange traffic with Amazon

Amazon (2018c). The presence of these peerings in BGP is due to the role they

play as transit networks in the Pr-B-nV group that is separate from peers in this

group using virtual peerings. Example networks in this group are Cogent, Comcast,

CW, GTT, CenturyLink, HE, and TimeWarner, all of which are listed as Amazon

cloud connectivity partners Amazon (2018c); Google (2018c); Microsoft (2018b))

and connect enterprises to Amazon. When examining the min RTT difference

between both ends of peerings across different groups (row 5 in Figure 19), we

observe that both groups with virtual interconnections (Pr-B-V and Pr-nB-V) have

in general larger values than the other groups. This observation is in agreement

with the fact that many of these VPIs are associated with enterprises that are

brought to the cloud exchange by access networks using layer-2 connections.

Coverage of Amazon’s Interconnections. Although the total number of

peerings that Amazon has with its customers is not known, our goal here is to

provide a baseline comparison between Amazon’s peering fabric that is visible in

public BGP data and Amazon’s peering fabric as inferred by our approach. Using

our approach, we have identified 3.3k unique peerings for Amazon. In contrast,

                                         141
there are only 250 unique Amazon peerings reported in BGP, and 226 of them

are also discovered by our approach. Upon closer examination, for some of the

24 peerings that are seen in BGP but not by our approach, we observed a sibling

of the corresponding peer ASes. This brings the total coverage of our method to

about 93% of all reported Amazon peerings in BGP. In addition, we report on more

than 3k unique Amazon peerings that are not visible in public BGP data. These

peerings with Amazon and their associated traffic are not visible when relying on

more conventional measurement techniques.

         4.7.4    Characterizing Amazon’s Connectivity Graph. Having

focused so far on groups of peerings of certain types or individual AS peers, we

next provide a more holistic view of Amazon’s inferred peering graph and examine

some of its basic characteristics. We first produce the Interface Connectivity Graph

(ICG) between all the inferred border interfaces. ICG is a bipartite graph where

each node is a border interface (an ABI or a CBI ) and each edge corresponds to

the traceroute interconnection segment (ICS) between an ABI and a CBI . We also

annotate each edge with the difference in the minimum RTT from the closest VM

to each end of the ICS.12

       Intuitively, we expect the resulting ICG to have a separate partition that

consists of interconnections associated with each region, i.e. ABIs of a region

connecting to CBIs that are supported by them. However, we observe that the

ICG’s largest connected component consists of the vast majority (92.3%) of all

nodes. This implies that there are links between ABIs in each Amazon region

and CBIs in several other regions. Upon closer examination of 57.85% of all the

peerings that have both of their ends pinned, we notice that a majority of these

 12
   We identify the VM that has the shortest RTT from an ABI and use the min-RTT of the
same VM from the corresponding CBI to determine the RTT of an ICS.
                                           142
 1.00                                         1.00

 0.75                                         0.75

 0.50                                         0.50

 0.25                                         0.25

 0.00                                         0.00
        10 0     10 1     10 2     10 3              0   10         20         30   40
                  Degree of ABIs                              Degree of CBIs


                        (a)                                        (b)
Figure 20. Distribution of ABIs (log scale) and CBIs degree in left and right
figures accordingly.

peerings (98%) are indeed contained within individual Amazon regions. However,

we do encounter remote peerings between regions that are a significant geographical

distance apart. For example, there are peerings between FR and KR, US-VA and

SG, AU and CA. The large fraction of peerings with only one end or no end pinned

(about 42%) suggests that the actual number of remote peerings is likely to be

much larger. These remote peerings are the main reason for why the ICG’s largest

connected component contains more than 92% of all border interfaces.

         To illustrate the basic connectivity features of the bi-partite ICG, Figures

20a and 20b show the distributions of the number of CBIs that are associated

with each individual ABIs (degree of ABIs) and the number of ABIs associated

with individual CBIs (degree of CBIs). We observe a skewed distribution for ABI

degree where 30%, 70%, and 95% of ABIs are associated with 1, <10, and <100

CBIs, respectively. Roughly 50% (90%) of CBIs are associated with a single (≤ 8)

ABIs. A closer examination shows that high degree CBIs are mainly associated

with Amazon’s public peerings with large transit networks (e.g. GTT, Cogent,

NTT, CenturyLink). In contrast, a majority of high degree ABIs is associated with

private, non-BGP, non-virtual peerings (see § 4.7).

                                           143
4.8    Inferring Peering with bdrmap

       As stated earlier in § 4.2, bdrmap Luckie et al. (2016)13 is the only other

existing tool for inferring border routers of a given network from traceroute data.

With Amazon as the network of interest, our setting appears to be a perfect fit for

the type of target settings assumed by bdrmap. However, there are two important

differences between the cloud service provider networks we are interested in (e.g.

Amazon) and the more traditional service provider network that bdrmap targets

(e.g. a large US Tier-1 network). First, not only can the visibility of different

prefixes vary widely across different Amazon regions, but roughly one-third of

Amazon’s peerings are not visible in BGP and even some of the BGP-visible

peerings of a network are related to other instances of its peerings with Amazon

(§ 4.7). At the same time, bdrmap relies on peering relationships in BGP to

determine the targets for its traceroute probes and also uses them as input for some

of its heuristics. Therefore, bdrmap’s outcome is affected by any inconsistent or

missing peering relationship in BGP. Second, as noted earlier, our traceroute probes

reveal hybrid Amazon border routers that have both Amazon and client routers as

their next hop and connect to them. This setting is not consistent with bdrmap’s

assumption that border routers should be situated exclusively in the host or peering

network. Given these differences, the comparison below is intended as a guideline

for how bdrmap could be improved to apply in a cloud-centric setting.

       Thanks to special efforts by the authors of bdrmap who modified their

tool so it could be used for launching traceroutes from cloud-based vantage points

(i.e., VMs), we were able to run it in all Amazon regions to compare the bdrmap-

inferred border routers with our inference results. bdrmap identified 4.83k ABIs

  13
     MAP-IT Marder and Smith (2016) and bdrmapIT Alexander et al. (2018) are not suitable for
this setting since we have layer-2 devices at the border.
                                            144
and 9.65k CBIs associated with 2.66k ASes from all global regions. 3.23k of these

CBIs belong to IXP prefixes and are associated with 1.81k ASes. Given bdrmap’s

customized probing strategy and its extensive use of different heuristics, it is

not feasible to identify the exact reasons for all the observed differences between

bdrmap’s and our findings. However, we were able to identify the following three

major inconsistencies in bdrmap’s output.

       First, bdrmap does not report an AS owner for 0.32k of its inferred CBIs

(i.e. owner is AS0). Second, instances of bdrmap that run in different Amazon

regions report different AS owners for more than 500 CBIs, sometimes as many

as 4 or 5 different AS owners for an interface. Third, running instances of bdrmap

in different Amazon regions results in inconsistent views of individual border router

interfaces; e.g. one and the same interface is inferred to be an ABI from one region

and a CBI from another region. We identified 872 interfaces that exhibit this

inconsistency. Furthermore, the fact that 97% (846 out of 872) of the interfaces

with this type of inconsistency are advertised by Amazon’s ASNs indicates that the

AS owner for these interfaces have been inferred by bdrmap’s heuristics.

       When comparing the findings of bdrmap against our methodology in more

detail, we observed that our methodology and bdrmap have 1.85k, 5.48k, and 2k

ABI , CBI , and ASes in common. However, without access to ground truth, a full

investigation into the various points of disagreement is problematic. To make the

problem more tractable, we limit our investigation to the 0.65k ASes that were

exclusively identified by bdrmap and try to rely on other sources of information to

confirm or dismiss bdrmap’s findings. These exclusive ASNs belong to 0.18k (0.49k)

IXP (private) peerings. For IXP peerings, we compare bdrmap’s findings against

IP-to-ASN mappings that are published by IXP operators or rely on embedded

                                          145
information within DNS names. The inferences of bdrmap is only aligned for 42 of

these peers. For the 0.49k private peerings we focus on inferences that were made

by the thirdparty heuristic as it constitutes the largest (62%) fraction of bdrmap-

exclusive private peerings (for details, see § 5.4 in Luckie et al. (2016)). These ASes

are associated with 375 CBIs and we observe 66 (60 ASNs) of these interfaces in

our data. For each of these 66 CBIs, we calculate the set of reachable destination

ASNs through these CBIs and determine the upstream provider network for each

one of these destination ASes using BGP data CAIDA (2018). Observing more

than one or no common provider network among reachable destination ASes for

individual CBIs would invalidate the application of bdrmap’s thirdparty heuristic,

i.e. bdrmap wouldn’t have applied this heuristic if it had done more extensive

probing that revealed an additional set of reachable destination ASes for these

CBIs. We find that 50 (44 ASNs) out of the 66 common CBIs have more than one

or no common providers for the target ASNs. Note that this observation does not

invalidate bdrmap’s thirdparty heuristics but highlights its reliance on high-quality

BGP snapshots and AS-relationship information.

4.9   Limitations of Our Study

       As a third-party measurement study of Amazon’s peering fabric that

makes no use of Amazon-proprietary data and only relies on generally-available

measurement techniques, there are inherent limitations to our efforts aimed at

inferring and geo-locating all interconnections between Amazon and the rest of the

Internet. This section collects and organizes the key limitations in one place and

details their impact on our findings.

Inferring Interconnections. Border routers responding to traceroute probes

using a third-party address are a well-known cause for artifacts in traceroute

                                         146
measurement output, and our IXP-client and Hybrid-IP heuristics used in § 4.5.1

are not immune to this problem. However, as reported in Luckie et al. (2014), the

fraction of routers that respond with their incoming interface is in general above

50% and typically even higher in the U.S.

       In contrast, because of the isolation of network paths for VPIs of Amazon’s

clients that use private addresses, any peerings associated with these VPIs are not

visible to probes from VMs owned by other Amazon customers. As a result, our

inference methodology described in § 4.4 cannot discover established VPIs that

leverage private IP addresses.

Pinning Interconnections. In § 4.6, we reported being able to pin only about

half of all the inferred peering interfaces at the metro level. In an attempt to

understand what is limiting our ability to pin the rest of the inferred interfaces,

we identified two main reasons. First, there is a lack of anchors in certain regions,

and second, there is the common use of remote peering. These two factors in

conjunction with our conservative iterative strategy for pinning interfaces to the

metro level make it difficult to provide enough and sufficiently reliable indicators of

interface-specific locations.

       One way to overcome some of these limiting factors is by using a coarser

scale for pinning (e.g. regional level). In fact, as shown in § 4.6, at the regional

level, we are able to pin some 30% of the remaining interfaces which improves the

overall coverage of our pinning strategy at the granularity of regions to about 80%.

Other Observations. Although our study does not consider IPv6 addresses, we

argue that the proposed methodology only requires minimal modifications (e.g.

incorporating IPv6 target selection techniques Beverly et al. (2018); Gasser et al.



                                           147
(2018)) to be applicable to infer IPv6 peerings. We will explore IPv6 peerings as

part of future work.

       Like others before us, as third-party researchers, we found it challenging to

validate our Amazon-specific findings. Like most of the large commercial provider

networks, Amazon makes little, if any, ground truth data about its global-scale

serving infrastructure publicly available, and our attempts at obtaining peering-

related ground truth information from either Amazon, Amazon’s customers,

operators of colo facilities where Amazon is native, or AWS Direct Connect

Partners have been futile.

       Faced with the reality of a dearth of ground truth data, whenever possible,

we relied on extensive consistency-checking of our results (e.g. see § 4.5, § 4.6). At

the same time, many of our heuristics are conservative in nature, typically requiring

agreement when provided with input from multiple complementary sources of

information. As a result, the reported quantities in this chapter are in general

lower bounds but nevertheless demonstrate the existence of a substantial number

of Amazon-related peerings that are not visible to more conventional measurement

studies and/or inference techniques.

4.10   Summary

       In this chapter, we present a measurement study of the interconnection

fabric that Amazon utilizes on a global scale to run its various businesses, including

AWS. We show that in addition to some 0.12k private peerings and about 2.69k

pubic peerings (i.e., bi-lateral and multi-lateral peerings), Amazon also utilizes

at least 0.24k (and likely many more) virtual private interconnections or VPIs.

VPIs are a new and increasingly popular interconnection option for entities such as

enterprises that desire highly elastic and flexible connections to the cloud providers

                                          148
that offer the type of services that these entities deem critical for running their

business. Our study makes no use of Amazon-proprietary data and can be used to

map the interconnection fabric of any large cloud provider, provided the provider in

question does not filter traceroute probes.

       Our findings emphasize that new methods are needed to track and study

the type of “hybrid" connectivity that is in use today at the Internet’s edge.

This hybrid connectivity describes an emerging strategy whereby one part of an

Internet player’s traffic bypasses the public Internet (i.e. cloud service-related

traffic traversing cloud exchange-provided VPIs), another part is handled by its

upstream ISP (i.e. traversing colo-provided private interconnections), and yet

another portion of its traffic is exchanged over a colo-owned and colo-operated

IXP. As the number of businesses investing in cloud services is expected to continue

to increase rapidly, multi-cloud strategies are predicted to become mainstream,

and the majority of future workload-related traffic is anticipated to be handled by

cloud-enabled colos Gartner (2016), tracking and studying this hybrid connectivity

will require significant research efforts on parts of the networking community.

Knowing the structure of this hybrid connectivity, for instance, is a prerequisite

for studying which types of interconnections will handle the bulk of tomorrow’s

Internet traffic, and how much of that traffic will bypass the public Internet, with

implications on the role that traditional players such as Internet transit providers

and emerging players such as cloud-centric data center providers may play in the

future Internet.




                                          149
                                        CHAPTER V

                     CLOUD CONNECTIVITY PERFORMANCE

5.1    Introduction

        In Chapter IV we presented and characterized different peering relationships

that CPs form with various networks. This chapter focuses on the performance of

various connectivity options that are at the disposal of enterprises for establishing

end-to-end connectivity with cloud resources.

        The content in this chapter is the result of a collaboration between Bahador

Yeganeh with Ramakrishnan Durairajan, Reza Rejaie, and Walter Willinger.

Bahador Yeganeh is the primary author of this work and responsible for conducting

all measurements and producing the presented analyses.

5.2    Introduction

        For enterprises, the premise of deploying a multi-cloud strategy1 is succinctly

captured by the phrase “not all clouds are equal". That is, instead of considering

and consuming compute resources as a utility from a single cloud provider (CP), to

better satisfy their specific requirements, enterprise networks can pick-and-choose

services from multiple participating CPs (e.g. rent storage from one CP, compute

resources from another) and establish end-to-end connectivity between them and

their on-premises server(s) at the same or different locations. In the process, they

also avoid vendor lock-in, enhance the reliability and performance of the selected

services, and can reduce the operational cost of deployments. Indeed, according

to an industry report from late 2018 Krishna et al. (2018), 85% of the enterprises

have already adopted multi-cloud strategies, and that number is expected to rise

to 98% by 2021. Because of their popularity with enterprise networks, multi-

   1
   This is different from hybrid cloud computing, where a direct connection exists between a
public cloud and private on-premises enterprise server(s).
                                              150
cloud strategies are here to stay and can be expected to be one of the drivers of

innovation in the future cloud services The enterprise deployment game-plan: why

multi-cloud is the future (2018); Five Reasons Why Multi-Cloud Infrastructure is

the Future of Enterprise IT (2018); The Future of IT Transformation Is Multi-

Cloud (2018); The Future of Multi-Cloud: Common APIs Across Public and

Private Clouds (2018); The Future of the Datacenter is Multicloud (2018); How

multi-cloud business models will shape the future (2018); IBM bets on a multi-cloud

future (2018).

       Fueled by the deployment of multi-cloud strategies, we are witnessing

two new trends in Internet connectivity. First, there is the emergence of new

Internet players in the form of third-party private connectivity providers (e.g.

DataPipe, HopOne, among others Amazon (2018c); Google (2018b); Microsoft

(2018c)). These entities offer direct, secure, private, layer 3 connectivity between

CPs (henceforth referred to as third-party private (TPP)), at a cost of a few

hundreds of dollars per month. TPP routes bypass the public Internet at Cloud

Exchanges CoreSite (2018); Demchenko et al. (2013) and offer additional benefits to

users (e.g. enterprise networks can connect to CPs without owning an Autonomous

System Number, or ASN, or physical infrastructure). Second, the large CPs are

aggressively expanding the footprint of their serving infrastructures, including

the number of direct connect locations where enterprises can reach the cloud

via direct, private connectivity (henceforth referred to as cloud-provider private

(CPP)) using either new CP-specific interconnection services (e.g. Amazon (2018a);

Google (2018a); Microsoft (2018a)) or third-party private connectivity providers

at colocation facilities. Of course, a user can forgo the TPP and CPP options

altogether and rely instead on the traditional, best-effort connectivity over the

                                          151
public Internet—henceforth referred to as (transit provider-based) best-effort public

(Internet) (BEP)—to employ a multi-cloud strategy.
                                  Cloud-provider private (CPP) backbone                     Enterprise
                                                                                             Network




                                                                                                 CP 1

                                           Best-effort public (BEP) Internet

            Virtual
            Machines                                                                             CP2
                               Transit              Transit           Transit
                              provider 1           provider 2        provider N

                                         Third-party private (TPP) backbone




                        Cloud exchange                Private peering             Cloud router (CR)


Figure 21. Three different multi-cloud connectivity options.


       To illustrate the problem, consider, for example, the case of a modern

enterprise whose goal is to adopt a multi-cloud strategy (i.e. establishing end-to-end

connectivity between (i) two or more CPs, i.e. cloud-to-cloud; and (ii) enterprise

servers and the participating CPs, i.e. enterprise-to-cloud) that is performance- and

cost-aware. For this scenario, let us assume that (a) the enterprise’s customers are

geo-dispersed and different CPs are available in different geographic regions (i.e.

latency matters for all customers); (b) regulations are in place (e.g. for file sharing

and storing data in EU; hence, throughput matters for data transfers Example

Applications Services (2018)); (c) cloud reliability and disaster recovery are

important, especially in the face of path failures (i.e. routing matters); and (d) cost

savings play an important role in connectivity decisions. Given these requirements,

the diversity of CPs, the above-mentioned different connectivity options, and the

lack of visibility into the performance tradeoffs, routing choices, and topological

features associated with these multi-cloud connectivity options, the enterprise faces
                                                      152
the “problem of plenty": how to best leverage the different CPs’ infrastructures, the

various available connectivity choices, and the possible routing options to deploy a

multi-cloud strategy that achieves the enterprise’s performance and cost objectives?

       With multi-cloud connectivity being the main focus of this chapter, we note

that existing measurement techniques are a poor match in this context. For one,

they fall short of providing the data needed to infer the type of connectivity (i.e.

TPP, CPP, and BEP) between (two or more) participating CPs. Second, they

are largely incapable of providing the visibility needed to study the topological

properties, performance differences, or routing strategies associated with different

connectivity options. Last but not least, while mapping the connectivity from

cloud/content providers to users has been considered in prior work (e.g. Anwar

et al. (2015); Calder, Flavel, Katz-Bassett, Mahajan, and Padhye (2015); Calder

et al. (2018); Chiu et al. (2015); Cunha et al. (2016); Schlinker et al. (2017)

and references therein), multi-cloud connectivity from a cloud-to-cloud (C2C)

perspective has remained largely unexplored to date.

       This chapter aims to empirically examine the different types of multi-

cloud connectivity options that are available in today’s Internet and investigate

their performance characteristics using non-proprietary cloud-centric, active

measurements. In the process, we are also interested in attributing the observed

characteristics to aspects related to connectivity, routing strategy, or the presence

of any performance bottlenecks. To study multi-cloud connectivity from a C2C

perspective, we deploy and interconnect VMs hosted within and across two different

geographic regions or availability zones (i.e. CA and VA) of three large cloud

providers (i.e. Amazon Web Services (AWS), Google Cloud Platform (GCP) and

Microsoft Azure) using the TPP, CPP, and BEP option, respectively. We note that

                                          153
the high cost of using the services of commercial third-party private connectivity

providers for implementing the TPP option prevents us from having a more global-

scale deployment that utilizes more than one such provider.

       Using this experimental setup as a starting point, we first compare the

stability and/or variability in performance across the three connectivity options

using metrics such as delay, throughput, and loss rate over time. We find that

CPP routes exhibit lower latency and are more stable when compared to BEP and

TPP routes. CPP routes also have higher throughput and exhibit less variation

compared to the other two options. Given that using the TPP option is expensive,

this finding is puzzling. In our attempt to explain this observation, we find

that inconsistencies in performance characteristics are caused by several factors

including border routers, queuing delays, and higher loss-rates of TPP routes.

Moreover, we attribute the CPP routes’ overall superior performance to the fact

that each of the CPs has a private optical backbone, there exists rich inter-CP

connectivity, and that the CPs’ traffic always bypasses (i.e. is invisible to) BEP

transits.

       In summary, this chapter makes the following contributions:

 • To the best of our knowledge, this is one of the first efforts to perform a

   comparative characterization of multi-cloud connectivity in today’s Internet.

   To facilitate independent validation of our results, we will release all relevant

   datasets (properly anonymized; e.g. with all TPP-related information removed).

 • We identify issues, differences, and tradeoffs associated with three popular

   multi-cloud connectivity options and strive to elucidate/discuss the underlying

   reasons. Our results highlight the critical need for open measurement platforms

   and more transparency by the multi-cloud connectivity providers.

                                         154
         The rest of the chapter is organized as follows. We describe the

measurement framework, cloud providers, performance metrics, and data collection

in § 5.3. Our measurements results and root causes, both from C2C and E2C

perspectives are in § 5.4 and § 5.5 respectively. We present the open issues and

future work in § 5.6. Finally, we summarize the key findings of this chapter in

§ 5.7.

5.3      Measurement Methodology

         In this section, we describe our measurement setting and how we

examine the various multi-cloud connectivity options, the cloud providers under

consideration, and the performance metrics of interest.

           5.3.1   Deployment Strategy. As shown in Figure 21, we explore in

this chapter three different types of multi-cloud connectivity options: third-party

private (TPP) connectivity between CP VMs that bypasses the public Internet,

cloud-provider private (CPP) connectivity enabled by private peering between the

CPs, and best-effort public (BEP) connectivity via transit providers. To establish

TPPs, we identify the set of colocation facilities where connectivity partners offer

their services Amazon (2018c); Google (2018b); Microsoft (2018c). Using this

information, we select colocation facilities of interest (e.g. in the geo-proximity

of cloud VMs) and deploy the third-party providers’ cloud routers (CRs) that

interconnect virtual private cloud networks within a region or regions. The selection

of CR locations can also leverage latency information obtained from the third-party

connectivity providers.

         Next, based on the set of selected VMs and CRs we utilize third-party

connectivity APIs to deploy CRs and establish virtual cloud interconnections

between VMs and CRs to create TPPs. At a high level, this step involves (i)

                                          155
establishing a virtual circuit between the CP and a connectivity partner, (ii)

establishing a BGP peering session between the CP’s border routers and the

partner’s CR, (iii) connecting the virtual private cloud gateway to the CP’s border

routers, and (iv) configuring each cloud instance to route any traffic destined

to the overlay network towards the configured virtual gateway. Establishing

CPP connectivity is similar to TPP. The only difference is in the user-specified

connectivity graph where in the case of CPP, CR information is omitted. To

establish CPP connectivity, participating CPs automatically select private

peering locations to stitch the multi-cloud VMs together. Finally, we have two

measurement settings for BEP. While the first setting is between a non-native

colocation facility in AZ and our VMs through the BEP Internet, the second

form of measurement is towards Looking Glasses (LGs) residing in the colocation

facility hosting our CRs, also traverses the BEP Internet, and only yields latency

measurements.

          Our network measurements are performed in rounds. Each round consists

of path, latency, and throughput measurements between all pairs of VMs (in

both directions to account for route asymmetry) but can be expanded to include

additional measurements as well. Furthermore, the measurements are performed

over the public BEPs as well as the two private options (i.e. CPP and TPP). We

avoid cross-measurement interference by tracking the current state of ongoing

measurements and limit measurement activities to one active measurement per

cloud VM. The results of the measurements are stored locally on the VMs (hard

disks) and are transmitted to a centralized storage at the end of our measurement

period.



                                         156
           5.3.2    Measurement Scenario & Cloud Providers. As mentioned

earlier, the measurement setting is designed to provide visibility into multi-cloud

deployments so as to be able to study aspects related to the topology, routing, and

performance tradeoffs. Unfortunately, the availability of several TPP providers,

and more importantly, the incurred costs for connecting multiple clouds using

TPP connections are very high. For example, for each 1 Gbps link to a CP

network, third party providers charge anywhere from about 300 to 700 USD per

month Megaport (2019a); PacketFabric (2019); Pureport (2019)2 . Such high costs

of TPP connections prevents us from having a global-scale deployment and the

possibility of examining multiple TPP providers. Due to the costly nature of

establishing TPP connections, we empirically measure and examine only one coast-

to-coast, multi-cloud deployment in the US. The deployment we consider in this

study is nevertheless representative of a typical multi-cloud strategy that is adopted

by modern enterprises Megaport (2019b).

        More specifically, our study focuses on connectivity between three major

CPs (AWS, Azure, and GCP) and one enterprise. To emulate realistic multi-cloud

scenarios, each entity is associated with a geographic location.The deployments are

shown in Figure 22. We select the three CPs as they collectively have a significant

market share and are used by many clients concurrently ZDNet (2019). Using these

CPs, we create a realistic multi-cloud scenario by deploying three CRs using one

of the top third-party connectivity provider’s network; one in the Santa Clara, CA

(CR-CA) region, one in the Phoenix, AZ (CR-AZ) region, and one in the Ashburn,

VA (CR-VA) region. CR-CA is interconnected to CR-VA and CR-AZ. Furthermore,

CR-CA and CR-VA are interconnected with native cloud VMs from Amazon,

   2
    Note that these price points do not take into consideration the additional charges that are
incurred by CPs for establishing connectivity to their network.
                                               157
Google, and Microsoft. To emulate an enterprise leveraging the multi-clouds, CR-

AZ is connected to a physical server hosted within a colocation facility in Phoenix,

AZ (server-AZ).

                                           CPP or TPP or BEP?                  CP1
    CP1                                                          Colocation
              Colocation                                         Facility in   CP2
    CP2       Facility in                                        Virginia
              California                                                       CP3
    CP3                     BE
                              Po
                                   rT
                                     PP
                                       ?        Enterprise in
                                                  Arizona

                              BEP                   CPP         TPP
Figure 22. Our measurement setup showing the locations of our VMs from AWS,
GCP and Azure. A third-party provider’s CRs and line-of-sight links for TPP,
BEP, and CPP are also shown.


       The cloud VMs and server-AZ are all connected to CRs with 50Mb/s links.

We select the colocation facility hosting the CRs based on two criteria (i) CPs offer

native cloud connectivity within that colo, and (ii) geo-proximity to the target

CPs datacenters. CRs are interconnected with each other using a 150Mb/s link

capacity that support the maximum (3 concurrent measurements in total to avoid

more than 1 ongoing measurement per VM) number of concurrent measurements

that we perform. Each cloud VM has at least 2 vCPU cores, 4GB of memory,

and runs Ubuntu server 18.04 LTS. Our VMs were purposefully over-provisioned

to reduce any measurement noise within virtualized environments. Throughout

our measurements the VMs CPU utilization always remained below 2%. We also

cap the VM interfaces at 50Mb/s to have a consistent measurement setting for

both public (BEP) and private (TPP and CPP) routes. We perform measurements

between all CP VMs within regions (intra-region), across regions (inter-region) for

C2C analysis, and from server-AZ to VMs in CA for E2C analysis. Additionally,
                                                  158
we also perform measurements between our cloud VMs and two LGs that are

located within the same facility as CR-CA and CR-VA, respectively, and use

these measurements as baselines for comparisons (C2LG). Together, these efforts

resulted in 60 pairs of measurements between CP instances (P (6, 2) ∗ 2 permutation

of 2 pairs out of 6 CP VMs over 2 types of unidirectional network paths), 24

pairs of measurement between CP VMs and LGs (6 CP VMs * 2 LGs * 2 type of

unidirectional network paths), and 12 pairs of measurement between server-AZ and

west coast CP VMs (P (3, 2) ∗ 2 permutation of 3 west coast CP VMs over 2 types

of unidirectional network paths).

             5.3.3     Data Collection & Performance Metrics. Using our

measurement setting, we conducted measurements for about a month in the

Spring of 2019.3 We conduct measurements in 10-minute rounds. In each round,

we performed latency, path, and throughput measurements between all pairs of

relevant nodes. For each round, we measure and report the latency using 10 ping

probes. We refrain from using a more accurate one-way latency measurement tool

such as OWAMP as the authors of OWAMP caution its use within virtualized

environments One-Way Ping (OWAMP) (2019). Similarly, paths are measured

by performing 10 attempts of paris-traceroute using scamper Luckie (2010) towards

each destination. We used ICMP probes for path discovery as they maximized the

number of responsive hops along the forward path. Lastly, throughput is measured

using the iperf3 tool, which was configured to transmit data over a 10-second

interval using TCP. We discard the first 5 seconds of our throughput measurement

to account for TCP’s slow-start phase and consider the median of throughput for

  3
      See § 5.3.5 for more details.




                                         159
the remaining 5 seconds. These efforts resulted in about 30k latency and path

samples and some 15k throughput samples between each measurement pair.

       To infer inter-AS interconnections, the resulting traceroute hops from

our measurements were translated to their corresponding AS paths using BGP

prefix announcements from Routeviews and RIPE RIS RIPE (2019); University of

Oregon (2018). Missing hops were attributed to their surrounding ASN if the prior

and next hop ASNs were identical. The existence of IXP hops along the forward

path was detected by matching hop addresses against IXP prefixes published by

PeeringDB PeeringDB (2017) and Packet Clearing House (PCH) Packet Clearing

House (2017). Lastly, we mapped each ASN to its corresponding ORG number

using CAIDA’s AS-to-ORG mapping dataset Huffaker et al. (2018).

       CPs are heterogeneous in handling path measurements. In our

mappings, we observed the use of private IP addresses internally by CPs as well

as on traceroutes traversing the three connectivity options. We measured the

number of observed AS/ORGs (excluding hops utilizing private IP addresses) for

inter-cloud, intra-cloud, and cloud-to-LG, and made the following two observations.

First, of the three CPs, only AWS used multiple ASNs (i.e. ASes 8987, 14618, and

16509). Second, not surprisingly, we observed a striking difference between how

CPs respond to traceroute probes. In particular, we noted that the differences

in responses are dependent on the destination network and path type (public

vs. private). For example, GCP does not expose any of its routers unless the

target address is within another GCP region. Similarly, Azure does not expose

its internal routers except for their border routers that are involved in peering with

other networks. Finally, we found that AWS heavily relies on private/shared IP

addresses for their internal network. These observations serve as motivation for our

                                         160
characterization of the various multi-cloud connectivity options in § 5.4 and § 5.5

below.

           5.3.4    Representation of Results. Distributions in this chapter are

presented using letter-value plots Hofmann, Kafadar, and Wickham (2011). Letter-

value plots, similar to boxplots, are helpful for summarizing the distribution of data

points but offer finer details beyond the quartiles. The median is shown using a

dark horizontal line and the 1/2i quantile is encoded using the box width, with the

widest boxes surrounding the median representing the quartiles, the 2nd widest

boxes corresponding to the octiles, etc. Distributions with low variance centered

around a single value appear as a narrow horizontal bar while distributions with

diverse values appear as vertical bars.

         Throughout this chapter we try to present full distributions of latency when

it is illustrative. Furthermore, we compare latency characteristics of different paths

using the median and variance measures and specifically refrain from relying on

minimum latency as it does not capture the stability and dynamics of this measure

across each path.

           5.3.5    Ethical and Legal Considerations. This study does not

raise any ethical issues. Overall, our goal in this study is to measure and improve

multi-cloud connectivity without attributing particular features to any of the

utilized third-party providers which might be in violation of their terms of service.

Hence, we obfuscate, and wherever possible, omit all information that can be

used to identify the colocation and third-party connectivity providers. This

information includes names, supported measurement APIs, costs, time and date

of measurements, topology information, and any other potential identifiers.



                                          161
5.4   Characteristics of C2C routes

       In this section, we characterize the performance of C2C routes and attribute

their characteristics to connectivity and routing.

         5.4.1   Latency Characteristics. CPP routes exhibit lower

latency than TPP routes and are stable. Figure 23 depicts the distribution

of RTT values between different CPs across different connectivity options. The

rows (from top to bottom) correspond to AWS, GCP, and Azure as the source CP,

respectively. Intra-region (inter-region) measurements are shown in the left (right)

columns, and CPP (TPP) paths are depicted in blue (orange). To complement

Figure 23, the median RTT values comparing CPP and TPP routes are shown in

Figure 24.

       From Figures 23 and 24, we see that, surprisingly, CPP routes typically

exhibit lower medians of RTT compared to TPP routes, suggesting that CPP

routes traverse the CP’s optical private backbone. We also observe a median

RTT of ∼2ms between AWS and Azure VMs in California which is in accordance

with the relative proximity of their datacenters for this region. The GCP VM in

California has a median RTT of 13ms to other CPs in California, which can be

attributed to the geographical distance between GCP’s California datacenter in

LA and the Silicon Valley datacenters for AWS and Azure. Similarly, we notice

that the VMs in Virginia all exhibit low median RTTs between them. We attribute

this behavior to the geographical proximity of the datacenters for these CPs. At

the same time, the inter-region latencies within a CP are about 60ms with the

exception of Azure which has a higher median of latency of about 67ms. Finally,

the measured latencies (and hence the routes) are asymmetric in both directions

albeit the median of RTT values in Figure 24 shows latency symmetry (<0.1ms).

                                         162
            20      Intra Region          AWS              Inter Region
                                       CPP 80
            15                         TPP
 RTT (ms)


            10                                 70
             5
             0                                 60
          Z R :CA CP:CA CP:VA ZR:VA      W S :VA CP:VA ZR:VA ZR:CA S:CA CP:CA
        A            G     G     A     A        G -A        A AW G
     CA-         CA-   VA-   VA-    CA- CA- CA VA- VA- VA-
            20      Intra Region           GCP             Inter Region
                                       CPP 80
            15                         TPP
 RTT (ms)




            10                                 70
             5
             0                                 60
          Z R :CA WS:CA WS:VA ZR:VA         W S :VA CP:VA ZR:VA ZR:CA S:CA CP:CA
        A          A      A        A      A        G -A        A AW G
    CA-         CA-    VA-     VA-     CA- CA- CA VA- VA- VA-
      20             Intra Region        AZR                Inter Region
                                     CPP 80
      15                             TPP
 RTT (ms)




      10                                   70
        5
        0                                  60
                A     A       A      A            A     A     A     A    A     A
         W  S:C GCP:C WS:V GCP:V            W S:V GCP:V AZR:V AZR:C WS:C GCP:C
       A                  A    VA-        A             -           A
    CA-          CA-   VA-             CA- CA- CA VA- VA- VA-
Figure 23. Rows from top to bottom represent the distribution of RTT (using
letter-value plots) between AWS, GCP, and Azure’s network as the source CP and
various CP regions for intra (inter) region paths in left (right) columns. CPP and
TPP routes are depicted in blue and orange, respectively. The first two characters
of the X axis labels encode the source CP region with the remaining characters
depicting the destination CP and region.




                                         163
                       CPP                                                   TPP
  AWS:CA        60.4 2.2 62.0 8.6 68.3                                70.2 4.2 71.411.770.7   75
  AWS:VA   60.4      66.8 1.3 62.3 1.2                           70.3 71.3 3.8 78.2 3.0       60
  AZR:CA    2.1 66.8     64.011.761.9                             4.2 71.2 72.112.471.1       45
  AZR:VA   62.0 1.4 64.1      65.3 1.8                           71.4 3.9 72.2 80.4 3.8       30
  GCP:CA    8.5 62.311.865.2      60.0                           11.678.112.480.4 77.8        15
  GCP:VA   68.3 1.2 61.9 1.8 59.9                                70.7 3.0 71.1 3.9 77.9
           AWS:CA
                    AWS:VA




                                                        GCP:VA

                                                                 AWS:CA
                                                                 AWS:VA
                             AZR:CA
                                      AZR:VA
                                               GCP:CA




                                                                 AZR:CA
                                                                 AZR:VA
                                                                 GCP:CA
                                                                 GCP:VA
Figure 24. Comparison of median RTT values (in ms) for CPP and TPP routes
between different pairs.


Also, the median of the measured latency between our CRs is in line with the

published values by third-party connectivity providers, but the high variance of

latency indicates that the TPP paths are in general a less reliable connectivity

option compared to CPP routes. Lastly, BEP routes for C2LG measurements

always have an equal or higher median of latency compared to CPP paths with

much higher variability (order of magnitude larger standard deviation). Results are

omitted for brevity and to avoid skewed scales in current figures.

         5.4.2         Why do CPP routes have better latency than TPP

routes?. CPP routes are short, stable, and private. Figure 25a depicts

the distribution of ORG hops for different connectivity options. We observe

that intra-cloud paths always have a single ORG, indicating that regardless of

the target region, the CP routes traffic internally towards the destination VM.

More interestingly, the majority of inter-cloud paths only observe two ORGs

corresponding to the source and destination CPs. Only a small fraction (<4%)

of paths involves three ORGs, and upon closer examination of the corresponding

paths, we find that they traverse IXPs and involve traceroutes that originate from

Azure and are destined to Amazon’s network in another region. We reiterate that

                                                                 164
single ORG inter-CP paths correspond to traceroutes which are originated from

GCP’s network and does not reveal any internal hops of its network. For the cloud-

to-LG paths, we observe a different number of ORGs depending on the source CP

as well as the physical location of the target LG. The observations range from only

encountering the target LG’s ORG to seeing intermediary IXP hops as points of

peering. Lastly, we measure the stability of routes at the AS-level and observe

that all paths remain consistently stable over time with the exception of routes

sourced at Azure California and destined to Amazon Virginia. The latter usually

pass through private peerings between the CPs, and only less than 1% of our path

measurements go through an intermediary IXP. In short, we did not encounter any

transit providers in our measured CPP routes.

       ORG Hops    1         2        3
 1.0                                             1.0
 0.8
 0.6                                             0.5
                                           CDF




 0.4
 0.2                                             0.0
 0.0                                                    20      40          1      2       3
       intra-CP   inter-CP       LG                    Hop Distance             ORG Hops

                   (a)                                                (b)

Figure 25. (a) Distribution for number of ORG hops observed on intra-cloud, inter-
cloud, and cloud to LG paths. (b) Distribution of IP (AS/ORG) hop lengths for all
paths in left (right) plot.


       CPs are tightly interconnected with each other in the US. Not

observing any transit AS along our measured C2C paths motivated us to measure

the prevalence of this phenomenon by launching VM instances within all US

regions for our target CP networks. This results in a total of 17 VM instances

corresponding to 8, 5, and 4 regions within Azure, GCP, and AWS. We perform

                                          165
UDP and ICMP paris-traceroutes using scamper between all VM instances (272

unique pairs) in 10-minute rounds for four days and remove the small fraction

(9 ∗ 10−5 ) of traceroutes that encountered a loop along the path. Overall, we

observe that ICMP probes are better in revealing intermediate hops as well as

reaching the destination VMs. Similar to § 5.3.3, we annotate the hops of the

collected traceroutes with their corresponding ASN/ORG and infer the presence

of IXP hops along the path. For each path, we measure its IP and AS/ORG hop

length and show in Figure 25b the corresponding distributions. C2C paths exhibit

a median (0.9 percentile) IP hop length of 22 (33). Similar to our initial C2C

path measurements, with respect to AS/ORGhop length, we only observe ORGs

corresponding to the three target CPs as well as IXP ASNs for Coresite Any2

and Equinix. All ORG hop paths passing through an IXP correspond to paths

which are sourced from Azure and are destined to AWS. The measurements further

extend our initial observation regarding the rich connectivity of our three large CPs

and their tendency to avoid exchanging traffic through the public Internet.

       On the routing models of multi-cloud backbones. By leveraging

the AS/ORG paths described in § 5.3, we next identify the peering points

between the CPs. Identifying the peering point between two networks from

traceroute measurements is a challenging problem and the subject of many recent

studies Alexander et al. (2018); Luckie et al. (2016); Marder and Smith (2016).

For our study, we utilized the latest version of bdrmapIT Alexander et al. (2018)

to infer the interconnection segment on the collection of traceroutes that we have

gathered. Additionally, we manually inspected the inferred peering segments and,

where applicable, validated their correctness using (i) IXP address to tenant ASN

mapping and (ii) DNS names such as amazon.sjc-96cbe-1a.ntwk.msn.net

                                        166
 which is suggestive of peering between AWS and Azure. We find that bdrmapIT

 is unable to identify peering points between GCP and the other CPs since GCP

 only exposes external IP addresses for paths destined outside of its network,

 i.e. bdrmapIT is unaware of the source CPs network as it does not observe any

 addresses from that network on the initial set of hops. For these paths, we choose

 the first hop of the traceroute as the peering point only if it has an ASN equal to

 the target IP addresses ASN.Using this information, we measure the RTT between

 the source CP and the border interface to infer the geo-proximity of the peering

 point from the source CP. Using this heuristic allows us to analyze each CP’s

 inclination to use hot-potato routing.

                    AWS                         GCP                        AZR
       75             intra        75                         75
                      inter
RTT (ms)




       50                          50                         50
       25                          25                         25
        0                           0                          0
              GCP         AZR             AZR         AWS            GCP         AWS
 Figure 26. Distribution of RTT between the source CP and the peering hop.
 From left to right plots represent AWS, GCP, and Azure as the source CP.
 Each distribution is split based on intra (inter) region values into the left/blue
 (right/orange) halves, respectively.


            Figure 26 shows the distribution of RTT for the peering points between each

 CP. From left to right, the plots represent AWS, GCP, and Azure as the source

 CP. Each distribution is split based on intra (inter) region values into the left/blue

 (right/orange) halves, respectively. We observe that AWS’ peering points with

 other CPs are very close to their networks and therefore, AWS is employing hot-

 potato routing. For GCP, we find that hot-potato routing is never employed and

                                            167
traffic is always handed off near the destination region. The bi-modal distribution

of RTT values for each destination CP is centered at around 2ms, 12ms, 58ms, and

65ms corresponding to the intra-region latency for VA and CA, inter-region latency

to GCP, and inter-region latency to other CPs, respectively. Finally, Azure exhibits

mixed routing behavior. Specifically, Azure’s routing behavior depends on the

target network – Azure employs hot-potato routing for GCP, its Virginia-California

traffic destined to AWS is handed off in Los Angeles, and for inter-region paths

from California to AWS Virginia, the traffic is usually (99%) handed off in Dallas

TX and for the remainder is being exchanged through Digital Realty Atlanta’s IXP.

       From these observations, the routing behavior for each path can be modeled

with a simple threshold-based method. More concretely, for each path i with an

end-to-end latency of lei and a border latency of lbi , we can infer if source CP
                                          1
employs hot-potato routing if lbi <         l .
                                          10 ei
                                                  Otherwise, the source CP employs cold-
                             9                               1         9
potato routing (i.e. lbi >     l ).
                             10 ei
                                      The fractions (i.e.   10
                                                                 and   10
                                                                          )   are not prescriptive

and are derived based on the latency distributions depicted in Figure 26.

         5.4.3    Throughput Characteristics. CPP routes exhibit higher

and more stable throughput than TPP routes. Figure 27 depicts the

distribution of throughput values between different CPs using different connectivity

options. While intra-region measurements tend to have a similar median and

variance of throughput, we observe that with respect to inter-region measurements,

TPPs exhibit a lower median throughput with higher variance. Degradation of

throughput seems to be directly correlated with higher RTT values as shown in

Figure 23. Using our latency measurements, we also approximate loss-rate to

be 10−3 and 10−4 for TPP and CPP routes, respectively. Using the formula of



                                              168
 Throughput (Mb/s)        Intra Region            AWS           Inter Region
                     40

                     20
                                            CPP
                                            TPP
                     0
                Z R :CA CP:CA CP:VA ZR:VA      W S :VA CP:VA ZR:VA ZR:CA S:CA CP:CA
              A            G     G     A     A        G -A        A AW G
           CA-         CA-   VA-   VA-    CA- CA- CA VA- VA- VA-
                          Intra Region            GCP           Inter Region
 Throughput (Mb/s)




                     40

                     20
                                            CPP
                                            TPP
                     0
                Z R :CA WS:CA WS:VA ZR:VA      W S :VA CP:VA ZR:VA ZR:CA S:CA CP:CA
              A          A     A       A     A        G -A        A AW G
           CA-        CA-   VA-    VA-    CA- CA- CA VA- VA- VA-
                          Intra Region            AZR           Inter Region
 Throughput (Mb/s)




                     40

                     20
                                            CPP
                                            TPP
                     0
                   A     A     A     A           A     A     A     A    A      A
              W S:C GCP:C WS:V GCP:V        W S:V GCP:V AZR:V AZR:C WS:C GCP:C
            A                A  VA-       A            -           A
         CA-        CA-   VA-          CA- CA- CA VA- VA- VA-
Figure 27. Rows from top to bottom in the letter-value plots represent the
distribution of throughput between AWS’, GCP’s, and Azure’s network as the
source CP and various CP regions for intra- (inter-) region paths in left (right)
columns. CPP and TPP routes are depicted in blue and orange respectively.

Mathis et al. Mathis et al. (1997) to approximate TCP throughput4 , we can obtain

      4
    We do not have access to parameters such as TCP timeout delay and number of acknowledged
packets by each ACK to use more elaborate TCP models (e.g. Padhye, Firoiu, Towsley, and
Kurose (1998)).
                                             169
an upper bound for throughput for our measured loss-rate and latency values.

Figure 28 shows the upper bound of throughput for an MSS of 1460 bytes and

several modes of latency and loss-rate. For example, the upper bound of TCP

throughput for a 70ms latency and loss-rate of 10−3 (corresponding to the average

measured values for TPP routes between two coasts) is about 53Mb/s. While

this value is higher than our interface/link bandwidth cap of 50Mb/s, bursts of

packet loss or transient increases in latency could easily lead to sub-optimal TCP

throughput for TPP routes.


                                        10e-5        10e-4        10e-3        10e-2

                             104
         Throughput (Mb/s)




                             103

                             102

                             101
                                    0           20      40        60      80
                                                       RTT (ms)
Figure 28. Upper bound for TCP throughput using the formula of Mathis et
al. Mathis et al. (1997) with an MSS of 1460 bytes and various latency (X axis)
and loss-rates (log-scale Y axis) values.


         5.4.4                     Why do CPP routes have better throughput than TPP

routes?. TPPs have higher loss-rates than CPPs. Our initial methodology

for measuring loss-rate relied on our low-rate ping probes (outlined in § 5.3.3).

While this form of probing can produce a reliable estimate of average loss-rate over

a long period of time Tariq, Dhamdhere, Dovrolis, and Ammar (2005), it doesn’t

capture the dynamics of packet loss at finer resolutions. We thus modified our

                                                        170
probing methodology to incorporate an additional iperf3 measurement using UDP

probes between all CP instances. Each measurement is performed for 5 seconds and

packets are sent at a 50Mb/s rate.5 We measure the number of transmitted and

lost packets during each second and also count the number of packets that were

delivered out of order at the receiver. We perform these loss-rate measurements

for a full week. Based on this new set of measurements, we estimate the overall

loss-rate to be 5 ∗ 10−3 and 10−2 for CPP and TPP paths, respectively. Moreover,

we experience 0 packet loss in 76% (37%) of our sampling periods for CPP (TPP)

routes, indicating that losses for CPP routes tend to be more bursty than for TPP

routes. The bursty nature of packet losses for CPP routes could be detrimental to

real-time applications which can only tolerate certain levels of loss and should be

factored in by the client. The receivers did not observe any out-of-order packets

during our measurement period.

        Figure 29 shows the distribution of loss rate for various paths. The rows

(from top to bottom) correspond to AWS, GCP, and Azure as the source CP,

respectively. Intra-region (inter-region) measurements are shown in the left

(right) columns, and CPP (TPP) paths are depicted in blue (orange). We observe

consistently higher loss-rates for TPP routes compared to their CPP counterparts

and lower loss-rates for intra-CP routes in Virginia compared to California.

Moreover, paths destined to VMs in the California region show higher loss-rates

regardless of where the traffic has been sourced from, with asymmetrically lower

loss-rate on the reverse path indicating the presence of congested ingress points

for CPs within the California region. We also notice extremely low loss-rates for

intra-CP (except Azure) CPP routes between the US east and west coasts and for

   5
    In an ideal setting, we should not experience any packet losses as we are limiting our probing
rate at the source.
                                               171
 inter-CP CPP routes between the two coasts for certain CP pairs (e.g. AWS CA to

 GCP VA or Azure CA to AWS VA).


            0.10              Intra Region        AWS              Inter Region
            0.08                              CPP
                                              TPP
Loss-Rate




            0.06
            0.04
            0.02
            0.00
                 Z R :CA CP:CA CP:VA ZR:VA           S :VA CP:VA ZR:VA ZR:CA S:CA CP:CA
               A            G
                                VA-
                                    G
                                        VA-
                                            A      AW G -A            A AW G
            CA-         CA-                     CA- CA- CA VA- VA- VA-
            0.10            Intra Region          GCP              Inter Region
            0.08                            CPP
                                            TPP
Loss-Rate




            0.06
            0.04
            0.02
            0.00
                 Z R :CA WS:CA WS:VA ZR:VA         W S :VA CP:VA ZR:VA ZR:CA S:CA CP:CA
               A          A     A         A      A        G -A        A AW G
            CA-        CA-   VA-      VA-     CA- CA- CA VA- VA- VA-
        0.10           Intra Region               AZR      Inter Region
        0.08                         CPP
                                     TPP
Loss-Rate




        0.06
        0.04
        0.02
        0.00
                  A     A       A    A           A     A     A     A    A      A
             W S:C GCP:C WS:V GCP:V         W S:V GCP:V AZR:V AZR:C WS:C GCP:C
           A                A    VA-      A            -           A
        CA-        CA-   VA-           CA- CA- CA VA- VA- VA-
 Figure 29. Rows from top to bottom in the letter-value plots represent the
 distribution of loss-rate between AWS, GCP, and Azure as the source CP and
 various CP regions for intra- (inter-) region paths in left (right) columns. CPP and
 TPP routes are depicted using blue and orange respectively.



                                                172
            5.4.5   Summary. To summarize, our measurements for characterizing

C2C routes reveal the following important insights:

 • CPP routes are better than TPP routes in terms of latency as well as

      throughput. This finding begs the question: Given the sub-optimal performance

      of TPP routes and their cost implications, why should an enterprise seek

      connectivity from third-party providers when deciding on its multi-cloud

      strategy?

 • The better performance of CPP routes as compared to their TPP counterparts

      can be attributed to two factors: (a) the CPs’ rich (private) connectivity

      in different regions with other CPs (traffic is by-passing the BEP Internet

      altogether) and (b) more stable and better provisioned CPP (private)

      backbones.

5.5     Characteristics of E2C routes

         In this section, we turn our attention to E2C routes, characterize their

performance and attribute the observations to connectivity and routing.

            5.5.1   Latency Characteristics. TPP routes offer better latency

than BEP routes. Figure 30a shows the distribution of latency for our measured

E2C paths. We observe that TPP routes consistently outperform their BEP

counterparts by having a lower baseline of latency and also exhibiting less variation.

We observe a median latency of 11ms, 20ms, and 21ms for TPP routes towards

GCP, AWS, and Azure VM instances in California, respectively. We also observe

symmetric distributions on the reverse path but omit the results for brevity.

            5.5.2   Why do TPP routes offer better latency than BEP

routes?. In the case of our E2C paths, we always observe direct peerings between

the upstream provider (e.g. Cox Communications (AS22773)) and the CP network.

                                           173
         25                               60




                                                                   Throughput (Mb/s)
                                                                                   40
         20                               40
RTT (ms)




                                   RTT (ms)
         15            BEP                20                                       20                 CPP
                       TPP                                                                            TPP
         10 AWS:CA AZR:CA GCP:CA              0 AWS    AZR   GCP                       0 AWS   AZR   GCP

                   (a)                                (b)                                      (c)
Figure 30. (a) Distribution of latency for E2C paths between our server in AZ
and CP instances in California through TPP and BEP routes. Outliers on the Y-
axis have been deliberately cut-off to increase the readability of distributions. (b)
Distribution of RTT on the inferred peering hop for E2C paths sourced from CP
instances in California. (c) Distribution of throughput for E2C paths between our
server in AZ and CP instances in California through TPP and BEP routes.


Relying on bdrmapIT to infer the peering points from the traceroutes associated

with our E2C paths, we measure the latency on the peering hop. Figure 30b

shows the distribution of the latency for the peering hop for E2C paths originated

from the CPs’ instances in CA towards our enterprise server in AZ. While the

routing policies of GCP and Azure for E2C paths are similar to our observations

for C2C paths, Amazon seems to hand-off traffic near the destination which is

unlike their hot-potato tendencies for C2C paths. We hypothesize that this change

in AWS’ policy is to minimize the operational costs via their Transit Gateway

service Amazon (2019b). In addition, observing an equal or lower minimum latency

for TPP routes as compared to BEP routes suggests that TPP routes are shorter

than BEP paths6 . We also find (not shown here) that the average loss rate on

TPP routes is 6 ∗ 10−4 which is an order of magnitude lower than the loss rate

experienced on BEP routes (1.6 ∗ 10−3 ).

               5.5.3     Throughput Characteristics. TPP offers consistent

throughput for E2C paths. Figure 30c depicts the distribution of throughput

     6
   In the absence of information regarding the physical fiber paths, we rely on latency as a proxy
measure of path length.

                                                      174
for E2C paths between our server in AZ and CP instances in CA via TPP and BEP

routes, respectively. While we observe very consistent throughput values near the

purchased link capacity for TPP paths, BEP paths exhibit higher variability which

is expected given the best effort nature of public Internet paths.

            5.5.4   Summary. In summary, our measurements for characterizing

E2C routes support the following observations:

 • TPP routes exhibit better latency and throughput characteristics when

      compared with BEP routes.

 • The key reasons for the better performance of TPP routes as compared to

      their BEP counterparts include shorter (e.g. no transit providers) and better

      performant (e.g. lower loss rate) paths.

 • For an enterprise deciding on a suitable multi-cloud strategy, CPP routes are

      better only when enterprises are closer to the CPs’ native locations. Given that

      TPPs are present at many geographic locations where the CPs are not native,

      third-party providers offer better connectivity options compared to relying on

      the public Internet (i.e. using BEP routes).

5.6     Discussion and Future Work

         In this section, we discuss the limitations of our study and open issues. We

also discuss ongoing and future work.

         Representativeness. While the measurement setup depicted in Figure 22

represents a realistic enterprise network employing a multi-cloud strategy, it is

not the only representative setting. We note that there are a number of other

multi-cloud connectivity scenarios (e.g. distinct CPs in different continents,

different third-party providers in different countries, etc.), which we do not discuss

in this study. For example, what are the inter-cloud connectivity and routing
                                            175
characteristics between intercontinental VMs e.g. in USA and EU? Unfortunately,

the costs associated with establishing TPP paths prevent an exhaustive exploration

of multi-cloud connectivity in general and TPP connectivity in particular.

       Additional Cloud and Third-party Providers. Our study focuses on

multi-cloud connectivity options between three major CPs (i.e. AWS, Azure, and

GCP) as they collectively have a significant market share. We plan to consider

additional cloud providers (e.g. Alibaba, IBM Softlayer, Oracle, etc.) as part of

future work.

       Similar to the availability of other CPs, TPP connectivity between CPs are

offered via new services by a number of third-party connectivity providers Amazon

(2018c); Google (2018b); Microsoft (2018c). Exploring the TPP connectivity

provided by the ecosystem and economics of these different third-party providers

is an open problem. In addition, there has been no attempt to date to compare

their characteristics in terms of geography, routing, and performance, and we intend

to explore this aspect as part of future work.

       Longitudinal Analysis & Invariants. Despite the fact that we conduct

our measurements for about a month in the Spring of 2019 (as mentioned

in § 5.3.5), we note that our study is a short-term characterization of multi-

cloud connectivity options. Identifying the invariants in this context requires a

longitudinal analysis of measurements which is the focus of our ongoing work.

       Impact of Connectivity Options on Cloud-hosted Applications.

Modern cloud applications pose a wide variety of latency and throughput

requirements. For example, key-value stores are latency sensitive Tokusashi,

Matsutani, and Zilberman (2018), whereas applications like streaming and

geo-distributed analytics require low latency as well as high throughput Lai,

                                         176
Chowdhury, and Madhyastha (2018). In the face of such diverse requirements, what

is critically lacking is a systematic benchmarking of the impact of performance

tradeoffs between the BEP, CPP and TPP routes on the cloud-hosted applications

(e.g. key-value stores, streaming, etc.) While tackling WAN heterogeneity is the

focus of a recent effort Jonathan, Chandra, and Weissman (2018), dealing with

multi-cloud connectivity options and their impacts on applications is an open

problem.

       Connectivity and Routing Implications. In terms of routing and

connectivity, our study has two implications. First, while it is known that the

CPs are contributing to the ongoing “flattening" of the Internet Dhamdhere and

Dovrolis (2010); Gill, Arlitt, Li, and Mahanti (2008); Labovitz, Iekel-Johnson,

McPherson, Oberheide, and Jahanian (2010), our findings underscore the fact

that the third-party private connectivity providers act as a catalyst to the ongoing

flattening of the Internet. In addition, our study offers additional insights into the

ongoing “cloudification" of the Internet in terms of where and why cloud traffic

bypasses the BEP transits. Our study also implies that compared to the public

Internet, CPP backbones are better performant, more stable, and more secure

(invisible and isolated from the BEP transits), making them first-class citizens

for future Internet connectivity. In light of these two implications, our study also

warrants revisiting existing efforts from the multi-cloud perspective. In particular,

we plan to pursue issues such as failure detection and characterization for multi-

cloud services (e.g. Zhang, Zhang, Pai, Peterson, and Wang (2004)) and multi-

cloud reliability (e.g. Quan, Heidemann, and Pradkin (2013)). Other open problems

concern inferring inter-CP congestion (e.g. Dhamdhere et al. (2018)) and examining



                                          177
the economics of multi-cloud strategies (e.g. Zarchy, Dhamdhere, Dovrolis, and

Schapira (2018)).

5.7   Summary

       Enterprises are connecting to multiple CPs at an unprecedented pace and

multi-cloud strategies are here to stay. Due to this development, in addition

to best-effort public (BEP) transit provider-based connectivity, two additional

connectivity options are available in today’s Internet: third-party private (TPP)

connectivity and cloud-provider private (CPP) connectivity.

       In this work, we perform a first-of-its-kind measurement study to understand

the tradeoffs between three popular multi-cloud connectivity options (CPP vs.

TPP vs. BEP). Based on our cloud-centric measurements, we find that CPP routes

are better than TPP routes in terms of latency as well as throughput. The better

performance of CPPs can be attributed to (a) CPs’ rich connectivity in different

regions with other CPs (by-passing the BEP Internet altogether) and (b) CPs’

stable and well-designed private backbones. In addition, we find that TPP routes

exhibit better latency and throughput characteristics when compared with BEP

routes. The key reasons include shorter paths and lower loss rates compared to

the BEP transits. Although limited in scale, our work highlights the need for more

transparency and access to open measurement platforms by all the entities involved

in interconnecting enterprises with multiple clouds.




                                         178
                                       CHAPTER VI

                             OPTIMAL CLOUD OVERLAYS

        Motivated by the observations in Chapter V on the diversity of performance

characteristics of various cloud connectivity paths, in this chapter, we design an

extensible measurement platform for cloud environments. Furthermore, we create

a decision support framework that facilitates enterprises in creating optimal multi-

cloud deployments.

        The content in this chapter is the result of a collaboration between Bahador

Yeganeh with Ramakrishnan Durairajan, Reza Rejaie, and Walter Willinger.

Bahador Yeganeh is the primary author of this work and responsible for desiging

all systems, conducting measurements and producing the presented analyses.

6.1    Introduction

        Modern enterprises are adopting multi-cloud strategies1 at a rapid pace.

Among the benefits of pursuing such strategies are competitive pricing, vendor

lockout, global reach, and requirements for data sovereignty. According to a recent

industry report, more than 85% of enterprises have already adopted multi-cloud

strategies Krishna et al. (2018).

        Despite this existing market push for multi-cloud strategies, we posit that

there is a technology pull: seamlessly connecting resources across disparate, already-

competitive cloud providers (CPs) in a performance- and cost-aware manner is

an open problem. This problem is further complicated by two keys issues. First,

prior research on overlays has focused either on the public Internet-based Andersen,

Balakrishnan, Kaashoek, and Morris (2001) or on CP paths in isolation Costa,

Migliavacca, Pietzuch, and Wolf (2012); Haq, Raja, and Dogar (2017); Lai et al.

   1
   This is different from hybrid cloud computing, where a direct connection exists between a
public cloud and private on-premises enterprise server(s).
                                              179
(2018). Second, because CP backbones are private and are invisible to traditional

measurement techniques, we lack a basic understanding of their performance, path,

and traffic-cost characteristics.




        AWS
        AZR
        GCP

Figure 31. Global regions for AWS, Azure, and GCP.


       To examine the benefits of multi-cloud overlays, we perform a third-party,

cloud-centric measurement study2 to understand the performance, path, and traffic-

cost characteristics of three major global-scale private cloud backbones (i.e., AWS,

Azure and GCP). Our measurements were ran across 6 continents and 23 countries

for 2 weeks (see Figure 31). Our measurements reveal a number of key insights.

First, the cloud backbones (a) are optimal (i.e., 2x reduction in latency inflation

ratio, which is defined as the ratio between line-of-sight and latency-based speed-

of-light distances, w.r.t. public Internet), (b) lack path and delay asymmetry, and

(c) are tightly interconnected with other CPs. Second, multi-cloud paths exhibit

higher latency reductions than single cloud paths; e.g., 67% of all paths, 54% of

all intra-CP paths, and 74% of all inter-CP paths experience an improvement in

their latencies. Third, although traffic costs vary from location to location and

  2
   Code and datasets used in this study will be openly available to the community upon
publication.

                                             180
across CPs, the costs are not prohibitively high. Based on these insights, we argue

that enterprises and cloud users can indeed benefit from future efforts aimed at

constructing high-performance overlay networks atop multi-cloud underlays in a

performance- and cost-aware manner.

       While our initial findings suggest that multi-cloud overlays are indeed

beneficial for enterprises, establishing overlay-based connectivity to route enterprise

traffic in a cost- and performance-aware manner among islands of disparate CP

resources is an open and challenging problem. For one, the problem is complicated

by the lack of continuous multi-cloud measurements and vendor-agnostic APIs.

       To tackle these challenges, the main goal of this chapter is to create a service

to establish and manage overlays on top of multi-cloud underlays. The starting

point of our approach to create a cloud-centric measurement and management

service called Tondbaz that continuously monitors the inter- and intra-CP links. At

the core of Tondbaz are vendor-agnostic APIs to connect the disparate island of CP

resources. With the measurement service and APIs in place, Tondbaz constructs a

directed graph consisting of nodes that represent VM instances, given two locations

(e.g., cities) as input by a cloud user. Edges in the graph will be annotated with

latencies and traffic-cost values from the measurement service.

       This study makes the following contributions:


   – We propose and design an extensible system called Tondbaz to facilitate the

     measurement of multi-CP network paths.

   – We design a decision-support framework for constructing optimal cloud

     overlay paths using insights gleaned using Tondbaz .




                                          181
   – We demonstrate the cost and performance benefits of utilizing a decision-

      support framework by integrating it into the snitch mechanism of Cassandra,

      a distributed key-value store.


       The remainder of this chapter is organized by, first presenting the design

objectives of our measurement platform and provide formal definitions for our

optimization framework in §6.2. In §6.3 we utilize our measurement platform to

measure the path characteristics of the top 3 CPs on a global scale and apply our

optimization framework to obtain optimal paths between all pairs of CP regions.

Next, we demonstrate the applicability of our overlays for a handful of paths and

discuss the operational trade-offs of overlays in §6.4.2. Lastly, we conclude this

chapter by summarizing our findings in §6.5.

6.2   Tondbaz Design

       In this section, we will describe the Tondbaz ’s components and their

corresponding design principles and objectives. At a high-level Tondbaz consists

of 2 main components namely, (i) a measurement platform for conducting cloud

to cloud measurements 6.2.1 and (ii) a decision support framework for obtaining

optimal cloud paths based on a set of constraints 6.2.3.

         6.2.1    Measurement Platform. The measurement platform is

designed with low resource overhead and extensibility as objectives in mind. The

measurement platform consists of the following 3 main components:


   – an agent for conducting/gathering multi-cloud performance measurements

   – a centralized data-store for collecting and archiving the measurement results

      from each agent


                                          182
                                                                    1



                                                                    2



                                                                    3




                    1                         2                      3




Figure 32. Overview of components for the measurement system including the
centralized controller, measurement agents, and data-store.

   – a centralized controller/scheduler that configures each measurement agent and

     schedules measurement tasks

       Figure 32 shows a high-level overview of the components for the

measurement platform as well as how they communicate with each other. The

agents communicate with the centralized controller in a client-server model

over a control channel. Furthermore, the agents store the result of running

the measurements in a data-store. The data-store and controller by design are

decoupled from each other although they can reside on the same node.

         6.2.1.1    Measurement Agent. The measurement agent is designed

with multiple objectives in mind namely, (i) ease of deployment, (ii) low resource

overhead, (iii) and extensibility of measurements. In the following, we describe how

each of these design objectives are achieved within our measurement agents.

Ease of Deployment: The measurement subsystem is designed to be installed

as a daemon on the host system with minimal dependencies (except for a

python distribution) using a simple shell script. The only required parameter for
                                        183
installation is the address for the controller that the agent will be communicating

with. Upon installation, the agent would announce itself to the controller on a

predefined channel. After registration with the controller configuration of the agent

including (but not limited to) target addresses, output destination, execution of

measurement tasks should all happen through a configuration channel and therefore

can be managed from a centralized location. We rely on the MQTT protocol

OASIS (2019) for communication between agents and the centralized controller .

Low Resource Overhead: A barebone agent is simply a daemon listening for

incoming commands from the centralized controller on its control channel. Using

this minimal design the agent is implemented in less than 1k lines of python code

with a single dependency on the Eclipse Paho MQTT library Eclipse (2019). The

agent uses 10MB of memory on runtime and requires less than 100KB of memory

to maintain the state for ongoing measurement.

Extensibility of Measurements: The agents should support a wide range

of measurements including standard network measurement tools such as ping,

traceroute, iperf as well as any custom executables. Each measurement tool

should be implemented as a container image. In addition to the container image,

the developer should implement a Python class that inherits from a standard

interface depicting how the agent can communicate with the measurement tool.

Measurement results should be serialized into a predefined JSON schema prior to

being stored on the data-store.

         6.2.1.2    Centralized Controller. The coordinator awaits incoming

connections from agents that announce their presence and register themselves with

the coordinator (anc). After the initial registration the coordinator can schedule

and conduct measurements on the agent if needed. The coordinator would maintain

                                         184
a control channel with each agent which is used for (i) monitoring the health of

each agent through heartbeat messages (hbt), (ii) sending configuration parameters

(cfg), (iii) scheduling and issuing measurement commands (run and fin), and (iv)

monitoring/reporting the status of ongoing measurements (sta).

         6.2.2   Data Collector. Tondbaz agents can store the results of each

measurement locally for later aggregation in a centralized data-store. Additionally,

each agent can stream the results of each measurement back to the centralized

data-store. Each measurement result is presented as a JSON object containing

generic fields (start time, end time, measurement id, agent address) in addition

to a JSON serialized representation of the measurement output provided by the

commands plugin for Tondbaz agent. We rely on MongoDB for our centralized data

collector given that our data has a NoSQL JSON schema.

         6.2.3   Optimization Framework. In addition to the measurement

platform, we have designed an optimization framework that can identify cloud

overlay paths that optimize a network performance metric while satisfying certain

constraints specified by the user. The optimization framework relies on the stream

of measurements reported by all agents within the data-store and using them would

create an internal model of the network using a directed graph G where nodes

represent agent instances and edges depict the network path between each instance.

                                         G = (V, E)

                                    V = {v1 , v2 , ..., vN }                       (6.1)

                  E = {eij = (vi , vj )| ∀vi , vj ∈ V (vi , vj ) 6= (vj , vi )}
       Measurement results pertaining to the network path are added as edge

attributes. Additionally, the optimization framework relies on an internal cost

model that calculates the cost of transmitting traffic over each path based on

                                              185
the policies that each CP advertises on their websites Amazon (2019c); Google

(2019); Microsoft (2019). The details of each CP’s pricing policy differs from

one to another but at a high-level is governed by 4 common rules namely, (i)

CPs only charge for egress traffic from a compute instance, (ii) customers are

charged based on the volume of exchanged traffic (ii) traffic remaining within a

CPs network has a lower charge rate, (iii) each source/destination region (or a

combination of both) has a specific charging rate. While the measurement platform

is designed to be extensible and supports a wide variety of measurement tools, the

optimization framework only utilizes latency and cost measurements. Extensions to

the framework to support additional network metrics is part of our future work.

       The optimization framework requires the user to specify a series of

constraints namely, (i) set of target regions where the user needs to have a

deployment (R), (ii) set of regions that should be avoided when constructing an

optimal path (A), (iii) a set of region pairs that would be communicating with

each other through the overlay (T ), and (iv) an overall budget for traffic cost (B).

Equation (6.2) formally defines the aforementioned constraints. This formulation of

the optimization problem can be mapped to the Steiner tree graph problem Hwang

and Richards (1992) which is known to be NP-complete.

                                        A⊂V

                                     V0 =V −A
                                                                                   (6.2)
                                                0
                                        R⊂V

                              Tij = (vi , vj ); ∀vi , vj ∈ R
       We approximate the solution (if any) to this optimization problem by (i)

creating an induced graph by removing all regions in A from its internal directed



                                          186
graph G (Equation (6.3))

                                          G0 = (V 0 , E 0 )
                                                                                          (6.3)
                           0                  0                          0
                         V = V − A , E = (vi , vj ) ∀vi , vj ∈ V
       (ii) performing a breadth first search (BFS) to obtain all paths (P ) between

each pair of regions within T that have an overall cost (C function) within the

budget B and do not have an inflated end-to-end latency compared to the default

path (lij ) (Equations (6.4) and (6.5))

                               Pij = {pxij | pxij = (vx1 , ..., vxn ),

                 ∀1 ≤ k < n (vxk , vxk+1 ) ∈ E 0 and vx1 = vi , vxn = vj ,                (6.4)

                                    1 ≤ x ≤ b(|V | − 2)!ec}


                        Pij0 = {pxij | C(pxij ) ≤ B, L(pxij ) ≤ lij }
                                         X
                              C(pxij ) =      cwz ; ∀ewz ∈ pxij                           (6.5)
                                         X
                              L(pxij ) =      lwz ; ∀ewz ∈ pxij
       and (iii) selecting the overlay that has the overall greatest reduction in

latency among all possible sets of overlays (Equation (6.6)).



                         O = {pij | ∀pij ∈ Pij0 and ∀i, j eij ∈ T }
                                    X
                          L(O) =        lij − L(pij ) ; ∀pij ∈ O                          (6.6)

                               OP T = Ox ; x = argmin(L(Ox ))
       The time complexity of this approach is equal to performing a BFS

(O(|V 0 | + |E 0 |)) for each pair of nodes in T in addition to selecting the set of

paths which result in the most amount of overall latency reduction. The latter

step has a time complexity of O((|P 0 |!)|T | ), where |P 0 | = b(|V 0 | − 2)!ec. While

the high complexity of the second step might seem intractable, our BFS algorithm

                                                  187
would backtrack whenever it encounters a path that exceeds our total budget B

or has an end-to-end latency greater than the default path lij . Additionally, based

on our empirical evaluation we observe that each relay point can add about 1ms

of forwarding latency and therefore our search would backtrack from paths that

yield less than 1ms of latency improvement per relay hop. Through our analysis, we

observed that on average 31% number of paths that do not exceed the default end-

to-end latency with an unlimited budget effectively making our solution tractable.

6.3     A Case for Multi-cloud Overlays

         In this section, we demonstrate the use of Tondbaz to conduct path and

latency measurements in a multi-cloud setting (§ 6.3.1), followed by the optimality

of single CP paths (§ 6.3.2) and motivating performance gains of multi-cloud paths

(§ 6.3.3). Next, we present the challenge of inferring traffic cost profiles which

hinders the realization of multi-cloud overlays (§ 6.3.4). Lastly, we investigate the

possibility of utilizing IXP points for the creation of further optimal overlays in

§ 6.3.6.

            6.3.1    Measurement Setting & Data Collection. We target the

top 3 CPs namely, Amazon Web Services (AWS), Microsoft Azure, and Google

Cloud Platform (GCP). We create small VM instances within all global regions

of these CPs resulting in a total of 68 regions (17, 31, and 20 for AWS, Azure,

and GCP respectively). regions are dedicated to government agencies and are

not available to the public. Furthermore, we were not able to allocate VMs in 5

Azure regions3 . Through our private correspondence with the support team, we

learned that those regions either are mainly designed for storage redundancy of

nearby regions or did not have free resources available at the time of this study.

  3
      Central India, Canada East, France South, South Africa West, and Australia Central


                                               188
Additionally, we identify the datacenter’s geo-location for each CP. Although CPs

are secretive with respect to the location of their datacenters, various sources do

point to their exact or approximate location Build Azure (2019); Burrington (2016);

Google (2019); Miller (2015); Plaven (2017); WikiLeaks (2018); Williams (2016)

and in the absence of any online information we resort to the nearest metro area

that the CP advertises.

       We conduct pairwise latency and path measurements between all VM

instances in 10 minute rounds for the duration of 2 weeks in October of 2019

resulting in about 20k latency and path samples between each pair of VM. Each

round of measurement consists of 5 latency probes and 2 (UDP, and TCP) paris-

traceroute path measurements. The resultant traceroute hops from our path

measurements are annotated with their corresponding ASN using BGP feeds

of Routeviews University of Oregon (2018) and RIPE RIPE (2018) collectors

aggregated by BGPStream Orsini, King, Giordano, Giotsas, and Dainotti (2016).

Furthermore, we map each hop to its owner ORG by relying on CAIDA’s AS-

to-ORG dataset Huffaker et al. (2018). Lastly, the existence of IXP hops along

the path is checked by matching hop addresses against the set of IXP prefixes

published by PeeringDB PeeringDB (2017), Packet Clearing House (PCH) Packet

Clearing House (2017), and Hurricane Electric (HE) using CAIDA’s aggregate IXP

dataset CAIDA (2018).

         6.3.2     Are Cloud Backbones Optimal?.

         6.3.2.1    Path Characteristics of CP Backbones. As mentioned

above, we measure the AS and ORG path for all of the collected traceroutes. In

all our measurements, we observe multiple ASes for AWS only (AS14618 and

AS16509). Hence, without the loss of generality, from this point onward we only

                                         189
present statistics using the ORG measure. We measure the ORG-hop length for

all unique paths and find that for 97.86% of our measurements, we only observe 2

ORGs (i.e. the source and destination CP networks). Out of the remaining paths,

we observe that 2.12%, and 0.02% have 3, and 4 ORG hops respectively. These

observations indicate two key results. First, all intra-CP measurements (and, hence,

traffic) remain almost always within the CPs’ backbones. Second, the CP networks

are tightly interconnected with each other and establish private peerings between

each other on a global scale. Surprised by these findings, we take a closer look at

the 2.14% of paths which include other networks along their path. About 76%

of these paths have a single IXP hop between the source and destination CPs.

That is, the CPs are peering directly with each other over an IXP fabric. For the

remaining 24% of paths, we observe 2 prominent patterns: (i) paths sourced from

AWS in Seoul and Singapore as well as various GCP regions that are destined to

Azure in UAE; and (ii) paths sourced from various AWS regions in Europe and

destined to Azure in Busan, Korea.

         Main findings: All intra-CP and the majority of inter-CP traffic remains

within the CPs’ network and is transmitted between the CPs’ networks over private

and public peerings. CP’s backbones are tightly interconnected and can be leveraged

for creating a global multi-cloud overlay.

           6.3.2.2          Performance Characteristics of CP Backbones. Using

the physical location of datacenters for each CP, we measure the geo-distance

between each pair of regions within a CP’s network using the Haversine distance

Robusto (1957) and approximate the optimal latency using speed of light (SPL)

constraints.4 Figure 33 depicts the CDF of latency inflation, which is defined as the

  4            2
      We use   3   ∗ C within our calculations Singla et al. (2014)


                                                     190
ratio of measured latency and SPL latency calculated using line-of-sight distances

for each CP.

          1.0

          0.5                                                          AWS
                                                                       AZR
                                                                       GCP
          0.0
                         2         3            4       5          6
                              Latency Inflation Ratio
Figure 33. Distribution of latency inflation between network latency and RTT
approximation using speed of light constraints for all regions of each CP.


       We observe median latency inflation of about 1.68, 1.63, and 1.67 for intra-

CP paths of AWS, Azure, and GCP, respectively. Compared to a median latency

inflation ratio of 3.2 for public Internet paths Singla et al. (2014), these low latency

inflation ratios attest to the optimal fiber paths and routes that are employed by

CPs. Furthermore, Azure and GCP paths have long-tail in their latency inflation

distributions while all intra-CP paths for AWS have a ratio of less than 3.6, making

it the most optimal backbone among all CPs.

       Main findings: CPs employ an optimal fiber backbone with near line-

of-sight latencies to create a global network. This result opens up a tantalizing

opportunity to construct multi-cloud overlays in a performance-aware manner.

         6.3.2.3    Latency Characteristics of CP Backbones. Next, we turn

our attention to the latency characteristics of the CP backbones toward the goal

of creating CP-specific latency profiles. Figure 34 shows the distribution of RTT

and standard deviation across different measurements for all paths between VM
                                          191
pairs. We observe a wide range of RTT values between VM instances, which can

be explained by the geographic distance between CP regions. Furthermore, latency

between each pair is relatively stable across different measurements with a 90th -

percentile coefficient of variation of less than 0.05.

                               Coefficient of Variation
                0.0      0.5       1.0       1.5         2.0    2.5         3.0
          1.0

          0.5
                                                                      RTT
                                                                      CV
          0.0
                         100          200          300         400
                                  Median RTT (ms)
Figure 34. Distribution of median RTT and coefficient of variation for latency
measurements between all VM pairs.


       In addition to stability characteristics, we also compare the forward and

reverse path latencies by measuring the difference between the median of latencies

in each direction. We find that paths exhibit symmetric latencies with a 95th -

percentile latency difference of 0.22ms among all paths as shown in Figure 35.

       Main findings: Cloud paths exhibit a stable and symmetric latency profile

over our measurement period, making them ideal for reliable multi-cloud overlays.

          6.3.3     Are Multi-Cloud Paths Better Than Single Cloud Paths?.

          6.3.3.1     Overall Latency Improvements. The distribution of latency

reduction percentage for all, intra-CP, and inter-CP paths is shown in Figure

36. From this figure, we observe that about 55%, 76%, and 69% of all, intra-CP,

and inter-CP paths experience an improvement in their latency using an indirect
                                           192
           1.0

           0.5

           0.0
                 0.0         0.5           1.0          1.5           2.0
                           |Fwd - Rev| Latency (ms)
Figure 35. Distribution for difference in latency between forward and reverse paths
for unique paths.


optimal path. These optimal paths can be constructed by relaying traffic through

one or multiple intermediary CP regions. We provide more details on the intra- and

inter-CP optimal overlay paths below.


         1.0

         0.5                                                all
                                                            intra-CP
                                                            inter-CP
         0.0
                 0             20             40             60
                            Latency Reduction %
Figure 36. Distribution for RTT reduction ratio through all, intra-CP, and inter-CP
optimal paths.


       To complement Figure 36, Figure 37-(left) shows the distribution of the

number of relay hops along optimal paths. From this figure, we find that the

majority (64%) of optimal paths can be constructed using only one relay hop while

some paths can go through as many as 5 relay hops. Almost all of the optimal

                                        193
paths with latency reductions greater than 30% have less than 4 relay hops as

shown in Figure 37-(right). In addition, we observe that the median of latency

reduction percentage increases with the number of relay hops. We note that (a)

forwarding traffic through additional relay hops might have negative effects (e.g.,

increase in latencies) and (b) optimal paths with many relay hops might have an

alternative path with fewer hops and comparable performance.




                                                                           Latency Reduction %
               1000                                                    60
                                                                       40
       Paths




                500                                                    20
                  0                                                    0
                      1 2 3 4 5 6              1 2 3 4 5 6
                         Relay Hops                Relay Hops
Figure 37. Distribution for the number of relay hops along optimal paths (left) and
the distribution of latency reduction percentage for optimal paths grouped based on
the number of relay hops (right).


       Lastly, we measure the prevalence of each CP along optimal paths and find

that AWS, Azure, and GCP nodes are selected as relays for 55%, 48%, and 28% of

optimal paths.

         6.3.3.2      Intra-CP Latency Improvements. We present statistics

on the possibility of optimal overlay paths that are sourced and destined towards

the same CP network (i.e. intra-CP overlays). Figure 38 depicts the distribution

of latency reduction ratio for intra-CP paths of each CP. The distributions are

grouped based on the CP network. Furthermore, each boxplot’s color represents

the ownership of relay nodes with A, Z, and G corresponding to AWS, Azure,

and GCP relays respectively. From this figure, we observe that intra-CP paths
                                         194
                                   ..G     .Z.   .Z.G         A..   A..G   A.Z.   A.Z.G


        Latency Reduction %   60
                              40
                              20
                               0
                                         AWS              AZR              GCP
Figure 38. Distribution of latency reduction percentage for intra-CP paths of each
CP, divided based on the ownership of the relay node.


can benefit from relay nodes within their own network in addition to nodes from

other CPs. Furthermore, we observe that intra-CP paths within GCP’s network

observe the greatest reduction in latency among all CPs with AWS relays being

the most effective in lowering the end-to-end latency. Upon closer examination, we

observe that the majority of these paths correspond to GCP regions within Europe

communicating with GCP regions in either India or Hong Kong.

       Main findings: Our measurements demonstrate that surprisingly, intra-

CP paths can observe end-to-end latency reductions via optimal paths that are

constructed with relay hops that belong to a different CP.

               6.3.3.3             Inter-CP Latency Improvements. We next focus on the

possibility of overlay paths that are sourced from one CP but destined towards

a different CP (i.e. inter-CP overlays). Figure 39 presents the latency reduction

percentage for inter-CP paths. For brevity, only one direction of each CP pair

is presented as the reverse direction is identical. Similar to Figure 38 the color

and label encoding of each boxplot represent the ownership of relay nodes. From

this figure, we make a number of observations. First, optimal paths constructed

                                                        195
                                    ..G    .Z.     .Z.G         A..   A..G    A.Z.     A.Z.G


        Latency Reduction %
                              60
                              40
                              20
                               0
                                     AWS-AZR              AZR-GCP            AWS-GCP
Figure 39. Distribution of latency reduction ratio for inter-CP paths of each CP,
divided based on the ownership of the relay nodes.


using GCP nodes as relays exhibit the least amount of latency reduction. Second,

AWS-AZR paths have lower values of latency reduction with equal amounts of

reduction across each relay type. This is indicative of a tight coupling between

these networks. Lastly, optimal paths with AWS relays tend to have higher latency

reductions which are in line with our observations in §6.3.2.1 regarding AWS’

backbone.

       Main findings: Similar to intra-CP paths, inter-CP paths can benefit from

relay nodes to construct new, optimal paths with lower latencies. Moreover, inter-

CP paths tend to experience greater reductions in their latency.

               6.3.4               Are there Challenges in Creating Multi-Cloud Overlays?.

               6.3.4.1               Traffic Costs of CP Backbones. We turn our focus to the

cost of sending traffic via CP backbones. Commonly, CPs charge their customers

for traffic that is transmitted from their VM instances. That is, customers are

charged only for egress traffic; all ingress traffic is free. Moreover, traffic is billed

on a volume-by-volume basis (e.g., per GB of egress traffic) but each CP has a

different set of rules and rates that govern their pricing policy. For example, we find

                                                          196
that AWS and GCP have lower rates for traffic that remains within their network

(i.e. is sourced and destined between different regions of their network) while Azure

is agnostic to the destination of the traffic. Furthermore, GCP has different rates

for traffic destined to the Internet based on the geographic region of the destination

address. We compile all these pricing policies based on the information that each

CP provides on their webpage Amazon (2019c); Google (2019); Microsoft (2019)

into a series of rules that allow us to infer the cost of transmitting traffic from each

CP instance to other destinations.

       Traffic costs for AWS. For AWS (see Figure 40), we observe that intra-

CP traffic is always cheaper than inter-CP traffic with the exception of traffic that

is sourced from Australia and Korea. Furthermore, traffic sourced from the US,

Canada, and European regions have the lowest rate while traffic sourced from

Brazil has the highest charge rate per volume of traffic. Lastly, traffic is priced in

multiple tiers defined based on the volume of exchanged traffic and we see that

exchanging extra traffic leads to lower charging rates.


                       40       src           Japan
       Cost (1k USD)




                                US Europe     Brazil
                                India         inter
                                Korea         False
                       20       Australia     True



                        0
                            0           50         100      150         200
                                             Traffic (TB)
Figure 40. Cost of transmitting traffic sourced from different groupings of AWS
regions. Dashed (solid) lines present inter-CP (intra-CP) traffic cost.



                                               197
       Traffic costs for Azure. Azure’s pricing policy is much more simple (see

Figure 41). Global regions are split into multiple large size areas namely (i) North

America and Europe excluding Germany, (ii) Asia and Pacific, (iii) South America,

and (iv) Germany. Each of these areas has a different rate, with North America

and Europe being the cheapest while traffic sourced from South America can cost

up to 3x more than North America. Lastly, as mentioned earlier, Azure is agnostic

to the destination of traffic and does not differentiate between intra-CP and traffic

destined to the Internet.


                       60       US Europe
       Cost (1k USD)




                                Asia Pacific
                       40       Brazil
                                Germany
                       20
                        0
                            0        100           200        300     400
                                               Traffic (TB)
Figure 41. Cost of transmitting traffic sourced from different groupings of Azure
regions.


       Traffic costs for GCP. GCP’s pricing policy is the most complicated

among the top 3 CPs (see Figure 42). At a high level, GCP’s pricing policy can

be determined based on (i) source region, (ii) destination geographic location, and

(iii) whether the destination is within GCP’s network or the Internet (intra-CP vs

inter-CP). Intra-CP traffic generally has a lower rate compared to inter-CP traffic.

Furthermore, traffic destined to China (excluding Hong Kong) and Australia have

higher rates compared to other global destinations.


                                                198
                                                     Inter-CP
                       80         src                 dst
       Cost (1k USD)              US Europe           China
                       60         Singapore           Australia
                                  Japan               Global
                                  Australia
                       40
                       20
                        0
                                                     Intra-CP
                       80         src                      Asia
                                  Intercontinental         South America
       Cost (1k USD)




                       60         US Canada                Ocenia
                                  Europe
                       40
                       20
                        0
                             0          100            200         300     400
                                                Traffic (TB)
Figure 42. Cost of transmitting traffic sourced from different groupings of GCP
regions. Solid, dashed, and dotted lines represent cost of traffic destined to China
(excluding Hong Kong), Australia, and all other global regions accordingly.


           6.3.5            Cost Penalty for Multi-Cloud Overlays. Next, we seek

an answer to the question of the cost incurred by using relay nodes from other

CPs. Figure 43 depicts the distribution of cost penalty (i.e. the difference between

the optimal overlay cost and default path cost) within various latency reduction

percentage bins for transmitting 1TB of traffic.



                                                     199
       Cost Penalty (USD)
                                                                     inter-CP
                                                                     intra-CP
                            400
                            200
                              0
                               , 1 0] , 20] , 30] , 40] , 50] , 60] , 70]
                             (0 (10 (20 (30 (40 (50 (60
                                          Latency Reduction %
Figure 43. Distribution of cost penalty within different latency reduction ratio bins
for intra-CP and inter-CP paths.


       From Figure 43, we make a number of key observations. First, we find that

optimal paths between intra-CP endpoints incur higher cost penalties compared to

inter-CP paths. This is expected as intra-CP paths tend to have lower charging

rates and optimal overlays usually pass through a 3rd party CP’s backbone.

Counter-intuitively, we next observe that the median cost penalty for paths with

the most amount of latency reduction is less or equal to less optimal overlay paths.

Lastly, we find that 2 of our optimal overlay paths have a negative cost penalty.

That is, the optimal path costs are lesser than transmitting traffic directly between

the endpoints. Upon closer inspection, we find that all of these paths are destined

to the AWS Australia region and are sourced from GCP regions in Oregon US and

Montreal, Canada, respectively. All of these paths benefit from AWS’ lower transit

cost towards Australia by handing off their traffic towards a nearby AWS region.

Motivated by this observation, for each set of endpoint pairs we find the path with

the minimum cost. We find that the cost of traffic sourced from all GCP regions

(except for GCP Australia) and destined to AWS Australia can be reduced by 28%
                                                200
by relying on AWS’ network as a relay hop. These cost-optimal paths on average

experience a 72% inflation in their latency.

       Main findings: The added cost of overlay networks is not highly

prohibitive. In addition to the inherent benefits of multi-cloud settings, our results

demonstrate that enterprises and cloud users can construct high-performance

overlay networks atop multi-cloud underlays in a cost-aware manner.

         6.3.6    Further Optimization Through IXPs. Motivated by the

observations within Kotronis et al. (2016), we investigate the possibility of creating

optimal inter-CP paths via IXP relays. Using this approach an enterprise (or

possibly a third-party relay service provider) would peer CPs at IXPs that have

multiple CPs present and would relay traffic between their networks. We should

note that the results presented in this section offer upper bounds on the amount

of latency reduction and realization of these values in practice are dependent on

several factors including (i) enterprise or a third-party entity should be present

at IXP relay points and has to peer with the corresponding CPs, (ii) relay nodes

should implement an address translation scheme since CPs would only route

traffic to destination addresses within a peers address space, (iii) CPs could have

restrictions on which portion of their network is reachable from each peering point

and therefor a customers cloud traffic might not be routable to certain IXP relays.

       Towards this goal, we gather a list of ∼20k IXP tenant interface addresses

using CAIDA’s aggregate IXP dataset CAIDA (2018) corresponding to 741 IXPs

in total. We limit our focus to 143 IXPs which host more than one of our target

CPs (i.e. an enterprise or third-party relay provider has the opportunity to peer

with more than one CP). Given that IXP tenants can peer remotely, we only limit

our focus to the interface addresses of CPs within an IXP and perform path and

                                          201
latency probes using the same methodology described in § 6.3.1. We approximate

the latency of each CP region towards each unique IXP by relying on the median of

measured latencies.

       We augment our connectivity graph by creating nodes for each IXP and

place an edge between IXPs and the regions of each CP that is a tenant of that

IXP. Furthermore, we annotate edges with their corresponding measured minimum

latency. Using this augmented graph we measure the optimal overlay paths between

CP region pairs. Out of the 4.56k path, about 3.21k (compared to 3.16k CP

based overlays) can benefit from overlay paths that have IXP relay nodes. About

0.19k of the optimized overlay paths exclusively rely on IXP relays, i.e. CPs do

not appear as relay nodes along the path. Figure 44 depicts the distribution of

latency reduction percentage for optimal paths using CP relays, IXP relays, and a

combination of CP and IXP relays. From this figure, we observe that IXP relays

offer minimal improvement to multi-cloud overlay paths indicating that CP paths

are extremely optimized and that CPs tend to leverage peering opportunities with

other CPs when available. We should note that the results in this section explore a

hypothetical relay service provider that only operates within IXPs that have more

than one CP. Further improvements in multi-cloud connectivity via dark fiber paths

between IXPs/colos hosting a single CP are part of future work that we would like

to investigate.

6.4   Evaluation of Tondbaz

         6.4.1    Case Studies of Optimal Paths. Given the large number of

possible paths between all CP regions, we select a handful of large scale areas that

are most likely to be utilized by enterprises which have WAN deployments. For



                                         202
         1.0
                                                               CP
                                                               IXP
         0.5                                                   IXP+CP

         0.0
               0           20            40           60           80
                                Latency Reduction %
Figure 44. Distribution for RTT reduction percentage through CP, IXP, and
CP+IXP relay paths.


each set of regions, we present the optimal path and discuss the cost penalty for

traversing through this path.

US East - US West: All of our target CPs have representative regions near

northern Virginia on the east coast of the US. Contrary to the east coast, CP

regions on the west coast are not concentrated in a single area. AZR is the only CP

with a region within the state of Washington, GCP and AWS have deployments

within Oregon, and AWS and AZR have regions in northern California while

GCP has a region in southern California. The shortest path between US coasts

is possible through an overlay between AZR on the east coast and AWS in northern

California with a median RTT of 59.1 ms and a traffic cost of about $86 for

transmitting 1 TB of data. By accepting a 1 ms increase in RTT the cost of

transmitting 1 TB of traffic can be reduced to $10 by utilizing GCP regions on

both US coasts.

US East - Europe: For brevity, we group all european regions together. The

optimal path between the east coast of US and Europe is between AWS in northern

Virginia and AZR in Ireland with a median RTT of 66.4 ms and a traffic cost of
                                        203
about $90 for transmitting 1 TB of data. The cost of traffic can be reduced to

$20 for 1 TB of data by remaining within AWS’ network and transmitting traffic

between AWS in norther Virginia and AWS in Ireland with an RTT of 74.7 ms.

US East - South America: All CP regions within south America are located

in São Paulo Brazil. The optimal path between these areas is established through

AZR in northern Virginia and AWS in Brazil. The median of RTT for this path is

116.7 ms and transmitting 1 TB of data would cost about $87. Interestingly, this

optimal path also has the lowest cost for transmitting data between these areas.

US East - South Africa: AZR is the only CP that is present in South Africa, the

optimal path between each CP’s region in northern Virginia and AZR’s region in

South Africa all have the same amount RTT of about 231 ms with the traffic cost

for sourcing traffic from AZR, AWS, and GCP in northern Virginia is $86, $90, and

$110 accordingly.

US West - South America: As stated earlier the CP regions on the west coast

of US are not concentrated in a single area. The most optimal path from all CP

regions on the west coast is between GCP in southern California and GCP in Brazil

with an RTT of 167.5 ms with a traffic cost of $80 for exchanging 1 TB of data.

The optimal path from northern California is made possible through AZR’s region

in northern California and AWS in Brazil with an RTT of 169.5 ms and a traffic

cost of $86.5 for 1 TB of data. The cheapest path for exchanging 1 TB of traffic is

made possible through AWS in northern California and AWS in Brazil for $20 with

an RTT of 192.2 ms.

US West - Asia East: For east Asia, we consider regions within Japan, South

Korea, Hong Kong, and Singapore. The optimal path between the west coast of

US and east Asia is possible through GCP’s region on US west coast and GCP in

                                        204
Tokyo Japan with an RTT of 88.5 ms and a traffic cost of $80 for transmitting 1

TB of traffic. Optimal paths destined to AZR in Japan tend to go through GCP

relays. The cheapest path for transmitting 1 TB of data from US west coast to east

Asia is possible through AWS in Oregon and AWS in Japan for $20 and a median

RTT of 98.6 ms.

US West - Australia: AWS and GCP both have regions within Sydney Australia

while AZR has regions in Sydney, Canberra, and Melbourne Australia. The optimal

path from US west coast towards Australia is sourced from GCP in southern

California and GCP in Australia with a median RTT of 137 ms and a traffic cost of

$150 for 1 TB of data. The next optimal path is possible through AWS in Oregon

and AWS in Australia with a median RTT of 138.9 ms and a traffic cost of $20 for

1 TB of data. Optimal paths for other combinations of regions typically benefit

from going through GCP and AWS relays with the latter option having lower traffic

cost.

India - Europe: All 3 CPs have regions within Mumbai India. Furthermore, AZR

has 2 more regions within India namely in Pune and Chennai. The optimal path

from India towards Europe is sourced from AWS in Mumbai and destined to AWS

in France with a median RTT of 103 ms and a traffic cost of $86 for 1 TB of data.

The cheapest path is sourced from GCP India and destined to GCP in Belgium

with a traffic cost of $80 for 1 TB of data and a median RTT of 110 ms.

         6.4.2    Deployment of Overlays. In this section, we demonstrate how

Tondbaz creates multi-cloud overlays and empirically measure the latency reduction

through the overlay and contrast them with Tondbaz’s estimated latency reductions

based on its internal model.



                                        205
                                     overlay       underlay


              1                                2                          3




          =       →                     =      →                      =       →


              →                         =      →                          →


                                               →




Figure 45. Overlay network composed of 2 nodes (V M1 and V M3 ) and 1 relay node
(V M2 ). Forwarding rules are depicted below each node.


       Network overlays can be created either at the application layer or happen

transparently at the network layer. In the former case, each application is

responsible for incorporating the forwarding logic into the program while in the

latter case applications need not be aware of the forwarding logic within the overlay

and simply need to utilize IP addresses within the overlay domain. Given the wide

range of applications that could be deployed within a cloud environment, we chose

to create multi-cloud overlays at the network level.

       The construction of overlays consists of several high-level steps, namely (i)

identifying a private overlay subnet which does not overlap with the private address

space of participating nodes, (ii) assigning unique IP addresses to each overlay node

including relay nodes, (iii) creating virtual tunneling interfaces and assigning their

next-hop address based on the inferred optimal overlay path, and (iv) creating

forwarding rules for routing traffic through the correct tunneling interface.

       To illustrate these steps consider the example overlay network in Figure

45 composed of two nodes (V M1 and V M3 ) and one relay node (V M2 ). Each
                                            206
node has a default interface (highlighted in blue) that is connected to the public

Internet. Furthermore, each node can have one or two virtual tunneling interfaces

depending on whether they are a regular or relay node in the overlay respectively.

Below each node, the forwarding rules to support the overlay network are given.

Based on the given forwarding rules, a data packet sourced from an application

on V M1 destined to ipd on V M3 would be forwarded to interface ipa where the

packet would be encapsulated inside an additional IP header and forwarded to ipy

on V M2 . Upon the receipt of this packet, V M2 would decapsulate the outer IP

header and since the packet is destined to ipd , it would be forwarded to ipc where it

would be encapsulated once again inside an IP header destined to ipz . Once V M3

receives the packet, it would decapsulate the outer IP header and would forward

the data packet to the corresponding application on V M3 .

       Initially, we implemented the overlay construction mechanism using the IPIP

module of Linux which simply encapsulates packets within an IP header without

applying any encryption to the payload. Although we were able to establish overlay

tunnels within GCP and AWS’ network, for an unknown reason Azure’s network

would drop our tunneled packets. For this reason, we migrated our tunneling

mechanism to WireGuard WireGuard (2019) which encrypts the payload and

encapsulates the encrypted content within an IP+UDP header. This encapsulation

mechanism has a minimum of 28 Bytes of overhead corresponding to 8 Bytes for

the UDP header + a minimum of 20 Bytes for the IP header which translates to

less than 2% overhead for a 1500 MTU.

         6.4.2.1    Empirical vs Estimated Overlay Latencies. As stated

earlier given the large number of possibilities for creating overlay networks we limit

our focus to a handful of cases where Tondbaz estimated a reduction in end-to-end

                                         207
Table 9. List of selected overlay endpoints (first two columns) along the number
of relay nodes for each overlay presented in the third column. The default RTT,
estimated overlay RTT, and empirical RTT are presented in the last three columns
respectively.

 source              destination         relays   default overlay empiric RTT
                                                  RTT      RTT     RTT saving
                                                   (ms)    (ms)    (ms)   (ms)

 AWS Hong Kong       GCP Hong Kong          1      15.79     2.14      2.25    13.54

 AZR Wyoming         GCP Oregon             1      49.91     33.74    33.86    16.05

 GCP India           GCP Germany            2      351.5     148.8   149.02    202.48

 GCP Singapore       AZR UAE                3      250.08    83.56    84.42    165.66


latency through an overlay network. Table 9 lists the set of selected end-points,

number of relay nodes, the RTT of the default path, and Tondbaz’s estimated RTT

through the optimal overlay. While limited in number, the selected overlay paths

represent a different combination of CP networks, geographic regions, number of

relays, and latency reductions. Additionally, we list the set of relay nodes for each

selected end-points within Table 10.

Table 10. List of selected overlay endpoints (first two columns) along with the
optimal relay nodes (third column).

 source              destination         relays

 AWS Hong Kong       GCP Hong Kong       AZR Hong Kong

 AZR Wyoming         GCP Oregon          AZR Washington

 GCP India           GCP Germany         AWS India - AWS Germany

 GCP Singapore       AZR UAE             AZR Singapore - AZR S.India - AZR W.India



                                         208
       For each overlay network, we conduct latency probes over the default and

overlay paths for the full duration of a day using 5 minute rounds. Within each

round we send 5 latency probes towards each destination address, resulting in a

total of about 1.4k measurement samples per endpoint. Additionally, we also probe

each VM’s default interface address to obtain a baseline of latency that is needed

to traverse the network stack on each VM node. Similar to our observations in

§ 6.3.2, the measured latencies exhibit tight distributions over both default and

overlay paths with a coefficient of variation of less than 0.06. The last column in

Table 9 presents the median of empirical latency over the overlay paths. For all

overlay paths, we observe that Tondbaz’s estimate deviates less 1ms from empirical

measures, with paths having a greater number of relays exhibiting larger amounts

of deviation. The observed deviation values are inline with our estimates of network

stack traversal overhead for each VM (median of 0.1ms).

       Summary: in this section, we demonstrated Tondbaz’s overlay construction

strategy and showcased its applicability of it through the construction of 4 optimal

overlays. Although limited in number, these overlays exhibit the accuracy of

Tondbaz’s internal model in assessing overlay end-to-end latency.

6.5   Summary

       Market push indicates that the future of enterprises is multi-cloud.

Unfortunately, there is a technology pull: what is critically lacking is a

framework for seamlessly gluing the public cloud resources together in a cost-

and performance-aware manner. A key reason behind this technology pull is

the lack of understanding of the path, delay, and traffic-cost characteristics

of CPs’ private backbones. In this chapter, we presented Tondbaz as a cloud-

centric measurement platform and decision support framework for multi-cloud

                                          209
environments. We demonstrate the applicability of our framework by deploying

on global cloud regions of AWS, Azure, and GCP. Our cloud-centric measurement

study sheds light on the characteristics of CPs’ (private) backbones and reveals

several new/interesting insights including optimal cloud backbones, lack of delay

and path asymmetries in cloud paths, possible latency improvements in inter- and

intra-cloud paths, and traffic-cost characteristics. We present recommendations

regarding optimal inter-CP paths for select geographic region pairs. Lastly, we

construct a handful of overlay networks and empirically measure the latency

through the overlay network and contrast our measures with Tondbaz’s internal

model.




                                        210
                                   CHAPTER VII

                        CONCLUSIONS & FUTURE WORK

7.1   Conclusions

       Cloud providers have been transformative to how enterprises conduct their

business. By virtualizing vast resources of compute and storage through centralized

data-centers, cloud providers have been an attractive alternative to maintaining

in-house infrastructure and well adopted by private and public sectors. While

cloud resources have been the center of many research studies, little attention

has been dealt towards the connectivity of cloud providers and their effect on the

topological structure of the Internet. In this dissertation we presented a holistic

analysis of cloud providers and their role in today’s Internet and made the following

conclusions:

   – Cloud providers in conjunction with CDN’s are the major content providers

      in the Internet and collectively are responsible for a significant portion of an

      edge networks traffic;

   – Similar to CDN networks, cloud providers have been making efforts to reduce

      their network distance by expanding the set of centralized compute regions in

      addition to offering new peering services (VPIs) to edge networks;

   – In terms of connectivity of an enterprise towards cloud providers many factors

      including the type of connectivity (CPP, TPP, and BEP), cloud providers

      routing strategy, geo-proximity of cloud resources, and cross-traffic and

      congestion of TPP networks should be taken into consideration;




                                          211
   – The optimal backbone of cloud providers in combination with the tight

     interconnectivity of cloud provider networks with each other can be leveraged

     towards the creation of optimal overlays that have a global span;

Specifically, we have made the following contributions in each chapter.

       In Chapter III we utilized traffic traces from an edge network (UOnet) to

study the traffic footprint of major content providers and more specifically outline

the degree to which their content is served from nearby locations. We demonstrated

that the majority of traffic is associated with CDN and cloud providers networks.

Furthermore, we devised a technique to identify cache servers residing within

other networks which further enlarging the share of CDN networks towards traffic.

Lastly, we quantify the effects of content locality on user-perceived performance

and observe that many other factors such as last-mile connectivity are the main

bottlenecks of performance for end-users.

       In Chapter IV we present a measurement study of the interconnectivity

fabric of Amazon as the largest cloud provider. We pay special attention to VPIs

as an emergent and increasingly popular interconnection option for entities such

as enterprises that desire highly elastic and flexible connections to cloud providers

which bypass the public Internet. We present a methodology for capturing VPIs

and offer lower bound estimates on the number of VPI peerings that Amazon

utilizes. Next, we present a methodology for geolocating both ends of our inferred

peerings. Lastly, we characterize customer networks that peer over various peering

options (private, public, VPI) and offer insight into the visibility and routing

implications of each peering type from the cloud providers’ perspective.

       In Chapter V we perform a third-party measurement study to understand

the tradeoffs between three multi-cloud connectivity options (CPP, TPP, and

                                          212
BEP). Based on our cloud-centric measurements, we find that CPP routes are

better than TPP routes in terms of latency as well as throughput. We attribute

the observed performance benefits to CPs’ rich connectivity with other CPs and

CPs’ stable and well-designed private backbones. Additionally, we characterize

the routing strategies of CPs (hot- cold- potato routing) and highlight their

implications on end-to-end network performance metrics. Lastly, we identify that

subpar performance characteristics of TPP routes are caused by several factors

including border routers, queuing delays, and higher loss-rates on these paths.

       In Chapter VI we propose and design Tondbaz as a measurement platform

and decision support framework for multi-cloud settings. We demonstrate its

applicability by conducting path and latency measurements between the global

regions of AWS, Azure, and GCP networks. Our measurements highlight the tight

interconnectivity of cloud providers networks on a global scale with backbones

offering reliable connectivity to their customers. We utilize Tondbaz to measure

optimal cloud overlays between various endpoints and by establishing traffic

cost models for each cloud provider and inputting them to the decision support

framework of Tondbaz we offer insight into the tradeoffs of cost vs performance.

Next, we offer recommendations regarding the best connectivity paths between

various geographic regions. Lastly, we deploy a handful of overlay networks and

through empirical measurements, demonstrate the accuracy of Tondbaz’s network

performance estimates based on its internal model.

7.2   Future Work

       In the following, we present several possible directions for future work that

are in line with the presented dissertation.



                                          213
– Exploring the possibility of further improving the connectivity of multi-cloud

  paths via the utilization of dark fiber links either by cloud providers or third-

  party connectivity providers is an open research problem. Investigating these

  possibilities can be beneficiary in obtaining improved multi-cloud connectivity

  in addition to improving the connectivity of poorly connected cloud regions;

– Complementary to our work in Chapter VI, one could measure and profile

  the connectivity and performance of edge networks towards cloud providers.

  Profiling the last mile of connectivity between edge users and cloud providers

  is equally important to study of cloud providers’ backbone performance. This

  study in conjunction with the optimal cloud overlays generated by Tondbaz

  would enable us to provide estimates on the performance characteristics

  of connectivity between edge-users which is facilitated via optimal cloud

  overlays. In light of the rapid expansion of cloud providers backbones and

  their increasing role in the transit of Internet traffic conducting this study is

  of high importance and can provide insight into possible directions of end-user

  connectivity;

– In the current state of the Internet end-users are accustomed to utilizing

  many free services such as email, video streaming, social media networks,

  etc. The majority of these free services are funded via targeted advertisement

  platforms that rely on constructing accurate profiles of users based on their

  personal interests. These Internet services are based on an economic model

  of exchanging a user’s personal data and time in return for utilizing free

  services. The past years have seen an increased interest in the development

  and adoption of decentralized alternatives Calendar (2019); Docs (2019);

  Fediverse (2019); Forms.id (2019); IPFS (2019); Mastodon (2019); PeerTube
                                      214
  (2019) for many Internet services. These decentralized applications rely

  on strong cryptography to ensure that only users with proper access/keys

  have access to the data. Furthermore, given their decentralized nature, the

  governance of data is not in the hands of a single entity. Decentralized or P2P

  services can have varying performance depending on the state of the network.

  The constant push of cloud providers for increasing their locality to end-users

  in conjunction with the vast amount of storage and compute resources within

  cloud regions makes them an ideal candidate for having a hybrid deployment

  of these decentralized services, where part of the deployment is residing

  on cloud regions and the remainder is deployed on end-user. Studying the

  performance of decentralized services in a hybrid deployment and contrasting

  it with their centralized counterparts would facilitate the wide adoption of

  these services by end-users. Furthermore, estimating the per user operational

  cost of running these services within cloud environments could be helpful in

  the advocacy of democratized Internet services;

– The rise in multi-cloud deployments by enterprises has fueled the

  emergence/expansion of cloud providers as well as third-party connectivity

  providers. The stakeholders in a multi-cloud setting including cloud providers,

  third-party connectivity providers, and enterprises can have incongruent goals

  or objectives. For example, cloud providers are interested in maximizing their

  profit by following certain routing policies while an enterprise is interested in

  maximizing their performance for the lowest operational cost via the adoption

  of multi-cloud overlays. Furthermore, stakeholders could lack incentive for

  sharing information retaining to their internal operation with each other. For

  example, cloud providers host applications on a set of heterogeneous hardware

                                      215
  which in turn could introduce varying degrees of performance for enterprises.

  Measuring, modeling and mitigating the tussles between all stakeholders

  of a multi-cloud ecosystem is crucial for the advancement of multi-cloud

  deployments.

– The optimal overlays outlined in Chapter VI would only be beneficial to

  enterprises that maintain and manage their compute resources, i.e. they

  do not rely on the added/managed services that cloud providers offer. For

  example, an enterprise can maintain its stream processing pipeline using

  Apache Kafka within their cloud instances or rely on managed services like

  Amazon MSK Amazon (2019a) or Confluent for GCP users Google (2019). In

  the former case given that an enterprise is in complete control of the service

  they can benefit from the overlays that are constructed with Tondbaz while

  in the later case the network connectivity paths are maintained by the cloud

  provider. The seamless operation of these managed services in a multi-cloud

  setting would require the development of interoperability layers between

  managed services of cloud providers. Furthermore, the optimal operation of

  these managed services requires additional APIs that expose the network

  layer and provide finer control to cloud users.

– Evaluating the connectivity performance for various third-party connectivity

  providers (TPPs) and a push for the disclosure of such information via public

  measurement platforms would be beneficial for enterprises seeking optimal

  hybrid or multi-cloud deployments;

– Exploring the adoption of VPIs by the customers of other cloud providers,

  in addition to repeating the measurements outlined in Chapter IV on a

                                     216
temporal basis would offer a more comprehensive picture of Internet topology

in addition to capturing the micro-dynamics of Internet peering enabled by

VPIs;




                                  217
                              REFERENCES CITED



Adhikari, V. K., Guo, Y., Hao, F., Varvello, M., Hilt, V., Steiner, M., & Zhang,
      Z.-l. (2012). Unreeling Netflix: Understanding and Improving Multi-CDN
      Movie Delivery. In INFOCOM. IEEE.

Ager, B., Chatzis, N., Feldmann, A., Sarrar, N., Uhlig, S., & Willinger, W. (2012).
       Anatomy of a large european ixp. SIGCOMM CCR.

Ager, B., Mühlbauer, W., Smaragdakis, G., & Uhlig, S. (2011). Web content
       cartography. In Internet Measurement Conference (IMC). ACM.

Akamai. (2017). Akamai Technologies Facts & Figures.
     https://www.akamai.com/us/en/about/facts-figures.jsp.

Alexander, M., Luckie, M., Dhamdhere, A., Huffaker, B., KC, C., & Jonathan,
      S. M. (2018). Pushing the boundaries with bdrmapit: Mapping router
      ownership at internet scale. In Internet Measurement Conference (IMC).

Amazon. (2018a). AWS Direct Connect.
     https://aws.amazon.com/directconnect/.

Amazon. (2018b). AWS Direct Connect Frequently Asked Questions.
     https://aws.amazon.com/directconnect/faqs/.

Amazon. (2018c). AWS Direct Connect Partners.
     https://aws.amazon.com/directconnect/partners/.

Amazon. (2018d). AWS Direct Connect | Product Details.
     https://aws.amazon.com/directconnect/details/.

Amazon. (2018e). Describe virtual interfaces.
     https://docs.aws.amazon.com/cli/latest/reference/directconnect/
     describe-virtual-interfaces.html.

Amazon. (2018f). Regions and Availability Zones - Amazon Elastic Compute Cloud.
     https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/
     using-regions-availability-zones.html#concepts-regions
     -availability-zones.

Amazon. (2019a). Amazon Managed Streaming for Apache Kafka.
     https://aws.amazon.com/msk/.

Amazon. (2019b). AWS Transit Gateway.
     https://aws.amazon.com/transit-gateway/.
                                        218
Amazon. (2019c). EC2 Instance Pricing.
     https://aws.amazon.com/ec2/pricing/on-demand/.

Andersen, D., Balakrishnan, H., Kaashoek, F., & Morris, R. (2001). Resilient
      overlay networks. In SOSP. ACM.

Anwar, R., Niaz, H., Choffnes, D., Cunha, Í., Gill, P., & Katz-Bassett, E. (2015).
      Investigating interdomain routing policies in the wild. In Internet
      Measurement Conference (IMC).

APNIC. (2018). Measuring IPv6. https://labs.apnic.net/measureipv6/.

Augustin, B., Friedman, T., & Teixeira, R. (2007). Multipath tracing with paris
      traceroute. In End-to-End Monitoring Techniques and Services. IEEE.

Augustin, B., Krishnamurthy, B., & Willinger, W. (2009). IXPs: Mapped? In
      Internet Measurement Conference (IMC). ACM.

Baker, F. (1995). Requirements for IP Version 4 Routers (Tech. Rep.). Cisco
       Systems.

Bender, A., Sherwood, R., & Spring, N. (2008). Fixing ally’s growing pains with
      velocity modeling. In SIGCOMM. ACM.

Berman, M., Chase, J. S., Landweber, L., Nakao, A., Ott, M., Raychaudhuri, D.,
     . . . Seskar, I. (2014). GENI: A federated testbed for innovative network
     experiments. Computer Networks.

Beverly, R. (2016). Yarrp’ing the internet: Randomized high-speed active topology
       discovery. In Internet Measurement Conference (IMC). ACM.

Beverly, R., Durairajan, R., Plonka, D., & Rohrer, J. P. (2018). In the IP of the
       beholder: Strategies for active IPv6 topology discovery. In Internet
       Measurement Conference (IMC). ACM.

Böttger, T., Cuadrado, F., Tyson, G., Castro, I., & Uhlig, S. (2016). Open connect
      everywhere: A glimpse at the internet ecosystem through the lens of the
      netflix cdn. arXiv preprint arXiv:1606.05519 .

Bozkurt, I. N., Aqeel, W., Bhattacherjee, D., Chandrasekaran, B., Godfrey, P. B.,
      Laughlin, G., . . . Singla, A. (2018). Dissecting latency in the internet’s fiber
      infrastructure. arXiv preprint arXiv:1811.10737 .

Build Azure. (2019). Microsoft Azure Region Map.
      https://map.buildazure.com/.


                                          219
Burrington, I. (2016). Why Amazon’s Data Centers Are Hidden in Spy Country.
      https://www.theatlantic.com/technology/archive/2016/01/
      amazon-web-services-data-center/423147/.
CAIDA. (2018). Archipelago (Ark) measurement infrastructure.
     http://www.caida.org/projects/ark/.
CAIDA. (2018). AS Relationships.
     http://www.caida.org/data/as-relationships/.
CAIDA. (2018). The CAIDA UCSD IXPs Dataset.
     http://www.caida.org/data/ixps.xml.
Calder, M., Fan, X., Hu, Z., Katz-Bassett, E., Heidemann, J., & Govindan, R.
       (2013). Mapping the Expansion of Google’s Serving Infrastructure. In
       Internet measurement conference (imc).
Calder, M., Flavel, A., Katz-Bassett, E., Mahajan, R., & Padhye, J. (2015).
       Analyzing the Performance of an Anycast CDN. In Internet Measurement
       Conference (IMC). ACM.
Calder, M., Gao, R., Schröder, M., Stewart, R., Padhye, J., Mahajan, R., . . .
       Katz-Bassett, E. (2018). Odin: Microsoft’s Scalable Fault-Tolerant {CDN}
       Measurement System. In NSDI. USENIX.
Calendar, S. (2019). Secure Calendar - Free Encrypted Calendar.
      https://securecalendar.online/.
Castro, I., Cardona, J. C., Gorinsky, S., & Francois, P. (2014). Remote Peering:
       More Peering without Internet Flattening. CoNEXT .
Chabarek, J., & Barford, P. (2013). What’s in a name?: decoding router interface
      names. In Hotplanet. ACM.
Chandrasekaran, B., Smaragdakis, G., Berger, A. W., Luckie, M. J., & Ng, K.-C.
     (2015). A server-to-server view of the internet. In Conext. ACM.
Chatzis, N., Smaragdakis, G., Böttger, J., Krenc, T., & Feldmann, A. (2013). On
      the Benefits of Using a Large IXP as an Internet Vantage Point. In Internet
      Measurement Conference (IMC). ACM.
Chiu, Y.-C., Schlinker, B., Radhakrishnan, A. B., Katz-Bassett, E., & Govindan, R.
      (2015). Are we one hop away from a better internet? In Internet
      Measurement Conference (IMC). ACM.
Chun, B., Culler, D., Roscoe, T., Bavier, A., Peterson, L., Wawrzoniak, M., &
      Bowman, M. (2003). Planetlab: an overlay testbed for broad-coverage
      services. SIGCOMM CCR.
                                        220
Comarela, G., Terzi, E., & Crovella, M. (2016). Detecting unusually-routed ASes:
     Methods and applications. In Internet Measurement Conference (IMC).
     ACM.

CoreSite. (2018). The Coresite Open Cloud Exchange - One Connection. Countless
      Cloud Options. https://www.coresite.com/solutions/cloud-services/
      open-cloud-exchange.

Costa, P., Migliavacca, M., Pietzuch, P., & Wolf, A. L. (2012). Naas:
       Network-as-a-service in the cloud. In Workshop on hot topics in
       management of internet, cloud, and enterprise networks and services.

Cunha, Í., Marchetta, P., Calder, M., Chiu, Y.-C., Machado, B. V., Pescapè, A., . . .
      Katz-Bassett, E. (2016). Sibyl: a practical internet route oracle. NSDI .

DatacenterMap. (2018). Amazon EC2.
      http://www.datacentermap.com/cloud/amazon-ec2.html.

Demchenko, Y., Van Der Ham, J., Ngo, C., Matselyukh, T., Filiposka, S., de Laat,
     C., & Escalona, E. (2013). Open cloud exchange (OCX): Architecture and
     functional components. In Cloud Computing Technology and Science. IEEE.


Dhamdhere, A., Clark, D. D., Gamero-Garrido, A., Luckie, M., Mok, R. K.,
     Akiwate, G., . . . Claffy, K. (2018). Inferring persistent interdomain
     congestion. In SIGCOMM. ACM.

Dhamdhere, A., & Dovrolis, C. (2010). The Internet is Flat: Modeling the
     Transition from a Transit Hierarchy to a Peering Mesh. In CoNEXT. ACM.

Diamantidis, N., Karlis, D., & Giakoumakis, E. A. (2000). Unsupervised
     stratification of cross-validation for accuracy estimation. Artificial
     Intelligence.

Docs, A. (2019). Arcane Docs – Blockchain-based alternative for Google Docs.
      https://docs.arcaneoffice.com/signup/.

Durairajan, R., Barford, C., & Barford, P. (2018). Lights Out: Climate Change
      Risk to Internet Infrastructure. In Proceedings of the applied networking
      research workshop. ACM.

Durairajan, R., Ghosh, S., Tang, X., Barford, P., & Eriksson, B. (2013). Internet
      atlas: a geographic database of the internet. In Hotplanet.

Durairajan, R., Sommers, J., & Barford, P. (2014). Layer 1-Informed Internet
      Topology Measurement. In Internet Measurement Conference (IMC). ACM.

                                         221
Durairajan, R., Sommers, J., Willinger, W., & Barford, P. (2015). InterTubes: A
      Study of the US Long-haul Fiber-optic Infrastructure. In SIGCOMM. ACM.


Durumeric, Z., Wustrow, E., & Halderman, J. A. (2013). Zmap: Fast internet-wide
     scanning and its security applications. In Usenix security symposium.
Eclipse. (2019). Eclipse Paho - MQTT and MQTT-SN Software.
       http://www.eclipse.org/paho/.
EdgeConneX. (2018). Space, power and connectivity.
     http://www.edgeconnex.com/company/about/.
Engebretson, J. (2014). Verizon-netflix dispute: Is netflix using direct connections
      or not? https://www.telecompetitor.com/
      verizon-netflix-dispute-netflix-using-direct-connections/.
The enterprise deployment game-plan: why multi-cloud is the future. (2018).
      https://blog.ubuntu.com/2018/08/30/the-enterprise-deployment
      -game-plan-why-multi-cloud-is-the-future.
Equinix. (2017). Cloud Exchange. http://www.equinix.com/services/
      interconnection-connectivity/cloud-exchange/.
Eriksson, B., Durairajan, R., & Barford, P. (2013). RiskRoute: A Framework for
       Mitigating Network Outage Threats. CoNEXT . doi:
       10.1145/2535372.2535385
European Internet Exchange Association. (2018). https://www.euro-ix.net/.
Example Applications Services. (2018). https://builtin.com/cloud-computing/
     examples-applications-services.
Fan, X., Katz-Bassett, E., & Heidemann, J. (2015). Assessing Affinity Between
      Users and CDN Sites. In Traffic monitoring and analysis. Springer.
Fanou, R., Francois, P., & Aben, E. (2015). On the diversity of interdomain
      routing in Africa. In Passive and active measurements (pam).
Fediverse. (2019). Fediverse. https://fediverse.party/.
Five Reasons Why Multi-Cloud Infrastructure is the Future of Enterprise IT.
      (2018). https://www.cloudindustryforum.org/content/five-reasons
      -why-multi-cloud-infrastructure-future-enterprise-it.
Fontugne, R., Pelsser, C., Aben, E., & Bush, R. (2017). Pinpointing delay and
      forwarding anomalies using large-scale traceroute measurements. In Internet
      Measurement Conference (IMC). ACM.
                                         222
Fontugne, R., Shah, A., & Aben, E. (2018). The (thin) bridges of as connectivity:
      Measuring dependency using as hegemony. In Passive and Active
      Measurement (PAM). Springer.

Forms.id. (2019). Private, simple, forms. | Forms.id. https://forms.id/.

The Future of IT Transformation Is Multi-Cloud. (2018).
      https://searchcio.techtarget.com/Rackspace/
      The-Future-of-IT-Transformation-Is-Multi-Cloud.

The Future of Multi-Cloud: Common APIs Across Public and Private Clouds.
      (2018). https://blog.rackspace.com/
      future-multi-cloud-common-apis-across-public-private-clouds.

The Future of the Datacenter is Multicloud. (2018). https://www.nutanix.com/
      2018/11/01/future-datacenter-multicloud/.

Gartner. (2016). https://www.gartner.com/doc/3396633/
      market-trends-cloud-adoption-trends.

Gasser, O., Scheitle, Q., Foremski, P., Lone, Q., Korczyński, M., Strowes, S. D., . . .
       Carle, G. (2018). Clusters in the expanse: Understanding and unbiasing
       IPv6 hitlists. In Internet Measurement Conference (IMC). ACM.

Gehlen, V., Finamore, A., Mellia, M., & Munafò, M. M. (2012). Uncovering the big
      players of the web. In Lecture notes in computer science. Springer.

Gharaibeh, M., Shah, A., Huffaker, B., Zhang, H., Ensafi, R., & Papadopoulos, C.
      (2017). A look at router geolocation in public and commercial databases. In
      Internet Measurement Conference (IMC). ACM.

Gill, P., Arlitt, M., Li, Z., & Mahanti, A. (2008). The Flattening Internet
        Topology: Natural Evolution, Unsightly Barnacles or Contrived Collapse?
        In Passive and Active Measurement (PAM). Springer.

Giotsas, V., Dhamdhere, A., & Claffy, K. C. (2016). Periscope: Unifying looking
       glass querying. In Passive and Active Measurement (PAM). Springer.

Giotsas, V., Dietzel, C., Smaragdakis, G., Feldmann, A., Berger, A., & Aben, E.
       (2017). Detecting peering infrastructure outages in the wild. In SIGCOMM.
       ACM.

Giotsas, V., Luckie, M., Huffaker, B., & Claffy, K. (2015). Ipv6 as relationships,
       cliques, and congruence. In Passive and Active Measurements (PAM).

Giotsas, V., Luckie, M., Huffaker, B., et al. (2014). Inferring Complex AS
       Relationships. In Internet Measurement Conference (IMC). ACM.
                                          223
Giotsas, V., Smaragdakis, G., Huffaker, B., Luckie, M., & claffy, k. (2015).
       Mapping Peering Interconnections to a Facility. In CoNEXT.

Giotsas, V., & Zhou, S. (2012). Valley-free violation in internet routing—analysis
       based on bgp community data. In International conference on
       communications.

Giotsas, V., & Zhou, S. (2013). Improving the discovery of IXP peering links
       through passive BGP measurements. In INFOCOM.

Giotsas, V., Zhou, S., Luckie, M., & Klaffy, K. (2013). Inferring Multilateral
       Peering. In CoNEXT. ACM.

Google. (2018a). GCP Direct Peering. https://cloud.google.com/
      interconnect/docs/how-to/direct-peering.

Google. (2018b). Google supported service providers. https://cloud.google.com/
      interconnect/docs/concepts/service-providers.

Google. (2018c). Partner Interconnect | Google Cloud.
      https://cloud.google.com/interconnect/partners/.

Google. (2019). Apache Kafka for GCP Users.
      https://cloud.google.com/blog/products/gcp/apache-kafka-for-gcp
      -users-connectors-for-pubsub-dataflow-and-bigquery.

Google. (2019). Data center locations. https://www.google.com/about/
      datacenters/inside/locations/index.html.

Google. (2019). Google Compute Engine Pricing.
      https://cloud.google.com/compute/pricing#network.

Govindan, R., & Tangmunarunkit, H. (2000). Heuristics for Internet map
      discovery. In INFOCOM.

Graham, R., Mcmillan, P., & Tentler, D. (2014). Mass Scanning the Internet: Tips,
     Tricks, Results. In Def Con 22.

Green, T., Lambert, A., Pelsser, C., & Rossi, D. (2018). Leveraging inter-domain
       stability for bgp dynamics analysis. In Passive and Active Measurement
       (PAM). Springer.

Gregori, E., Improta, A., Lenzini, L., & Orsini, C. (2011). The impact of IXPs on
      the AS-level topology structure of the Internet. Computer Communications.

Gunes, M., & Sarac, K. (2009). Resolving IP aliases in building traceroute-based
      Internet maps. Transactions on Networking (ToN).
                                         224
Gunes, M. H., & Sarac, K. (2006). Analytical IP alias resolution. In International
      conference on communications.

Gupta, A., Calder, M., Feamster, N., Chetty, M., Calandro, E., & Katz-Bassett, E.
      (2014). Peering at the Internet’s Frontier: A First Look at ISP
      Interconnectivity in Africa. Passive and Active Measurements (PAM).

Haq, O., Raja, M., & Dogar, F. R. (2017). Measuring and improving the reliability
      of wide-area cloud paths. In WWW. ACM.

He, Y., Siganos, G., Faloutsos, M., & Krishnamurthy, S. (2009). Lord of the links:
       a framework for discovering missing links in the Internet topology.
       Transactions on Networking (ToN).

Hofmann, H., Kafadar, K., & Wickham, H. (2011). Letter-value plots: Boxplots for
     large data (Tech. Rep.). had.co.nz.

How multi-cloud business models will shape the future. (2018).
     https://www.cloudcomputing-news.net/news/2018/oct/05/
     how-multi-cloud-business-models-will-shape-future/.

Huffaker, B., Fomenkov, M., & claffy, k. (2014). DRoP:DNS-based Router
      Positioning. SIGCOMM CCR.

Huffaker, B., Fomenkov, M., et al. (2014). Drop: Dns-based router positioning.
      SIGCOMM CCR.

Huffaker, B., Keys, K., Fomenkov, M., & Claffy, K. (2018). As-to-organization
      dataset. http://www.caida.org/research/topology/as2org/.

Hwang, F. K., & Richards, D. S. (1992). Steiner tree problems. Networks.

Hyun, Y. (2006). Archipelago measurement infrastructure. In CAIDA-WIDE
      Workshop.

IBM bets on a multi-cloud future. (2018).
      https://www.zdnet.com/article/ibm-bets-on-a-multi-cloud-future/.

IP2Location. (2015). IP2Location DB9, 2015. http://www.ip2location.com/.

IP2Location. (2018). IP address geolocaiton.
      https://www.ip2location.com/database/ip2location.

IPFS. (2019). IPFS is the Distributed Web. https://ipfs.io/.

Jacobson, V. (1989). traceroute. ftp://ftp.ee.lbl.gov/traceroute.tar.gz.


                                        225
Jonathan, A., Chandra, A., & Weissman, J. (2018). Rethinking adaptability in
      wide-area stream processing systems. In Hot topics in cloud computing.
      USENIX.

Kang, M. S., & Gligor, V. D. (2014). Routing bottlenecks in the internet: Causes,
      exploits, and countermeasures. In Computer and communications security.
      ACM.

Kang, M. S., Lee, S. B., & Gligor, V. D. (2013). The crossfire attack. Symposium
      on Security and Privacy.

Katz-Bassett, E., Scott, C., Choffnes, D. R., Cunha, Í., Valancius, V., Feamster, N.,
      . . . Krishnamurthy, A. (2012). LIFEGUARD: Practical repair of persistent
      route failures. In SIGCOMM.

Keys, K., Hyun, Y., Luckie, M., & Claffy, K. (2013). Internet-Scale IPv4 Alias
      Resolution with MIDAR. Transactions on Networking (ToN).

Khan, A., Kwon, T., Kim, H.-c., & Choi, Y. (2013). AS-level topology collection
      through looking glass servers. In Internet Measurement Conference (IMC).

Klöti, R., Ager, B., Kotronis, V., Nomikos, G., & Dimitropoulos, X. (2016). A
       comparative look into public ixp datasets. SIGCOMM CCR.

Knight, S., Nguyen, H. X., Falkner, N., Bowden, R., & Roughan, M. (2011). The
      Internet topology zoo. Selected Areas in Communications.

Kotronis, V., Klöti, R., Rost, M., Georgopoulos, P., Ager, B., Schmid, S., &
      Dimitropoulos, X. (2016). Stitching Inter-Domain Paths over IXPs. In
      Symposium on SDN Research. ACM.

Kotronis, V., Nomikos, G., Manassakis, L., Mavrommatis, D., & Dimitropoulos, X.
      (2017). Shortcuts through colocation facilities. In Internet Measurement
      Conference (IMC). ACM.

Krishna, A., Cowley, S., Singh, S., & Kesterson-Townes, L. (2018). Assembling
      your cloud orchestra: A field guide to multicloud management.
      https://www.ibm.com/thought-leadership/institute-business-value/
      report/multicloud.

Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, J., & Jahanian, F.
      (2010). Internet inter-domain traffic. In SIGCOMM. ACM.

Lad, M., Oliveira, R., Zhang, B., & Zhang, L. (2007). Understanding resiliency of
      internet topology against prefix hijack attacks. In International conference
      on dependable systems and networks.

                                        226
Lai, F., Chowdhury, M., & Madhyastha, H. V. (2018). To relay or not to relay for
       inter-cloud transfers? In Workshop on hot topics in cloud computing.

Li, L., Alderson, D., Willinger, W., & Doyle, J. (2004). A first-principles approach
        to understanding the internet’s router-level topology. In SIGCOMM CCR.

Limelight. (2017). Private global content delivery network.
       https://www.limelight.com/network/.

Lodhi, A., Larson, N., Dhamdhere, A., Dovrolis, C., et al. (2014). Using peeringDB
       to understand the peering ecosystem. SIGCOMM CCR.

Luckie, M. (2010). Scamper: a scalable and extensible packet prober for active
       measurement of the internet. In Internet Measurement Conference (IMC).

Luckie, M., & Beverly, R. (2017). The impact of router outages on the as-level
       internet. In Sigcomm.

Luckie, M., Dhamdhere, A., Huffaker, B., Clark, D., et al. (2016). bdrmap:
       Inference of Borders Between IP Networks. In Internet measurement
       conference (imc).

Luckie, M., Huffaker, B., Dhamdhere, A., & Giotsas, V. (2013). AS Relationships,
       Customer Cones, and Validation. IMC . doi: 10.1145/2504730.2504735

Luckie, M., Huffaker, B., Dhamdhere, A., Giotsas, V., et al. (2013). As
       relationships, customer cones, and validation. In Internet Measurement
       Conference (IMC). ACM.

Luckie, M., et al. (2014). A second look at detecting third-party addresses in
       traceroute traces with the IP timestamp option. In Passive and Active
       Measurement (PAM).

Marder, A., & Smith, J. M. (2016). MAP-IT: Multipass Accurate Passive
      Inferences from Traceroute. In Internet Measurement Conference (IMC).
      ACM.

Mastodon. (2019). Giving social networking back to you.
      https://joinmastodon.org/.

Mathis, M., Semke, J., Mahdavi, J., & Ott, T. (1997). The macroscopic behavior of
      the TCP congestion avoidance algorithm. SIGCOMM CCR.

MaxMind. (2018). GeoIP2 databases.
     https://www.maxmind.com/en/geoip2-databases.

Megaport. (2019a). Megaport Pricing. https://www.megaport.com/pricing/.
                                         227
Megaport. (2019b). Nine Common Scenarios of multi-cloud design.
     https://knowledgebase.megaport.com/megaport-cloud-router/
     nine-common-scenarios-for-multicloud-design/.

Microsoft. (2018a). Azure ExpressRoute.
      https://azure.microsoft.com/en-us/services/expressroute/.

Microsoft. (2018b). ExpressRoute connectivity partners.
      https://azure.microsoft.com/en-us/services/expressroute/
      connectivity-partners/.

Microsoft. (2018c). Expressroute partners and peering locations.
      https://docs.microsoft.com/en-us/azure/expressroute/
      expressroute-locations.

Microsoft. (2019). Bandwidth Pricing.
      https://azure.microsoft.com/en-us/pricing/details/bandwidth/.

Miller, R. (2015). Regional Data Center Clusters Power Amazon’s Cloud.
        https://datacenterfrontier.com/
        regional-data-center-clusters-power-amazons-cloud/.

M-Lab. (2018). NDT (Network Diagnostic Tool).
      https://www.measurementlab.net/tests/ndt/.

Motamedi, R., Yeganeh, B., Chandrasekaran, B., Rejaie, R., Maggs, B., &
     Willinger, W. (2019). On Mapping the Interconnections in Today’s Internet.
     Transactions on Networking (ToN).

NetAcuity. (2018). Industry-standard geolocation.
      https://www.digitalelement.com/solutions/.

Netflix. (2017a). Internet connection speed requirements.
       https://help.netflix.com/en/node/306.

Netflix. (2017b). Open Connect Appliance Overview.
       https://openconnect.netflix.com/en/appliances-overview/.

Nomikos, G., & Dimitropoulos, X. (2016). traIXroute: Detecting IXPs in
     traceroute paths. In Passive and Active Measurement (PAM).

Nomikos, G., Kotronis, V., Sermpezis, P., Gigis, P., Manassakis, L., Dietzel, C., . . .
     Giotsas, V. (2018). O Peer, Where Art Thou?: Uncovering Remote Peering
     Interconnections at IXPs. In Internet Measurement Conference (IMC).

Nur, A. Y., & Tozal, M. E. (2018). Cross-as (x-as) internet topology mapping.
      Computer Networks.
                                          228
OASIS. (2019). MQTT. http://mqtt.org/.

One-Way Ping (OWAMP). (2019). http://software.internet2.edu/owamp/.

Orsini, C., King, A., Giordano, D., Giotsas, V., & Dainotti, A. (2016).
       BGPStream: a software framework for live and historical BGP data analysis.
       In Internet Measurement Conference (IMC). ACM.

Packet Clearing House. (2017). Routing archive. https://www.pch.net.

Packet Clearing House. (2018). MRT Routing Updates.
       https://www.pch.net/resources/Raw_Routing_Data/.

PacketFabric. (2019). Cloud Connectivity.
      https://www.packetfabric.com/packetcor#pricing.

Padhye, J., Firoiu, V., Towsley, D., & Kurose, J. (1998). Modeling tcp throughput:
      A simple model and its empirical validation. SIGCOMM CCR.

Padmanabhan, V. N., & Subramanian, L. (2001). An investigation of geographic
     mapping techniques for internet hosts. In SIGCOMM CCR.

Palmer, C. R., Siganos, G., Faloutsos, M., Faloutsos, C., & Gibbons, P. (2001).
      The connectivity and fault-tolerance of the internet topology. In Workshop
      on Network-Related Data Management (NRDM).

PeeringDB. (2017). Exchange Points List. https://peeringdb.com/.

PeerTube. (2019). Join PeerTube. https://joinpeertube.org/.

Plaven, G. (2017). Amazon keeps building data centers in umatilla, morrow
       counties. http://www.eastoregonian.com/eo/local-news/20170317/
       amazon-keeps-building-data-centers-in-umatilla-morrow-counties.

Pureport. (2019). Pricing - Pureport. https://www.pureport.com/pricing/.

Quan, L., Heidemann, J., & Pradkin, Y. (2013). Trinocular: Understanding
      internet reliability through adaptive probing. In SIGCOMM CCR.

Richter, P., Smaragdakis, G., Feldmann, A., Chatzis, N., Boettger, J., & Willinger,
       W. (2014). Peering at peerings: On the role of IXP route servers. In
       Internet Measurement Conference (IMC).

RIPE. (2018). Routing information service (ris). https://www.ripe.net/
      analyse/internet-measurements/routing-information-service-ris.

RIPE. (2019). RIPE RIS.

                                        229
RIPE NCC. (2016). RIPE Atlas.

Robusto, C. C. (1957). The cosine-haversine formula. The American Mathematical
      Monthly.

SamKnows. (2018). The internet measurement standard.
     https://www.samknows.com/.

Sanchez, M. A., Bustamante, F. E., Krishnamurthy, B., Willinger, W.,
      Smaragdakis, G., & Erman, J. (2014). Inter-domain traffic estimation for
      the outsider. In Internet Measurement Conference (IMC).

Sánchez, M. A., Otto, J. S., Bischof, Z. S., Choffnes, D. R., Bustamante, F. E.,
      Krishnamurthy, B., & Willinger, W. (2013). Dasu: Pushing experiments to
      the Internet’s edge. In NSDI.

Scheitle, Q., Gasser, O., Sattler, P., & Carle, G. (2017). Hloc: Hints-based
       geolocation leveraging multiple measurement frameworks. In Network Traffic
       Measurement and Analysis Conference.

Schlinker, B., Kim, H., Cui, T., Katz-Bassett, E., Madhyastha, H. V., Cunha, I., . . .
       Zeng, H. (2017). Engineering egress with edge fabric: Steering oceans of
       content to the world. In SIGCOMM.

Schulman, A., & Spring, N. (2011). Pingin’in the rain. In Internet Measurement
      Conference (IMC).

Shavitt, Y., & Shir, E. (2005). Dimes: Let the internet measure itself. SIGCOMM
       CCR.

Sherwood, R., Bender, A., & Spring, N. (2008). DisCarte: A Disjunctive Internet
      Cartographer. In SIGCOMM CCR.

Siganos, G., & Faloutsos, M. (2004). Analyzing BGP policies: Methodology and
      tool. In INFOCOM.

Singla, A., Chandrasekaran, B., Godfrey, P., & Maggs, B. (2014). The internet at
       the speed of light. In Proceedings of hot topics in networks.

Sodagar, I. (2011). The MPEG-dash standard for multimedia streaming over the
      internet. In IEEE MultiMedia.

Spring, N., Dontcheva, M., Rodrig, M., & Wetherall, D. (2004). How to resolve IP
       aliases (Tech. Rep.). Univ. Michigan, UW CSE.

Spring, N., Mahajan, R., & Wetherall, D. (2002). Measuring isp topologies with
       rocketfuel. SIGCOMM CCR.
                                         230
Sundaresan, S., Burnett, S., Feamster, N., & De Donato, W. (2014). BISmark: A
      Testbed for Deploying Measurements and Applications in Broadband Access
      Networks. In Usenix annual technical conference.

Sundaresan, S., Feamster, N., & Teixeira, R. (2015). Measuring the Performance of
      User Traffic in Home Wireless Networks. In Passive and Active
      Measurement (PAM). ACM.

Tariq, M. M. B., Dhamdhere, A., Dovrolis, C., & Ammar, M. (2005). Poisson
       versus periodic path probing (or, does pasta matter?). In Internet
       Measurement Conference (IMC).

TeamCymru. (2008). IP to ASN mapping.
     https://www.team-cymru.com/IP-ASN-mapping.html.

Tokusashi, Y., Matsutani, H., & Zilberman, N. (2018). Lake: An energy efficient,
      low latency, accelerated key-value store. arXiv preprint arXiv:1805.11344 .

Torres, R., Finamore, A., Kim, J. R., Mellia, M., Munafò, M. M., & Rao, S. (2011).
       Dissecting video server selection strategies in the YouTube CDN. In
       International conference on distributed computing systems. IEEE.

Tozal, M. E., & Sarac, K. (2011). Palmtree: An ip alias resolution algorithm with
       linear probing complexity. Computer Communications.

Triukose, S., Wen, Z., & Rabinovich, M. (2011). Measuring a commercial content
       delivery network. In World Wide Web (WWW). ACM.

University of Oregon. (2018). University of oregon route views project.
      http://www.routeviews.org/routeviews/.

WikiLeaks. (2018). Amazon Atlas. https://wikileaks.org/amazon-atlas/.

Williams, M. (2016). Amazon’s central Ohio data centers now open.
       http://www.dispatch.com/content/stories/business/2016/10/18/
       amazon-data-centers-in-central-ohio-now-open.html.

WireGuard. (2019). WireGuard: fast, modern, secure VPN tunnel.

Wohlfart, F., Chatzis, N., Dabanoglu, C., Carle, G., & Willinger, W. (2018).
      Leveraging interconnections for performance: the serving infrastructure of a
      large CDN. In SIGCOMM.

Xia, J., & Gao, L. (2004). On the evaluation of AS relationship inferences [Internet
        reachability/traffic flow applications]. In GLOBECOM.


                                        231
Yap, K.-K., Motiwala, M., Rahe, J., Padgett, S., Holliman, M., Baldus, G., . . .
      others (2017). Taking the edge off with espresso: Scale, reliability and
      programmability for global internet peering. In SIGCOMM.

Yeganeh, B., Durairajan, R., Rejaie, R., & Willinger, W. (2019). How cloud traffic
      goes hiding: A study of amazon’s peering fabric. In Internet Measurement
      Conference (IMC).

Yeganeh, B., Rejaie, R., & Willinger, W. (2017). A view from the edge: A stub-as
      perspective of traffic localization and its implications. In Network Traffic
      Measurement and Analysis Conference (TMA).

Zarchy, D., Dhamdhere, A., Dovrolis, C., & Schapira, M. (2018). Nash-peering: A
       new techno-economic framework for internet interconnections. In INFOCOM
       Computer Communications Workshops.

ZDNet. (2019). Top cloud providers 2019. https://tinyurl.com/y526vneg.

Zhang, M., Zhang, C., Pai, V. S., Peterson, L. L., & Wang, R. Y. (2004).
      Planetseer: Internet path failure monitoring and characterization in
      wide-area services. In OSDI.




                                         232