Authors Bahador Yeganeh
License CC-BY-4.0
MEASURING THE EVOLVING INTERNET IN THE CLOUD COMPUTING ERA: INFRASTRUCTURE, CONNECTIVITY, AND PERFORMANCE by BAHADOR YEGANEH A DISSERTATION Presented to the Department of Computer and Information Science and the Graduate School of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Philosophy December 2019 DISSERTATION APPROVAL PAGE Student: Bahador Yeganeh Title: Measuring the Evolving Internet in the Cloud Computing Era: Infrastructure, Connectivity, and Performance This dissertation has been accepted and approved in partial fulfillment of the requirements for the Doctor of Philosophy degree in the Department of Computer and Information Science by: Prof. Reza Rejaie Chair Prof. Ramakrishnan Durairajan Co-Chair Prof. Jun Li Core Member Prof. Allen Malony Core Member Dr. Walter Willinger Core Member Prof. David Levin Institutional Representative and Kate Mondloch Interim Vice Provost and Dean of the Graduate School Original approval signatures are on file with the University of Oregon Graduate School. Degree awarded December 2019 ii c 2019 Bahador Yeganeh This work is licensed under a Creative Commons Attribution 4.0 License. iii DISSERTATION ABSTRACT Bahador Yeganeh Doctor of Philosophy Department of Computer and Information Science December 2019 Title: Measuring the Evolving Internet in the Cloud Computing Era: Infrastructure, Connectivity, and Performance The advent of cloud computing as a means of offering virtualized computing and storage resources has radically transformed how modern enterprises run their business and has also fundamentally changed how today’s large cloud providers operate. For example, as these large cloud providers offer an increasing number of ever-more bandwidth-hungry cloud services, they end up carrying a significant fraction of today’s Internet traffic. In response, they have started to build-out and operate their private backbone networks and have expanded their service infrastructure by establishing a presence in a growing number of colocation facilities at the Internet’s edge. As a result, more and more enterprises across the globe can directly connect (i.e. peer) with any of the large cloud providers so that much of the resulting traffic will traverse these providers’ private backbones instead of being exchanged over the public Internet. Furthermore, to reap the benefits of the diversity of these cloud providers’ service offerings, enterprises are rapidly adopting multi-cloud deployments in conjunction with multi-cloud strategies (i.e., end-to-end connectivity paths between multiple cloud providers). While prior studies have focused mainly on various topological and performance-related aspects of the Internet as a whole, little to no attention has iv been given to how these emerging cloud-based developments impact connectivity and performance in today’s cloud traffic-dominated Internet. This dissertation presents the findings of an active measurement study of the cloud ecosystem of today’s Internet. In particular, the study explores the connectivity options available to modern enterprises and examines the performance of the cloud traffic that utilizes the corresponding end-to-end paths. The study’s main contributions include (i) studying the locality of traffic for major content providers (including cloud providers) from the edge of the network (ii) capturing and characterizing the peering fabric of a major cloud provider, (iii) characterizing the performance of different multi-cloud strategies and associated end-to-end paths, and (iv) designing a cloud measurement platform and decision support framework for the construction of optimal multi-cloud overlays. This dissertation contains previously published co-authored material. v CURRICULUM VITAE NAME OF AUTHOR: Bahador Yeganeh GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: University of Oregon, Eugene, OR, USA Isfahan University of Technology, Isfahan, Iran DEGREES AWARDED: Doctor of Philosophy, Computer and Information Science, 2019, University of Oregon Bachelor of Science, Computer Engineering, 2013, Isfahan University of Technology AREAS OF SPECIAL INTEREST: Internet Measurement Internet Topology Cloud Computing Network Overlays PROFESSIONAL EXPERIENCE: Graduate Research Assistant, University of Oregon, Eugene, OR, USA, 2013-2019 Software Engineer, PANA Co, Isfahan, Iran, 2010-2013 Summer Intern, InfoProSys, Isfahan, Iran, 2008 GRANTS, AWARDS AND HONORS: vi Internet Measurement Conference (IMC) Travel Grant, 2018 Gurdeep Pall Scholarship in Computer & Information Science University of Oregon, 2018 Phillip Seeley Scholarship in Computer & Information Science University of Oregon, 2017 J. Hubbard Scholarship in Computer & Information Science University of Oregon, 2014 PUBLICATIONS: Bahador, Yeganeh & Ramakrishnan, Durairajan & Reza, Rejaie & Walter, Willinger (2020). Tondbaz: A Measurement-Informed Multi-cloud Overlay Service. SIGCOMM - In Preparation Bahador, Yeganeh & Ramakrishnan, Durairajan & Reza, Rejaie & Walter, Willinger (2020). A First Comparative Characterization of Multi-cloud Connectivity in Today’s Internet. Passive and Active Measurement Conference (PAM) - In Submission Bahador, Yeganeh & Ramakrishnan, Durairajan & Reza, Rejaie & Walter, Willinger (2019). How Cloud Traffic Goes Hiding: A Study of Amazon’s Peering Fabric. Internet Measurement Conference (IMC) Reza, Motamedi & Bahador, Yeganeh & Reza, Rejaie & Walter, Willinger & Balakrishnan, Chandrasekaran & Bruce, Maggs (2019). On Mapping the Interconnections in Today’s Internet. Transactions on Netowrking (TON) Bahador, Yeganeh & Reza, Rejaie & Walter, Willinger (2017). A View From the Edge: A Stub-AS Perspective of Traffic Localization and its Implications. Network Traffic Measurement and Analysis Conference (TMA) vii ACKNOWLEDGEMENTS I would like to thank my parents and sister for their sacrifices, endless support throughout all stages of my life and for encouraging me to always pursue my goals and dreams even if it meant that I would be living thousands of miles away from them. I thank my advisor Prof. Reza Rejaie for providing me with the opportunity to pursue a doctoral degree at UO. I am grateful to him as well as my co-advisor Prof. Ramakrishnan Durairajan and collaborator Dr. Walter Willinger for their guidance and feedback in every step of my PhD, the numerous late nights they stayed awake to help me meet deadlines, and for instilling perseverance in me. Completing this dissertation wouldn’t have been possible without each one of you. I am thankful to my committee members Prof. Jun Li, Prof. Allen Malony, and Prof. David Levin for their valuable input and guidance on shaping the direction of my dissertation and for always being available even on the shortest notice. Lastly, I am grateful for all the wonderful friendship relations that I have formed through the past years. These friends have been akin to a second family and their support and help through the highs and lows of my life has been invaluable to me. viii TABLE OF CONTENTS Chapter Page I. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1. Challenges in Topology Discovery & Internet Measurement . . . . . . 3 1.2. Dissertation Scope & Contributions . . . . . . . . . . . . . . . . . . . 4 1.2.1. Locality of Traffic Footprint . . . . . . . . . . . . . . . . . . . 5 1.2.2. Discovery of Cloud Peering Topology . . . . . . . . . . . . . . 5 1.2.3. Cloud Connectivity Performance . . . . . . . . . . . . . . . . 6 1.2.4. Optimal Cloud Overlays . . . . . . . . . . . . . . . . . . . . . 7 1.3. Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1. Navigating the Chapters . . . . . . . . . . . . . . . . . . . . . 8 II. RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2. Tools & Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1. Measurement Tools & Platforms . . . . . . . . . . . . . . . . . 14 2.2.1.1. Path Discovery . . . . . . . . . . . . . . . . . . . . . 14 2.2.1.2. Alias Resolution . . . . . . . . . . . . . . . . . . . . 17 2.2.1.3. Interface Name Decoding . . . . . . . . . . . . . . . 19 2.2.2. Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.2.1. BGP Feeds & Route Policies . . . . . . . . . . . . . 20 2.2.2.2. Colocation Facility Information . . . . . . . . . . . . 21 2.2.2.3. IXP Information . . . . . . . . . . . . . . . . . . . . 22 2.2.2.4. IP Geolocation . . . . . . . . . . . . . . . . . . . . . 22 ix Chapter Page 2.3. Capturing Network Topology . . . . . . . . . . . . . . . . . . . . . . 23 2.3.1. AS-Level Topology . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.1.1. Graph Generation & Modeling . . . . . . . . . . . . 27 2.3.1.2. Topology Incompleteness . . . . . . . . . . . . . . . . 28 2.3.1.3. IXP Peerings . . . . . . . . . . . . . . . . . . . . . . 32 2.3.2. Router-Level Topology . . . . . . . . . . . . . . . . . . . . . . 37 2.3.2.1. Peering Inference . . . . . . . . . . . . . . . . . . . . 38 2.3.2.2. Geo Locating Routers & Remote Peering . . . . . . . 44 2.3.3. PoP-Level Topology . . . . . . . . . . . . . . . . . . . . . . . 49 2.3.4. Physical-Level Topology . . . . . . . . . . . . . . . . . . . . . 56 2.4. Implications & Applications of Network Topology . . . . . . . . . . . 60 2.4.1. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.4.1.1. AS-Level Topology . . . . . . . . . . . . . . . . . . . 62 2.4.1.2. Router-Level Topology . . . . . . . . . . . . . . . . . 67 2.4.1.3. Physical-Level Topology . . . . . . . . . . . . . . . . 71 2.4.2. Resiliency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 2.4.2.1. AS-Level Topology . . . . . . . . . . . . . . . . . . . 74 2.4.2.2. Router-Level Topology . . . . . . . . . . . . . . . . . 76 2.4.2.3. Physical-Level Topology . . . . . . . . . . . . . . . . 78 2.4.3. AS Relationship Inference . . . . . . . . . . . . . . . . . . . . 80 2.4.3.1. AS-Level . . . . . . . . . . . . . . . . . . . . . . . . 80 2.4.3.2. PoP-Level . . . . . . . . . . . . . . . . . . . . . . . . 82 III. LOCALITY OF TRAFFIC . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.2. Data Collection for a Stub-AS: UOnet . . . . . . . . . . . . . . . . . 87 x Chapter Page 3.3. Identifying Major Content Providers . . . . . . . . . . . . . . . . . . 89 3.4. Traffic Locality for Content Providers . . . . . . . . . . . . . . . . . . 95 3.5. Traffic From Guest Servers . . . . . . . . . . . . . . . . . . . . . . . . 98 3.5.1. Detecting Guest Servers . . . . . . . . . . . . . . . . . . . . . 99 3.5.2. Relative Locality of Guest Servers . . . . . . . . . . . . . . . . 102 3.6. Implications of Traffic Locality . . . . . . . . . . . . . . . . . . . . . . 103 3.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 IV. CLOUD PEERING ECOSYSTEM . . . . . . . . . . . . . . . . . . . . . 110 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.3. Data Collection & Processing . . . . . . . . . . . . . . . . . . . . . . 115 4.4. Inferring Interconnections . . . . . . . . . . . . . . . . . . . . . . . . 117 4.4.1. Basic Inference Strategy . . . . . . . . . . . . . . . . . . . . . 117 4.4.2. Second Round of Probing to Expand Coverage . . . . . . . . . 119 4.5. Verifying Interconnections . . . . . . . . . . . . . . . . . . . . . . . . 120 4.5.1. Checking Against Heuristics . . . . . . . . . . . . . . . . . . . 120 4.5.2. Verifying Against Alias Sets . . . . . . . . . . . . . . . . . . . 122 4.6. Pinning Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.6.1. Methodology for Pinning . . . . . . . . . . . . . . . . . . . . . 123 4.6.2. Evaluation of Pinning . . . . . . . . . . . . . . . . . . . . . . 130 4.7. Amazon’s Peering Fabric . . . . . . . . . . . . . . . . . . . . . . . . . 131 4.7.1. Detecting Virtual Interconnections . . . . . . . . . . . . . . . 131 4.7.2. Grouping Amazon’s Peerings . . . . . . . . . . . . . . . . . . . 133 4.7.3. Inferring the Purpose of Peerings . . . . . . . . . . . . . . . . 137 4.7.4. Characterizing Amazon’s Connectivity Graph . . . . . . . . . 142 xi Chapter Page 4.8. Inferring Peering with bdrmap . . . . . . . . . . . . . . . . . . . . . . 144 4.9. Limitations of Our Study . . . . . . . . . . . . . . . . . . . . . . . . . 146 4.10. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 V. CLOUD CONNECTIVITY PERFORMANCE . . . . . . . . . . . . . . . 150 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 5.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 5.3. Measurement Methodology . . . . . . . . . . . . . . . . . . . . . . . . 155 5.3.1. Deployment Strategy . . . . . . . . . . . . . . . . . . . . . . . 155 5.3.2. Measurement Scenario & Cloud Providers . . . . . . . . . . . 157 5.3.3. Data Collection & Performance Metrics . . . . . . . . . . . . . 159 5.3.4. Representation of Results . . . . . . . . . . . . . . . . . . . . 161 5.3.5. Ethical and Legal Considerations . . . . . . . . . . . . . . . . 161 5.4. Characteristics of C2C routes . . . . . . . . . . . . . . . . . . . . . . 162 5.4.1. Latency Characteristics . . . . . . . . . . . . . . . . . . . . . . 162 5.4.2. Why do CPP routes have better latency than TPP routes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 5.4.3. Throughput Characteristics . . . . . . . . . . . . . . . . . . . 168 5.4.4. Why do CPP routes have better throughput than TPP routes? . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.4.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.5. Characteristics of E2C routes . . . . . . . . . . . . . . . . . . . . . . 173 5.5.1. Latency Characteristics . . . . . . . . . . . . . . . . . . . . . . 173 5.5.2. Why do TPP routes offer better latency than BEP routes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.5.3. Throughput Characteristics . . . . . . . . . . . . . . . . . . . 174 xii Chapter Page 5.5.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 5.6. Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . 175 5.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 VI. OPTIMAL CLOUD OVERLAYS . . . . . . . . . . . . . . . . . . . . . . 179 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 6.2. Tondbaz Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 6.2.1. Measurement Platform . . . . . . . . . . . . . . . . . . . . . . 182 6.2.1.1. Measurement Agent . . . . . . . . . . . . . . . . . . 183 6.2.1.2. Centralized Controller . . . . . . . . . . . . . . . . . 184 6.2.2. Data Collector . . . . . . . . . . . . . . . . . . . . . . . . . . 185 6.2.3. Optimization Framework . . . . . . . . . . . . . . . . . . . . . 185 6.3. A Case for Multi-cloud Overlays . . . . . . . . . . . . . . . . . . . . . 188 6.3.1. Measurement Setting & Data Collection . . . . . . . . . . . . 188 6.3.2. Are Cloud Backbones Optimal? . . . . . . . . . . . . . . . . . 189 6.3.2.1. Path Characteristics of CP Backbones . . . . . . . . 189 6.3.2.2. Performance Characteristics of CP Backbones . . . . 190 6.3.2.3. Latency Characteristics of CP Backbones . . . . . . 191 6.3.3. Are Multi-Cloud Paths Better Than Single Cloud Paths? . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 6.3.3.1. Overall Latency Improvements . . . . . . . . . . . . 192 6.3.3.2. Intra-CP Latency Improvements . . . . . . . . . . . 194 6.3.3.3. Inter-CP Latency Improvements . . . . . . . . . . . 195 6.3.4. Are there Challenges in Creating Multi-Cloud Overlays? . . . 196 6.3.4.1. Traffic Costs of CP Backbones . . . . . . . . . . . . 196 6.3.5. Cost Penalty for Multi-Cloud Overlays . . . . . . . . . . . . . 199 xiii Chapter Page 6.3.6. Further Optimization Through IXPs . . . . . . . . . . . . . . 201 6.4. Evaluation of Tondbaz . . . . . . . . . . . . . . . . . . . . . . . . . . 202 6.4.1. Case Studies of Optimal Paths . . . . . . . . . . . . . . . . . . 202 6.4.2. Deployment of Overlays . . . . . . . . . . . . . . . . . . . . . 205 6.4.2.1. Empirical vs Estimated Overlay Latencies . . . . . . 207 6.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 VII.CONCLUSIONS & FUTURE WORK . . . . . . . . . . . . . . . . . . . . 211 7.1. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 REFERENCES CITED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 xiv LIST OF FIGURES Figure Page 1. Abstract representation for topology of ASA , ASB , and ASC in red, blue, and green accordingly. ASA and ASB establish a private interconnection inside colo1 at their LA PoP while peering with each other as well as ASC inside colo2 at their NY PoP facilitated by an IXP’s switching fabric. . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2. Illustration of inferring and incorrect link (b − e) by traceroute due to load balanced paths. Physical links and traversed paths are shown with black and red lines accrodingly. The T T L = 2 probe traverses the top path and expires at node b while the T T L = 3 probe traverses the bottom path and expires at node e. This succession of probes causes traceroute to infer a non-existent link (b − e). . . . . . . . . . . . . . . . . . . . . . . . . 17 3. Illustration of an IXP switch and route server along with 4 tenant networks ASa , ASb , ASc , and ASd . ASa establishes a bi-lateral peering with ASd (solid red line) as well as multi-lateral peerings with ASb and ASc (dashed red lines) facilitated by the route server within the IXP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 xv Figure Page 4. Illustration of address sharing for establishing an inter-AS link between border routers. Although the traceroute paths (dashed lines) are identical the inferred ownership of router interfaces and the placement of the inter-AS link differs for these two possibilities. . . . 38 5. Fiber optic backbone map for CenturyLink’s network in continental US. Each node represents a PoP for CenturyLink while links between these PoPs are representative of the fiber optic conduits connecting these PoPs together. Image courtesy of CenturyLink. . . . . . . . . . 55 6. The volume of delivered traffic from individual top content providers to UOnet along with the CDF of aggregate fraction of traffic by top 21 content providers in the 10/04/16 snapshot. . . . . . . . . . . . . . . . . . . . 90 7. The prevalence and distribution of rank for any content provider that has appeared among the top content providers in at least one daily snapshot. . . . . . . . . . . . 92 8. Distribution of the number of top IPs across different snapshots in addition to total number of unique top IP addresses (blue line) and the total number of unique IPs across all snapshots (red line) for each target content provider. . . . . . . . . . . . . . . . . . . . . . . . . . . 93 xvi Figure Page 9. Radar plots showing the aggregate view of locality based on RTT of delivered traffic in terms of bytes (left plot) and flows (right plot) to UOnet in a daily snapshot (10/04/2016). . . . . . . . . . . . . . . . . . . . . . . . . . . 94 10. Two measures of traffic locality, from top to bottom, Summary distribution of NWL and the RTT of the closest servers per content provider (or minRTT). . . . . . . . . . . . 98 11. Locality (based on RTT in ms) of delivered traffic (bytes, left plot; flows, right plot) for Akamai-owned servers as well as Akamai guest servers residing within three target ASes for snapshot 2016-10-04. . . . . . . . . . . . . . . . 102 12. Summary distribution of average throughput for delivered flows from individual target content providers towards UOnet users across all of our snapshots. . . . . . . 104 13. Maximum Achievable Throughput (MAT) vs MinRTT for all content providers. The curves show the change in the estimated TCP throughput as a function of RTT for different loss rates. . . . . . . . . . . . . . . . . . . . . . . . 106 14. Average loss rate of closest servers per target content provider measured over 24 hours using ping probes with 1 second intervals. For each content provider we choose at most 10 of the closest IP addresses. . . . . . . . . . . . . . 108 xvii Figure Page 15. Overview of Amazon’s peering fabric. Native routers of Amazon & Microsoft (orange & blue) establishing private interconnections (AS3 - yellow router), public peering through IXP switch (AS4 - red router), and virtual private interconnections through cloud exchange switch (AS1 , AS2 , and AS5 - green routers) with other networks. Remote peering (AS5 ) as well as connectivity to non-ASN businesses through layer-2 tunnels (dashed lines) happens through connectivity partners. . . . . 113 16. Illustration of a hybrid interface (a) that has both Amazon and client-owned interfaces as next hop. . . . . . . . . . . . 121 17. (a) Distribution of min-RTT for ABIs from the closest Amazon region, and (b) Distribution of min-RTT difference between ABI and CBI for individual peering links. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 18. Distribution of the ratio of two lowest min-RTT from different Amazon regions to individual unpinned border interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 xviii Figure Page 19. Key features of the six groups of Amazon’s peerings (presented in Table 7) showing (from top to bottom): the number of /24 prefixes within the customer cone of peering AS, the number of probed /24 prefixes that are reachable through the CBIs of associated peerings of an AS, the number of ABIs and CBIs of associated of an AS, the difference in RTT of both ends of associated peerings of an AS, and the number of metro areas which the CBIs of each peering AS have been pinned to. . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 20. Distribution of ABIs (log scale) and CBIs degree in left and right figures accordingly. . . . . . . . . . . . . . . . . . . . . 143 21. Three different multi-cloud connectivity options. . . . . . . . . . . . . 152 22. Our measurement setup showing the locations of our VMs from AWS, GCP and Azure. A third-party provider’s CRs and line-of-sight links for TPP, BEP, and CPP are also shown. . . . . . . . . . . . . . . . . . . . . . . . . . 158 xix Figure Page 23. Rows from top to bottom represent the distribution of RTT (using letter-value plots) between AWS, GCP, and Azure’s network as the source CP and various CP regions for intra (inter) region paths in left (right) columns. CPP and TPP routes are depicted in blue and orange, respectively. The first two characters of the X axis labels encode the source CP region with the remaining characters depicting the destination CP and region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 24. Comparison of median RTT values (in ms) for CPP and TPP routes between different pairs. . . . . . . . . . . . . . . . . 164 25. (a) Distribution for number of ORG hops observed on intra-cloud, inter-cloud, and cloud to LG paths. (b) Distribution of IP (AS/ORG) hop lengths for all paths in left (right) plot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 26. Distribution of RTT between the source CP and the peering hop. From left to right plots represent AWS, GCP, and Azure as the source CP. Each distribution is split based on intra (inter) region values into the left/blue (right/orange) halves, respectively. . . . . . . . . . . . . . . 167 xx Figure Page 27. Rows from top to bottom in the letter-value plots represent the distribution of throughput between AWS’, GCP’s, and Azure’s network as the source CP and various CP regions for intra- (inter-) region paths in left (right) columns. CPP and TPP routes are depicted in blue and orange respectively. . . . . . . . . . . . . . . 169 28. Upper bound for TCP throughput using the formula of Mathis et al. Mathis, Semke, Mahdavi, and Ott (1997) with an MSS of 1460 bytes and various latency (X axis) and loss-rates (log-scale Y axis) values. . . . . . . . . . . . . 170 29. Rows from top to bottom in the letter-value plots represent the distribution of loss-rate between AWS, GCP, and Azure as the source CP and various CP regions for intra- (inter-) region paths in left (right) columns. CPP and TPP routes are depicted using blue and orange respectively. . . . . . . . . . . . . . . . . . . . . . . . 172 30. (a) Distribution of latency for E2C paths between our server in AZ and CP instances in California through TPP and BEP routes. Outliers on the Y-axis have been deliberately cut-off to increase the readability of distributions. (b) Distribution of RTT on the inferred peering hop for E2C paths sourced from CP instances in California. (c) Distribution of throughput for E2C paths between our server in AZ and CP instances in California through TPP and BEP routes. . . . . . . . . . . . . . . . . 174 xxi Figure Page 31. Global regions for AWS, Azure, and GCP. . . . . . . . . . . . . . . . 180 32. Overview of components for the measurement system including the centralized controller, measurement agents, and data-store. . . . . . . . . . . . . . . . . . . . . . . . . . . 183 33. Distribution of latency inflation between network latency and RTT approximation using speed of light constraints for all regions of each CP. . . . . . . . . . . . . . . . . . . 191 34. Distribution of median RTT and coefficient of variation for latency measurements between all VM pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 35. Distribution for difference in latency between forward and reverse paths for unique paths. . . . . . . . . . . . . . . . . . . . 193 36. Distribution for RTT reduction ratio through all, intra-CP, and inter-CP optimal paths. . . . . . . . . . . . . . . . . . 193 37. Distribution for the number of relay hops along optimal paths (left) and the distribution of latency reduction percentage for optimal paths grouped based on the number of relay hops (right). . . . . . . . . . . . . . . . . . . . 194 38. Distribution of latency reduction percentage for intra- CP paths of each CP, divided based on the ownership of the relay node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 39. Distribution of latency reduction ratio for inter-CP paths of each CP, divided based on the ownership of the relay nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 xxii Figure Page 40. Cost of transmitting traffic sourced from different groupings of AWS regions. Dashed (solid) lines present inter-CP (intra-CP) traffic cost. . . . . . . . . . . . . . . . . . 197 41. Cost of transmitting traffic sourced from different groupings of Azure regions. . . . . . . . . . . . . . . . . . . . . . . . . 198 42. Cost of transmitting traffic sourced from different groupings of GCP regions. Solid, dashed, and dotted lines represent cost of traffic destined to China (excluding Hong Kong), Australia, and all other global regions accordingly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 43. Distribution of cost penalty within different latency reduction ratio bins for intra-CP and inter-CP paths. . . . . . . . . . 200 44. Distribution for RTT reduction percentage through CP, IXP, and CP+IXP relay paths. . . . . . . . . . . . . . . . . . . . 203 45. Overlay network composed of 2 nodes (V M1 and V M3 ) and 1 relay node (V M2 ). Forwarding rules are depicted below each node. . . . . . . . . . . . . . . . . . . . . . . . . 206 xxiii LIST OF TABLES Table Page 1. Topics covered in each chapter of the dissertation. . . . . . . . . . . . 9 2. Main features of the selected daily snapshots of our UOnet Netflow data. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3. Number of unique ABIs and CBIs along with their fraction with various meta data, prior (rows 2-3) and after (rows 4-5) /24 expansion probing. . . . . . . . . . . . . . . . . . 119 4. Number of candidate ABIs (and corresponding CBIs) that are confirmed by individual (first row) and cumulative (second row) heuristics. . . . . . . . . . . . . . . . . . . . 122 5. The exclusive and cumulative number of anchor interfaces by each type of evidence and pinned interfaces by our co-presence rules. . . . . . . . . . . . . . . . . . . . 128 6. Number (and percentage) of Amazon’s VPIs. These are CBIs that are also observed by probes originated from Microsoft, Google, IBM, and Oracle’s cloud networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7. Breakdown of all Amazon peerings based on their key attributes. . . . 134 8. Hybrid peering groups along with the number of unique ASes for each group. . . . . . . . . . . . . . . . . . . . . . . . 136 xxiv Table Page 9. List of selected overlay endpoints (first two columns) along the number of relay nodes for each overlay presented in the third column. The default RTT, estimated overlay RTT, and empirical RTT are presented in the last three columns respectively. . . . . . . . . . . . . 208 10. List of selected overlay endpoints (first two columns) along with the optimal relay nodes (third column). . . . . . . . . . . 208 xxv CHAPTER I INTRODUCTION The Internet since its inception as a network for interconnecting a handful of academic and military networks has gone through constant evolution throughout the years and has become a large scale distributed network spanning the globe that is intertwined with every aspect of our daily lives. Given its importance, we need to study its health, vulnerability, and connectivity. This is only made possible through constant network measurements. Researchers have conducted measurements in order to gain a better understanding of traffic routing through this network, its connectivity structure as well as its performance. Our interest and ability to conduct network measurements can vary in both scopes with respect to the number or size of networks under study as well as the resolution with regards to focusing on networks as a single unit or paying attention to finer network elements such as routers. The advent of cloud computing can be considered among the most recent and notable changes in the Internet. Cloud providers (CPs) offer an abundance of compute and storage resources in centralized regions in an on-demand basis. Reachability to these remote resources has been made possible via the Internet. Conversely, this shift in computing paradigm has resulted in cloud providers to become one of the main end-points of traffic within today’s Internet. These cloud service offerings have fundamentally changed how business is conducted in all segments of the private and public sectors. This, in turn, has transformed the way these companies connect to major cloud service providers to utilize these services. In particular, many companies prefer to bypass the public Internet and directly connect to major cloud service providers at a close-by colocation (or colo) facility 1 to experience better performance when using these cloud services. In response to these demands, some of the major colo facilities have started to deploy and operate new switching infrastructure called cloud exchanges CoreSite (2018); Demchenko et al. (2013). Importantly, in conjunction with this new infrastructure, these colo providers have also introduced a new interconnection service offering called “virtual private interconnection (VPI)" Amazon (2018a); Google (2018a); Microsoft (2018a). By purchasing a single port on the cloud exchange switching fabric in a given facility, VPIs enable enterprises that are either natively deployed in that facility to establish direct peering to any number of cloud service providers that are present on that exchange. Furthermore, there is the emergence of new Internet players in the form of third-party private connectivity providers (e.g. DataPipe, HopOne, among others Amazon (2018c); Google (2018b); Microsoft (2018c)). These entities offer direct, secure, private, layer 3 connectivity between CPs (henceforth referred to as third-party private (TPP)) and extend the reach of peering points towards CPs to non-native colo facilities in a wider geographic footprint. TPP routes bypass the public Internet at Cloud Exchanges CoreSite (2018); Demchenko et al. (2013) and offer additional benefits to users (e.g. enterprise networks can connect to CPs without owning an Autonomous System Number, or ASN, or physical infrastructure). The implications of this transformation for the Internet’s interconnection ecosystem have been profound. First, the on-demand nature of VPIs introduces a degree of dynamism into the Internet interconnection fabric that has been missing in the past were setting up traditional interconnections of the public or private peering types took days or weeks. Second, once the growing volume of an enterprise’s traffic enters an existing VPI to a cloud provider, it is handled entirely 2 by that cloud provider’s private infrastructure (i.e. the cloud provider’s private backbone that interconnects its own datacenters) and completely bypasses the public Internet. The extensive means of connectivity towards cloud providers coupled with the competing market place of multiple CPs has lead enterprises to adopt a mutli- cloud strategy where instead of considering and consuming compute resources as a utility from a single CP, to better satisfy their specific requirements, enterprise networks can pick-and-choose services from multiple participating CPs (e.g. rent storage from one CP, compute resources from another) and establish end-to-end connectivity between them and their on-premises server(s) at the same or different locations. In the process, they also avoid vendor lock-in, enhance the reliability and performance of the selected services, and can reduce the operational cost of deployments. Indeed, according to an industry report from late 2018 Krishna, Cowley, Singh, and Kesterson-Townes (2018), 85% of the enterprises have already adopted multi-cloud strategies, and that number is expected to rise to 98% by 2021. These disparate resources from various CP regions are connected together either via TPP networks, cloud-providers private (CPP) backbone, or simply via the best- effort public Internet (BEP). The aforementioned market trends collectively showcase the implications of the cloud computing paradigm on the Internet’s structure and topology and highlight the need for focusing on these emergent technologies to have a correct understanding of the Internet’s structure and operation. 1.1 Challenges in Topology Discovery & Internet Measurement The topology of the Internet has been a key enabler for studying routing of traffic in addition to gaining a better understanding of Internet performance 3 and resiliency. The measurement of Internet in general and capturing Internet topology in specific is challenging due to many factors, namely (i) scale: the vast scale of the Internet as a network spanning the globe limits our abilities to fully capture its structure, (ii) visibility: our view of the Internet is constrained to the perspective that we are able to glean from the limited number of vantage points we are able to look at it, (iii) dynamic: the Internet as an ever-evolving entity is under constant structural change added to this the existence of redundant routes, backup links, and load-balanced paths limits our ability to fully capture the current state of the Internet’s topology, (iv) tools: researchers have relied on tools which were originally designed for troubleshooting purposes and the protocol stack of Internet lacks any inherent methods for identifying topology, and (v) intellectual property: many of the participating entities within the Internet lack incentives for sharing or disclosing data pertaining to their internal structure as often these data are key to their competitive edge. 1.2 Dissertation Scope & Contributions In this dissertation, we study and assess the impact of the wide adoption of CPs on today’s Internet traffic and topology. In a broad sense this dissertation can be categorized into four main parts, namely (i) studying the locality of traffic for major content providers (including CPs) from the edge of the network, (ii) presenting methodologies for capturing the topology surrounding cloud providers with a special focus on VPIs that have been under-looked up to this point, (iii) characterizing and evaluating the performance of various connectivity options towards CPs, and (iv) designing and presenting a measurement platform to support the measurement of cloud environments in addition to a decision support 4 framework for optimal utilization of cloud paths. The following presents an overview of the main contributions of this dissertation. 1.2.1 Locality of Traffic Footprint. Serving user requests from near-by caches or servers has been a powerful technique for localizing Internet traffic with the intent of providing lower delay and higher throughput to end users while also lowering the cost for network operators. This basic concept has led to the deployment of different types of infrastructures of varying degrees of complexity that large CDNs, CPs, ISPs, and content providers operate to localize their user traffic. This work assesses the nature and implications of traffic localization as experienced by end-users at an actual stub-AS. We report on the localization of traffic for the stub-AS UOnet (AS3582), a Research & Education network operated by the University of Oregon. Based on a complete flow-level view of the delivered traffic from the Internet to UOnet, we characterize the stub-AS’s traffic footprint (i.e. a detailed assessment of the locality of the delivered traffic by all major content providers), examine how effective individual content providers utilize their built-out infrastructures for localizing their delivered traffic to UOnet, and investigate the impact of traffic localization on perceived throughput by end-users served by UOnet. Our empirical findings offer valuable insights into important practical aspects of content delivery to real-world stub-ASes such as UOnet. 1.2.2 Discovery of Cloud Peering Topology. This works main contribution consists of presenting a third-party, cloud-centric measurement study aimed at discovering and characterizing the unique peerings (along with their types) of Amazon, the largest cloud service provider in the US and worldwide. Each peering typically consists of one or multiple (unique) interconnections between Amazon and a neighboring Autonomous System (AS) that are typically established 5 at different colocation facilities around the globe. Our study only utilizes publicly available information and data (i.e. no Amazon-proprietary data is used) and is therefore also applicable for discovering the peerings of other large cloud providers. We describe our technique for inferring peerings towards Amazon and pay special attention to inferring the VPIs associated with this largest cloud provider. We also present and evaluate a new method for pinning (i.e. geo-locating) each end of the inferred interconnections or peering links. Our study provides a first look at Amazon’s peering fabric. In particular, by grouping Amazon’s peerings based on their key features, we illustrate the specific role that each group plays in how Amazon peers with other networks. Overall, our analysis of Amazon’s peering fabric highlights how (e.g. using virtual and non-BGP peerings) and where (e.g. at which metro) Amazon’s cloud traffic “goes hiding"; that is, bypasses the public Internet. In particular, we show that as large cloud providers such as Amazon aggressively pursue new connect locations closer to the Internet’s edge, VPIs are an attractive interconnection option as they (i) create shortcuts between enterprises at the edge of the network and the large cloud providers (i.e. further contributing to the flattening of the Internet) and (ii) ensure that cloud-related traffic is primarily carried over the large cloud providers’ private backbones (i.e. not exposed to the unpredictability of the best-effort public Internet). 1.2.3 Cloud Connectivity Performance. This work aims to empirically examine the different types of multi-cloud connectivity options that are available in today’s Internet and investigate their performance characteristics using non-proprietary cloud-centric, active measurements. In the process, we are also interested in attributing the observed characteristics to aspects related to connectivity, routing strategy, or the presence of any performance bottlenecks. To 6 study multi-cloud connectivity from a C2C perspective, we deploy and interconnect VMs hosted within and across two different geographic regions or availability zones (i.e. CA and VA) of three large cloud providers (i.e. Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure) using the TPP, CPP, and BEP option, respectively. Using this experimental setup, we first compare the stability and/or variability in performance across the three connectivity options using metrics such as delay, throughput, and loss rate over time. We find that CPP routes exhibit lower latency and are more stable when compared to BEP and TPP routes. CPP routes also have higher throughput and exhibit less variation compared to the other two options. In our attempt to explain the subpar performance of TPP routes, we find that inconsistencies in performance characteristics are caused by several factors including border routers, queuing delays, and higher loss-rates of TPP routes. Moreover, we attribute the CPP routes’ overall superior performance to the fact that each of the CPs has a private optical backbone, there exists rich inter-CP connectivity, and that the CPs’ traffic always bypasses (i.e. is invisible to) BEP transits. 1.2.4 Optimal Cloud Overlays. This work focuses on the design of a measurement platform for multi-cloud environments aimed at gaining a better understanding of the connectivity and performance characteristics of inter- cloud connectivity paths. We demonstrate the applicability of this platform by deploying it on all available regions of the top three CPs (i.e. Amazon, Microsoft, and Google) and measure the latency among all regions. Furthermore, we capture the traffic cost models of each CP based on publicly published resources. The measured latencies and cost models are utilized by our optimal overlay construction framework that is capable of constructing overlay networks composed of network 7 paths within the backbone of CP networks. These overlays satisfy the deployment requirements of an enterprise in terms of target regions, and overall traffic budget. Overall our results demonstrate that CP networks are tightly interconnected with each other. Second, multi-cloud paths exhibit higher latency reductions than single cloud paths; e.g., 67% of all paths, 54% of all intra-CP paths, and 74% of all inter- CP paths experience an improvement in their latencies. Third, although traffic costs vary from location to location and across CPs, the costs are not prohibitively high. 1.3 Dissertation Outline The remainder of this thesis is organized as follows. We provide a background and overview of studies related to topology discovery and performance characteristics of Internet routes in Chapter II. Next, in Chapter III we characterize the locality of Internet traffic from an edge perspective and demonstrate that the majority of Internet traffic can be attributed to CDNs and cloud providers. Chapter IV presents our work on the discovery of Amazon’s peering ecosystem with a special focus on VPIs. We evaluate and characterize the performance of different connectivity options in a multi-cloud setting within Chapter V. Chapter VI presents our proposed measurement platform for multi-cloud environments and showcases the applicability of this measurement platform in the creation of optimal overlays. We conclude and summarize our contributions in Chapter VII. 1.3.1 Navigating the Chapters. This dissertation studies the effects of cloud-providers on the Internet from multiple perspectives, including (i) traffic, (ii) topology (iii), performance, and (iv) multi-cloud deployments. The chapters presented in this dissertation can be read independently. A reader interested in 8 Table 1. Topics covered in each chapter of the dissertation. Chapter Traffic Topology Performance Multi-Cloud 3 X X 4 X 5 X X X 6 X X X individual topics can refer to Table 1 for a summary of topics that are covered in each chapter. 9 CHAPTER II RELATED WORK This chapter presents a collection of prior studies for various aspects of Internet measurement to gain insight into the topology of the Internet as well as its implications in designing applications. For Internet measurement, we focus on recent studies regarding the simulation and characterization of Internet topology. Furthermore, we organize these studies based on the resolution of the uncovered topology with an emphasis on the utilized datasets and employed methodologies. On the second part, we focus on various implications of Internet topology on the design and performance of applications. These studies are organized in accordance with the implication of topology on performance or resiliency of the Internet. Furthermore we emphasis on how various resolutions of Internet topology allow researchers to conduct different studies. The collection of these studies present a handful of open and interesting problems regarding the future of Internet topology with the advent of cloud providers and their centrality within today’s Internet. The rest of this chapter is organized as follows. First, in Section 2.1 we present a primer on the Internet and introduce the reader with a few taxonomies that are frequently used within this document. Second, an overview of most common datasets, platforms, and tools which are used for topology discovery is given in Section 2.2. Third, the review for recent studies on Internet topology discovery is presented in Section 2.3. Lastly, Section 2.4 covers the recent studies which utilize Internet topologies to study the performance and resiliency of the Internet. 10 2.1 Background The Internet is a globally federated network composed of many networks each of which has complete autonomy over the structure and operation of its own network. These autonomous systems or networks (AS) can be considered as the building blocks of the Internet. Each AS represents a virtual entity and can be composed of a vast network infrastructure composed of networking equipment like routers and switches as well as transit mediums such as Ethernet and fiber optic cables. These ASes can serve various purposes such as providing transit or connectivity for other networks, generating or offering content such as video streams, or merely represent the network of an enterprise. Each of the connectivity provider ASes can be categorized into multiple tiers based on their size and how they are interconnected with other ASes. These tiers create a natural hierarchy of connectivity that is broadly composed of 3 tiers namely, (i) Tier-1: an AS that can reach all other networks without the need to pay for its traffic exchanges, (ii) Tier-2: an AS which can have some transit-free relations with other ASes while still needing to pay for transit for reachability to some portion of the Internet, and (iii) Tier-3: an AS that solely purchases transit for connectivity to the Internet. While each network has full control over its own internal network and can deliver data from one internal node to another, transmitting data from one AS to another requires awareness of a path that can reach the destination AS. This problem is solved by having each AS advertise its own address space to neighboring ASes through the border gateway protocol (BGP). Upon receiving a BGP announcement, each AS would prepend its own AS number (ASN) to the AS-path attribute of this announcement and advertise this message to its own neighbors. This procedure allows ASes to learn about other networks and the set of AS-paths or routes that 11 they can be reached through. ASes can interconnect with each by linking their border routers at one or multiple physical locations. These border routers are responsible for advertising their prefixes in addition to performing the actual routing of traffic within the Internet. The border routers of ASes are placed within colocation facilities (colo) that offer space, power, security, and networking equipment to the tenants ASes. Each AS can have a physical presence in multiple metro areas. The collection of their routers within each of these metro areas are referred to as the points of presence (PoP) for these ASes. Figure 1 presents a high level abstraction of the aforementioned concepts. The figure consists of 3 ASes namely, ASA , ASB , and ASC in red, blue, and green accordingly. The internal structure ASes is abstracted out presenting only the border routers of each AS. ASA and ASB have two PoPs one in LA and another in NY while ASC is only present in NY. ASA and ASB establish a private interconnection with each other through their LA PoP within colo1 while they peer with each other as well as ASC in their NY PoP in colo2 through an IXPs switching fabric. 2.2 Tools & Datasets This section provides an overview of various tools and datasets that have been commonly used by the measurement community for discovering Internet topology. We aim to familiarize the reader with these tools and datasets as they are continuously used within the literature by researchers. Researchers have utilized a wide range of tools for the discovery of topologies; they range from generic network troubleshooting tools such as traceroute or paris-traceroute to tools developed by the Internet measurement community such as Sibyl or MIDAR. Furthermore, researchers have benefited from many measurement platforms such as RIPE Atlas 12 PoPLA ASB PoPNY colo1 colo2 ASC IXP PoPLA ASA PoPNY Figure 1. Abstract representation for topology of ASA , ASB , and ASC in red, blue, and green accordingly. ASA and ASB establish a private interconnection inside colo1 at their LA PoP while peering with each other as well as ASC inside colo2 at their NY PoP facilitated by an IXP’s switching fabric. or PlanetLab which enable them to perform their measurements from a diverse set of ASes and geographic locations. In addition to the aforementioned toolsets researchers have benefited from various datasets within their work. These datasets are collected by a few well- known projects in the Internet measurement community such as Routeviews University of Oregon (2018), CAIDA’s Ark CAIDA (2018), and CAIDA’s AS relationships datasets or stem from other sources such as IP to geolocation datasets or information readily available on colocation facilities or IXP operators websites. The remainder of this section is organized within two subsections. First, §2.2.1 would provide an overview of the most commonly used tools and platforms for Internet topology discovery. Second, §2.2.2 would give a brief overview of the datasets that appear in the literature presented within §2.3 and §2.4. 13 2.2.1 Measurement Tools & Platforms. Broadly speaking the tools used for Internet topology discovery can be categorized within three groups namely, (i) path discovery, (ii) alias resolution, and (iii) interface name decoding. 2.2.1.1 Path Discovery. Although originally developed for troubleshooting purposes, traceroute Jacobson (1989) has become one of the prominent tools used within the Internet measurement community. traceroute displays the set of intermediate router interfaces that are traversed towards a specific destination in the forward path. This is made possible by sending packets towards the destination with incremental TTL values, each router along the path would decrease the TTL value before forwarding the packet. If a router encounters a packet with a TTL value of 0 the packet would be dropped, and a notification message with its source address would be sent back to the originator of the packet. This, in turn, allows the originator of these packets to identify the source address of router interfaces along the forward path. Deployment of load-balancing mechanics by routers which rely on packet header fields can lead to inaccurate and incomplete paths to be reported by traceroute. Figure 2 illustrates an example of incorrect inferences by traceroute in the presence of load-balanced paths. Node a is a load- balancer and multiplexes packets between the top and bottom paths. In this example, the T T L = 2 probe originated from the source traverses the top path and expires at node b while the T T L = 3 probe goes through the bottom path and terminates at node e. These successive probes cause traceroute to incorrectly infer a non-existent link between nodes b and e. To address this problem, Augustin, Friedman, and Teixeira (2007) developed paris-traceroute which relies on packet header contents to enforce load-balancers to pick a single route for all probes of a 14 single traceroute session. Furthermore, paris-traceroute uses a stochastic probing algorithm in order to enumerate all possible interfaces and links at each hop. Given the scale of the Internet and its geographic span relying on a single vantage point (VP) to conduct topology discovery studies would likely lead to incomplete or inaccurate inferences. Researchers have relied on various active measurement platforms which either host a pre-defined set of tools, e.g., Dasu, Bismark, Dimes, Periscope, and RIPE Atlas Giotsas, Dhamdhere, and Claffy (2016); RIPE NCC (2016); Sánchez et al. (2013); Shavitt and Shir (2005); Sundaresan, Burnett, Feamster, and De Donato (2014) or provide full-access control, e.g. PlanetLab, CAIDA Archipelago, and GENI Berman et al. (2014); Chun et al. (2003); Hyun (2006) to the user to conduct their measurements from a diverse set of networks and geographic locations. For example, RIPE Atlas RIPE NCC (2016) is composed of many small measurement devices (10k at the time of this survey) that are voluntarily hosted within many networks on a global scale. Hosting RIPE Atlas nodes would give credit to the hosting entity which later on could be used to conduct latency (ping) and reachability (traceroute and paris- traceroute) measurements. Periscope Giotsas et al. (2016) is another platform that provides a unified interface for probing around 1.7k publicly available looking glasses (LGs) which provide a web interface to conduct basic network commands (ping, traceroute, and bgp on routers hosted in roughly 0.3k ASes. Periscope VPs are located at core ASes while RIPE Atlas probes are hosted in a mix of core and edge networks. Dasu Sánchez et al. (2013) on the other hand mainly consists of VPs at edge networks and more specifically broadband users relying on ISPs to have Internet connectivity. Dasu consists of a plugin for the Vuze BitTorrent client that is able to conduct network measurement from the computers of users who 15 have installed their plugin on their Vuze client. The authors of Dasu incentivize its adoption by reporting broadband network characteristics to its users. Cunha et al. (2016) developed a route oracle platform named Sibyl which allowed users to define the path requirements for their measurement through an expressive input language based on symbolic regular-expressions after which Sibyl would select the source (LG) and destination pair that has the highest likelihood of satisfying the users path requirements based on its internal model. Lastly, considering the large number of Internet hosts and networks, researchers have developed a series of tools that allow them to conduct large scale measurements in parallel. The methodology of paris-traceroute has been incorporated in scamper Luckie (2010), an extensible packet prober that implements various common network measurement functionalities such as traceroute, ping, and alias resolution into a single tool. scamper is able to conduct measurements in parallel without exceeding a predefined probing rate. While scamper is able to run measurements in parallel, each measurement is conducted sequentially, this in turn could hinder its rate or induce overhead to the probing device in order to maintain the state of each measurement. yarrp Beverly (2016); Beverly, Durairajan, Plonka, and Rohrer (2018) is a high-rate IPv4 and IPv6 capable, Internet-scale probing tool inspired by the state-less design principles of ZMap Durumeric, Wustrow, and Halderman (2013) and masscan Graham, Mcmillan, and Tentler (2014). yarrp randomly permutates the IP and TTL space and encodes the state information of each probe within the IP and TCP header fields (which are included in the ICMP response) and is therefore able to conduct traceroute probes in parallel without incrementally increasing the TTL value. 16 Figure 2. Illustration of inferring and incorrect link (b − e) by traceroute due to load balanced paths. Physical links and traversed paths are shown with black and red lines accrodingly. The T T L = 2 probe traverses the top path and expires at node b while the T T L = 3 probe traverses the bottom path and expires at node e. This succession of probes causes traceroute to infer a non-existent link (b − e). 2.2.1.2 Alias Resolution. Paths which are obtained via the tools outlined in §2.2.1.1 all specify the router interfaces that are encountered along the forward path. It is possible to observe multiple interfaces of a single router within different traceroute paths. The association of these interfaces to a single physical router is not clear from these outputs. Alias resolution tools have been developed to solve this issue. These tools would accept a set of interface addresses as an input and would provide a collection of interface sets, each of which corresponds to a single router. Alias resolution tools can broadly be categorized into two groups namely, (i) probing Bender, Sherwood, and Spring (2008); Govindan and Tangmunarunkit (2000); Keys, Hyun, Luckie, and Claffy (2013); Spring, Mahajan, and Wetherall (2002); Tozal and Sarac (2011) and (ii) inference M. Gunes and Sarac (2009); M. H. Gunes and Sarac (2006); Sherwood, Bender, and Spring (2008); Spring, Dontcheva, Rodrig, and Wetherall (2004) based techniques. The former would require a VP which would probe the interfaces in question to identify sets of interfaces which belong to the same router. Probe based techniques mostly rely on the IP ID field which is used for reassembling fragmented packets at the network layer. These techniques assume that routers rely 17 on a single central incremental counter which assigns these ID values regardless of the interface. Given this assumption, Ally Spring et al. (2002) probed IPs with UDP packets having high port numbers (most likely not in use) to induce an ICMP port unreachable response. Ally will infer IP addresses to be aliases if successive probes have incremental ID values within a short distance. Radargun Bender et al. (2008) tries to address the probing complexity of Ally (O(n2 )) by iteratively probing IPs and inferring aliases based on the velocity of IP ID increments for each IP. MIDAR Keys et al. (2013) presents a precise methodology for probing large scale pool of IP addresses by eliminating unlikely IP aliases using a velocity test. Furthermore, aliases are inferred by comparing the monotonicity of IP ID time series for multiple target IP addresses. MIDAR utilizes ICMP, TCP, and UDP probes to increase the likelihood of receiving responses from each router/interface. Palmtree Tozal and Sarac (2011) probes /30 or /31 mates of target IPs using a TTL value inferred to expire at the router in question to induce an ICMP_TTL_EXPIRED response from another interface of the router. Assuming no path changes have happened between measuring the routers hop distance and the time the ICMP_TTL_EXPIRED message has been generated, the source address of the ICMP_TTL_EXPIRED message should reside on the same router of the target IP and therefore are inferred to be aliases. Inference based techniques accept a series of traceroute outputs and rely on a set of constraints and assumptions regarding the setting and environment which these routers are deployed to make inferences about interfaces that are most likely part of the same router. Spring et al. suggest a common successor heuristic to attribute IP addresses on the prior hop to the same router. This heuristic assumes that no layer-2 devices are present between the two routers in question. 18 Analytical Alias Resolution (AAR) M. H. Gunes and Sarac (2006) infers aliases using symmetric traceroute pairs by pairing interface addresses using the common address sharing convention of utilizing a /30 or /31 prefix for interfaces on both ends of a physical link. This method requires the routes between both end-pairs to be symmetrical. DisCarte Sherwood et al. (2008) relies on the route record option to capture the forward and reverse interfaces for the first nine hops of a traceroute. Limited support and various route record implementations by routers in addition to the high complexity of the inference algorithm limits its applicability to wide/large scenarios. 2.2.1.3 Interface Name Decoding. Reverse DNS (RDNS) entries for observed interface addresses can be the source of information for Internet topology researchers. Port type, port speed, geolocation, interconnecting AS, and IXP name are examples of information which can be decoded from RDNS entries of router interfaces. These information sets are embedded by network operators within RDNS entries for ease of management in accordance to a (mostly) structured convention. For example, ae-4.amazon.atlnga05.us.bb.gin.ntt.net is an RNDS entry for a router interface residing on the border router of NTT (ntt.net) within Atlanta GA (atlnga) interconnecting with Amazon. Embedding this information is completely optional, and the structure of this information varies from one AS to another. Several tools have been developed to parse and extract the embedded information within RDNS entries Chabarek and Barford (2013); Huffaker, Fomenkov, et al. (2014); Scheitle, Gasser, Sattler, and Carle (2017); Spring et al. (2002). Spring et al. extracted DNS encoded information for the ISPs under study in their Rocketfuel project Spring et al. (2002). As part of this process, they relied on the city code names compiled in Padmanabhan and Subramanian 19 (2001) to search for domain names which encode geoinformation in their name. PathAudit Chabarek and Barford (2013) is an extension to traceroute which report encoded information within observed router hops. In addition to geo information, PathAudit reports on interface type, port speed, and manufacturing vendor of the router. The authors of PathAudit extract common encodings (tags) from device configuration parameters, operator observations, and common naming conventions. Using this set of tags, RDNS entries from CAIDA’s Ark project CAIDA (2018) are parsed to match against one or multiple of these tags. A clustering algorithm is employed to identify similar naming structures within domains of a common top level domain TLD. These common structures are translated into parsing rules which can match against other RDNS entries. DDeC Huffaker, Fomenkov, and claffy (2014) is a web service which decodes embedded information within RDNS entries by unifying the rulesets obtained by both UNDNS Spring et al. (2002) and DRoP Huffaker, Fomenkov, et al. (2014) projects. 2.2.2 Datasets. Internet topology studies have been made possible through various data sources regarding BGP routes, IXP information, colo facility listings, AS attributes, and IP to geolocation mapping. The following sub-section provides a short overview of data sources most commonly used by the Internet topology community. 2.2.2.1 BGP Feeds & Route Policies. University of Oregon’s RouteViews and RIPE Routing Information Service (RIS) RIPE (2018); University of Oregon (2018) are projects originally conceived to provide real-time information about the global routing system from the standpoint of several route feed collectors. These route collectors periodically report the set of BGP feeds that they receive back to a server where the information is made publicly accessible. The data from 20 these collectors have been utilized by researchers to map prefixes to their origin- AS or to infer AS relationships based on the set of observed AS-paths from all the route collectors. Routeviews and RIPE RIS provide a window into the global routing system from higher tier networks. Packet Clearing House (PCH) Packet Clearing House (2018) maintains more than 100 route collectors which are placed within IXPs around the globe and provides a complementary view to the global routing system presented by Routeviews and RIPE RIS. Lastly, Regional Internet Registries (RIRs) maintain databases regarding route policies of ASes for each of the prefixes that are delegated to them using the Route Policy Specification Language (RPSL). Historically, RPSL entries are not well adopted and typically are not maintained/updated by ASes. The entries are heavily concentrated within RIPE and ARIN regions but nonetheless have been leveraged by researchers to infer or validate AS relationships Giotsas, Luckie, Huffaker, and Claffy (2015); Giotsas, Luckie, Huffaker, et al. (2014). 2.2.2.2 Colocation Facility Information. Colocation facilities (colo for short) are data-centers which provide space, power, cooling, security, and network equipment for other ASes to host their servers and also establish interconnections with other ASes that have a presence within the colo. PeeringDB and PCH Packet Clearing House (2017); PeeringDB (2017) maintain information regarding the list of colo facilities and their physical location as well as tenant ASes within each colo. Furthermore, some colo facility operators provide a list of tenant members as well as the list of transit networks that are available for peering within their facilities for marketing purposes on their website. This information has been mainly leveraged by researchers to define a set of constraints regarding the points of presence (PoP) for ASes. 21 2.2.2.3 IXP Information. IXPs are central hubs providing rich connectivity opportunities to the participating ASes. Their impact and importance regarding the topology of the Internet have been highlighted within many works Augustin, Krishnamurthy, and Willinger (2009); Castro, Cardona, Gorinsky, and Francois (2014); Comarela, Terzi, and Crovella (2016); Nomikos et al. (2018). IXPs provide a switching fabric within one or many colo facilities where each participating AS connects their border router to this switch to establish bi- lateral peering with other member ASes or establishes a one to many (multi- lateral) peering with the route server that is maintained by the IXP operator. IXP members share a common subnet owned by the IXP operator. Information regarding the location, participating members, and prefixes of IXPs is readily available through PeeringDB, PCH, and the IXP operators website Packet Clearing House (2017); PeeringDB (2017). 2.2.2.4 IP Geolocation. The physical location of IP addresses isn’t known. Additionally, IP addresses could correspond to mobile end-hosts or can be repurposed by the owner AS and therefore have a new geolocation. Several free and commercial databases have been made throughout the years that attempt to map IP addresses to physical locations. These datasets can vary in their coverage as well as the resolution of mapped addresses (country, state, city, and geo-coordinates). Maxmind’s GeoIP2 MaxMind (2018), IP2Location databases IP2Location (2018), and NetAcuity NetAcuity (2018) are among the most widely used IP geolocating datasets used by the Internet measurement community. Majority of these datasets have been designed to geolocate end-host IP addresses. Gharaibeh et al. (2017) compare the accuracy of these datasets for geolocating router interfaces and while NetAcuity has relatively higher accuracy than Maxmind and IP2Location 22 datasets, relying on RTT validated geocoding of RDNS entries is more reliable for geolocating router and core addresses. 2.3 Capturing Network Topology This section provides an overview of Internet measurement studies which attempt to capture the Internet’s topology using various methodologies motivated by different end goals. Capturing Internet topology has been the focus of many pieces of research over the past decade, while each study has made strides of incremental improvements to present a more complete and accurate picture of Internet topology, the problem remains widely open and the subject of many recent studies. Internet topology discovery has been motivated by a myriad of applications ranging from protocol design, performance measurement in terms of inter-AS congestion, estimating resiliency towards natural disasters and service or network interruptions, security implications of DDoS attacks and much more. A motivating example would be the Netflix Verizon dispute where the subpar performance of Netflix videos for Verizon customers lead to lengthy accusations from both parties Engebretson (2014). The lack of proper methodologies to capture inter- AS congestion by independent entities at the time further elongated the dispute. Within Section 2.4 we provide a complete overview of works which rely on some aspect of Internet topology to drive their research and provide insight regarding the performance or resiliency of the Internet. Capturing Internet topology is hard due to many contributing factors, the following is a summary of them: – The Internet is by nature a decentralized entity composed of a network of networks, each of the constituent networks lacks any incentive to share 23 their topology publicly and often can have financial gains by obscuring this information. – Topology discovery studies are often based on “hackish" techniques that rely on toolsets which were designed for completely different purposes. The designers of the TCP/IP protocol stack did not envision the problem of topology discovery within their design most likely due to the centralized nature of the Internet in its inception. The de facto tool for topology discovery has been traceroute which is designed for troubleshooting and displaying paths between a host and a specific target address. – Capturing inter-AS links within Internet topology becomes even more challenging due to lack of standardization for proper ways to establish these links. More specifically, the shared address between two border routers could originate from either of the participating networks. Although networks typically rely on common good practices such as using addresses from the upstream provider, the lack of any oversight or requirement within RFC standards does not guarantee its proper execution within the Internet. – A certain set of RFCs regarding how routers should handle TTL expired messages has resulted in incorrect inferences of the networks which are establishing inter-AS interconnections. For example, responses generated by third-party interfaces on border routers could lead to the inference of an inter- AS link between networks which necessarily are not interconnected with each other. Topology discovery studies can be organized according to many of their features; in particular, the granularity of the obtained topology seems to be the 24 most natural fit. Each of the studies in this section based on the utilized dataset, or devised methodology results in topologies which capture the state of the Internet at different granularities, namely physical-level, router-level, PoP-level, and AS- level. The aforementioned resolutions of topology have a direct mapping to the abstract layers of the TCP/IP stack, e.g. physical-level corresponds to the first layer (physical), router-level can be mapped to the transport layer, and PoP-level as well as AS-level topologies are related to application layer at the top of the TCP/IP stack. These abstractions allow one to capture different features of interest without the need for dealing with the complexities of lower layers. For instance, the interplay of routing and the business relationships between different ASes can be captured through an AS-level topology without the need to understand how and where these inter-AS relationships are being established. In the following subsections, we will provide an overview of the most recent as well as prominent works that have captured Internet topology at various granularities. We present all studies in accordance to their chronological order starting with works related to AS-level topologies as the most abstract representation of Internet topology within Section §2.3.1, AS-level topologies are the oldest form of Internet topology but have retained their applicability for various forms of analyses throughout the years. Later we’ll present router-level and physical-level topologies within Section §2.3.2 and §2.3.4 accordingly. 2.3.1 AS-Level Topology. The Internet is composed of various networks or ASes operating autonomously within their domain that interconnect with each other at various locations. This high-level abstraction of the Internet’s structure is captured by graphs representing AS-level topologies where each node is an AS and edges present an interconnection between two ASes. These 25 graphs lay-out virtual entities (ASes) that are interconnecting with each other and abstract out details such as the number and location where these inter-AS links are established. For example, two large Tier-1 networks such as Level3 and AT&T can establish many inter-AS links through their border routers at various metro areas. These details are abstracted out, and all of these inter-AS links are represented by a single edge within the AS-level topology. The majority of studies rely on control plane data that is obtained by active measurements of retrieving router dumps through available looking glasses or passive measurements that capture BGP feeds, RPSL entries and BGP community attributes. Path measurements captured through active or passive traceroute probes have been an additional source of information for obtaining AS-level topologies. The obtained traceroute paths have been mapped to their corresponding AS path by translating each hop’s address to its corresponding AS. Capturing AS-level topology has been challenging mainly due to limited visibility into the global routing system, more specifically the limited set of BGP feeds that each route collector is able to observe. This limited visibility is known as the topology incompleteness problem within the community. Researchers have attempted to address this issue by either modeling Internet topology by combing the limited ground truth information with a set of constraints or by presenting novel methodologies that merge various data sources in order to obtain a comprehensive view of Internet topology. The later efforts lead to research’s that highlighted the importance of IXPs as central hubs of rich connectivity. Within the remainder of this Section we organize works into the following three groups: (i) graph generative and modeling, (ii) topology incompleteness, and (iii) IXP’s internal operation and peerings. 26 2.3.1.1 Graph Generation & Modeling. Graph generation techniques attempt to simulate network topologies by relying on a set of constraints such as the maximum number of physical ports on a router. These constraints coupled with the limited ground truth information regarding the structure of networks are used to model and generate topologies. The output of these models can be used in other studies which investigate the effects of topology on network performance and resiliency of networks towards attacks or failures caused by natural disasters. Li, Alderson, Willinger, and Doyle (2004) argue that graph generating models rely on replicating too abstract measures such as degree distribution which are not able to express the complexities/realities of Internet topology. Authors aim to model ASes/ISPs as the building blocks of the Internet at the granularity of routers, where nodes represent routers and links are Layer2 physical links which connect them together. Furthermore, the authors argue that technological constraints on routers switching fabric dictate the amount of bandwidth-links we can have within this topology. Furthermore, due to economical reasons access providers aggregate their traffic over a few links as possible since the cost of laying physical links could surpass that of the switching/routing infrastructure. This, in turn, leads to lower degree core and high degree edge elements. The authors create five graphs with the same degree distribution but based on different heuristics/models and compare the performance of these models using a single router model. Interestingly graphs that are less likely to be produced using statistical measures have the highest performance. Gregori, Improta, Lenzini, and Orsini (2011) conduct a structural interpretation of the Internet connectivity graph with an AS granularity. They 27 report on the structural properties of this graph using k-core decomposition techniques. Furthermore, they report what effects IXPs have on the AS-level topology. The data for this study is compiled from various datasets, namely CAIDA’s Ark, DIMES, and Internet Topology Collection from IRL which is a combination of BGP updates from Routeviews, RIPE RIS, and Abilene. The first two datasets consist of traceroute data and are converted to AS-level topologies by mapping each hop to its corresponding ASN. A list of IXPs was obtained using from PCH, PeeringDB, Euro-IX, and bgp4.as. The list of IXP members was compiled either from the IXP websites or by utilizing the show ip bgp summary command from IXPs which host an LG. Using the obtained AS-level graph resulted from combing various data sources the authors report on various characteristics of the graph namely: degree, average neighbor degree, clustering coefficient, betweenness centrality, and k-core decomposition. A k-core subgraph has a minimum degree of k for every node and is the largest subgraph which has this property. The authors present stats regarding the penetration of IXPs in different continents with Europe having the largest share (47%) and North America (19%) at second position. Furthermore using k- core decomposition, the authors identify a densely connected core and a loosely connected periphery which consists of the majority of nodes. The authors also look at the fraction of nodes in the core which are IXP participants and find that IXPs play a fundamental role in the formation of these cores. 2.3.1.2 Topology Incompleteness. Given the limited visibility of each of the prior works, researchers have relied on a diverse set of data sources and devised new methodologies for inferring additional peerings to address the 28 incompleteness of Internet topology. These works have lead to highlighting the importance of IXPs as a means of providing the opportunity for establishing many interconnections with IXP members and a major source for identifying missing peering links. Peerings within IXPs and their rich connectivity fabric between many edge networks caused topological changes to the structure of the Internet deviating from the historical hierarchical structure and as a consequence creating a more flat Internet structure referred to as Internet flattening within the literature. He, Siganos, Faloutsos, and Krishnamurthy (2009) address AS-level topology incompleteness by presenting tools and methodologies which identify and validate missing links. BGP snapshots from various (34 in total) Routeviews, RIPE RIS, and public route servers are collected to create a baseline AS-level topology graph. The business relationship of each AS edge is identified by using the PTE algorithm Xia and Gao (2004). The authors find that the majority of AS links are of a c2p type, while most of the additional links which are found by additional collectors are p2p links. Furthermore, by parsing IRR datasets using Nemecis Siganos and Faloutsos (2004) to infer additional AS links. A list of IXP participants is compiled by gathering IXP prefixes from PCH and performing DNS lookups and parsing the resulting domain name to infer the participating ASN. Furthermore, the authors infer inter-AS links within IXPs by relying on traceroute measurements which cross IXP addresses and utilize a majority voting scheme to infer the participants ASN reliably. By Combing all these datasets and proposed methodologies, the authors find about 300% additional links compared to prior studies, most of which is found to be established through IXPs. Augustin et al. (2009) attempt to expand on prior works for discovering IXP peering relationships by providing a more comprehensive view of this ecosystem. 29 They rely on various data sources to gather information on IXPs as much as possible, their data-sources are: (i) IXP databases such as PCH and PeeringDB, (ii) IXP websites which typically list their tenants as well as the prefixes which are employed by them, (iii) RIRs may include BGP policy entries specifically the import and export entries that expose peering relationships, (iv) DNS names of IXP addresses which include information about the peer, (v) BGP dumps from LGs, Routeviews, and RIPE’s RIS can include next hop neighbors which are part of an IXP prefix. The authors conduct targeted traceroute measurements with the intention of revealing peering relationships between members of each IXP. To limit the number of conducted probes, the authors either select a vantage point within one of the member ASes or if not available they rely on the AS relationship datasets to discover a - at most 2 hops away - neighbor for each member which has a VP. Using the selected VPs, they conduct traceroutes towards alive addresses (or random address if such an address was not discovered) in the target network. Inference of peerings based on traceroutes is done using a majority voting scheme similar to He et al. (2009). The authors augment their collected dataset with the data plane measurements of CAIDA’s Skitter, DIMES, and traceroutes measured from about 250 PlanetLab nodes. The resultant dataset is able to identify peerings within 223 (out of 278) IXPs which consisted of about 100% (40%) more IXPs (peerings) compared to the work of He et al. He et al. (2009). Ager et al. (2012) rely on sFlow records from one of largest European/global IXPs as another source of information for inferring peering relationships between IXP tenants and provide insight on three fronts: (i) they outline the rich connectivity which is happening over the IXP fabric and contrast that with known private peerings which are exposed through general topology measurement studies, 30 (ii) present the business dynamics between participants of the IXP and providing explanation for their incentives to establish peering relationships with others, and (iii) provide the traffic matrix between peers of the IXP as a microcosm of Internet traffic. Among the set of analyses that have been conducted within the paper one could point to: (i) comparison of peering visibility from Routeviews, RIPE, LGs, and the IXPs perspective, (ii) manual label for AS types as well as the number of established peerings per member, (iii) breakdown of traffic into various protocols based on port numbers as well as the share of each traffic type among various AS types, and (iv) traffic asymmetry, ratio of used/served prefixes and geo-distance between end-points. Khan, Kwon, Kim, and Choi (2013) utilize LG servers to provide a complementary view to Routeviews and RIPE RIR of the AS-level Internet topology. A list of 1.2k LGs (420 were operational at the time of the study) has been built by considering various sources including PeeringDB, traceroute.org, traceroute.net.ru, bgp4.as, bgp4.net, and virusnet. AS-level topologies from IRL, CAIDA’s Ark, iPlane, and IRR’s are used to compare the completeness of the identified AS-links. For the duration of a month show ip bgp summary is issued twice a week and BGP neighbor ip advertised is issued once a week towards all LGs which support the command. The first command outputs each neighbor’s address and its associated ASN while the second command outputs the routing table of the router, consisting of reachable prefixes, next hop IP as well as the AS path towards the given prefix. AS-level connectivity graph is constructed by parsing the output of the prior commands. Using this new data source enables the authors to identify an additional 11k AS-links and about 700 new ASes. 31 Klöti, Ager, Kotronis, Nomikos, and Dimitropoulos (2016) perform a cross- comparison of three public IXP datasets, namely PeeringDB PeeringDB (2017), Euro-IX European Internet Exchange Association (2018), and PCH Packet Clearing House (2017) to study several attributes of IXPs such as location, facilities, and participants. Aside from the three aforementioned public IXP datasets, for validation purposes BGP feeds collected by PCH route collectors as well as data gathered from 40 IXP websites was used through the study. The three datasets lack common identifiers for IXPs across datasets, for this reason in a first pass IXPs are linked together through an automated process by relying on names and geo information, in the second pass linked IXPs are manually checked for correctness. The authors present one of the largest IXP information datasets at the time as a side effect of their study. Geo coverage of each dataset is examined where the authors find relatively close coverage by each dataset except for North America region where PCH has the highest coverage. Facility location for IXPs is compared across datasets and is found that PCH lacks this information and in general facility information for IXPs is limited for other datasets. Complementarity of datasets is presented using both Jaccard and overlap index. It is found that PeeringDB and Euro-IX have the largest overlap within Europe and larger IXPs tend to have the greatest similarity across all pairs of datasets. 2.3.1.3 IXP Peerings. The studies within this section provide insight into the inner operation of IXPs and how tenants establish peerings with other ASes. Each tenant of an IXP can establish a one-to-one (bilateral) peering with other ASes of the IXP similar to how regular peerings are established. Given the large number of IXP members, a great number of peering sessions should 32 ASc Route Server IXP ASb ASd BLP ASa MLP Figure 3. Illustration of an IXP switch and route server along with 4 tenant networks ASa , ASb , ASc , and ASd . ASa establishes a bi-lateral peering with ASd (solid red line) as well as multi-lateral peerings with ASb and ASc (dashed red lines) facilitated by the route server within the IXP. be maintained over the IXP fabric. Route servers have been created to alleviate this issue where each member would establish a peering session with the route server and describe its peering preferences. This, in turn, has enabled one-to-many (multilateral) peering relationships between IXP tenants. Figure 3 illustrates an IXP with 4 tenant networks ASa , ASb , ASc , and ASd . ASa established a bi-lateral peering with ASd (solid red line) as well as multi-lateral peerings with ASb and ASc (dashed red lines) that are facilitated by the route server within the IXP. Studies within this section propose methodologies for differentiating these forms of peering relationships from each other and emphasize the importance of route servers in the operation of IXPs. Giotsas, Zhou, Luckie, and Klaffy (2013) present a methodology to discover multilateral peerings within IXPs using the BGP communities attributes and 33 route server data. The BGP communities attribute which is 32bits follows specific encoding to indicate either of the following policies by each member of an IXP: (i) ALL routes are announced to all IXP members. (ii) EXCLUDE block an announcement towards a specific member, this policy is usually used in conjunction with the ALL policy. (iii) NONE block an announcement towards all members, and (iv) INCLUDE allow an announcement towards a specific member, this policy is used with the NONE policy. Using a combination of prior policies a member AS can control which IXP members receive its BGP announcements. By leveraging available LGs at IXPs and issuing router dump commands, the authors obtain the set of participating ASes and the BGP communities values for their advertised prefixes which in turn allows them to infer the connectivity among IXP participants. Furthermore, additional BGP communities values are obtained by parsing BGP feeds from Routeviews and RIPE RIS archives. Giotsas et al. infer the IXP by either parsing the first 16bits of the BGP communities attribute or by cross-checking the list of excluded ASes against IXP participants. By combining the passive and active measurements, the authors identify 207k multilateral peering (MLP) links between 1.3k ASes. They validate their findings by finding LGs which are relevant to the identified links from PeeringDB, by testing 26k different peerings they are able to confirm 98.4% of them. Furthermore Giotsas et al. parse the peering policies of IXP members either from PeeringDB or from IXP websites which provide this information and find that 72%, 24%, and 4% of members have an open, selective, and restrictive peering policy accordingly. Participation in a route server seems to be positively correlated to a networks openness in peering. The authors present the existence of a binary pattern in terms of the number of allowed/blocked ASes where ASes either allow 34 or block the majority of ASes from receiving their announcements. Peering density as a representation of the percentage of established links against the number of possible links is found to be between 80%-95%. Giotsas and Zhou (2013) expand their prior work Giotsas et al. (2013) by inferring multi-lateral peering (MLP) links between IXP tenants by merely relying on passive BGP measurements. BGP feeds are collected from both Routeviews and RIPE RIS collectors. Additionally, the list of IXP looking glasses, as well as their tenants, are gathered from PeeringDB and PCH. The authors compile a list of IXP tenants, using which the setter of each BGP announcement containing the communities attribute is determined by matching the AS path against the list of IXP tenants. If less than two ASes match against the path, no MLP link can be identified. From the two matching ASes, the AS which is closest to the prefix would be the setter, if more than two ASes match, only two ASes which have a p2p relationship according to CAIDA’s AS relationship dataset are selected and the one closer to the prefix is identified as the setter. Depending on a blacklist or whitelist policy that the setter AS has chosen a list of multi-lateral peers for each setter AS is compiled. The methodology is applied to 11 large IXP route servers; the authors find about 73% additional peering links out of which only 3% of the links are identified within CAIDA’s Ark and DIMES datasets. For validation, the authors rely on IXP LGs and issue a show ip bgp command for each prefix. About 3k links where tested for validation and 94% of them were found to be correct. Richter et al. (2014) outline the role and importance of route servers within IXPs. For their data, weekly snapshots of peer and master RIBs from two IXPs which exposes the multi-lateral peerings that have been happening at 35 the IXP are used. Furthermore, the authors have access to sFlow records which are sampled from the IXP’s switching infrastructure. This dataset allows the authors to identify peerings between IXP members which have been established without the help of route servers. Using peer RIB snapshots peering relationships between IXP members as well as the symmetrical nature of it is identified. For the master RIB, Richter et al. assume peering with all members unless they find members using BGP community values to control their peering. The data plane sFlow measurements would correspond to a peering relationship if BGP traffic is exchanged between two members of the IXP. The proclivity of multi-lateral peering over bi-lateral peering is measured and found that ASes favor multi-lateral peerings with a ratio of 4:1 and 8:1 in the large and medium IXPs accordingly. Furthermore, traffic volumes transmitted over multi-lateral and bi-lateral peerings are measured and found that ASes tend to send more traffic over bi-lateral links with a ratio of 2:1 and 1:1 for the large and medium IXPs accordingly. It is found that ASes have binary behavior of either advertising all or none of their prefixes through the route server. Additionally, when ASes establish hybrid (multi and bi-lateral) peerings, they do not advertise further prefixes over their bi-lateral links. Majority of additional peerings happen over multi-lateral fabric while traffic ratios between multi(bi)-lateral peerings remain fairly consistent over the period of study. Summary: This subsection provided an overview of researches concerned with AS-level topology. The majority of studies were concerned with the incompleteness of Internet topology graphs. These efforts lead to highlighting the importance of IXPs as central hubs of connectivity. Furthermore, various sources of information such as looking glasses, router collectors within IXPs, targeted traceroutes, RPSL entries, and traffic traces of IXPs were gleaned together to provide a more 36 comprehensive view of inter-AS relationships within the Internet. Lastly the importance of route servers to the inner operation of IXPs and how they enable multi-lateral peering relationships was brought into attention. 2.3.2 Router-Level Topology. Although AS-level topologies provide a preliminary view into the structure and peering relations of ASes, they merely represent virtual relationships and do not reflect details such as the number and location where these peerings are established. ASes establish interconnections with each other by placing their border routers within colos where other ASes are also present. Within these colos ASes can establish one to one peerings through private interconnections or rely on an IXPs switching fabric to establish public peerings with the IXP participants. Furthermore, some ASes extend their presence into remote colos to establish additional peerings with other ASes by relying on layer2 connectivity providers. Capturing these details can become important for accurately attributing inter-AS congestion to specific links/routers or for pin- pointing links/routers that are responsible for causing outages or disruptions within the connectivity of a physical region or network. Studies within this section aim to present methodologies to infer router-level topologies using data plane measurements in the form of traceroute. These methods would address the aforementioned shortcomings of AS-level topologies by mapping the physical entities (border routers) which are used to establish peering relations and therefore can account for multiple peering links between each AS. Furthermore, given that routers are physical entities, researchers are able to pinpoint these border routers to geo locations using various data sources and newly devised methodologies. Creating router-level topologies of the Internet can be challenging due to many reasons. First, given the span of the Internet as well as the interplay of business 37 relationships and routing dynamics, traceroute as the de-facto tool for capturing router-level topologies is only capable of recording a minute fraction of all possible paths. Routing dynamics caused by changes in each ASes route preference as well as the existence of load-balancers further complicate this task. Second, correctly inferring which set of ASes have established an inter-AS link through traceroute is not trivial due to non-standardized practices for establishing interconnections between border routers as well as several RFCs regarding the operation of routers that cause traceroute to depict paths that do not correspond to the forward path. Lastly, given the disassociation of the physical layer from the transport layer establishing the geolocation for the set of identified routers is not trivial. Within Section 2.2 we presented a series of platforms which try to address the first problem. The following studies summarize recent works which try to address the latter two problems. ′ ′ ′ ′ Figure 4. Illustration of address sharing for establishing an inter-AS link between border routers. Although the traceroute paths (dashed lines) are identical the inferred ownership of router interfaces and the placement of the inter-AS link differs for these two possibilities. 2.3.2.1 Peering Inference. As briefly mentioned earlier, inferring inter-AS peering relationships using traceroute paths is not trivial. To highlight this issue, consider the sample topology within Figure 4 presenting the border 38 routers of AS1 and AS2 color coded as orange and blue accordingly. This figure shows the two possibilities for address sharing on the inter-AS link. The observed traceroute path traversing these border routers is also presented at the top of each figure with dashed lines. Within the top figure AS2 is providing the address space for the inter-AS link (y 0 − y) while AS1 provides the address space for the inter- AS link for the bottom figure. As we can see both of the traceroute paths are identical to each other while the ownership of router interfaces and the placement of the inter-AS link differs for these two possibilities. To further complicate the matter, a border router can respond with an interface (a in the top figure using address space owned by AS3 color coded with red), not on the forward path of the traceroute leading to incorrect inference of an inter-AS link between AS1 and AS3 . Lastly, the border routers of some ASes are configured to not respond to traceroute probes which restrict the chances of inferring inter-AS peerings with those ASes. The studies within this section try to address these difficulties by using a set of heuristics which are applied to a set of traceroutes that allow them to account for these difficulties. Spring et al. (2002) done the seminal work of mapping networks of large ISPs and inferring their interconnections through traceroute probes. They make three contributions namely, (i) conducting selective traceroute probes to reduce the overall overhead of running measurements, (ii) provide an alias resolution technique to group IP address into their corresponding router, and (iii) parse DNS information to extract PoP/GEO information. Their selective probing method is composed of two main heuristics: (i) directed probing, which utilizes Routeviews data and the advertised paths to probe prefixes which are likely to cross the target network, (ii) path reduction, that avoids conducting traceroutes which would 39 lead to redundant paths, i.e., similar ingress or egress points. Additionally, an alias resolution technique named Ally is devised to group interfaces from a single network into routers. Lastly, a series of DNS parsing rules are crafted to extract geoinformation from router interface RDNS entries. The extracted geo information allows the authors to identify the PoPs of each AS. Looking glasses listed on traceroute.org are used to run Rocketfuel ’s methodology to map the network of 10 ISPs including AT&T, Sprint, and Verio. The obtained maps were validated through private correspondence with network operators and by comparing the set of identified BGP neighbors with those obtainable through BGP feeds. Nomikos and Dimitropoulos (2016) develop an augmented version of traceroute (traIXroute) which annotates the output path and reports whether (and at which exact hop) an IXP has been crossed along the path. The tool can operate with either traceroute or scamper as a backend. As input, traIXroute requires IXP membership and a list of their corresponding prefixes from PeeringDB and PCH as well as Routeviews’ prefix to origin-AS mapping datasets. traIXroute annotates the hops of the observed path with the origin AS and tags hops which are part of an IXP prefix and also provides the mapping between an IXP address and the members ASN if such a mapping exists. Using a sliding window of size three the hops of the path are examined to find (i) hops which are part of an IXP prefix, (ii) hops which have an IXP to ASN mapping, and (iii) whether the adjacent ASes are IXP members or not. The authors account for a total of 16 possible combinations and present their assessment regarding the location of the IXP link for 8 cases that were most frequent. About 75% of observed paths matched rules which rely on IXP to ASN mapping data. The validity of this data source is looked into by using BGP dumps from routers that PCH operates within multiple IXPs. A list of IXP address 40 to ASN mappings was compiled by using the next hop address and first AS within the AS path from these router dumps. The authors find that 92% (93%) of the IXP to ASN mappings reported by PeeringDB (PCH) are accurate according to the BGP dumps. Finally, the prevalence of IXPs along Internet paths are measured by parsing a CAIDA Ark snapshot. About 20% of paths are reported to cross IXPs, the IXP hop on average is located on the 6th hop at the middle of the path, and only a single IXP is observed along each route which is in accordance with valley- free routing. Luckie, Dhamdhere, Huffaker, Clark, et al. (2016) develop bdrmap, a method to identify inter-domain links of a target network at the granularity of individual routers by conducting targeted traceroutes. As an input to their method, they utilize originated prefixes from Routeviews and RIPE RIS, RIR delegation files, list of IXP prefixes from PeeringDB and PCH, and CAIDA’s AS-to-ORG mapping dataset. Target prefixes are constructed from the BGP datasets by splitting overlapping prefixes into disjoint subnets, the first address within each prefix is targeted using paris-traceroute, neighbors border addresses are added to a stop list to avoid further probing within the customer’s network. IP addresses are grouped together to form a router topology by performing alias resolution using Ally and Mercator. By utilizing the prefixscan tool, they try to eliminate third-party responses for cases where interfaces are responsive to alias resolution. Inferences to identify inter-AS links are done by iteratively going through a set of 8 heuristics which are designed to minimize inference errors caused by address sharing, third-party response, and networks blocking traceroute probes. Luckie et al. deploy their tool within 10 networks and receive ground truth results from 4 network operators; their method is able to identify 96-99% of inter-AS links for 41 these networks correctly. Furthermore, the authors compare their findings against BGP inferred relationships and find that they are able to observe between 92% - 97% of BGP links. Using a large US access network as an example, the authors study the resiliency of prefix reachability in terms of the number of exit routers and find that only 2% of prefixes exit through the same router while a great majority of prefixes had about 5-15 exit routers. Finally, the authors look at the marginal utility of using additional VPs for identifying all inter-AS links and find that results could vary depending on the target network and the geographic distribution of the VPs. Marder and Smith (2016) devise a tool named MAP-IT for identifying inter-AS links by utilizing data-plane measurements in the form of traceroutes. The algorithm developed in this method requires as input the set of traceroute measurements which were conducted in addition to prefix origin-AS from BGP data as well as a list of IXP prefixes and CAIDA’s AS to ORG mapping dataset. For each interface a neighbor set (Ns ) composed of addresses appearing on prior (Nb ) and next (Nf ) hops of traceroute is created. Each interface is split into two halves, the forward and backward halves. Direct inferences are made regarding the ownership of each interface half by counting the majority ASN based on the current IP-to-AS mapping dataset. At the end of each round, if a direct inference has been made for an interface half, the other side will be updated with an indirect inference. Furthermore, within each iteration of the algorithm using the current IP-to-AS mapping, MAP-IT visits interface halves with direct inferences to check whether the connected AS still holds the majority, if not the inference is reduced to indirect, after visiting all interface halves any indirect inference without an associated direct inference is removed. MAP-IT would update the IP to AS mapping dataset 42 based on the current inferences and would continue this process until no further inferences are made. For verification Marder et al. use Internet2’s network topology as well as a manually compiled dataset composed of DNS names for Level3 and TeleSonera interfaces. The authors investigate the effect of the hyper parameter f which controls the majority voting outcome for direct inferences and empirically find that a value of 0.5 yields the best result. Using f=0.5 MAP-IT has a recall of 82% - 100% and a precision of 85% - 100% for each network. The authors also look into the incremental utility of each iteration of MAP-IT, interestingly the majority ( 80%) of inferences can be made in the first round which is equivalent to making inferences based on a simple IP2AS mapping. The algorithm converges quickly after its 2nd and 3rd iterations. Alexander et al. (2018) combine the best practices of bdrmap Luckie et al. (2016) and MAP-IT Marder and Smith (2016) into bdrmapIT, a tool for identifying the border routers that improves MAP-IT ’s coverage without loosing bdrmap’s accuracy at identifying border routers of a single ASN. The two techniques are mainly made compatible with the introduction of “Origin AS Sets" which annotates each link between routers with the set of origin ASes from the prior hop. bdrmapIT relies on a two-step iterative process. During the first step, the owner of routers are inferred by counting the routers majority subsequent interfaces votes. Exceptions in terms of the casted vote for IXP interfaces, reallocated prefixes, and multi- homed routers are made to account for these cases correctly. During the second step, interfaces are annotated with an ASN using either the origin AS (if router annotation matches that of the interface) or the majority vote of prior connected routers (if router annotation differs from the interface). The iterative process is repeated until no further changes are made to the connectivity graph. The 43 methodology is evaluated using bdrmap’s ground truth dataset, as well as the ITDK dataset by removing the probes from a ground truth VP. The authors find that bdrmapIT improves the coverage of MAP-IT by up to 30% while maintaining the accuracy of bdrmap. 2.3.2.2 Geo Locating Routers & Remote Peering. Historically ASes would have established their peering relations with other ASes local to their PoPs and would have relied on their upstream providers for connectivity to the remainder of the Internet. IXPs enabled ASes to establish peerings that both improved their performance due to shorter paths and reduced their overall transit costs by offload upstream traffic on p2p links instead of c2p links. With the proliferation of IXPs and their aforementioned benefits, ASes began to expand their presence not only within local IXPs but also remote ones as well. ASes would rely on layer2 connectivity providers to expand their virtual PoPs within remote physical areas. Layer3 measurements are agnostic to these dynamics and are not able to distinguish local vs. remote peering relations from each other. Researchers have tried to solve this issue by pinpointing border routers of ASes to physical locations. The association of routers to geolocations is not trivial, researchers have relied on a collection of complementary information such as geocoded embeddings within reverse DNS names or by constraining the set of possible locations through colo listings offered by PeeringDB and similar datasets. In the following, we present a series of recent studies which tackle this unique issue. Castro et al. (2014) present a methodology for identifying remote peerings, where two networks interconnect with each other via a layer-2 connectivity provider. Furthermore, they derive analytical conditions for the economic viability of remote peering versus relying on transit providers. Levering PeeringDB, PCH, 44 and information available on IXP websites a list of IXP’s as well as their tenants, prefixes and interface to member mapping is obtained. For this study, IXPs which have at least one LG or RIPE NCC probe (amounting to a total of 22) are selected. By issuing temporally spaced probes towards all of the identified interfaces within IXP prefixes and filtering interfaces which either do not respond frequently or do not match an expected maximum TTL value of 255 or 64 a minimum RTT value for each interface is obtained. By examining the distribution of minimum RTT for each interface, a conservative threshold of 10ms is selected to consider an interface as remote. A total of 4.5k interfaces corresponding to 1.9k ASes in 22 IXPs are probed in the study. The authors find that 91% of IXPs have remote peering while 285 ASes have a remote interface. Findings including RTT measures as well as remote labels for IXP members were confirmed for TorIX by the staff. One month of Netflow data captured at the border routers of RedIRIS (Spain’s research and education network) is used to examine the amount of inbound and outbound traffic between RedIRIS and its transit providers, using which an upper bound for traffic which can be offloaded is estimated. Furthermore, the authors create a list of potential peers (2.2k) which are reachable through Euro-IX, these potential peers are also categorized into different groups based on their peering policy which is listed on PeeringDB. Considering all of the 2.2k networks RedIRIS can offload 27% (33%) of its inbound (outbound) traffic by remotely peering with these ASes. Through their analytical modeling, the authors find that remote peering is viable for networks with global traffic as well as networks which have higher ratios of traffic-independent cost for direct peering compared to remote peering such as networks within Africa. 45 Giotsas, Smaragdakis, Huffaker, Luckie, and claffy (2015) attempt to obtain a peering interconnection map at the granularity of colo facilities. Authors gather AS to facility mapping information from PeeringDB as well as manually parsing this information for a subset of networks from their websites. IXP lists and members were compiled by combining data from PeeringDB, PCH, and IXP websites. For data-plane measurements, the authors utilize traceroute data from RIPE Atlas, iPlane, CAIDA’s Ark, and a series of targeted traceroutes conducted from looking glasses. The authors annotate traceroute hops with their corresponding ASN and consider the segment which has a change in ASN as the inter-AS link. Using the colo-facility listing obtained in the prior step the authors produce a list of candid facilities for each inter-AS link which can result in three cases: (i) a single facility is found, (ii) multiple facilities match the criteria, or (iii) no candid facility is found. For the latter two cases, the author’s further constraint the search space by either benefiting from alias resolution results (two alias interfaces should reside in the same facility) or by conducting further targeted probes which are aimed at ASNs that have a common facility with the owner AS of the interface in question. The methodology is applied to five content providers (Google, Yahoo, Akamai, Limelight, and Cloudflare) and five transit networks (NTT, Cogent, DT, Level3, and Telia). The authors present the effect of each round of their constrained facility search (CFS) algorithm’s iteration (max iteration count of 100), the majority of pinned interfaces are identified up to the 40th iteration with RIPE probes providing a better opportunity for resolving new interfaces. The authors find that DNS-based pinning methods are able to identify only 32% of their findings. The authors also cross-validate their findings using direct feedback from network admins, BGP communities attribute, DNS records, 46 and IXP websites with 90% of the interfaces being pinned correctly and for the remainder, the pinning accuracy was correct at a metro granularity. Nomikos et al. (2018) present a methodology for identifying remote peers within IXPs, furthermore they apply their methodology to 30 large IXPs and characterize different aspects of the remote peering ecosystem. They define an IXP member as a remote peer if it is not physically connected to the IXPs fabric or reaches the IXP through a reseller. The development of the methodology and the heuristics used by the authors are motivated by a validation dataset which they obtain through directly contacting several IXP operators. A collection of 5 heuristics are used in order to infer whether an IXP member is peering locally or remotely these heuristics in order of importance are: (i) the port capacity of a customer, (ii) latency measurements from VPs within IXPs towards customer interfaces, (iii) colocation locations within an RTT radius, (iv) multi-IXP router inferences by parsing traceroutes from publicly available datasets and corroborating the location of these IXPs and whether the AS in question is local to any of them, and (v) identifying private peerings (by parsing public traceroute measurements) between the target AS and one or more local IXP members is used as a last resort to infer whether a network is local or remote to a given IXP. The methodology is applied to 30 large IXPs, and the authors find that a combination of RTT and colo listings to be the most effective heuristics in inferring remote peers. Overall 28% of interfaces are inferred to be peering remotely and for 90% of IXPs. The size of local and remote ASes in terms of customer cone is observed to be similar while hybrid ASes tend to have larger network sizes. The growth of remote peering is investigated over a 14 month period, and the authors find that the number of remote peers grew twice as fast as the number of local peers. 47 Motamedi et al. (2019) propose a methodology for inferring and geolocating interconnections at a colo level. The authors obtain a list of colo facility members from PeeringDB and colo provider webpages. A series of traceroutes towards the address space of prior steps ASes are conducted using available measurement platforms such as looking glasses and RIPE Atlas nodes in the geo proximity of the targeted colo. tracerotue paths are translated to a router-level connectivity graph using alias resolution and a set of heuristics based on topology constraints. The authors argue that a router-level topology coupled with the prevalence of observations allows them to account for traceroute anomalies and they are able to infer the correct ASes involved in each peering. To geolocate routers, an initial set of anchor interfaces with a known location is created by parsing reverse DNS entries for the observed router interfaces. This information is propagated/expanded through the router-level graph by a Belief Propagation algorithm that uses a set of co-presence rules based-on membership in the same alias set and latency difference between neighboring interfaces. Summary: while traceroutes have been historically utilized as a source of information to infer inter-AS links, the methodologies did not correctly account for the complexities of inferring BGP peerings from layer-3 probes. The common practice of simply mapping interface addresses along the path to their origin-AS based on BGP data does not account for the visibility of BGP collectors, address sharing for establishing inter-AS links, third-party responses of TTL expired messages by routers, and unresponsive routers or firewalled networks along the traceroute path. The presented methodologies within this section attempt to account for these difficulties by corroborating domain knowledge for common networking practices and relying on a collection of traceroute paths and their corresponding 48 router view (obtained by using alias resolution techniques) to make accurate inferences of the entities which are establishing inter-AS links. Furthermore, pin- pointing routers to physical locations was the key enabler for highlighting remote peerings that are simply not visible from an AS-level topology. 2.3.3 PoP-Level Topology. PoP-level topologies present a middle ground between AS-level and router-level topologies. A PoP-level graph presents the points of presence for one or many networks. These topologies inherently have geo information at the granularity of metro areas embedded within. They have been historically at the center of focus as many ASes disclose their topologies at a PoP level granularity and do not require detailed information regarding each individual router and merely represent a bundle of routers within each PoP as a single node. They have lost their traction to router-level topologies that are able to capture the dynamics of these topologies in addition to providing finer details of information. Regardless of this, due to the importance of some ASes and their centrality in the operation of today’s Internet, several studies Schlinker et al. (2017); Wohlfart, Chatzis, Dabanoglu, Carle, and Willinger (2018); Yap et al. (2017) outlining the internal operation of these ASes within each PoP have emerged. These studies offer insight into the challenges these ASes face for peering and serving the vast majority of the Internet as well as the solutions that they have devised. Cunha et al. (2016) develop Sibyl, a system which provides an expressive interface that allows the user to specify the requirements for the path of a traceroute, given the set of requirements Sibyl would utilize all available vantage points and rely on historical data to conduct a traceroute from a given vantage point towards a specific destination that is most likely to satisfy the users 49 constraints. Furthermore, given that each vantage point has limited probing resources and that concurrent requests can be made, Sibyl would pick source- destination pairs which optimize for resource utilization. Sibyl combines PlanetLab, RIPE Atlas, traceroute servers accessible through looking glasses, DIMES, and Dasu measurement platforms to maximize its coverage. Symbolic regular expressions are used for the query interface where the user can express path properties such as the set of traversed ASes, cities, and PoPs. The likelihood of each source-destination pair matching the required path properties is calculated using a supervised machine learning technique (RuleFit) which is trained based on prior measurements and is continuously updated based on new measurements. Resource utilization optimization is addressed by using a greedy algorithm, Sibyl chooses to issue traceroutes that fit the required budget and that have the largest marginal expected utility based on the output of the trained model. Schlinker et al. (2017) outline Facebook’s edge fabric within their PoPs by utilizing an SDN based system that alters BGP local-pref attributes to utilize alternative paths towards specific prefixes better. The work is motivated by BGP’s shortcomings namely, lack of awareness of link capacities and incapability to optimize path selections based on various performance metrics. More specifically BGP makes its forwarding decisions using a combination of AS-path length and the local-perf metric. Facebook establishes BGP connections with other ASes through various means namely, private interconnections, public peerings through IXPs, and peerings through router servers within IXPs. The authors report that the majority of their interconnections are established through public peerings while the bulk of traffic is transmitted over the private links. The later reflects Facebook’s preference to select private peerings over public peerings while peerings established through 50 route servers have the lowest priority. Furthermore, the authors observe that for all PoPs except one, all prefixes have at least two routes towards each destination prefix. The proposed solution isolates the traffic engineering per PoP to simplify the design, the centralized SDN controller within each PoP gathers router RIB tables through a BMP collector. Furthermore, traffic statistics are gathered through sampled sFlow or IPFIX records. Finally, interface information is periodically pulled by SNMP. The collector emulates BGP’s best path selection and projects interface utilization. For overloaded interfaces prefixes with alternative routes are selected, an alternative route is selected based on a set of preferences. The output of this step generates a set of route overrides which are enforced by setting a high local-pref value for them. The authors report that their deployed system detours traffic from 18% of interfaces. The median of detour time is 22 minutes and about 10% of detours last as long as 6 hours. The detoured routes resulted in 45% of the prefixes achieving a median latency improvement of 20ms while 2% of prefixes improved their latency by 100ms. Yap et al. (2017) discuss the details of Espresso, an application-aware routing system for Google’s peering edge routing infrastructure. Similar to the work of Schlinker et at. Schlinker et al. (2017) Espresso is motivated by the need for a more efficient (both technically and economically) edge peering fabric that can account for traffic engineering constraints. Unlike the work of Schlinker et al. Schlinker et al. (2017) Espresso maintains two layers of control plane one which is localized to each PoP while the other is a global centralized controller that allows Google to perform further traffic optimizations. Espresso relies on commodity MPLS switches for peering purposes, traffic between the switches and servers are encapsulated in IP-GRE and MPLS headers. IP-GRE header encodes the correct 51 switch, and the MPLS header determines the peering port. The global controller (GC) maintains an egress map that associates each client prefix and PoP tuple to an edge router/switch and egress port. User traffic characteristics such as throughput, RTT, and re-transmits are reported at a /24 granularity to the global controller. Link utilization, drops, and port speeds are also reported back to the global controller. A greedy algorithm is used by the GC to assign traffic to a candid router port combination. The greedy algorithm starts by making its decisions using traffic priority metrics and orders its available options based on BGP policies, user traffic metrics, and the cost of serving on a specific link. Espresso has been incrementally deployed within Google and at the time of the study was responsible for serving about 22% of traffic. Espresso is able to maintain higher link utilization while maintaining low packet drop rates even for fully utilized links (95% less than 2.5%). The authors report that the congestion reaction feature of the GC results in higher goodput and mean time between re-buffers for video traffic. Wohlfart et al. (2018) present an in-depth study of the connectivity fabric of Akamai at its edge towards its peers. The authors account 3.3k end-user facing (EUF) server deployments with varying size and capabilities which are categorized into four main groups. Two of these groups have Akamai border routers and therefore establish explicit peerings with peers and deliver content directly to them while the other two groups are hosted within another ASes network and are responsible for delivering content implicitly to other peers. Customers are redirected to the correct EUF server through DNS, the mapping is established by considering various inputs including BGP feeds collected by Akamai routers, user performance metrics, and link cost information. To analyze Akamai’s peering fabric, the authors rely on proprietary BGP snapshots obtained from Akamai 52 routers and consist of 3.65M AS paths and about 1.85M IPv4 and IPv6 prefixes within 61k ASes (ViewA). As a point of comparison, a combination of daily BGP feeds from Routeviews, RIPE RIS, and PCH consisting of 21.1M AS paths and 900k prefixes within 59k ASes is used (ViewP). While at an AS level both datasets seem to have a relatively similar view, ViewA (ViewP) observes 1M (0.1M) prefixes the majority of which are prefixes longer than /25. Only 15% of AS paths within ViewP are observed by ViewA which suggests that a large number of AS paths within ViewP are irrelevant for the operation of Akamai. Wohlfart et al. report 6.1k unique explicit peerings between Akamai and its neighbors by counting the unique number of next-hop ASN from the Akamai BGP router dumps. About 6k of these peerings happen through IXPs while the remainder are established through PNIs. In comparison, only 450 peerings between Akamai and other ASes are observed through ViewP. Using AS paths within ViewP the authors report about 28k implicit peers which are within one AS hop from Akamai’s network. Lastly, the performance of users sessions are looked into by utilizing EUF server logs containing the clients IP address, throughput, and a smoothed RTT value. The performance statistics are presented for two case studies (i) serving a single ISP and (ii) serving customers within 6 distinct metros. Overall 90% of traffic is coming from about 1% of paths and PNIs are responsible for delivering the bulk of traffic and PNIs and cache servers within eyeball ASes achieve the best performance regarding RTT. Nur and Tozal (2018) study the Internet AS-level topology using a multigraph representation where AS pairs can have multiple edges between each other. Traceroute measurements from CAIDA’s Ark and iPlane projects are collected for this study. For IP to AS mapping Routeviews’ BGP feed is utilized. 53 Next hop addresses for BGP announcements are extracted from Routeviews as well as RIPE RIS. For mapping IP addresses to their corresponding geo-location various data sources have been employed namely, (i) UNDNS for DNS parsing, (ii) DB-IP, (iii) Maxmind GeoLite2 City, and (iv) IP2Location DB5 Lite. Each ASes border interface is identified by tracking ASN changes along the hops of each traceroute. Each cross border interface X-BI is geolocated to the city in which it resides by applying one of the following methods in order of precedence: (i) relying on UNDNS for extracting geoinformation from reverse DNS names, (ii) majority vote along three (DB-IP, Maxmind, and IP2Location) IP to GEO location datasets, (iii) sandwich method where an unresolved IP between two IPs in the same geolocation is mapped to the same location, (iv) RTT based geo locating which relies on the geolocation of prior or next hops of an unresolved address that have a RTT difference smaller than 3 ms for mapping them to the same location, and (v) if all of the prior methods fail Maxmind’s output is used for mapping the geolocation of the X-BI. The set of inter-AS links resulting from parsing traceroutes is augmented by benefiting from BGP data. If an AS relationship exists between two ASes but is missing from the current AS-level graph and all identified X-BIs corresponding to these ASes are geolocated to a single city, a link will be added to the AS-level topology graph under the assumption that this is the only possible location for establishing an interconnection between these two ASes. The inferred PoP nodes in the AS graph are validated for major research networks as well as several commercial ISPs. The overlap of identified PoPs is measured for networks which have publicly available PoP-level maps. The maps align with the set of identified cities by X-AS with deviations in terms of number of PoPs per city. This is a limitation of X-AS as it is only able to identify one 54 PoP per city. Identified AS-links are compared against CAIDA’s AS relationships dataset, the percentage of discrepancy for AS links of each AS is measured. For 78% of ASes, the maps agree with each other completely, and the average link agreement is about 85% for all ASes. Various properties of the resulting graph are analyzed in the paper, the authors find that the number of X-BI nodes per AS, X-BI nodes degree, and AS degree all follow a power law distribution. Summary: PoP-level topologies can offer a middle ground between router- level and AS-level topologies offering an understanding of inter-AS peering relationships while also being able to distinguish instances of these peerings happening at various geo-locations/PoPs. Additionally, we reviewed studies that elaborate on the faced challenges as well as the devised solutions for content provider (Google, Facebook) and CDN (Akamai) networks which are central to the operation of today’s Internet. Figure 5. Fiber optic backbone map for CenturyLink’s network in continental US. Each node represents a PoP for CenturyLink while links between these PoPs are representative of the fiber optic conduits connecting these PoPs together. Image courtesy of CenturyLink. 55 2.3.4 Physical-Level Topology. This subsection is motivated by the works of Knight, Nguyen, Falkner, Bowden, and Roughan (2011) and Durairajan, Ghosh, Tang, Barford, and Eriksson (2013); Durairajan, Sommers, and Barford (2014); Durairajan, Sommers, Willinger, and Barford (2015) which presented the groundwork for having a comprehensive physical map of the Internet consisting of edges corresponding to fiber optic cables providing connectivity between metro areas and PoPs as nodes within these topologies. A sample of this topology for CenturyLink’s fiber-optic backbone network within the continental US is presented in Figure 5. Physical maps were mostly neglected by the Internet topology community mainly due to two reasons: (i) the scarcity of well-formatted information and (ii) the complete disassociation of physical layers from probes conducted within higher layers of the TCP/IP stack. The following set of papers try to address the former issue by gathering various sources of information and compiling them into a unified format. Knight et al. (2011) present the Internet topology Zoo which is a collection of physical maps of various networks within the Internet. The authors rely on ground truth data publicly provided by the network operators on their websites. These maps are presented in various formats such as static images or flash objects. The authors transcribe all maps using yEd (a graph editor and diagraming program) into a unified graph specification format (GML) and annotate nodes and links with any additional information such as link speed, link type, longitude, and latitudes that is provided by these maps. Each map and its corresponding network is classified as a backbone, testbed, customer, transit, access or internet exchange based on the properties of their network. For example, backbone networks should connect at least two cities together while access networks should provide 56 edge access to individuals. A total of 232 networks are transcribed by the authors. About 50% of networks are found to have more than 21 PoPs and each of these PoPs have an average degree of about 3. Lastly similar to Gregori et al. (2011) the core density of networks is examined by measuring the 2-core size of networks. A wide degree of 2-core sizes ranging from 0 (tree-like networks) to 1 (densely connected core with hanging edges) are found within the dataset. Durairajan et al. (2013) create a map of the physical Internet consisting of nodes representing colocation facilities and data-centers, links representing conduits between these nodes and additional metadata related to these entities. The authors rely on publicly available network maps (images, Flash objects, Google Maps overlays) provided by ASes. The methodology for transcribing images consists of 5 steps: (i) capturing high-resolution sub-images, (ii) patching sub-images into a composite image, (iii) extracting a link image using color masking techniques, (iv) importing link image into ArcGIS using geographic reference points, and (v) using link vectorization in ArcGIS to convert links into vectors. Given that each map has a different geo resolution, different scores are attributed to nodes with lat/lon or street level, city, and state having a corresponding score of 1.0, 0.75, 0.5. All maps have at least city level resolution with about 20% of nodes having lat/lon or street level accuracy. Durairajan et al. (2014) work is motivated by two research questions: (i) how do physical layer and network layer maps compare with each other? and (ii) how can probing techniques be improved to reveal a larger portion of physical infrastructure? For physical topologies, the authors rely on maps which are available from the Internet Atlas project. From this repository the maps for 7 Tier- 1 networks and 71 non-Tier-1 networks which are present in North America are 57 gathered, these ASes collectively consist of 2.6k PoPs and 3.6k links. For network layer topologies, traceroutes from the CAIDA Ark project during the September 2011 to Match 2013 period are used. Additionally DNS names for router interfaces are gathered from the IPv4 Routed /24 DNS Names Dataset which includes the domain names for IP addresses observed in the CAIDA Ark traceroutes. Traceroute hops are annotated with their corresponding geo information (extracted with DDeC) as well as the AS number which is collected from TeamCymru’s service. Effects of vantage point selection on node identification are studied by employing public traceroute servers. Different modalities depending on the AS ownership of the traceroute server and the target address are considered ([V Pi n, ti n], [V Pi n, to ut], [V Po ut, ti n]). Their methodology (POPsicle) chooses VPs based on geo proximity towards the selected targets and along the pool of destinations, those which have a square VP to destination distance greater than the sum of squares of the distance between target VP and destination are selected to create a measurement cone. For this study 50 networks that have a comprehensive set of geo-information for their physical map are considered. Out of these 50 networks, 21 of them do not have any geo information embedded in their DNS names. Furthermore, 16 ASes were not observed in the Ark traces. This results in 13 ASes out of the original 50 which have both traces and geo-information in the network layer map. POPsicle was deployed in an IXP (Equinix Chicago) to identify the PoPs of 10 tenants. Except for two networks, POPsicle was able to identify all known PoPs of these networks. Furthermore, POPsicle was evaluated by targeting 13 ISPs through Atlas probes which were deployed in IXPs, for all of these ISPs POPsicle was able to match or outperform Ark and Rocketfuel. Furthermore for 8 of these ISPs POPsicle found all or the majority of PoPs present in Atlas maps. 58 Durairajan et al. (2015) obtain the long-haul fiber network within the US and study its characteristics and limitations. For the construction of the long- haul fiber map, Durairajan et al. rely on the Internet Atlas project Durairajan et al. (2013) as a starting point and confirm the geo-location or sharing of conduits through legal documents which outline laying/utilization of infrastructure. The methodology consists of four steps: (i) using Internet Atlas maps for tier-1 ASes that have geo-coded information, a basic map is constructed, (ii) the geolocation of nodes and links for the map is confirmed through any form of legal document which can be obtained, (iii) the map is augmented with additional maps from large transit ASes which lack any geo-coded information, (iv) the augmented map is once again confirmed through any legal document that would either confirm the geolocation of a node/link or would indicate conduit sharing with links that have geo-coded information. The long-haul fiber map seems to be physically aligned with roadway and railway routes, the authors use the polygon overlap feature of ArcGIS to compare the overlap of these maps and find that most often long- hauls run along roadways. The authors also assess shared conduit risks, for this purpose they construct a conduit sharing matrix were rows are ASes and columns are conduits the value within each row indicates the number of ASes which are utilizing that conduit. Out of 542 identified conduits about 90% of them are shared by at least one other AS. Using the risk matrix the hamming distance for each AS pair is measured to identify ASes which have similar risk profiles. Using traceroute data from Edgescape and parsing geoinformation in domain names the authors infer which conduits were utilized by each traceroute and utilize the frequency of traceroutes as a proxy measure of traffic volume. Finally a series of risk mitigation analysis are conducted namely: (i) the possibility of increasing network robustness 59 by utilizing available conduits or by peering with other networks is investigated for each AS (ii) increasing network robustness through the addition of additional k links is measured for each network, and lastly (iii) possibility for improving latency is investigated by comparing avg latencies against right of way (ROW), line of sight (LOS), and best path delays. Summary: the papers within this sub-section provided an overview of groundbreaking works that reveal physical-level topologies of the Internet. The researchers gathered various publicly available maps of ASes as well as legal documents pertaining to the physical location of these networks to create a unified, well-formatted repository for all these maps. Furthermore, the applicability of these maps towards the improvement of targeted probing methodologies and the possibility of improving and provisioning the infrastructure of each network is investigated. Although the interplay of routing on top of these physical topologies is unknown and remains as an open problem, these physical topologies provide complementary insight into the operation of the Internet and allow researchers to provision or design physical infrastructure supporting lower latency Internet access or to measure the resiliency of networks towards natural disasters. 2.4 Implications & Applications of Network Topology This section will provide an overview of the studies which rely on Internet topology to provide additional insight regarding the performance, resiliency, and various characteristics of the Internet. The studies which are outlined in this section look into various properties of the Internet including but not limited to: path length both in terms of router and AS hops, latency, throughput, packet loss, redundancy, and content proximity. In a more broad sense, we can categories these studies into three main groups: (i) studying performance characteristics of the 60 Internet, (ii) studying resiliency of the Internet, and (iii) classifying the type of inter-AS relationships between ASes. Depending on the objective of the study one or more of the aforementioned properties of the Internet could be the subject which these studies focus on. Each of these studies would require different resolutions of Internet topology. As outlined in Section 2.3 obtaining a one to one mapping between different resolutions is not always possible. For example, each AS link can correspond to multiple router level links while each router level link can correspond to multiple physical links. For this reason, each study would rely on a topology map which better captures the problems objectives. As an example, studying the resiliency of a transit ASes backbone to natural disasters should rely on a physical map while performing the same analyses using an AS-level topology could lead to erroneous conclusions given the disassociation of ASes to physical locations. While on the other hand studying the reachability and visibility of an AS through the Internet would require an AS-level topology and conducting the same study using a fiber map would be inappropriate as the interplay of the global routing system on top of this physical map is not known. The remainder of this section would be organized into three sub-sections presenting the set of studies which focus on the (i) Internet performance, (ii) Internet resiliency, and (iii) AS relationship classification. Furthermore, each sub-section would further divide the studies based on the granularity of the topology which is employed. 2.4.1 Performance. Raw performance metrics such as latency and throughput can be conducted using end-to-end measurements without any attention to the underlying topology. While these measurements can be insightful on their own, gaining a further understanding of the root cause of subpar performance often requires knowledge of the underlying topology. For example, 61 high latency values reported through end-to-end measurements can be a side effect of many factors including but not limited to congestion, a non-optimal route, an overloaded server, and application level latencies. Many of these underlying causes can only be identified by a correct understanding of the underlying topology. Congestion can happen on various links along the forward and reverse path, identifying the faulty congested link or more specifically the inter-AS link requires a correct mapping for the traversed topology. Expanding infrastructure to address congestion or subpar latency detected through end-to-end measurements is possible through an understanding of the correct topology as well as the interplay of routing on top of this topology. In the following Section, we will present studies that have relied on router, AS, and physical level topologies to provide insight into various network performance related issues. 2.4.1.1 AS-Level Topology. Studies in this section rely on BGP feeds as well as traceroute probes that have been translated to AS paths to study performance characteristics such as increased latency and path lengths due to insufficient network infrastructure within Africa Fanou, Francois, and Aben (2015); Gupta et al. (2014), path stability and the latency penalties due to AS path changes Green, Lambert, Pelsser, and Rossi (2018), IXPs centrality in Internet connectivity as a means for reducing path distances towards popular content Chatzis, Smaragdakis, Böttger, Krenc, and Feldmann (2013), and estimating traffic load on inter-AS links through the popularity of traversed paths Sanchez et al. (2014). Chatzis et al. (2013) demonstrate the centrality of a large European IXP in the Internet’s traffic by relying on sampled sFlow traces captured by the IXP operator. Peering relationships are identified by observing BGP as well as regular 62 traffic being exchanged between tenant members. The authors limit their focus to web traffic as it constitutes the bulk of traffic which is observed over the IXP’s fabric. Endhost IP addresses are mapped to the country which they reside in by using Maxmind’s IP to GEO dataset. The authors observe traffic from nearly every country (242 out of 250). While tenant ASes generated the bulk of traffic, about 33% of traffic originated from ASes which were one or more hops away from the IXP. The authors find that recurrent IP addresses generate about 60% of server traffic. Finally, the authors highlight the heterogeneity of AS traffic by identifying servers from other ASes which are hosted within another AS. Heterogeneous servers are identified by applying a clustering algorithm on top of the SOA records of all observed IP addresses. Lastly, the share of heterogeneous traffic on inter-AS links is presented for Akamai and Cloudflare. It is found that about 11% (54%) of traffic (servers) are originated (located) within 3rd-party networks. Sanchez et al. (2014) attempt to characterize and measure inter-domain traffic by utilizing traceroutes as a proxy measure. Traceroute probes towards random IP addresses from the Ono BitTorrent extension are gathered over two separate months. Ground truth data regarding traffic volume is obtained from two sources: (i) sampled sFlows from a large European IXP and (ii) link utilization for the customers of a large ISP presenting the 95th percentile of utilization using SNMP. AS-link traversing paths (ALTP) are constructed by mapping each hop of traceroutes to their corresponding ASN. For each ALTP-set a relative measure of link frequency is defined which represents the cardinality of the link to the sum of cardinalities of all links in that set. This measure is used as a proxy for traffic volume. The authors measure different network syntax metrics namely: 63 connectivity, control value, global choice, and integration for the ALTP-sets which have common links with their ground truth traffic data. r2 is measured for regression analysis of the correlation between network syntax metrics and traffic volume. ALTP-frequency shows the strongest correlation with r2 values between 0.71 - 0.97 while the remainder of metrics also show strong and very- strong correlations. The authors utilize the regression model to predict traffic volume using ALTP-frequency as a proxy measure. Furthermore Sanchez et al. demonstrate that the same inferences cannot be made from a simple AS-level connectivity graph which is derived from BGP streams. Finally, the authors apply the same methodology to CAIDA’s Ark dataset and find similar results regarding the correlation of network syntax metrics and traffic volume. Gupta et al. (2014) study circuitous routes in Africa and their degrading effect on latency. Circuitous routes are between two endpoints within Africa that traverse a path outside of Africa, i.e. the traversed route should have ideally remained within Africa but due to sub-par connectivity has detoured to a country outside of Africa. Two major datasets are used for the study, (i) BGP routing tables from Routeviews, PCH, and Hurricane Electric, and (ii) periodic (every 30 minutes) traceroute measurements from BISmark home routers towards MLab servers, IXP participants, and Google cache servers deployed across Africa. Traceroute hops are annotated with their AS owner and inter-AS links are identified with the observation of ASN changes along the path. Circuitous routes are identified by relying on high latency values for the given path. Latency penalty is measured as the ratio of path latency to the best case latency between the source node and a node in the same destination city. The authors find two main reasons for paths with high latency penalty values namely, (i) ASes along the path are not 64 physically present at a local IXP, or (ii) the ASes are present at a geographically closeby IXP but do not peer with each other due to business preferences. Fanou et al. (2015) study Internet topology and its characteristics within Africa. By expanding RIPE’s Atlas infrastructure within African countries, the authors leverage this platform to conduct traceroute campaigns with the intention of uncovering as many as possible AS paths. To this end, periodic traceroutes were ran between all Atlas nodes within Africa. These probes would target both IPv4 and IPv6 addresses if available. Traceroute hops were mapped to their corresponding country by leveraging six public datasets, namely OpenIPMap, MaxMind, Team Cymru, AFRINIC DB, Whois, and reverse DNS lookup. Upon disagreement between datasets, RIPE probes within the returned countries were employed to measure latency towards the IP addresses in question, the country with the lowest latency was selected as the host country. Interface addresses are mapped to their corresponding ASN by utilizing Team Cymru’s IP to AS service TeamCymru (2008), using the augmented traceroute path the AS path between the source and destination is inferred. Using temporal data the preference of AS pairs to utilize the same path is studied, 73% (82%) of IPv4 (IPv6) paths utilize a path with a frequency higher than 90%. Path length for AS pairs within west and south Africa are studied, with southern countries having a slightly shorter average path of 4 compare to 5. AS path for pairs of addresses which reside within the same country in each region is also measured where it’s found that southern countries have a much shorter path compared to pairs of addresses which are in the same western Africa countries (average of 3 compared to 5). AS-centrality (percentage of paths which AS appears in and is not the source or destination) is measured to study transit roles of ASes. Impact of intercontinental transit on end-to-end 65 delay is measured by identifying the IP path which has the minimum RTT. It is found that intercontinental paths typically exhibit higher RTT values while a small fraction of these routes still have relatively low RTT values (< 100ms) and are attributed to inaccuracies in IP to geolocation mapping datasets. Green et al. (2018) leverage inter-AS path stability as a measure for conducting Internet tomography and anomaly detection. Path stability is analyzed by the stability of a primary path. The primary path of router r towards prefix p is defined as the most prevalent preferred path by r during the window time- frame of W. Relying on 3 months of BGP feeds from RIPE RIS’ LINX collector it is demonstrated that 85% (90%) of IPv4 (IPv6) primary paths are in use for at least half of the time. Any deviation from the primary path are defined as pseudo- events which are further categorized into two groups: (i) transient events where a router explores additional paths before reconverging to the primary path, and (ii) structural events where a router consistently switches to a new primary path. For each pseudo-event, the duration and set of new paths that were explored are recorded. About 13% of transient pseudo-events are found to be longer than an hour while 12% of structural pseudo-events last less than 7 days. The number of explored paths and the recurrence of each path is measured for pseudo-events. It is found that MRAI timers and route flap damping are efficient at regulating BGP dynamics. However, these transient events could be recurrent and require more complex mechanisms in order to be accounted for. For anomaly detection about 2.3k AS-level outages and hijack events reported by BGPmon during the same period of the study are used as ground truth. About 84% of outages are detected as pseudo-events in the same time window while about 14% of events the detection time was about one hour earlier than what BGPmon reported. For hijacks, the 66 announced prefix is looked-up amongst pseudo-events if no match is found less specific prefixes are used as a point of comparison with BGPmon. For about 82% of hijacking events, a matching pseudo-event was found, and the remainder of events are tagged as explicit disagreements. 2.4.1.2 Router-Level Topology. With the rise of peering disputes highlighted by claims of throttling for Netflix’s traffic access to unbiased measurements reflecting the underlying cause of subpar performance seems necessary more than before. Doing so would require a topology map which captures inter-AS links. The granularity of these links should be at the router level since two ASes could establish many interconnections with each other, each of which could exhibit different characteristics in terms of congestion. As outlined in Section 2.3 various methodologies have been presented that enable researchers to infer the placement of inter-AS links from data plane measurements in the form of traceroutes. A correct assessment of the placement of inter-AS links is necessary to avoid attributing intra-AS congestion to inter-AS congestion, furthermore incorrectly identifying the ASes which are part of the inter-AS link could lead to attributing congestion to incorrect entities. Dhamdhere et al. (2018) rely on prior techniques Luckie et al. (2016) to infer both ends of an interconnect link and by conducting time series latency probes (TSLP) try to detect windows of time where the latency time series deviates from its usual profile. Observing asymmetric congestion for both ends of a link is attributed to inter-AS congestion. The authors deploy 86 vantage points within 47 ASes on a global scale. By conducting similar TSLP measurements towards the set of identified inter-AS links over the span of 21 months starting at March 2016, the authors study congestion patterns between various networks and their 67 upstream transit providers as well the interconnections they establish with content providers. Additionally, the authors conduct throughput measurements using the Network Diagnostic Tool (NDT) M-Lab (2018) as well as SamKnows SamKnows (2018) throughput measurements of Youtube servers and investigate the correlation of inter-AS congestion and throughput. Chandrasekaran, Smaragdakis, Berger, Luckie, and Ng (2015) utilize a large content delivery networks infrastructure to assess the performance of the Internet’s core. The authors rely on about 600 servers spanning 70 countries and conduct pairwise path measurements in both forward and reverse directions between the servers. Furthermore, AS paths are measured by translating router hop interfaces to their corresponding AS owner, additionally inter-AS segments are inferred by relying on a series of heuristics developed by the authors based on domain knowledge and common networking practices. Latency characteristics of the observed paths are measured by conducting periodic ping probes between the server pairs. Consistency and prevalence of AS paths for each server pair are measured for a 16 month period. It is found that about 80% of paths are dominant for at least half of the measurement period. Furthermore, about 80% of paths experience 20 or fewer route changes during the 16 month measurement period. The authors measure RTT inflation in comparison to optimal AS paths and find that sub-optimal paths are often short-lived although a small number (10%) of paths experience RTT inflation for about 30% of the measurement period. Effects of congestion on RTT inflation are measured by initially selecting the set of server pairs which experience RTT inflation using ping probe measurements while the first segment that experiences congestion is pinpointed by relying on traceroute measurements which are temporally aligned with the ping measurements. The 68 authors report that most inter and intra-AS links experience about 20 to 30 ms of added RTT due to congestion. Chiu, Schlinker, Radhakrishnan, Katz-Bassett, and Govindan (2015) assess path lengths and other properties for paths between popular content providers and their clients. A collection of 4 datasets were used throughout the study namely: (i) iPlane traceroutes from PlanetLab nodes towards 154k BGP prefixes, (ii) aggregated query counts per /24 prefix (3.8M) towards a large CDN, (iii) traceroute measurements towards 3.8M + 154k prefixes from Google’s Compute Engine (GCE), Amazon Elastic Cloud, and IBM’s Softlayer VMs, and (iv) traceroutes from RIPE Atlas probes towards cloud VMs and a number of popular websites. Using traceroute measurements from various platforms and converting the obtained IP hop path to its corresponding AS-level path the authors assess the network distance between popular content providers and client prefixes. iPlane traceroutes are used as a baseline for comparison, only 2% of these paths are one hop away from their destination this value increases to 40% (60%) for paths between GCE and iPlane (end user prefixes). This indicates that Google peers directly with the majority of networks which host its clients. Using the CDN logs as a proxy measure for traffic volume the authors find that Google peers with the majority of ASes which carry large volumes of traffic. Furthermore Chiu et al. find that the path from clients towards google.com due to off-net hosted cache servers is much shorter where 73% of queries come from ASes that either peer with Google or have an off-net server in their network or their upstream provider. A similar analysis for Amazon’s EC2 and IBM’s Softlayer was performed each having 30% and 40% one hop paths accordingly. 69 Kotronis, Nomikos, Manassakis, Mavrommatis, and Dimitropoulos (2017) study the possibility of improving latency performance through the employment of relay nodes within colocation facilities. This work tries to (i) identify the best locations/colos to place relay nodes and (ii) quantify the latency improvements that are attainable for end pairs. The authors select a set of ASes per each country which covers at least 10% of the countries population by using APNIC’s IPv6 measurement campaign dataset APNIC (2018). RIPE Atlas nodes within these AS country pairs are selected which are running the latest firmware, are connected and pingable, and have had stable connectivity during the last 30 days. Colo relays are selected by relying on the set of pinned router interfaces from Giotsas, Smaragdakis, et al. (2015) work. Due to the age of the dataset, a series of validity tests including conformity with PeeringDB data, pingability, consistent ASN owner, and RTT-based geolocation test with Periscope LGs have been conducted over the dataset to filter out stale information. A set of PlanetLab relays and RIPE Atlas relays are also considered as reference points in addition to the set of colo relays. The measurement framework consists of 30 minute rounds between April 20th - May 17th 2017. Within each round, ping probes are sent between the selected end pairs to measure direct latency. Furthermore, the relay paths latency is estimated by measuring the latency between the <src, relay> and <dst, relay> pairs. The authors observed improve latency for 83% of cases with a median of 12-14ms between different relay types. Colo relays having the largest improvement. The number of required relays for improved latency is measured, the authors find that colo relays have the highest efficiency where 10 relays account for 58% of improved cases while the same number of improved cases for RIPE relays would require more than 100 relays. Lastly, the authors list the top 10 colo facilities which host the 70 20 most effective colo relays, 4 of these color are in the top 10 PeeringDB colos in terms of the number of colocated ASes and all host at least 2 or more IXPs within them. Fontugne, Pelsser, Aben, and Bush (2017) introduce a statistical model for measuring and pin-pointing delay and forwarding anomalies from traceroute measurements. Given the prevalence of route asymmetry on the Internet, measuring the delay of two adjacent hops is not trivial. This issue is tackled by the key insight that differential delay between two adjacent hops is composed of two independent components. Changes in link latency can be detected by having a diverse set of traceroute paths that traverse the under study link and observing latency values disrupting the normal distribution for latency median. Forwarding patterns for each hop are established by measuring a vector accounting for the number of times a next hop address has been observed. Pearson product-moment correlation coefficient is used as a measure to detect deviations or anomalies within the forwarding pattern of a hop. RIPE Atlas’ built-in and anchoring traceroute probes for an eight-month period in 2015 are used for the study. The authors highlight the applicability of their proposed methodology by providing insight into three historical events namely, DDoS attacks on DNS root servers, Telekom Malaysia’s BGP route leak, and Amsterdam IXP outage. 2.4.1.3 Physical-Level Topology. Measuring characteristics of physical infrastructure using data plan measurements is very challenging due to the disassociation of routing from the physical layer. Despite these challenges, we overview two studies within this section that investigate the effects of sub-optimal fiber infrastructure on latency between two end-points Singla, Chandrasekaran, 71 Godfrey, and Maggs (2014) and attempt to measure and pinpoint the causes of observing subpar latency within fiber optic cables Bozkurt et al. (2018). Singla et al. (2014) outline the underlying causes of sub-par latency within the Internet. The authors rely on about 400 Planet Lab nodes to periodically fetch the front page of popular websites, geolocate the webserver’s location and measure the optimal latency based on speed of light (c-latency) constraints. Interestingly the authors find that the median of latency inflation is about 34 times greater than c-latency. Furthermore, the authors breakdown the webpage fetch time into its constituent components namely, DNS resolution, TCP handshake, and TCP transfer. Router path latency is calculated by conducting traceroutes towards the servers and lastly, minimum latency towards the web server is measured by conducting periodic ping probe. It is found that the median of router paths experience about 2.3x latency inflation. The authors hypothesize that latencies within the physical layer are due to sub-optimal fiber paths between routers. The validity of this hypothesis is demonstrated by measuring the pairwise distance between all nodes of Internet2 and GEANT network topologies and also computing road distance using Google Maps API. It is found that fiber links are typically 1.5-2x longer than road distances. While this inflation is smaller in comparison to webpage fetching component’s latency the effects of fiber link inflation are evident within higher layers due to the stacked nature of networking layers. Bozkurt et al. (2018) present a detailed analysis of the causes for sub- par latency within fiber networks. The authors rely on Durairajan et al. (2014) InterTubes dataset to estimate fiber lengths based on their conduits in the dataset. Using the infrastructure of a CDN, server clusters which are within a 25km radius of conduit endpoints were selected, and latency probes between pairs of servers at 72 both ends of the conduit were conducted every 3 hours for the length of 2 days. The conduit length is estimated using the speed of light within fiber optic cables (f-latency), and the authors find that only 11% of the links have RTTs within 25% of the f-latency for their corresponding conduit. Bozkurt et al. enumerate various factors which can contribute to the inflated latency that they observed within their measurements namely, (i) refraction index for different fiber optic cables varies, (ii) slack loops within conduits to account for fiber cuts, (iii) latency within optoelectrical and optical amplifier equipment, (iv) extra fiber spools to compensate for chromatic dispersion, (v) publication of mock routes by network operators to hide competitive details, and (vi) added fiber to increase latency for price differentiation. Using published latency measurements from AT&T and CenturyLink RTT inflation in comparison to f-latency from InterTubes dataset is measured to have a median of 1.5x (2x) for AT&T and CenturyLink’s networks. The accuracy of InterTubes dataset is verified for Zayo’s network. Zayo published detailed fiber routes on their website. The authors find great conformity for the majority of fiber conduit lengths while for 12% of links the length difference is more than 100km. 2.4.2 Resiliency. Studying the resiliency of Internet infrastructure has been the subject of many types of research over the past decade. While many of these studies have reported postmortems regarding natural disasters and their effects on Internet connectivity, others have focused on simulating what-if scenarios to examine the resiliency of the Internet towards various types of disruptions. Within these studies, researchers have utilized Internet topologies which were contemporary to their time. The resolution of these topologies would vary in accordance with the stated problem. For example, the resiliency of long haul fiber 73 infrastructure to rising sea levels due to global warming is measured by relying on physical topology maps Durairajan, Barford, and Barford (2018) while the effects of router outages on BGP paths and AS reachability is studied using a combination of router and AS level topologies within Luckie and Beverly (2017). The remainder of this section is organized according to the resolution of the underlying topology which is used by these studies. 2.4.2.1 AS-Level Topology. Katz-Bassett et al. (2012) propose LIFEGUARD as a system for recovering from outages by rerouting traffic. Outages are categorized into two groups of forward and reverse path outages. Outages are detected and pinpointed by conducting periodic ping and traceroute measurements towards the routers along the path. A historical list of responsive routers for each destination is maintained. Prolonged unresponsive ping probes are attributed to outages. For forward path outages, the authors suggest the use of alternative upstream providers which traverse AS-paths that do not overlap with the unresponsive router. For reverse path outages, the authors propose a BGP poisoning solution where the origin AS would announce a path towards its own prefix which includes the faulty AS within the advertised path. This, in turn, causes the faulty AS to withdraw the advertisement (to avoid a loop) of the prefix and therefore cause alternative routes to be explored in the reverse path. A less- specific sentinel prefix is advertised by LIFEGUARD to detect the recovery of the previous path. Luckie and Beverly (2017) correlate BGP outage events to inferred router outages by relying on time-series of IPID values obtained through active measurements. This work is motivated by the fact that certain routers rely on central incremental counters for the generated IPID values, given this assumption 74 one would expect to observe increasing IPID values for a single router. Any disruption in this pattern can be linked to a router reboot. IPID values for IPv4 packets are susceptible to counter rollover since they are only 16 bits wide. The authors rely on IPID values obtained by inducing fragmentation within IPv6 packets. The authors rely on a hit list of IPv6 router addresses which is obtained from intermediate hops of CAIDA’s Ark traceroute measurements. By analyzing the time series of IPID values, an outage window is defined for each router. Router outages are correlated with their corresponding BGP control plane events by looking at BGP feeds and finding withdrawal and announcement messages occurring during the same time frame. It is found that for about 50% of router/prefix pairs at least 1-2 peers withdrew the prefix and nearly all peers withdrew their prefix announcement for about 10% of the router/prefix pairs. Luckie et al. find that about half of the ASes which had outages were completely unrouted during the outage period and had single points of failure. Unlike Luckie et al. approach which relied on empirical data to assess the resiliency of Internet, Lad, Oliveira, Zhang, and Zhang (2007) investigate both the impact and resiliency of various ASes to prefix hijacking attacks by simulating different attacks using AS-level topologies obtained through BGP streams. Impact of prefix hijacking is measured as the fraction of ASes which believe the false advertisement by a malicious AS. Similarly, the resiliency of an AS against prefix hijacks is measured as the number of ASes which believe the true prefix origin announcement. Surprisingly it is found that 50% of stub and transit ASes are more resilient than Tier-1 ASes this is mainly attributed to valley-free route preferences. Fontugne, Shah, and Aben (2018) look into structural properties, more specifically AS centrality, of AS-level IPv4 and IPv6 topology graphs. AS-level 75 topologies are constructed using BGP feeds of Routeviews, RIPE RIS, and BGPmon monitors. The authors illustrate the sampling bias of betweenness centrality (BC) measure by sub-sampling the set of available monitors and measuring the variation of BC for each sample. AS hegemony is used as an alternative metric for measuring the centrality of ASes which accounts for monitor biases by eliminating monitors too close or far from the AS in question and averaging the BC score across all valid monitors. Additionally, BC is normalized to account for the size of advertised prefixes. The AS hegemony score is measured for the AS-level graphs starting from 2004 till 2017. The authors find a great decrease in the hegemony score throughout the years supporting Internet flattening reports. Despite these observations, the hegemony score for ASes with the largest scores have remained consistent throughout the years pointing to the importance of large transit ASes in the operation of the Internet. AS hegemony for Akamai and Google is measured, the authors report little to no dependence for these content providers to any specific upstream provider. 2.4.2.2 Router-Level Topology. Palmer, Siganos, Faloutsos, Faloutsos, and Gibbons (2001) rely on topology graphs gathered by SCAN and Lucent projects consisting of 285k (430k) nodes (links) to simulate the effects of link and node failures within the Internet connectivity graph. The number of reachable pairs is used as a proxy measure to assess the impact of link or node failures. It is found that the number of reachable nodes does not vary significantly up to the removal of 50k links failures while this value drops to about 10k for node removals. Kang and Gligor (2014); Kang, Lee, and Gligor (2013) propose the Crossfire denial of service attack that targets links which are critical for Internet connectivity 76 of ASes, cities, regions, or countries. The authors rely on a series of traceroute measurements towards addresses within the target entity and construct topological maps from various VPs towards these targets. The attacker would choose links that are “close" to the target (3-4 router hops) and appear with a high frequency within all paths. The attacker could cut these entities from the Internet by utilizing a bot-net to launch coordinated low rate requests towards various destinations in the target entity. Furthermore, the attacker can avoid detection by the target by targeting addresses which are in close proximity of the target entity, e.g. sending probes towards addresses within the same city where an AS resides within. The pervasiveness and applicability of the Crossfire attack is investigated by relying on 250 PlanetLab Chun et al. (2003) nodes to conduct traceroutes towards 1k web servers located within 15 target countries and cities. Links are ranked according to their occurrence within traceroutes and for all target cities and countries, the authors observe a very skewed power-law distribution. This observation is attributed to cost minimization within Internet routing (shortest path for intra- domain and hot-potato for inter-domain routing). Bottleneck links are measured to be on average about 7.9 (1.84) router (AS) hops away from the target. Giotsas et al. (2017) develop Kepler a system that is able to detect peering outages. Kepler relies on BGP communities values that have geocoded embeddings. Although BGP community values are not standardized, they have been utilized by ASes for traffic engineering, traffic blackholing, and network troubleshooting. Certain ASes use the lower 16bits of the BGP communities attribute as a unique identifier for each of their border routers. These encodings are typically documented on RIR webpages. The authors compile a dictionary of BGP community values and their corresponding physical location (colo or IXP) by 77 parsing RIR entries. Furthermore, a baseline of stable BGP paths is established by monitoring BGP feeds and removing transient announcements. Lastly, the tenants of colo facilities and available IXPs and their members is compiled from PeeringDB, DataCenterMap, and individual ASes websites. Deviations in stable BGP paths such as explicit withdrawal or change in BGP community values are considered as outage signals. 2.4.2.3 Physical-Level Topology. Schulman and Spring (2011) investigate outages within the last mile of Internet connectivity which are caused by severe weather conditions. The authors design a tool called ThunderPing which relies on weather alerts from the US National Weather Service to conduct connectivity probes prior, during, and after a severe weather condition towards the residential users of the affected regions. A list of residential IP addresses is compiled by parsing the reverse DNS entry for 3 IP addresses within each /24 prefix. If any of the addresses have a known residential ISP such as Comcast or Verizon within their name the remainder of addresses within that block are analyzed as well. IP addresses are mapped to their corresponding geolocation by relying on Maxmind’s IP to GEO dataset. Upon the emergence of a weather alert ThunderPing would ping residential IP addresses within the affected region for 6 hours before, during, and after the forecasted event using 10 geographically distributed PlanetLab nodes. A sliding window containing 3 pings is used to determine the state of a host. A host responding with more than half of the pings is considered to be UP, not responding to any pings is considered to be DOWN, and host responding to less than half of the pings is in a HOSED state. The authors find that failure rates are more than double during thunderstorms compared to other weather conditions. Furthermore, the median for the duration 78 of DOWN times is almost an order of magnitude larger (104 seconds) during thunderstorms compared to clear weather conditions. Eriksson, Durairajan, and Barford (2013) present a framework (RiskRoute) for measuring the risks associated with various Internet routes. RiskRoute has two main objectives namely, (i) computing backup routes and (ii) to measure new paths for network provisioning. The authors introduce the bit-risk miles measure which quantifies the geographic distance that is traveled by traffic in addition to the outage risk along the path both in historical and immediate terms. Furthermore bit-risk miles is scaled to account for the impact of an outage by considering the population that is in the proximity of an outage. The likelihood of historical outage for a specific location is estimated using a Gaussian kernel which relies on observed disaster events at all locations. For two PoPs, RiskRoute aims to calculate the path which minimizes the bit-risk mile measure. For intra-domain routes, this is simply calculated as the path which minimizes the bit-risk mile measure among all possible paths which connect the two PoPs. For inter-domain routing the authors estimate BGP decisions using geographic proximity and rely on shortest path routes. Using the RiskRoute framework, improvements in the robustness of networks is analyzed by finding an edge which would result in the largest increase in bit-risk measure among all possible paths. It is found that Sprint and Teliasonera networks observe the greatest improvement in robustness while Level3’s robustness remains fairly consistent mostly due to rich connectivity within its network. Durairajan et al. (2018) assess the impact of rising sea levels on the Internet infrastructure within the US. The authors align the data from the sea level rise inundation dataset from the National Oceanic and Atmospheric Administration (NOAA) with long-haul fiber maps from the Internet Atlas project Durairajan et 79 al. (2013) using the overlap feature of ArcGIS. The amount of affected fibers as well as the number of PoPs, colos, and IXPs that will be at risk due to the rising sea levels is measured. The authors find that New York, Seattle, and Miami are among the cities with the highest amount of vulnerable infrastructure. 2.4.3 AS Relationship Inference. ASes form inter-AS connections motivated by different business relationships. These relationships can be in the form of a transit AS providing connectivity to a smaller network as a customer (c2p) by charging them based on the provided bandwidth or as a settlement-free connection between both peers (p2p) where both peers exchange equal amounts of traffic through their inter-AS link. These inter-AS connections are identical from topologies obtained from control or data plan measurements. The studies within this section overview a series of methodologies developed based on these business relationships in conjunction with the valley free routing principle to distinguish these peering relationships from each other. 2.4.3.1 AS-Level. Luckie, Huffaker, Dhamdhere, and Giotsas (2013) develop an algorithm for inferring the business relationships between ASes by solely relying on BGP data. Relationships are categorized as a customer to provider (c2p) relationship were a customer AS pays a provider AS for its connectivity to the Internet or a peer to peer (p2p) relationship were two ASes provide connectivity to each other and often transmit equal amounts of traffic through their inter- AS link(s). Inference of these relationships are based on BGP data using three assumptions: (i) there is a clique of large transit providers at the top of the Internet hierarchy, (ii) customers enter a transit agreement to be globally reachable, and (iii) we shouldn’t have a cycle in customer to provider (c2p) relationships. The authors validate a subset (43k) of their inferences, which is the largest by the 80 time of publication, and finally they provide a new solution for inferring customer cones of ASes. For their analyses, the authors rely on various data sources namely BGP paths from Routeviews and RIPE’s RIS, any path containing origin ASes which do not contain valid ASNs (based on RIRs) is excluded from the dataset. For validation Luckie et al. use three data sources: validation data reported by network operators to their website, routing policies reported to RIRs in export and import fields, and finally they use the communities attribute of BGP announcements based on the work of Giotsas and Zhou (2012). The authors define two metrics node degree and transit degree which can be measured from the AS relationship graph. Giotsas, Luckie, et al. (2015) modify CAIDA’s IPv4 relationship inference algorithm Luckie, Huffaker, Dhamdhere, and Giotsas (2013) and adapt it to IPv6 networks with the intention of addressing the lack of a fully-connected transit- free clique within IPv6 networks. BGP dumps from Routeviews and RIPE RIS which announce reachability towards IPv6 prefixes are used throughout this study. For validation of inferred relationships three sources are used: BGP communities, RPSL which is a route policy specification language that is available in WHOIS datasets and is mandated for IXPs within EU by RIPE, and local preference (LocPref) which is used to indicate route preference by an AS where ASes assign higher values to customers and lower values to providers to minimize transit cost. Data is sanitized by removing paths with artifacts such as loops or invalid ASNs. The remainder of the algorithm is identical to Luckie, Huffaker, Dhamdhere, and Giotsas (2013) with modifications to two steps: i) inferring the IPv6 clique and ii) removing c2p inferences made between stub and clique ASes. In addition to considering the transit degree and reachability, peering policy of ASes is also taken into account for identifying cliques. Peering policy is extracted from PeeringDB, 81 a restrictive policy is assumed for ASes who do not report this value. ASes with selective or restrictive policies are selected as seeds to the clique algorithm. For an AS to be part of the clique, it should provide BGP feeds to Routeviews or RIPE RIS and announce routes to at least 90% of IPv6 prefixes available in BGP. The accuracy of inferences is validated using the three validation sources which where described, a consistent accuracy of at least 96% was observed for p2c and p2p relationships for the duration of the study. The fraction of congruent relationships where the relationship type is identical for IPv4 and IPv6 networks is measured. The authors find that this fraction increases from 85% in 2006 to 95% in 2014. 2.4.3.2 PoP-Level. Giotsas et al. (2014) provide a methodology for extending traditional AS relationship models to include two complex relationships namely: hybrid and partial transit relationships. Hybrid relations indicate different peering relations at different locations. Partial transit relations restrict the scope of a customers relation by not exporting all provider paths to the customer. AS path, prefixes, and communities strings are gathered from Routeviews and RIPE RIS datasets. CAIDA’s Ark traceroutes in addition to a series of targeted traceroutes launched from various looking glasses are employed to confirm the existence of various AS relationships. Finally, geoinformation for AS-links are gathered from BGP community information, PeeringDB’s reverse DNS scan of IXP prefixes, DNS parsing of hostnames by CAIDA’s DRoP service, and NetAcuity’s IP geolocation dataset is used as a fallback when other methods do not return a result. Each AS relationship is labeled into one of the following export policies: i) full transit (FT) where the provider exports prefixes from its provider, ii) partial transit (PT) where prefixes of peers and customers are only exported, and iii) peering (P) where prefixes of customers are only exported. Each identified relationship defaults to 82 peering unless counter facts are found through traceroute measurements which indicate PT or FT relationships. Out of 90k p2c relationships 4k of them are classified as complex with 1k and 3k being hybrid and partial-transit accordingly. For validation (i) direct feedback from network operators, (ii) parsed BGP community values, and (iii) RPSL objects are used. Overall 19% (7%) of hybrid (partial-transit) relationships were confirmed. 83 CHAPTER III LOCALITY OF TRAFFIC This chapter provides a study on the share of cloud providers and CDNs in Internet traffic from the perspective of an edge network (UOnet). Furthermore, this work quantifies the degree to which the serving infrastructure for cloud providers and CDNs is close/local to UOnet’s network and investigates the implications of this proximity on end-users performance. The content in this chapter is derived entirely from Yeganeh, Rejaie, and Willinger (2017) as a result of collaboration with co-authors listed in the manuscript. Bahador Yeganeh is the primary author of this work and responsible for conducting all the presented analyses. 3.1 Introduction During the past two decades, various efforts among different Internet players such as large Internet service providers (ISP), commercial content distribution networks (CDN) and major content providers have focused on supporting the localization of Internet traffic. Improving traffic localization has been argued to ensure better user experience (in terms of shorter delays and higher throughput) and also results in less traffic traversing an ISP’s backbone or the interconnections (i.e., peering links) between the involved parties (e.g., eyeball ASes, transit providers, CDNs, content providers). As a result, it typically lowers a network operator’s cost and also improves the scalability of the deployed infrastructure in both the operator’s own network and the Internet at large. The main idea behind traffic localization is to satisfy a user request for a certain piece of content by re-directing the request to a cache or front-end server that is in close proximity to that user and can serve the desired piece 84 of content. However, different commercial content distribution companies use different strategies and deploy different types of infrastructures to implement their business model for getting content closer to the end users. For example, while Akamai Akamai (2017) operates and maintains a global infrastructure consisting of more then 200K servers located in more than 1.5K different ASes to bring the requested content by its customers closer to the edge of the network where this content is consumed, other CDNs such as Limelight or EdgeCast rely on existing infrastructure in the form of large IXPs to achieve this task Limelight (2017). Similar to Akamai but smaller in scale, major content providers such as Google and Netflix negotiate with third-party networks to deploy their own caches or servers that are then used to serve exclusively the content provider’s own content. In fact, traffic localization efforts in today’s Internet continue as the large cloud providers (e.g., Amazon, Microsoft) are in the process of boosting their presence at the edge of the network by deploying increasingly in the newly emerging 2nd-tier datacenters (e.g., EdgeConneX EdgeConneX (2018)) that target the smaller- or medium-sized cities in the US instead of the major metropolitan areas. These continued efforts by an increasing number of interested parties to implement ever more effective techniques and deploy increasingly more complex infrastructures to support traffic localization has motivated numerous studies on designing new methods and evaluating existing infrastructures to localize Internet traffic. While some of these studies Adhikari et al. (2012); Böttger, Cuadrado, Tyson, Castro, and Uhlig (2016); Calder et al. (2013); Fan, Katz-Bassett, and Heidemann (2015) have focused on measurement-based assessments of different deployed CDNs to reveal their global Böttger et al. (2016); Calder et al. (2013) or local Gehlen, Finamore, Mellia, and Munafò (2012); Torres et al. (2011) 85 infrastructure nodes, others have addressed the problems of reverse-engineering a CDN’s strategy for mapping users to their close-by servers or examining whether or not the implemented re-direction techniques achieve the desired performance improvements for the targeted end users Adhikari et al. (2012); Fan et al. (2015); Gehlen et al. (2012). However, to our knowledge, none of the existing studies provides a detailed empirical assessment of the nature and impact of traffic localization as seen from the perspective of an actual stub-AS. In particular, the existing literature on the topic of traffic localization provides little or no information about the makeup of the content that the users of an actual stub-AS request on a daily basis, the proximity of servers that serve the content requested by these users (overall or per major content provider), and the actual performance benefits that traffic localization entails for the consumers of this content (i.e., end users inside the stub-AS). In this chapter, we fill this gap in the existing literature and report on a measurement study that provides a detailed assessment of different aspects of the content that arrives at an actual stub-AS as a result of the requests made by its end users. To this end, we consider multiple daily snapshots of unsampled Netflow data for all exchanged traffic between a stub-AS that represents a Research & Education network (i.e., UOnet operated by the University of Oregon) and the Internet 3.2. We show that some 20 content providers are responsible for most of the delivered traffic to UOnet and that for each of these 20 content providers, the content provider specific traffic is typically coming from only a small fraction of source IPs (Section 3.3). Using RTT to measure the distance of these individual source IPs from UOnet, we present a characterization of this stub-AS’ traffic footprint; that is, empirical findings about the locality properties of delivered 86 traffic to UOnet, both in aggregate and at the level of individual content providers (Section 3.4). In particular, we examine how effective the individual content providers are in utilizing their infrastructure nodes to localize their delivered traffic to UOnet and discuss the role that guest servers (i.e., front-end servers or caches that some of these content providers deploy in third-party networks) play in localizing traffic for this stub-AS (Section 3.5). As part of this effort, we focus on Akamai and develop a technique that uses our data to identify all of Akamai’s guest servers that delivered content to UOnet. We then examine different features of the content that arrived at UOnet from those guest servers as compared to the content that reached UOnet via servers located in Akamai’s own AS. Finally, we investigate whether or not a content provider’s ability to localize its traffic has implications on end user-perceived performance, especially in terms of observed throughput (Section 3.6). 3.2 Data Collection for a Stub-AS: UOnet The stub-AS that we consider for this study is the campus network of the University of Oregon (UO), called UOnet (ASN3582). UOnet serves more than 24K (international and domestic) students and 4.5K faculty/staff during the academic year. These users can access the Internet through UOnet using wireless (through 2000+ access points) or wired connections. Furthermore, more than 4,400 of the students reside on campus and can access the Internet through UOnet using their residential connections. UOnet has three upstream providers, Neronet (AS3701), Oregon Gigapop (AS4600) and the Oregon IX exchange. Given the types of offered connectivity and the large size and diversity of the UOnet user population, we consider the daily traffic that is delivered from the rest of the Internet to UOnet 87 to be representative of the traffic that a stub-AS that is classified as a US Research & Education network is likely to experience. To conduct our analysis, we rely on un-sampled Netflow (v5) data that is captured at the different campus border routers. As a result, our Netflow data contains all of the flows between UOnet users and the Internet. The Netflow dataset contains a separate record for each incoming (and outgoing) flow from (to) an IP address outside of UOnet, and each record includes the following flow attributes: (i) source and destination IP addresses, (ii) source and destination port numbers, (iii) start and end timestamps, (iv) IP protocol, (v) number of packets, and (vi) number of bytes. We leverage Routeviews data to map all the external IPs to their corresponding Autonomous Systems (ASes) and use this information to map individual flows to particular providers (based on their AS number) and then determine the number of incoming (and outgoing) flows (and corresponding bytes) associated with each provider. In our analysis, we only consider the incoming flows since we are primarily interested in delivered content and services from major content providers to UOnet users. An incoming flow refers to a flow with the source IP outside and destination IP inside UOnet. We select 10 daily (24 hour) snapshots of Netflow data that consist of Tuesday and Wednesday from five consecutive weeks when the university was in session, starting with the week of Oct 3rd and ending with the week of of Oct 31st in 2016. Table 2 summarizes the main features of the selected snapshots, namely their date, the number of incoming flows and associated bytes, and the number of unique external ASes and unique external IPs that exchanged traffic with UOnet during the given snapshot. In each daily snapshot, wireless connections are responsible for roughly 62% (25%) of delivered 88 Table 2. Main features of the selected daily snapshots of our UOnet Netflow data. Snapshot Flows (M) TBytes ASes (K) IPs (M) 10/04/16 196 8.7 39 3.3 10/05/16 193 8.5 37 3.0 10/11/16 199 9.0 41 4.1 10/12/16 198 9.1 41 4.7 10/18/16 202 8.8 40 3.7 10/19/16 200 9.1 38 3.3 10/25/16 205 8.7 37 2.9 10/26/16 209 9.1 40 4.1 11/01/16 212 8.6 39 3.5 11/02/16 210 8.7 40 4.3 bytes (flows) and residential users contributed to about 17% (10%) of incoming bytes (flows). 3.3 Identifying Major Content Providers Our main objective is to leverage the UOnet dataset to provide an empirical assessment of traffic locality for delivered flows to UOnet and examine its implications for the end users served by UOnet. Here by “locality" we refer to a notion of network distance between the servers in the larger Internet that provide the content/service requested from within UOnet. Since the level of locality of delivered traffic by each content provider depends on both the relative network distance of its infrastructure and its strategy for utilizing this infrastructure, we conduct our analysis at the granularity of individual content providers and focus only on those that are responsible for the bulk of delivered content to UOnet. Moreover, because the number of unique source IPs that send traffic to UOnet on a daily basis is prohibitively large, we identify and focus only on those IPs that are responsible for a significant fraction of the delivered traffic. Inferring Top Content Providers: Figure 6 (left y-axis) shows the histogram of delivered traffic (in TB) to UOnet by those content providers that have the largest 89 2.5 1.0 2.0 0.8 Delivered Traffic (TB) 1.5 0.6 1.0 0.4 0.5 0.2 0.0 0.0 Twitter Twitch Fastly Pandora Netflix Akamai Dropbox Edgecast Valve Comcast Quantil Neronet Level3 Internap Limelight NTT Amazon CenturyLink CloudFlare Cogent OVH Figure 6. The volume of delivered traffic from individual top content providers to UOnet along with the CDF of aggregate fraction of traffic by top 21 content providers in the 10/04/16 snapshot. contributions in the 10/04/16 snapshot. It also shows (right y-axis) the CDF of the fraction of aggregate traffic that is delivered by the top-k content providers in this snapshot. The figure is in full agreement with earlier studies such as Ager, Mühlbauer, Smaragdakis, and Uhlig (2011); Chatzis et al. (2013) and clearly illustrates the extreme skewness of this distribution – the top 21 content providers (out of some 39K ASes) are responsible for 90% of all the delivered daily traffic to UOnet. To examine the stability of these top content providers across our 10 daily snapshots, along the x-axis of Figure 7, we list any content provider that is among the top content providers (with 90% aggregate contributions in delivered traffic) in at least one daily snapshot (the ordering is in terms of mean rank, from small to 90 large for content providers with same prevalence). This figure shows the number of daily snapshots in which a content provider has been among the top content providers (i.e. content provider’s prevalence, left y-axis) along with the summary distribution (i.e., box plot) of each of the content providers rankings among the top content providers across different snapshots (rank distribution, right y-axis). We observe that the same 21 content providers consistently appear among the top content providers. These 21 content providers are among the well-recognized players of today’s Internet and include major content providers (e.g. Netflix, Twitter), widely-used CDNs (e.g. Akamai, LimeLight and EdgeCast), and large providers that offer hosting, Internet access, and cloud services (e.g. Comcast, Level3, CenturyLink, Amazon). In the following, we only focus on these 21 content providers (called target content providers) that are consistently among the top content providers in all of our snapshots. These target content providers are also listed in Figure 6 and collectively contribute about 90% of the incoming daily bytes in each of our snapshots. Inferring Top IPs per Target Content Providers: To assess the locality of the traffic delivered to UOnet from each target content provider, we consider the source IP addresses for all of the incoming flows in each daily snapshot. While for some target content providers, the number of unique source IP addresses is as high as a few tens of thousands, the distribution of delivered traffic across these IPs exhibits again a high degree of skewness; i.e. for each target content provider, only a small fraction of source IPs (called top IPs) is responsible for 90% of delivered traffic. Figure 8 shows the summary distribution (in the form of box plots) of the number of top IPs across different snapshots along with the cumulative number of unique top IPs (blue line) and all IPs (red line) across all 91 30 10 25 8 20 Prevelance Rank Dist 6 15 4 10 2 5 0 0 UW Fastly NTT ATT OVH Valve Twitch TiNet M247 Quantil TATA Yahoo Netflix Level3 Twitter Akamai Internap Cogent Spotify RedHat Amazon Pandora Neronet Comcast Limelight Dropbox Edgecast KoreaTel ChinaNet CloudFlare CenturyLink Sec Comm CDNetworks Charter Com ChinaBackbone Figure 7. The prevalence and distribution of rank for any content provider that has appeared among the top content providers in at least one daily snapshot. of our 10 snapshots. The log-scale on the y-axis shows that the number of top IPs is often significantly smaller than the number of all IP addresses (as a result of the skewed distribution of delivered content by different IPs per target content provider). A small gap between the total number of top IPs and their distribution across different snapshots illustrates that for many of the target content providers, the top IPs do not vary widely across different snapshots. In our analysis of traffic locality below, we only consider the collection of all top IPs associated with each of the target content provider across different snapshots. Focusing on these roughly 50K IPs allows us to capture a rather complete view of delivered traffic to UOnet without considering the millions of observed source IPs. 92 10 6 10 5 10 4 IP Count 10 3 10 2 10 1 10 0 CenturyLink CloudFlare Limelight Twitch Comcast Amazon Quantil Edgecast Twitter Dropbox Internap Pandora Akamai Level3 Cogent Valve Neronet Fastly Netflix OVH NTT Figure 8. Distribution of the number of top IPs across different snapshots in addition to total number of unique top IP addresses (blue line) and the total number of unique IPs across all snapshots (red line) for each target content provider. Measuring the Distance of Top IPs: Using the approximately 50K top IPs for all 21 target content providers, we conducted a measurement campaign (on 11/10/16) that consisted of launching 10 rounds of traceroutes1 from UOnet to all of these 50K top IPs to infer their minimum RTT. Note that the value of RTT for each top IP accounts for possible path asymmetry between the launching location and the target IP and is therefor largely insensitive to the direction of the traceroute probe (i.e. from UOnet to a top IP vs. from a top IP to UOnet). Our traceroute probes successfully reached 81% of the targeted IP addresses. We exclude three target content providers (i.e., Internap, 1 We use all three types of traceroute probes(TCP, UDP, ICMP) and spread them throughout the day to reach most IPs and reliable capture minimum RTT 93 50% 50% 75% Pandora Netflix Neronet 75% Pandora Netflix Neronet 90% Quantil Akamai 90% Quantil Akamai OVH Level3 OVH Level3 50 60 50 60 30 40 30 40 Cogent 20 Fastly Cogent 20 Fastly 10 10 CloudFlare Twitter CloudFlare Twitter CenturyLink Comcast CenturyLink Comcast Valve NTT Valve NTT EdgecastDropboxLimelight EdgecastDropboxLimelight Figure 9. Radar plots showing the aggregate view of locality based on RTT of delivered traffic in terms of bytes (left plot) and flows (right plot) to UOnet in a daily snapshot (10/04/2016). Amazon and Twitch) from our analysis because their servers did not respond to more than 90% of our traceroute probes. All other target content providers responded to more than 90% of our probes. The outcome of our measurement campaign is the list of top IPs along with their min RTT and the percentage of delivered traffic (in terms of bytes and flows) for each target content provider. With the help of this information, we can now assess the locality properties of the content that is delivered from each target content provider to UOnet. Note that in theory, any distance measure could be used for this purpose. However, in practice, neither AS distance (i.e., number of AS hops), nor hop-count distance (i.e., number of traceroute hops), nor geographic distance are reliable metrics. While the first two ignore the commonly encountered asymmetry of IP-level routes in today’s Internet Sánchez et al. (2013), the last metric suffers from known inaccuracies in commercial databases such as IP2Location IP2Location (2015) and Maxmind MaxMind (2018) that are commonly used for IP geolocation. We choose the RTT distance (i.e., measured by min RTT value) as our metric-of-choice for assessing the locality of delivered 94 traffic since it is the most reliable distance measure and also the most relevant in terms of user-perceived delay. 3.4 Traffic Locality for Content Providers Overall View of Traffic Locality: We use radar plots to present an overall view of the locality of aggregate delivered traffic from our target content providers to UOnet based on RTT distance. Radar plots are well suited for displaying multi- variable data where individual variables are shown as a sequence of equiangular spokes, called radii. We use each spoke to represent the locality of traffic for a given target content provider by showing the RTT values for 50th, 75th and 90th percentiles of delivered traffic (in bytes or flows). In essence, the spoke corresponding to a particular target content provider shows what percentage of the traffic that this content provider delivers to UOnet originates from within 10, 20,..., or 60ms distance from our stub-AS. Figure 9 shows two such radar plots for a single daily snapshot (10/04/16). In these plots, the target CPs are placed around the plot in a clock-wise order (starting from 12 o’ clock) based on their relative contributions in delivered bytes (as shown in Figure 6), and the distances (in terms of min RTT ranges) are marked on the 45-degree spoke. The left and right plots in Figure 9 show the RTT distance for 50, 75 and 90th percentile of delivered bytes and flows for each content provider, respectively. By connecting the same percentile points on the spokes associated with the different target content providers, we obtain a closed contour where the sources for 50, 75 or 90% of the delivered content form our target content providers to UOnet are located. We refer to this collection of contours as the traffic footprint of UOnet. While more centrally-situated contours indicate a high degree of overall traffic locality for the 95 considered stub-AS, contours that are close to the radar plot’s boundary for some spokes suggest poor localization properties for some content providers. The radar plots in Figure 9 show that while there are variations in traffic locality for different target content providers, 90% of the delivered traffic for the top 13 content providers are delivered from within a 60ms RTT distance from UOnet and for 9 of them from within 20ms RTT. Moreover, considering the case of Cogent, while 50% of bytes from Cogent are delivered from an RTT distance of 20ms, 50% of the flows are delivered from a distance of 60ms. Such an observed higher level of traffic locality with respect to bytes compared to flows suggests that a significant fraction of the corresponding target content provider’s (in this case, Cogent) large or “elephant” flows are delivered from servers that are in closer proximity to UOnet than those that serve the target content provider’s smaller flows. Collectively, these findings indicate that for our stub-AS, the overall level of traffic locality for delivered bytes and flows is high but varies among the different target content providers. These observations are by and large testimony to the success of past and ongoing efforts by the different involved parties to bring content closer to the edge of the network where it is requested and consumed. As such, the results are not surprising, but to our knowledge, they provide the first quantitative assessment of the per-content provider traffic footprint (based on RTT distance) of a stub-AS. Variations in Traffic Locality: After providing an overall view of the locality of the delivered traffic to UOnet for a single snapshot, we next turn our attention to how traffic locality of a content provider (with respect to UOnet) varies over time. To simplify our analysis, we consider all flows of each target content provider and bin them based on their RTTs using a bin size of 2ms. The flows in each bin 96 are considered as a single group with an RTT value given by the mid-bin RTT value. We construct the histogram of percentages of delivered bytes from each group of flows in each bin and define the notion of Normalized Weighted Locality for delivered traffic from a provider P in snapshot s as: X F racBytes(i) ∗ RT T (i) N W L(s, P ) = minRT T (P ) iRT T Bins(P ) N W L(s,P) is simply the sum of the fraction of delivered traffic from each RTT bin (F racBytes(i)) that is weighted by its RT T and then normalized by the lowest RTT among all bin (minRT T (P )) for a content provider across all snapshots. N W L is an aggregate measure that illustrates how effectively a content provider localizes its delivered traffic over its own infrastructure. A N W L value of 1 implies that all of the traffic is delivered from the closest servers while larger values indicate more contribution from servers that are further from UOnet. The top plot in Figure 10 presents the summary distribution of N W L(s, P ) across different daily snapshots for each content provider. The bottom plot in Figure 10 depicts min RTT for each content provider. These two plots together show how local the closest server of a content provider is and how effective each content provider is in utilizing its infrastructure. The plots also demonstrate the following points about the locality of traffic. For one, for many target content providers (e.g. Netflix, Comcast, Valve), the N W L values exhibits small or no variations across different snapshots. Such a behavior suggests that the pattern of delivery from different servers is stable across different snapshots. In contrast, for content providers with varying N W L values, the contribution of various servers (i.e. the pattern of content delivery from various content provider servers) changes over time. Second, the value of N W L is less than 2 (and often very close to 1) for many content providers. This in turn indicates that these content providers 97 effectively localize their delivered traffic to UOnet over their infrastructure. The value of N W L for other content providers is larger and often exhibit larger variation due to their inability to effectively utilize their nodes to localize delivered traffic to UOnet. 7 6 5 4 NWL 3 2 1 70 60 50 min RTT 40 30 20 10 0 Twitter Fastly Netflix Akamai Comcast NTT Limelight Quantil Dropbox Edgecast Valve CenturyLink CloudFlare Cogent OVH Pandora Neronet Level3 Figure 10. Two measures of traffic locality, from top to bottom, Summary distribution of NWL and the RTT of the closest servers per content provider (or minRTT). 3.5 Traffic From Guest Servers To improve the locality properties of their delivered content and services to end users, some content providers expand their infrastructure by deploying some of their servers in other networks. We refer to such servers as guest servers and to the third-party networks hosting them as host networks or host ASes. For example, Akamai is known to operate some 200K such servers in over 1.5K different host 98 networks, with the servers using IP addresses that belong to the host networks Fan et al. (2015); Triukose, Wen, and Rabinovich (2011). We present two examples to illustrate the deployment of guest servers. First, our close examination of delivered traffic from Neronet which is one of UOnet’s upstream providers revealed that all of its flows are delivered from a small number of IPs (see Figure 8) associated with Google servers, i.e. Google caches Calder et al. (2013) that are deployed in Neronet. This implies that all of Google’s traffic for UOnet is delivered from Neronet-based Google caches and explains why Google is not among our target content providers. Second, Netflix is known to deliver its content to end users through its own caches (called Open Connect AppliancesNetflix (2017b)) that are either deployed within different host networks or placed at critical IXPs Böttger et al. (2016). When examined the DNS names for all the source IPs of our target content providers, we observed a number of source IPs that are within another network and their DNS name follow the *.pdx001.ix.nflxvideo.net format. This is a known Neflix convention for DNS names and clearly indicates that these guest servers are located at an IXP in Portland, Oregon Böttger et al. (2016). 3.5.1 Detecting Guest Servers. Given the special nature of content delivery to UOnet from Google (via Neronet) and Netflix (via a close-by IXP), we focus on Akamai to examine how its use of guest servers impact the locality of delivered traffic to UOnet. However, since our basic methodology that relies on a commonly-used IP-to-AS mapping technique cannot identify Akamai’s guest servers and simply associates them with their host network, we present in the following a new methodology for identifying Akamai’s guest servers that deliver content to UOnet. 99 Our proposed method leverages Akamai-specific information and proceeds in two steps. The first step consists of identifying the URLs for a few small, static and popular objects that are likely to be cached at many Akamai servers. Then, in a second step, we probe the observed source IP addresses at other target content providers with properly-formed HTTP request for the identified objects. Any third-party server that provides the requested objects is considered an Akamai guest server. More precisely, we first identify a few Akamai customer websites and interact with them to identify small, static and popular objects (i.e., “reference objects"). Since JavaScript or CSS files are less likely to be modified compared to other types of objects and thus are more likely to be cached by Akamai servers, we used in our experiments two JavaScript objects and a logo from Akamai client web sites (e.g. Apple, census.gov, NBA). Since an Akamai server is responsible for hosting content from multiple domain names, the web server needs a way to distinguish requests that are redirected from clients of different customer websites. This differentiation is achieved with the help of the HOST field of the HTTP header. Specifically, when constructing a HTTP request to probe an IP address, we set the HOST field to the original domain name of the reference object (e.g. apple.com, census.gov, nba.com). Next, for each reference object, we send a separate HTTP request to each of the 50K top source IP addresses in our datasets (see Section 3). If we receive the HTTP OK/200 status code in response to our request and the first 100 bytes of the provided object match the requested reference object 2 , we consider the server to be an Akamai guest server and identify its AS as host AS. We repeat our request using other reference objects if the HTTP request 2 The second condition is necessary since some servers provide a positive response to any HTTP requests. 100 fails or times-out. If all of our requests time-out or receive a HTTP error code, we mark the IP address as a non-Akamai IP address. To evaluate our proposed methodology, we consider all the 601 servers in our dataset whose IP addresses are mapped to Akamai (based on IP to AS mapping) and send our HTTP requests to all of them. Since all Akamai servers are expected to behave similarly, the success rate of our technique in identifying these Akamai servers demonstrates its accuracy. Indeed, we find that 585 (97%) of these servers properly respond to our request and are thus identified as Akamai servers. The remaining 3% either do not respond or respond with various HTTP error codes. When examining these 16 failed servers more closely, we discovered that 11 of them were running a mail server and would terminate a connection to their web server regardless of the requested content. This suggests that these Akamai servers perform functions other than serving web content. Using our proposed technique, we probed all 50K top source IP addresses associated with our 21 target content providers in all of our snapshots. When performing this experiment (on 11/20/16), we discovered between 143-295 Akamai guest servers in 3-7 host ASes across the different snapshots. In total, there were 658 unique guest servers from 7 unique host ASes, namely NTT, CenturyLink, OVH, Cogent, Comcast, Dropbox and Amazon. Moreover, these identified Akamai guest servers deliver between 121-259 GBytes to UOnet in their corresponding daily snapshots which is between 9-20% of the aggregate daily traffic delivered from Akamai to UOnet. These results imply that the 34-103 Akamai-owned servers in each snapshot deliver on average 12 times more content to UOnet than Akamai’s 143-295 guest servers. Moreover, we observed that the bulk of delivered bytes from Akamai’s guest servers to UOnet (i.e., 98%) is associated with guest servers that 101 Akamai Akamai 50% 75% 90% 20 20 15 15 10 10 5 5 OVH NTT OVH NTT CenturyLink CenturyLink Figure 11. Locality (based on RTT in ms) of delivered traffic (bytes, left plot; flows, right plot) for Akamai-owned servers as well as Akamai guest servers residing within three target ASes for snapshot 2016-10-04. are deployed in two content providers, namely NTT (76.1%) and CenturyLink (21.9%). 3.5.2 Relative Locality of Guest Servers. Deploying guest servers in various host ASes enables a content provider to either improve the locality of its traffic or provide better load balancing among its servers. To examine these two objectives, we compare the level of locality of traffic delivered from Akamai- owned servers vs Akamai’s guest servers. The radar plots in Figure 11 illustrate the locality (based on RTT) of delivered content from Akamai-owned servers shown at 12 o’clock (labeled as Akamai) as well as from Akamai’s guest servers in all three host networks in the snapshot from 10/04/16. The guest servers are grouped by their host ASes and ordered based on their aggregate contribution in delivered 102 bytes (for Akamai flows) in a clock-wise order. We observe that traffic delivered from Akamai-owned servers exhibits a higher locality – 75% (90%) of the bytes (flows) are delivered from servers that are 4ms (8ms) RTT away. The Akamai traffic from CenturyLink, NTT and OVH is delivered from servers that are at RTT distance of 8, 15 and 20ms, respectively. While these guest servers serve content from further away than the Akamai-owned servers, they are all relatively close to UOnet which suggests that they are not intended to offer higher level of locality for delivered content to UOnet users. 3.6 Implications of Traffic Locality Improving end user-perceived performance (i.e. decreasing delay and/or increasing throughput) is one of the main motivations for major content providers to bring their front-end servers closer to the edge of the network. In the following, we examine whether such performance improvements are indeed experienced by the end users served by UOnet and to what extent for a given content provider the observed performance is correlated with that content provider’s traffic locality. We already showed in Figure 9 that the measured min RTT values for a majority of content providers (with some exceptions such as OVH, Quantil, Cogent) are consistently low (<20ms) across all flows. The average throughput of each flow can be easily estimated by dividing the total number of delivered bytes by its duration 3 . To get an overall sense of the observed average throughput, Figure 12 shows the summary distributions of the measured throughput across delivered flows by each target content provider. We observe that 90% of the flows for all target content providers (except Level3) experience low throughput (< 0.5MB/s, and in 3 Note that we may have fragmented flows for this analysis. This means that long flows will be divided into 5min intervals. However, 5min is sufficiently long to estimate average throughput of individual flows. 103 3.0 2.5 Throughput (MB/s) 2.0 1.5 1.0 0.5 0.0 CenturyLink CloudFlare Comcast Dropbox Edgecast Neronet Pandora Akamai Twitter Limelight Cogent Fastly Valve Quantil Netflix Level3 NTT OVH Figure 12. Summary distribution of average throughput for delivered flows from individual target content providers towards UOnet users across all of our snapshots. most cases even < 0.25MB/s). This raises the question why these very localized flows do not achieve higher throughput. In general, reliably identifying the main factors that limit the throughput of individual flows is challenging Sundaresan, Feamster, and Teixeira (2015). The cause could be any combination of factors that include – Content Bottleneck: the flow does not have sufficient amount of content to “fill the pipe"; – Receiver Bottleneck: the receiver’s access link (i.e. client type) or flow control is the limiting factor; – Network Bottleneck: the fair share of network bandwidth is limited due to cross traffic (and resulting loss rate); 104 – Server Rate Limit: a content provider’s server may limit its transmission rate implicitly due to its limited capacity or explicitly as a results of the bandwidth requirements of the content (e.g. Netflix videos do not require more than 0.6 MB/s for a Full-HD stream Netflix (2017a)). Rather than inferring the various factors that affect individual flows, our goal is to identify the primary factor from the above list that limits the maximum achievable throughput by individual content providers. To this end, we only consider 3-4% (or 510-570K) of all flows for each target content provider that their size exceeds 1 MB and refer to them as “elephant" flows.4 These elephant flows have typically several 100s of packets and are thus able to fully utilize available bandwidth in the absence of other limiting factors (i.e. content bottleneck does not occur). More than 0.5 million elephant flows for individual content providers are delivered to end users in UOnet that have diverse connection types (wireless, residential, wired). Therefor, receiver bottleneck should not be the limiting factor for the maximum achievable throughput by individual content providers. This in turn suggests that either the network or the server are responsible for limiting the achievable throughput. To estimate the Maximum Achievable Throughput (MAT) for each content provider, we group all elephant flows associated with that content provider based on their RTT into 2ms bins and select the 95% throughput value (i.e. median of the top 10%) in the bin as its MAT with its mid-bin RTT value as the corresponding RTT. Since a majority (96%) of these flows are associated with TCP connection and thus are congestion controlled, we can examine the key factors responsible for limiting throughput. Figure 13 shows a scatter plot where each labelled dot 4 Selecting the 1 MB threshold for flow size strikes a balance between having sufficiently large flows Sodagar (2011) and obtaining a large set of flows for each content provider. 105 6 AKAM 10 −2 10 −3 AK-DRPBX 10 −4 5 2 ∗ 10 −5 FSTLY Throughput (MB/s) NERO 4 AK-CLINK AK-NTT LVL3 PNDR 3 CLFLR DRPBX AK-OVH LMLT 2 OVH NFLX EDGCS CMCST QNTL CGNT 1 TWTR VALV AK-CMCST NTT CLINK AK-CGNT 0 0 10 20 30 40 50 60 70 80 90 RTT (ms) Figure 13. Maximum Achievable Throughput (MAT) vs MinRTT for all content providers. The curves show the change in the estimated TCP throughput as a function of RTT for different loss rates. represents a target content provider with its y-value denoting its MAT and its x-value denoting the associated RTT. We also group all Akamai flows from its guest servers at each host ASx , determine their separate MAT and exclude them from ASx ’s own flows to avoid double-counting them. For example, Akamai flows that are delivered from OVH are marked as AK-OVH. To properly compare the measured MAT values across different RTTs, we also plot an estimated TCP throughput as a function of RTT for three different loss rates that we obtain by M SS √1 . applying the commonly-used equationMathis et al. (1997): T < RT T ∗ L In this equation, M SS denotes the Maximum Segment Size which we set to 1460; L represents the loss rate. We consider three different loss rate values, namely 10−2 , 10−3 , 10−4 , to cover a wide range of "realistic" values. 106 Examining Figure 13, we notice that the relative location of each labelled dot with respect to the TCP throughput lines reveals the average “virtual" loss rate across all elephant flows of a content provider if bandwidth bottleneck were the main limiting factor. The figure shows that this virtual loss rate for many content providers is at or above 10−3 . However, in practice, average loss rates higher than 10−3 over such short RTTs (<20ms) are very unlikely in our setting (e.g., UOnet is well provisioned and most incoming flows traverse the paths with similar or identical tail ends). To test this hypothesis, we directly measure the loss rate between UOnet and the closest servers for each content provider using 170K ping probes per content provider.5 Figure 14 depicts the average loss rate for each target content provider and shows that the measured average loss rate for all of the target content providers is at least an order of magnitude lower than the virtual loss rate for each content provider. This confirms that all of the measured MAT values must be rate-limited by the server, either explicitly (due to the bandwidth requirements of the content) or implicitly (due to server overload). Figure 13 also shows that the measured MAT values for Akamai guest servers are often much larger than those for the servers owned by the host AS. For example, the MAT value for AK-CLINK (AS-DRPBX or AK-NTT) is much higher than the MAT for CLINK (DRPBX, or NTT). Furthermore, the measured MAT value for all the flows from Akamai’s guest servers is lower than its counterpart for all flows from Akamai-owned servers. To summarize, there are two main take-aways from our examination of the performance implications of traffic locality. On the one hand, traffic locality is key to achieving the generally and uniformly very small measured delays for traffic 5 Note that ping measures loss in both directions of a connection. 107 5 4 Loss Rate x 10−4 3 2 1 0 Twitter Fastly AK-NTT AK-Comcast AK-OVH Netflix Akamai Comcast NTT Limelight Quantil AK-CenturyLink AK-Dropbox AK-Cogent Dropbox Edgecast Valve OVH CenturyLink CloudFlare Cogent Pandora Neronet Level3 Figure 14. Average loss rate of closest servers per target content provider measured over 24 hours using ping probes with 1 second intervals. For each content provider we choose at most 10 of the closest IP addresses. delivered to UOnet. On the other hand, our results show that a majority of flows for all target content providers are associated with small files and thus do not reach a high throughput. Furthermore, the throughput for most of the larger flows are not limited by the network but rather by the front-end servers. In other words, high throughput delivery of content at the edge is either not relevant (for small objects) or not required by applications. 3.7 Summary Our work contributes to the existing literature on content delivery by providing a unique view of different aspects of content delivery as experienced by the end users served by a stub-AS (i.e., a Research & Education network). To this end, we examine the complete flow-level view of traffic delivered to this stub-AS 108 from all major content providers and characterize this stub-AS’ traffic footprint (i.e. a detailed assessment of the locality properties of the delivered traffic). We also study the impact that this traffic footprint has on the performance experienced by its the end users and report on two main takeaways. First, this stub-AS’ traffic locality is uniformly high across the main CPs; i.e., the traffic that these CPs deliver to this stub-AS experiences in general only very small delays. Second, the throughput of the delivered traffic remains far below the maximum achievable throughout and is not limited by the network but rather by the front- end servers. Lastly, to complement the effort described in this chapter, assessing the locality properties of the traffic that constitutes the (long) tail of the distribution in Figure 6 and is typically delivered from source IP addresses that are rarely seen in our data or are responsible for only minuscule portions of the traffic delivered to UOnet looms as an interesting open problem and is part of future work. 109 CHAPTER IV CLOUD PEERING ECOSYSTEM Chapter III presented an overview of CPs and content providers’ share in Internet traffic and the degree of locality for their infrastructure. In this chapter, we focus on the topology and connectivity of CPs to the rest of the Internet. We pay special attention to the new form of peering relationships that CPs are forming with edge networks. The content in this chapter is derived entirely from Yeganeh, Durairajan, Rejaie, and Willinger (2019) as a result of collaboration with co-authors listed in the manuscript. Bahador Yeganeh is the primary author of this work and responsible for conducting all measurements and producing the presented analyses. 4.1 Introduction In this chapter, we present a third-party, cloud-centric measurement study aimed at discovering and characterizing the unique peerings (along with their types) of Amazon, the largest cloud service provider in the US and worldwide. Each peering typically consists of one or multiple (unique) interconnections between Amazon and a neighboring Autonomous System (AS) that are typically established at different colocation facilities around the globe. Our study only utilizes publicly available information and data (i.e. no Amazon-proprietary data is used) and is therefore also applicable for discovering the peerings of other large cloud providers.1 We start by presenting the required background on Amazon’s serving infrastructure, including the different types of peerings an enterprise network can establish with Amazon at a colo facility in § 4.2. § 4.3 describes the first round of our data collection; that is, launching cloud-centric traceroute probes from different 1 As long as the cloud provider does not filter traceroute probes. 110 regions of Amazon’s infrastructure toward all the /24 (IPv4) prefixes to infer a subset of Amazon’s peerings. We present our methodology for inferring Amazon’s peerings across the captured traceroutes in § 4.4.1. Our second round of data collection consists of using traceroute probes that target the prefixes around the peerings discovered in the first round and are intended to identify all the remaining (IPv4) peerings of Amazon (§ 4.4.2). In § 4.5, we present a number of heuristics to resolve the inherent ambiguity in inferring the specific traceroute segment that is associated with a peering. We further confirm our inferred peerings by assessing the consistency of border interfaces at both the Amazon side and client side of an inferred interconnection. Pinning (or geo-locating) each end of individual interconnections associated with Amazon’s peerings at the metro level forms another contribution of this study (§ 4.6). To this end, we develop a number of methods to identify border interfaces that have a reliable location and which we refer to as anchors. Next, we establish a set of co-presence rules to conservatively propagate the location of anchors to other close-by interfaces. We then identify the main factors that limit our ability to pin all border interfaces at the metro level and present ways to pin most of the interfaces at the regional level. Finally, we evaluate the accuracy and coverage of our pinning technique and characterize the pinned interconnections. The final contribution of this work is a new method for inferring the client border interface that is associated with that client’s VPI with Amazon. In particular, by examining the reachability of a given client border interface from a number of other cloud providers (§ 4.7) and identifying overlapping interfaces between Amazon and those other cloud providers, our method provides a lower bound on the number of Amazon’s VPIs. We then assign all inferred Amazon 111 peerings to different groups based on their key attributes such as being public or private, visible or not visible in BGP, and physical or virtual. We then carefully examine these groups of peerings to infer their purpose and explore hybrid peering scenarios. In particular, we show that one-third of Amazon’s inferred peerings are either virtual or not visible in BGP and thus hidden from public measurement. Finally, we characterize the inferred Amazon connectivity graph as a whole. 4.2 Background Amazon’s Ecosystem. The focus of our study of peerings in today’s Internet is Amazon, arguably the largest cloud service provider in the US and worldwide. Amazon operates several data centers worldwide. While these data centers’ street addresses are not explicitly published by Amazon, their geographic locations have been reported elsewhere Burrington (2016); DatacenterMap (2018); Miller (2015); Plaven (2017); WikiLeaks (2018); Williams (2016). Each data center hosts a large number of Amazon servers that, in turn, host user VMs as well as other services (e.g. Lambda). Amazon’s data center locations are divided into independent and distinct geographic regions to achieve fault tolerance/stability. Specifically, each region has multiple, isolated availability zones (AZs) that provide redundancy and offer high availability in case of failures. AZs are virtual and their mapping to a specific location within their region is not known Amazon (2018f). As of 2018, Amazon had 18 regions (55 AZs) across the world, with five of them (four public + one US government cloud) located in the US. For our study, we were not able to utilize three of these regions. Two of them are located in China, are not offered on Amazon’s AWS portal, and require approval requests by Amazon staff. The third region is assigned to the US government and is not offered to the public. 112 Peering with Amazon at Colo Facilities. Clients can connect to Amazon through a specific set of colo facilities. Amazon is considered a native tenant in these facilities, and their locations are publicly announced by Amazon Amazon (2018d). Amazon is also reachable through a number of other colo facilities via layer-2 connectivity offered by third-party providers (e.g. Megaport).2 AS 4 b f Azure Azure IXP AS 3 Access Local west-us Network Business e AS 2 d a g AS 1 Connectivity Cloud Exchange c AWS AS 5 AWS Partner us-west1 CoreSite LA1 Databank SLC Figure 15. Overview of Amazon’s peering fabric. Native routers of Amazon & Microsoft (orange & blue) establishing private interconnections (AS3 - yellow router), public peering through IXP switch (AS4 - red router), and virtual private interconnections through cloud exchange switch (AS1 , AS2 , and AS5 - green routers) with other networks. Remote peering (AS5 ) as well as connectivity to non-ASN businesses through layer-2 tunnels (dashed lines) happens through connectivity partners. Figure 15 depicts an example of different types of peerings offered by cloud providers at two colo facilities. Both Amazon (AWS) and Microsoft (Azure) are native (i.e. house their border routers) in the CoreSite LA1 colo facility and are both present at that facility’s IXP and cloud exchange. (Open) cloud exchanges are switching fabrics specifically designed to facilitate interconnections among network providers, cloud providers, and enterprises in ways that provide the scalability and elasticity essential for cloud-based services and applications (e.g. see CoreSite (2018); Equinix (2017)). Major colo facility providers (e.g. Equinix 2 These entities are called “AWS Direct Connect Partners" at a particular facility and are listed online along with their points of presence Amazon (2018c). 113 and CoreSite) also offer a new interconnection service option called “virtual private interconnection (VPI).” VPIs enable local enterprises (that may or may not own an ASN) to connect to multiple cloud providers that are present at the cloud exchange switching fabric by means of purchasing a single port on that switch. In addition, VPIs provide their customers access to a programmable, real-time cloud interconnection management portal. Through this portal, the operators of these new switching fabrics make it possible for individual enterprises to establish their VPIs in a highly-flexible, on-demand, and near real-time manner. This portal also enables enterprises to monitor in real-time the performance of their cloud-related traffic that traverses these VPIs. While cloud exchanges rely on switching fabrics that are similar to those used by IXPs, there are two important differences. For one, cloud exchanges enable each customer to establish virtualized peerings with multiple cloud providers through a single port. Moreover, they provide exclusive client connectivity to cloud providers without requiring a client to use its pre-allocated IP addresses. Operationally, a cloud customer establishes VPIs using either public or private IP addresses depending on the set of cloud services that this customer is trying to reach through these interconnections. On the one hand, VPIs relying on private addresses are limited to the customer’s virtual private cloud (VPC) through VLAN isolation. On the other hand, VPIs with public addresses can reach compute resources in addition to other AWS offerings such as S3 and DynamoDB Amazon (2018b). Given the isolation of network paths for VPIs with private addresses, any peerings associated with these VPIs are not visible to the probes from VMs owned by other Amazon customers. This makes it, in practice, impossible to discover established VPIs that rely on private addresses. In Figure 15, the different colors 114 of the client routers indicate the type of their peerings; e.g. public peering through the IXP (for AS4 ), direct physical interconnection (also called “cross-connect") (for AS3 ), private virtual peerings that are either local (for AS1 and AS2 ) or remote (for AS5 ). Here, a local virtual private peering (e.g. AS2 ) could be associated with an enterprise that is brought to the cloud exchange by its access network (e.g. Comcast) using layer-2 technology; based on traceroute measurements, such a peering would appear to be between Amazon and the access network. In contrast, a remote private virtual peering could be established by an enterprise (e.g. AS5 ) that is present at a colo facility (e.g. Databank in Salt Lake City in Figure 15) where Amazon is not native but that houses an “AWS Direct Connect Partner" (e.g. Megaport) which in turn provides layer-2 connectivity to AWS. 4.3 Data Collection & Processing To infer all peerings between Amazon and the rest of the Internet, we perform traceroute campaigns from Amazon’s 15 available global regions to a .1 in each /24 prefix of the IPv4 address space.3 To this end, we create a t2-micro instance VM within each of the 15 regions and break down the IPv4 address space into /24 prefixes. While we exclude broadcast and multicast prefixes, we deliberately consider addresses that are associated with private and shared address spaces since these addresses can be used internally in Amazon’s own network. This process resulted in 15.6M target IPv4 addresses. To probe these target IPs from our VMs, we use the Scamper tool Luckie (2010) with UDP probes as they provide the highest visibility (i.e. response rate). Individual probes are terminated upon encountering five consecutive unresponsive hops in order to limit the overall measurement time while reaching 3 We observed a negligible difference in the visibility of interconnections across probes from different AZs in each region. Therefore, we only consider a single AZ from each region. 115 Amazon’s border routers. We empirically set our probing rate to 300pps to prevent blacklisting or rate control of our probe packets by Amazon. With this probing rate, our traceroute campaign took nearly 16 days to complete (from 08/03/2018 to 08/19/2018). Each collected traceroute is associated with a status flag indicating how the probe was terminated. We observed that the total number of completed traceroutes across different regions is fairly consistent but rather small (mean 7.7% and std 5 ∗ 10−4 ) which suggests a limited yield. However, since our main objective is to identify Amazon interconnections and not to maximize traceroute yield, we consider any traceroute that leaves Amazon’s network (i.e. reaches an IP outside of Amazon’s network) as a candidate for revealing the presence of an interconnection, and the percentage of these traceroutes is about 77%. Annotating Traceroute Data. To identify any Amazon interconnection traversed by our traceroutes, we annotate every IP hop with the following information: (i) its corresponding ASN, (ii) its organization (ORG), and (iii) whether it belongs to an IXP prefix. To map each IP address to its ASN, we rely on BGP snapshots from RouteViews and RIPE RIS (taken at the same time as our traceroute campaign). For ORG, we rely on CAIDA’s AS-to-ORG dataset Huffaker, Keys, Fomenkov, and Claffy (2018) and map the inferred ASN of each hop from the previous step to its unique ORG identifier. ORG information allows us to correctly identify the border interface of a customer in cases where traceroute traverses through hops in multiple Amazon ASes prior to reaching a customer network4 . Finally, to determine if an IP hop is part of an IX prefix, we rely on PeeringDB PeeringDB (2017), Packet Clearing House (PCH) Packet Clearing House 4 We observed AS7224, AS16509, AS19047, AS14618, AS38895, AS39111, AS8987, and AS9059 for Amazon. 116 (2017), and CAIDA’s IXP dataset CAIDA (2018) to obtain prefixes assigned to IXPs. In our traceroutes, we observe IP hops that do not map to any ASN. These IPs can be divided into two groups. The first group consists of the IPs that belong to either a private or a shared address space (20.3%); we set the ASN of these IPs to 0. The second group consists of all the IPs that belong to the public address space but were not announced by any AS during our traceroute campaign (7%); for these IPs, we infer the AS owner by relying on WHOIS-provided information (i.e. name or ASN of the entity/company assigned by an RIR). 4.4 Inferring Interconnections In this section, we describe our basic inference strategy for identifying an Amazon-related interconnection segment across a given traceroute probe (§ 4.4.1) and discuss the potential ambiguity in the output of this strategy. We then discuss the extra steps we take to leverage these identified segments in an effort to efficiently expand the number of discovered Amazon-related interconnections (§ 4.4.2). 4.4.1 Basic Inference Strategy. Given the ASN-annotated traceroute data, we start from the source and sequentially examine each hop until we detect a hop that belongs to an organization other than Amazon (i.e. its ORG number is neither 0 nor 7224, which is Amazon). We refer to this hop as customer border hop and to its IP as a Customer Border Interface (CBI). The presence of a CBI indicates that the traceroute has exited Amazon’s network; that is, the traceroute hop right before a CBI is the Amazon Border Interface (ABI), and the corresponding traceroute probe thus must have traversed an Amazon-related interconnection segment. For the remainder of our analysis, we only consider these 117 initial portions of traceroutes between a source and an encountered CBI .5 Next, for each CBI , we check to confirm that the AS owners of all the downstream hops in each traceroute does not include any ASN owned by Amazon (i.e. a sanity check that the traceroute does not re-enter Amazon); all of our traceroutes meet this condition. Finally, because of their unreliable nature, we exclude all traceroutes that contain either an (IP-level) loop, unresponsive hop(s) prior to Amazon’s border, a CBI as the destination of a traceroute Baker (1995), or duplicate hops before Amazon’s border. The first two rows of Table 3 summarize the number of ABIs and CBIs that we identified in our traceroute data, along with the fraction of interfaces in each group for which we have BGP, Whois, and IXP-association information. As highlighted in § 2.3.2, in certain cases, our basic strategy may not identify the correct Amazon-related interconnection segment on a given traceroute. Given that our traceroutes are always launched from Amazon to a client’s network, when Amazon provides addresses for the physical interconnection, our strategy incorrectly identifies the next downstream segment as an interconnection Amazon (2018b). In summary, the described method always reveals the presence of an Amazon-related interconnection segment in a traceroute. The actual Amazon- specific interconnection segment is either the one between the identified ABI and CBI or the immediately preceding segment. Because of this ambiguity in accurately inferring the Amazon-specific interconnection segments, we refer to them as candidate interconnection segments. In § 4.5, we present techniques for a more precise determination of these inferred candidate interconnection segments. 5 In fact, we only need the CBI and the prior two ABIs. 118 Table 3. Number of unique ABIs and CBIs along with their fraction with various meta data, prior (rows 2-3) and after (rows 4-5) /24 expansion probing. All BGP% Whois% IXP% ABI 3.68k 38.4% 61.6% - CBI 21.73k 54.74% 24.8% 20.46% eABI 3.78k 38.85% 61.15% - eCBI 24.75k 79.82% 22.32% 17.86% 4.4.2 Second Round of Probing to Expand Coverage. We perform our traceroute probes from each Amazon’s region in two rounds. First, as described in § 4.4.1, we target .1 in each /24 prefix of the IPv4 address space (§ 4.3) and identify the pool of candidate interconnection segments. However, it is unlikely that our traceroute probes in this first round traverse through all the Amazon interconnections. Therefore, to increase the number of discovered interconnections, in a second round, we launch traceroutes from each region towards all other IP addresses in the /24 prefixes that are associated with each CBI that we discovered in the first round. Our reasoning for this “expansion probing" is that the IPs in these prefixes have a better chance to be allocated to CBIs than the IPs in other prefixes. Similar to round one, we annotate the resulting traceroutes and identify their interconnection segments (and the corresponding ABIs and CBIs). The bottom two rows in Table 3 show the total number of identified ABIs and CBIs after processing the collected expansion probes. In particular, while the first column of Table 3 shows a significant increase in the number of discovered CBIs (from 21.73k to 24.99k) and even some increase in the number of peering 119 ASNs (from 3.52k to 3.55k) as a result of the expansion probing, the number of ABIs remains relatively constant. 4.5 Verifying Interconnections To address the potential ambiguity in identifying the correct Amazon- specific segment of each inferred interconnection (§ 4.4.1), we first check these interconnections against three different heuristics (§ 4.5.1) and then rely on the router-level connectivity among border routers (§ 4.5.2) to verify (and possibly correct) the inferred ABIs and CBIs. 4.5.1 Checking Against Heuristics. We develop a few heuristics to check the aforementioned ambiguity of our approach with respect to inferring the correct interconnection segment. Since the actual interconnection segment could be the segment prior to the identified candidate segment (i.e. we might have to shift the interconnection to the previous segment), our heuristics basically check for specific pieces of evidence to decide whether an inferred ABI is correct or should be changed to its corresponding CBI . Once an ABI is confirmed, all of its corresponding CBIs are also confirmed. The heuristics are described below and are ordered (high to low) based on our level of confidence in their outcome. IXP-Client. An IP address that is part of an IXP prefix always belongs to a specific IXP member. Therefore, if the IP address for a CBI in a candidate interconnection segment is part of an IXP prefix, then that CBI and its corresponding ABI are correctly identified Nomikos and Dimitropoulos (2016). Hybrid IPs. We observe ABI interfaces with hybrid connectivity. For example, in Figure 16, interface a represents such an interface with hybrid connectivity; it appears prior to the client interface b in one traceroute and prior to the Amazon interface c in another traceroute. Even if we are uncertain about the owner of 120 an interface c (i.e. it may belong to the same or different Amazon client), we can reliably conclude that interface a has hybrid connectivity and must be an ABI . e a c d b Figure 16. Illustration of a hybrid interface (a) that has both Amazon and client- owned interfaces as next hop. Interface Reachability. Our empirical examination of traceroutes revealed that while ABIs are generally reachable from their corresponding clients, for security reasons, they are often not visible/reachable from the public Internet (e.g. a campus or residential networks). However, depending on the client configuration, CBIs may or may not be publicly reachable. Based on this empirical observation, we apply a heuristic that probes all candidate ABIs and CBIs from a vantage point in the public Internet (i.e. a node at the University of Oregon). Reachability (or unreachability) of a candidate CBI (or ABIs) from the public Internet offers independent evidence in support of our inference. Table 4 summarizes the fraction of identified ABIs (and thus their corresponding CBIs) that are confirmed by our individual (first row) and combined (second row) heuristics, respectively. We observe that our heuristics collectively confirmed 87.8% of all the inferred ABIs and thus 96.96% of the CBIs. The remaining 0.37k (or 9.81%) ABIs that do not match with any heuristic are interconnected with one (or multiple) CBIs that belong to a single organization. The resulting low rate of error in detecting the correct interconnection segments implies high confidence in the correctness of our inferred Amazon peerings. 121 Table 4. Number of candidate ABIs (and corresponding CBIs) that are confirmed by individual (first row) and cumulative (second row) heuristics. IXP Hybrid Reachable Individual 0.83k (13.66k) 2.05k (14.44k) 2.8k (15.14k) Cumulative 0.83k (13.66k) 2.26k (15.14k) 3.31k (24.23k) 4.5.2 Verifying Against Alias Sets. To further improve our ability to eliminate possible ambiguities in inferring the correct interconnection segments, we infer the router-level topology associated with all the candidate interconnections segments and determine the AS owner of individual routers. We consider any inferred interconnection segment to be correct if its ABI is on an Amazon router and its CBI is on a client router. In turn, for any incorrect segment, we first adjust the ownership of its corresponding ABI and CBI so as to be consistent with the determined router ownership and then identify the correct interconnection segment. To this end, we utilize MIDAR Bender et al. (2008) to perform alias resolution from VMs in all the regions where all the candidate ABIs and CBIs were observed. Each instance of this alias resolution effort outputs a set of (two or more) interfaces that reside on a single router. Given the potentially limited visibility of routers from different regions, we combine the alias sets from different regions that have any overlapping interfaces. Overall, we identify 2.64k alias sets containing 8.68k (2.31k ABI plus 6.37k CBI ) interfaces and their sizes have a skewed distribution. The direction of our traceroute probes (from Amazon towards client networks) and the fact that each router typically responds with the incoming interface suggest that the observed interfaces of individual Amazon (or client) 122 border routers in our traceroute (i.e. IPs in each alias set) should typically belong to the same AS. This implies that there should be a majority AS owner among interfaces in an alias set. To identify the AS owner of each router, we simply examine the AS owner of individual IPs in the corresponding alias set. The AS that owns a clear majority of interfaces in an alias set is considered as the owner of the corresponding router and all the interfaces in the alias set.6 We observe that for more than 94% (92%) of all alias sets, there is a single AS that owns >50% (100%) of all of an alias set’s interfaces. The remaining 6% of alias sets comprises 343 interfaces with a median set size of 2. We consider the majority AS owner of each alias set as the AS owner of (all interfaces for) that router. Using this information, we check all of the inferred ABIs and CBIs to ensure that they are on a router owned by Amazon and the corresponding client, respectively. Otherwise, we change their labels. This consistency check results in changing the status of only 45 interfaces (i.e. 18, 2, and 25 change from ABI → CBI, CBI → ABI, and CBI → CBI7 , respectively). These changes ultimately result in 3.77k ABIs and 24.76k CBIs associated with 3.55k unique ASes. 4.6 Pinning Interfaces In this section, we first explore techniques to pin (i.e. geo-locate) each end of the inferred Amazon peerings (i.e. all ABIs and CBIs) to a specific colo facility, metro area, or a region and then evaluate our pinning methodology. 4.6.1 Methodology for Pinning. Our method for pinning individual interfaces to specific locations involves two basic steps. In a first step, we identify 6 We also examined router ownership at the organization level by considering all ASNs that belong to a single organization. This strategy allows us to group all Amazon/client interfaces regardless of their ASN to accurately detect the AS owner. However, since we observed one ASN per ORG in 99% of the identified alias sets, we present here only the owner AS of each router. 7 This simply implies that the CBI interface belongs to another client. 123 a set of border interfaces with known locations that we call anchors. Then, in a second step, we establish two co-presence rules to iteratively infer the location of individual unpinned interfaces based on the location of co-located anchors or other already pinned interfaces. That is, in each iteration, we propagate the location of pinned interfaces to their co-located unpinned neighbors. Identifying Anchors. For ABIs or CBIs to serve as anchors for pinning other interfaces, we leverage the following four sources of information and consider them as reliable indicators of interface-specific locations. DNS Information (CBIs): A CBI 8 with specific location information embedded in its DNS name can be pinned to the corresponding colo or metro area. For example, a DNS name such as ae-4.amazon.atlnga05.us.bb.gin.ntt .net indicates that the CBI associated with NTT interconnects with Amazon in Atlanta, GA (atlnga). We use DNS parsing tools such as DRoP Huffaker, Fomenkov, and claffy (2014) along with a collection of hand-crafted rules to extract the location information (using 3-letter airport codes and full city names) from the DNS names of identified CBIs. In the absence of any ground truth, we check the inferred geolocation against the footprint of the corresponding AS from its PeeringDB listings or information on its webpage. Furthermore, we perform an RTT-constraint check using the measured RTTs from different Amazon regions to ensure that the inferred geolocation is feasible. This check, similar to DRoP Huffaker, Fomenkov, and claffy (2014), conservatively excludes 0.87k CBIs for which their inferred locations do not satisfy this RTT constraint. IXP Association (CBIs): CBIs that are part of an IXP prefix can be pinned to the colo(s) in a metro area where the IXP is present. In total we have identified 8 None of the ABIs had a reverse domain name associated with them. 124 671 IXPs within 471 (117) unique cities (countries) but exclude 10 IXPs (and their corresponding 366 CBIs) that are present in multiple metro areas as they cannot be pinned to a specific colo or metro area. Furthermore, we exclude all interfaces belonging to members that peer remotely. To determine those members, we first identified minIXRegion, the closest Amazon region to each IXP. We did this by measuring minIXRTT, the minimum RTT between the various regions and all interfaces that are part of the IXP and selecting minIXRegion as the Amazon region where minIXRTT is attained. Then we measure the minimum RTT between all interfaces and minIXRegion and label an interface as “local" if its RTT value is no more than 2ms higher than minIXRTT. We note that for about 80% of IXPs, the measured minIXRTT is less than 1.5ms (i.e. most IXPs are in very close proximity to at least one AWS region). This effort results in labeling about 2k out of the encountered 3.5k IXP interfaces in our measurements as “local." Conversely, there are some 1.5k interfaces belonging to members that peer remotely. 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 0 5 10 15 20 25 0 10 20 30 40 min-RTT to ABIs (ms) min-RTT for Peering (ms) (a) (b) Figure 17. (a) Distribution of min-RTT for ABIs from the closest Amazon region, and (b) Distribution of min-RTT difference between ABI and CBI for individual peering links. Single Colo/Metro Footprint (CBIs): CBIs of an AS that are present only at a single colo or at multiple colos in a given metro area can be pinned to that metro area. To identify those ASes that are only present in a single colo or a single metro 125 area, we collect the list of all tenant ASes for 2.6k colo facilities from PeeringDB Lodhi, Larson, Dhamdhere, Dovrolis, et al. (2014) as well as the list of all IXP participants from PeeringDB and PCH. Native Amazon Colos (ABIs): Intuitively, ABIs that are located at colo facilities where Amazon is native (i.e. facilities that house Amazon’s main border routers) must exhibit the shortest RTT from the VM in the corresponding region. To examine this intuition, we use two data sources for RTT measurements: (i) RTT 9 values obtained through active probing of CBIs and ABIs; and (ii) RTT values collected as part of the traceroute campaign. Figure 17a shows the distribution of the minimum RTT between VMs in different regions of Amazon and individual ABIs. We observe a clear knee at 2ms where around 40% of all the ABIs exhibit shorter RTT from a single VM. Given that all Amazon peerings have to be established through colo facilities where Amazon is native, we pin all these ABIs to the native colo closest to the corresponding VM. In some metro areas where Amazon has more than one native colo, we conservatively pinned the ABIs to the corresponding metro area rather than to a specific native colo. Consistency Checking of Anchors. We perform two sets of consistency checks on the identified anchors. First, we check whether the inferred locations are consistent for those interfaces (i.e. 1.1k in total) that satisfy more than one of the four indicators we used to classify them as anchors. Second, we check for consistency across the inferred geolocation of different interfaces in any given alias set. These checks flagged a total of 66 (48 and 18) interfaces that had inconsistent geolocations and that we therefore excluded from our anchor list. These checks also highlight the conservative nature of our approach. In particular, by removing any 9 This probing was done for a full day and used exclusively ICMP echo reply messages that can only be generated by intermediate hops and not by the target itself. 126 anchors with inconsistent locations, we avoid the propagation of unreliable location information in our subsequent iterative pinning procedure (see below). The middle part of Table 5 presents the exclusive and cumulative numbers of CBI and ABI anchors (excluding the flagged ones) that resulted from leveraging the four utilized source of information. Inferring Co-located Interfaces. We use two co-presence rules to infer whether two interfaces are co-located in the same facility or same metro area. (i) Rule 1 (Alias sets): This rule states that all interfaces in an alias set must be co-located in the same facility. Therefore, if an alias set contains one (or more) anchor(s), all interfaces in that set can be pinned to the location of that (those) anchor(s). (ii) Rule 2 (Interconnections in a Single Metro Area): An Amazon peering is established between an Amazon border router and a client border router, and these routers are either in the same or in different colo/metro areas. Therefore, a small RTT between the two ends of an interconnection segment is an indication of their co-presence in at least the same metro area. The key issue is to determine a proper threshold for RTT delay to identify these co-located pairs. To this end, Figure 17b shows the distribution of the min-RTT differences between the two ends of all the inferred Amazon interconnection segments. While the min-RTT difference varies widely across all interconnection segments, the distribution exhibits a pronounced knee at 2ms, with approximately half of the inferred interconnection segments having min-RTT values less than this threshold. We use this threshold to separate interconnection segments that reside within a metro area (i.e. both ends are in the metro area) from those that extend beyond the metro area. Therefore, if one end of such a “short" interconnection segment is pinned, its other end can be pinned to the same metro area. 127 Table 5. The exclusive and cumulative number of anchor interfaces by each type of evidence and pinned interfaces by our co-presence rules. Anchor Interface Pinned Interface DNS IXP Metro Native Alias min-RTT Exc. 5.31k 2.0k 1.66k 1.42k 0.65k 5.38k Cum. 5.31k 6.73k 7.22k 8.64k 9.21k 14.37k Iterative Pinning. Given a set of initial anchors at known locations as input, we identify and pin the following two groups of interfaces in an iterative fashion: (i) all unpinned alias sets that contain one (or more) anchor(s), and (ii) the unpinned end of all the short interconnection segments that have only one end pinned. For both steps, we extend our pinning knowledge to other interfaces only if all anchors 10 unanimously agree with the geolocation of the unpinned interface . This iterative process ends when there is no more interface that meets our co-presence rules. Our pinning process requires only four rounds to complete. The right-hand side of Table 5 summarizes the exclusive and cumulative number of interfaces pinned by each co-presence rule. Including all the anchors, we are able to pin 45.05% (75.87%) of all the inferred CBIs (ABIs), and 50.21% of all border interfaces associated with Amazon’s peerings. Pinning at a Coarser Resolution. To better understand the reasons for being able to map only about half of all inferred interfaces associated with Amazon at the metro level, we next explore whether the remaining (14.21k) unpinned interfaces can be associated with a specific Amazon region based on their relative RTT distance. To this end, we examine the ratio of the two smallest min-RTT values for individual unpinned interfaces from each of the 15 Amazon regions. 1.11k of these 10 We observed such a conflict in the propagation of pinning information only for 179 (1.2%) interfaces 128 1.00 0.75 0.50 0.25 0.00 1 2 3 4 5 min-RTT ratio Figure 18. Distribution of the ratio of two lowest min-RTT from different Amazon regions to individual unpinned border interfaces. interfaces are only visible from a single region and therefore the aforementioned ratio is not defined for these interfaces. We associate these interfaces to the only region from which they are visible. Figure 18 depicts the CDF of the ratio for the remaining (13.1k) unpinned interfaces that are reachable from at least two regions and shows that for 57% of these interfaces the ratio of two lowest min- RTT is larger than 1.5, i.e. the interface’s RTT is 50% larger for one region. We map these interfaces to the region with the lowest delay. The relatively balanced min-RTT values for the remaining 43% of interfaces is mainly caused by the limited geographic separation of some regions. For example, the relatively short distance between Virginia and Canada, or between neighboring European countries makes it difficult to reliably associate some of the interfaces that are located between them using min-RTT values. This coarser pinning strategy can map 8.67k (30.37%) of the remaining interfaces (0.62k ABIs and 8.05k CBIs) to a specific region which improves the overall coverage of the pinning process to a total of 80.58%. However, because of the coarser nature of pinning, we do not consider these 30.37% of interfaces for the rest of our analysis and only focus on those 50.21% that we pinned at the metro (or finer) level. 129 4.6.2 Evaluation of Pinning. Accuracy. Given the lack of ground truth information for the exact location of Amazon’s peering interfaces, we perform cross-validation on the set of identified anchors to enhance the confidence in our pinning results. Specifically, we perform a 10-fold stratified cross-validation with a 70-30 split for train-test samples. We employ stratified sampling Diamantidis, Karlis, and Giakoumakis (2000) to maintain the distribution of anchors within each metro area and avoid cases where test samples are selected from metro areas with fewer anchors. We run our pinning process over the training set and measure both the number of pinned interfaces that match the test set (recall) and the number of pinned interfaces which agree geolocation-wise with the test set (precision). The results across all rounds are very consistent, with a mean value of 99.34% (57.21%) for precision (recall) and a standard deviation of 1.6 ∗ 10−3 (5.5 ∗ 10−3 ). The relatively low recall can be attributed to the lack of known anchors in certain metro areas that prevented pinning information from propagating. The high precision attests to the conservative nature of our propagation technique (i.e. inconsistent anchors are removed and interfaces are only pinned when reliable (location) information is available) and highlights the low false positive rate of our pinning approach. Geographic Coverage. We examine the coverage of our pinning results by comparing the cities where Amazon is known to be present against the metros where we have pinned border interfaces. Combining the reported list of served cities by Amazon Amazon (2018d) and the list of PeeringDB-provided cities PeeringDB (2017) where Amazon establishes public or private peerings shows that Amazon is present in 74 metro areas. Our pinning strategy has geo-located Amazon-related border interfaces to 305 different metro areas across the world that cover all but 130 Table 6. Number (and percentage) of Amazon’s VPIs. These are CBIs that are also observed by probes originated from Microsoft, Google, IBM, and Oracle’s cloud networks. Microsoft Google (%) IBM (%) Oracle (%) (%) Pairwise 4.69k (18.93) 0.79k (3.17) 0.23k (0.94) 0 (0) Cumulative 4.69k (18.93) 4.93k (19.91) 5.01k (20.23) 5.01k (20.23) three metro areas from Amazon’s list, namely Bangalore (India), Zhongwei (China), and Cape Town (South Africa). While it is possible for some of our discovered, but unpinned CBIs to be located in these metros, we lack anchors in these three metros to reliably pin any interface to these locations. Finally, that our pinning strategy results in a significantly larger number of observed metros than the 74 metro areas reported by Amazon should not come as a surprise in view of the many inferred remote peerings where we have sufficient evidence to reliably pin the corresponding CBIs. 4.7 Amazon’s Peering Fabric In this section, we first present a method to detect whether an inferred Amazon-related interconnection is virtual (§ 4.7.1). Then we utilize various attributes of Amazon’s inferred peerings to group them based on their type (§ 4.7.2) and reason about the differences in peerings across the identified groups (§ 4.7.3). Finally, we characterize the entire inferred Amazon connectivity graph (§ 4.7.4). 4.7.1 Detecting Virtual Interconnections. To identify private peerings that rely on virtual interconnections, we recall that a VPI is associated with a single (CBI ) port that is utilized by a client to exchange traffic with one or more cloud providers (or other networks) over a layer-2 switching fabric. Therefore, 131 a CBI that is common to two or more cloud providers must be associated with a VPI. Motivated by this observation, our method for detecting VPIs consists of the following three steps. First, we create a pool of target IP addresses that is composed of all identified non-IXP CBIs for Amazon, each of their +1 next IP address, and all the destination IPs of those traceroutes that led to the discovery of individual unique CBIs. Second, we probe each of these target IPs from a number of major cloud providers other than Amazon and infer all the ABIs and CBIs along with the probes that were launched from these other cloud providers (using the methodology described in § 4.4). Finally, we identify any overlapping CBIs that were visible from two (or more) cloud providers and consider the corresponding interconnection to be a VPI. Note that this method yields a lower bound for the number of Amazon-related VPIs as it can only identify VPIs whose CBIs are visible from the considered cloud service providers. Any VPI that is not used for exchanging traffic with multiple cloud provider is not identified by this method. Furthermore, we are only capable of identifying VPIs which utilize public IP addresses for their CBIs Amazon (2018b). VPIs utilizing private addresses are confined to the virtual private cloud (VPC) of the customer and are not visible from anywhere within or outside of Amazon’s network. Applying this method, we probed nearly 327k IPs in our pool of target IP addresses from VMs in all regions of each one of the following four large cloud providers: Microsoft, Google, IBM, and Oracle. The results are shown in Table 6 where the first row shows the number of pairwise common CBIs between Amazon and other cloud providers. The second row shows the cumulative number of overlapping CBIs. From this table, we observe that roughly 20% of Amazon’s CBIs are related to VPIs as they are visible from at least one other of the four considered 132 cloud provider. While roughly 19% of VPIs are common between Amazon and Microsoft, there is no overlap in VPIs between Amazon and Oracle. Only 0.1% of Amazon’s CBIs are common with Microsoft, Google and IBM. Note that our method incorrectly identifies a VPI if a customer’s border router is directly connected to Amazon but responds to our probe with a default or 3rd party interface. However, either of these two scenarios is very unlikely. For one, recall (§ 4.4) that we use UDP probes and do not consider a target interface as a CBI to avoid a response by the default interface Baker (1995). Furthermore, our method selects +1 IP addresses as traceroute targets (i.e. during the expansion probing) to increase the likelihood that the corresponding traceroutes cross the same CBI without directly probing the CBI itself. Also, the presence of a customer border router that responds with a third party interface implies that the customer relies on the third party for reaching Amazon while directly receiving downstream traffic from Amazon. However, such a setting is very unlikely for Amazon customers. 4.7.2 Grouping Amazon’s Peerings. To study Amazon’s inferred peering fabric, we first group all the inferred peerings/interconnections based on the following three key attributes: (i) whether the type of peering relationship is public or private, (ii) whether the corresponding AS link is present in public BGP feeds, and (iii) in the case of private peerings, whether the corresponding interconnection is physical or virtual (VPI). A peering is considered to be public (bi-lateral or multi-lateral) if its CBI belongs to an IXP prefix. We also check whether the corresponding AS relationship is present in the public BGP data by utilizing CAIDA’s AS Relationships dataset CAIDA (2018) corresponding to the dates of our data collection. Although this dataset is widely used for AS 133 Table 7. Breakdown of all Amazon peerings based on their key attributes. Group ASes(%) CBIs(%) ABIs(%) Pb-nB 2.52k (71) 3.93k (16) 0.79k (21) Pb-B 0.20k (5) 0.56k (2) 0.56k (15) Pb 2.69k (76) 4.46k (18) 0.83k (22) Pr-nB-V 0.24k (7) 2.99k (12) 0.54k (14) Pr-nB-nV 1.1k (31) 10.24k (41) 2.59k (69) Pr-nB 1.18k (33) 13.24k (53) 2.68k (71) Pr-B-nV 0.11k (3) 5.67k (23) 2.07k (55) Pr-B-V 0.06k (2) 2.09k (8) 0.33k (9) Pr-B 0.12k (3) 7.76k (31) 2.11k (56) relationship information, its coverage is known to be limited by the number and placement of BGP feed collectors (e.g., see Luckie, Huffaker, Dhamdhere, Giotsas, et al. (2013) and references therein). Table 7 gives the breakdown of all of Amazon’s inferred peerings into six groups based on the aforementioned three attributes. We use the labels Pr/Pb to denote private/public peerings, B/nB for being visible/not visible in public BGP feeds, and V/nV for virtual/non-virtual peerings (applies only in the case of private interconnections). For example, Pr-nB-nV refers to the number of Amazon’s (unique) inferred private peerings that are not seen in public BGP feeds and are not virtual (e.g. cross connections). Each row in Table 7 shows the number (and 134 percentage) of unique AS peers that establish certain types of peerings, along with the number (and percentage) of corresponding CBIs and ABIs for those peers. Since there are overlapping ASes and interfaces between different groups, Table 7 also presents three rows (i.e. rows 3, 6, and 9 with italic fonts) that aggregate the information for the two closely related prior pair of rows/groups. These three aggregate rows provide an overall view of Amazon’s inferred peering fabric that highlight two points of general interest: (i) While 76% of Amazon’s peers use Pb peering, only 33% of Amazon’s peers use Pr-nB (virtual or physical) peerings, with the overlap of about 10% of peer ASes relying on both Pr-nB and Pb peerings, and the fraction of Pr-B peerings being very small (3%). (ii) The average number of CBIs (and ABIs) for ASes that use Pr-B, Pr-nB and Pb peerings to interconnect with Amazon is 65 (17), 11 (2), and 2 (0.3), respectively. Hidden Peerings. Note that there are groups of Amazon’s inferred peerings shown in Table 7 (together with their associated traffic) that remain in general hidden from the measurement techniques that are commonly used for inferring peerings (e.g. traceroute). One such group consists of all the virtual peerings (Pr- *-V) since they are used to exchange traffic between customer ASes of Amazon (or their downstream ASes) and Amazon. The second group is made up of all other non-virtual peerings that are not visible in BGP data, namely Pr-nB-nV and even Pb-nB. The presence of these peerings cannot be inferred from public BGP data and their associated traffic is only visible along the short AS path to the customer AS. These hidden peerings make up 33.29% of all of Amazon’s inferred peerings and their associated traffic is carried over Amazon’s private backbone and not over the public Internet. 135 Table 8. Hybrid peering groups along with the number of unique ASes for each group. Different Types of Hybrid Peering #ASN Pb-nB 2187 Pr-nB-nV 686 Pr-nB-nV; Pb-nB 207 Pb-B 117 Pr-nB-nV; Pr-nB-V 83 Pr-nB-nV; Pb-nB; Pr-nB-V 60 Pb-nB; Pr-nB-V 41 Pr-nB-V 38 Pr-B-nV; Pb-B 37 Pr-B-V; Pr-B-nV; Pb-B 31 Pr-B-nV 24 Pr-B-V; Pr-B-nV 16 Pr-nB-nV; Pr-B-nV; Pr-B-V 5 Pr-B-V; Pb-B 4 Pr-B-V 4 Pb-nB; Pb-B 2 Pr-nB-nV; Pr-B-nV; Pr-B-V; Pb-B 2 Pr-nB-nV; Pr-B-nV 1 Pr-nB-nV; Pr-B-nV; Pb-B 1 Pr-nB-nV; Pr-nB-V; Pr-B-nV 1 Pr-nB-nV; Pr-nB-V; Pr-B-nV; Pr-B-V; Pb-B 1 Hybrid Peering. Individual ASes may establish multiple peerings of different types (referred to as “hybrid" peering) with Amazon; that is, appear as a member of two (or more) groups in Table 7. We group all ASes that establish such hybrid peering based on the combination of peering types that are listed in Table 7 types and that they maintain with Amazon. The following are two of the most common hybrid peering scenarios we observe. Pr-nB-nV + Pb-nB: With 207 ASes, this is the largest group of ASes which utilize hybrid peering. Members of this group use both types of peerings to exchange their own traffic with Amazon and include ASes such as Akamai, Intercloud, Datapipe, Cloudnet, and Dell. Pr-nB-nV; Pb-nB; Pr-nB-V: This group is similar to the first group one but its members also utilize 136 virtual peerings to exchange their own traffic with Amazon. This group consists of 60 ASes that include large providers such as Google, Microsoft, Facebook, and Limelight. Table 8 gives a detailed breakdown of the observed hybrid (and non- hybrid) peering groups and shows for each group the number of ASes that use that peering group. Note that each AS is counted only once in the group that has the most specific peering types. 4.7.3 Inferring the Purpose of Peerings. In an attempt to gain insight into how each of the six different groups of Amazon’s peerings is being used in practice, we consider a number of additional characteristics of the peers in each group and depict those characteristics using stacked boxplots as shown in Figure 19. In particular, starting with the top row in Figure 19, we consider 11 summary distributions of (i) size of customer cone of peering AS (i.e. number of /24 prefixes that are reachable through the AS (labeled as "BGP /24"); (ii) number of /24 prefixes that are reachable from Amazon through the identified CBIs associated with each peering; (iii) number of ABIs for individual peering AS; (iv) number of CBIs for individual peering AS; (v) min RTT difference between both ends of individual peering; (vi) number of unique metro areas that the CBIs of each peering AS have been pinned to (see § 4.6). For example, we view the number of /24 prefixes in the customer cone of an AS to reflect the AS’s size/role (i.e. as tier-1 or tier-2 AS) in routing Internet traffic. Moreover, comparing the number of /24 prefixes in the customer cone with the number of reachable /24 prefixes through a specific peering for an AS reveals the purpose of the corresponding peering to route traffic to/from Amazon from/to its downstream networks. In the following, we discuss how the combined 11 For ASes that utilize hybrid peering with Amazon, the reported information in each group only includes peerings related to that group. 137 information in Table 7 and Figure 19 sheds light on Amazon’s global-scale peering fabric and illuminates the different roles of the six groups of peering ASes. Pb-nB. The peers in this group are typically edge networks with a small customer cone (including content, enterprise, and smaller transit/access networks) that exchange traffic with Amazon through a single CBI at an IXP. The corresponding routes are between Amazon and these edge networks and are thus not announced in BGP. Peers in this group include CDNs like Akamai, small transit/access providers like Etisalat, BT, and Floridanet, and enterprises such as Adobe, Cloudflare, Datapipe (Rackspace), Google, Symantec, LinkedIn, and Yandex. Pb-B. This group consists mostly of tier-2 transit networks with moderate-sized customer cones. These networks are present at a number of IXPs to connect their their downstream customer networks to Amazon. The corresponding routes must be announced to downstream ASes and are thus visible in BGP. Example peers in this group are CW, DigitalOcean, Fastweb, Seabone, Shaw Cable, Google Fiber, and Vodafone. Pr-nB-V. The peers in this group are a combination of small transit providers and some content and enterprise networks. They establish VPIs at a single location to exchange either their own traffic or the traffic of their downstream networks with Amazon through a VPI. Therefore, their peering is not visible in BGP. About 85% of these peers are visible from two cloud providers while the rest is visible from more than two cloud providers. Examples of enterprise and content networks in this group are Apple, UCSD, UIOWA, LG, and Edgecast, and examples of transit networks are Rogers, Charter, and CenturyLink. Pr-nB-nV. These peers appear to establish physical interconnections (i.e. cross- connects) with Amazon since they are not reachable from other cloud providers. 138 10 5 BGP /24 10 3 10 1 Reachable /24 10 5 10 3 10 1 10 2 ABIs 10 1 10 0 10 2 CBIs 10 1 10 0 RTT Diff (ms) 10 2 10 1 10 0 15 Metros 10 5 Pb-nB Pb-B Pr-nB-V Pr-nB-nV Pr-B-nV Pr-B-V Figure 19. Key features of the six groups of Amazon’s peerings (presented in Table 7) showing (from top to bottom): the number of /24 prefixes within the customer cone of peering AS, the number of probed /24 prefixes that are reachable through the CBIs of associated peerings of an AS, the number of ABIs and CBIs of associated of an AS, the difference in RTT of both ends of associated peerings of an AS, and the number of metro areas which the CBIs of each peering AS have been pinned to. However, given the earlier-mentioned under-counting of VPIs by our method, we hypothesize that some or all of these peerings could be associated with VPIs, 139 similar to the previous group. The composition of the peers in this group is comparable to Pr-nB-V but includes a larger fraction of enterprise networks (i.e. main users of VPIs) which in turn is consistent with our hypothesis. Examples of peers in this group are enterprises such as Datapipe (Rackspace), Chevron, Vox- Media, UToronto, and Georgia-Tech, CDNs such as Akamai and Limelight and transit/access providers like Comcast. To further examine our hypothesis, we parse the DNS names of 4.85k CBIs associated with peers in the Pr-nB group. 170 of these DNS names (100 from Pr-nB-nV and 70 from Pr-nB-V interfaces) contain VLAN tags, indicating the presence of a virtual private interconnection. We also observe some commonly used (albeit not required) keywords Amazon (2018e) such as dxvif (Amazon terminology for “direct connect virtual interface"), dxcon, awsdx and aws-dx for 125 (out of 170) CBIs where the “dx"-notation is synonymous with an interface’s use for “direct interconnections". We consider the appearance of these keywords in the DNS names of CBIs for this group of peerings (and only in this group) as strong evidence that the interconnections in question are indeed VPIs. Therefore, a subset of Pr-nB-nV interconnections is likely to be virtual as well. Pr-B-nV. The peers in this group are very large transit networks that establish cross-connections at various locations (many CBIs and ABIs) across the world). The large number of prefixes that are reachable through them from Amazon and the visibility of the peerings in BGP suggest that these peers simply provide connectivity for their downstream clients to Amazon. Given the large size of these transit networks, the visibility of these peerings in BGP is due to the announcement of routes from Amazon to all of their downstream networks. Intuitively, given the volume of aggregate traffic exchanged between Amazon and these large transit networks, the peers in this group have the largest number 140 of CBIs, and these CBIs are located at different metro areas across the world. Example networks in this group are AT&T, Level3 (now CenturyLink), GTT, Cogent, HE, XO, Zayo, and NTT. Pr-B-V. This group consists mostly a subset of the very large transit networks in Pr-B-nV and the peers in this group also establish a few VPIs (at different locations) with Amazon. The small number of prefixes that are reachable from Amazon through these peers along with the large number of CBIs per peer indicates that these peers bring specific Amazon clients (a provider or enterprise, perhaps even without an ASN) to a colo facility to exchange traffic with Amazon Amazon (2018c). The presence of these peerings in BGP is due to the role they play as transit networks in the Pr-B-nV group that is separate from peers in this group using virtual peerings. Example networks in this group are Cogent, Comcast, CW, GTT, CenturyLink, HE, and TimeWarner, all of which are listed as Amazon cloud connectivity partners Amazon (2018c); Google (2018c); Microsoft (2018b)) and connect enterprises to Amazon. When examining the min RTT difference between both ends of peerings across different groups (row 5 in Figure 19), we observe that both groups with virtual interconnections (Pr-B-V and Pr-nB-V) have in general larger values than the other groups. This observation is in agreement with the fact that many of these VPIs are associated with enterprises that are brought to the cloud exchange by access networks using layer-2 connections. Coverage of Amazon’s Interconnections. Although the total number of peerings that Amazon has with its customers is not known, our goal here is to provide a baseline comparison between Amazon’s peering fabric that is visible in public BGP data and Amazon’s peering fabric as inferred by our approach. Using our approach, we have identified 3.3k unique peerings for Amazon. In contrast, 141 there are only 250 unique Amazon peerings reported in BGP, and 226 of them are also discovered by our approach. Upon closer examination, for some of the 24 peerings that are seen in BGP but not by our approach, we observed a sibling of the corresponding peer ASes. This brings the total coverage of our method to about 93% of all reported Amazon peerings in BGP. In addition, we report on more than 3k unique Amazon peerings that are not visible in public BGP data. These peerings with Amazon and their associated traffic are not visible when relying on more conventional measurement techniques. 4.7.4 Characterizing Amazon’s Connectivity Graph. Having focused so far on groups of peerings of certain types or individual AS peers, we next provide a more holistic view of Amazon’s inferred peering graph and examine some of its basic characteristics. We first produce the Interface Connectivity Graph (ICG) between all the inferred border interfaces. ICG is a bipartite graph where each node is a border interface (an ABI or a CBI ) and each edge corresponds to the traceroute interconnection segment (ICS) between an ABI and a CBI . We also annotate each edge with the difference in the minimum RTT from the closest VM to each end of the ICS.12 Intuitively, we expect the resulting ICG to have a separate partition that consists of interconnections associated with each region, i.e. ABIs of a region connecting to CBIs that are supported by them. However, we observe that the ICG’s largest connected component consists of the vast majority (92.3%) of all nodes. This implies that there are links between ABIs in each Amazon region and CBIs in several other regions. Upon closer examination of 57.85% of all the peerings that have both of their ends pinned, we notice that a majority of these 12 We identify the VM that has the shortest RTT from an ABI and use the min-RTT of the same VM from the corresponding CBI to determine the RTT of an ICS. 142 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 10 0 10 1 10 2 10 3 0 10 20 30 40 Degree of ABIs Degree of CBIs (a) (b) Figure 20. Distribution of ABIs (log scale) and CBIs degree in left and right figures accordingly. peerings (98%) are indeed contained within individual Amazon regions. However, we do encounter remote peerings between regions that are a significant geographical distance apart. For example, there are peerings between FR and KR, US-VA and SG, AU and CA. The large fraction of peerings with only one end or no end pinned (about 42%) suggests that the actual number of remote peerings is likely to be much larger. These remote peerings are the main reason for why the ICG’s largest connected component contains more than 92% of all border interfaces. To illustrate the basic connectivity features of the bi-partite ICG, Figures 20a and 20b show the distributions of the number of CBIs that are associated with each individual ABIs (degree of ABIs) and the number of ABIs associated with individual CBIs (degree of CBIs). We observe a skewed distribution for ABI degree where 30%, 70%, and 95% of ABIs are associated with 1, <10, and <100 CBIs, respectively. Roughly 50% (90%) of CBIs are associated with a single (≤ 8) ABIs. A closer examination shows that high degree CBIs are mainly associated with Amazon’s public peerings with large transit networks (e.g. GTT, Cogent, NTT, CenturyLink). In contrast, a majority of high degree ABIs is associated with private, non-BGP, non-virtual peerings (see § 4.7). 143 4.8 Inferring Peering with bdrmap As stated earlier in § 4.2, bdrmap Luckie et al. (2016)13 is the only other existing tool for inferring border routers of a given network from traceroute data. With Amazon as the network of interest, our setting appears to be a perfect fit for the type of target settings assumed by bdrmap. However, there are two important differences between the cloud service provider networks we are interested in (e.g. Amazon) and the more traditional service provider network that bdrmap targets (e.g. a large US Tier-1 network). First, not only can the visibility of different prefixes vary widely across different Amazon regions, but roughly one-third of Amazon’s peerings are not visible in BGP and even some of the BGP-visible peerings of a network are related to other instances of its peerings with Amazon (§ 4.7). At the same time, bdrmap relies on peering relationships in BGP to determine the targets for its traceroute probes and also uses them as input for some of its heuristics. Therefore, bdrmap’s outcome is affected by any inconsistent or missing peering relationship in BGP. Second, as noted earlier, our traceroute probes reveal hybrid Amazon border routers that have both Amazon and client routers as their next hop and connect to them. This setting is not consistent with bdrmap’s assumption that border routers should be situated exclusively in the host or peering network. Given these differences, the comparison below is intended as a guideline for how bdrmap could be improved to apply in a cloud-centric setting. Thanks to special efforts by the authors of bdrmap who modified their tool so it could be used for launching traceroutes from cloud-based vantage points (i.e., VMs), we were able to run it in all Amazon regions to compare the bdrmap- inferred border routers with our inference results. bdrmap identified 4.83k ABIs 13 MAP-IT Marder and Smith (2016) and bdrmapIT Alexander et al. (2018) are not suitable for this setting since we have layer-2 devices at the border. 144 and 9.65k CBIs associated with 2.66k ASes from all global regions. 3.23k of these CBIs belong to IXP prefixes and are associated with 1.81k ASes. Given bdrmap’s customized probing strategy and its extensive use of different heuristics, it is not feasible to identify the exact reasons for all the observed differences between bdrmap’s and our findings. However, we were able to identify the following three major inconsistencies in bdrmap’s output. First, bdrmap does not report an AS owner for 0.32k of its inferred CBIs (i.e. owner is AS0). Second, instances of bdrmap that run in different Amazon regions report different AS owners for more than 500 CBIs, sometimes as many as 4 or 5 different AS owners for an interface. Third, running instances of bdrmap in different Amazon regions results in inconsistent views of individual border router interfaces; e.g. one and the same interface is inferred to be an ABI from one region and a CBI from another region. We identified 872 interfaces that exhibit this inconsistency. Furthermore, the fact that 97% (846 out of 872) of the interfaces with this type of inconsistency are advertised by Amazon’s ASNs indicates that the AS owner for these interfaces have been inferred by bdrmap’s heuristics. When comparing the findings of bdrmap against our methodology in more detail, we observed that our methodology and bdrmap have 1.85k, 5.48k, and 2k ABI , CBI , and ASes in common. However, without access to ground truth, a full investigation into the various points of disagreement is problematic. To make the problem more tractable, we limit our investigation to the 0.65k ASes that were exclusively identified by bdrmap and try to rely on other sources of information to confirm or dismiss bdrmap’s findings. These exclusive ASNs belong to 0.18k (0.49k) IXP (private) peerings. For IXP peerings, we compare bdrmap’s findings against IP-to-ASN mappings that are published by IXP operators or rely on embedded 145 information within DNS names. The inferences of bdrmap is only aligned for 42 of these peers. For the 0.49k private peerings we focus on inferences that were made by the thirdparty heuristic as it constitutes the largest (62%) fraction of bdrmap- exclusive private peerings (for details, see § 5.4 in Luckie et al. (2016)). These ASes are associated with 375 CBIs and we observe 66 (60 ASNs) of these interfaces in our data. For each of these 66 CBIs, we calculate the set of reachable destination ASNs through these CBIs and determine the upstream provider network for each one of these destination ASes using BGP data CAIDA (2018). Observing more than one or no common provider network among reachable destination ASes for individual CBIs would invalidate the application of bdrmap’s thirdparty heuristic, i.e. bdrmap wouldn’t have applied this heuristic if it had done more extensive probing that revealed an additional set of reachable destination ASes for these CBIs. We find that 50 (44 ASNs) out of the 66 common CBIs have more than one or no common providers for the target ASNs. Note that this observation does not invalidate bdrmap’s thirdparty heuristics but highlights its reliance on high-quality BGP snapshots and AS-relationship information. 4.9 Limitations of Our Study As a third-party measurement study of Amazon’s peering fabric that makes no use of Amazon-proprietary data and only relies on generally-available measurement techniques, there are inherent limitations to our efforts aimed at inferring and geo-locating all interconnections between Amazon and the rest of the Internet. This section collects and organizes the key limitations in one place and details their impact on our findings. Inferring Interconnections. Border routers responding to traceroute probes using a third-party address are a well-known cause for artifacts in traceroute 146 measurement output, and our IXP-client and Hybrid-IP heuristics used in § 4.5.1 are not immune to this problem. However, as reported in Luckie et al. (2014), the fraction of routers that respond with their incoming interface is in general above 50% and typically even higher in the U.S. In contrast, because of the isolation of network paths for VPIs of Amazon’s clients that use private addresses, any peerings associated with these VPIs are not visible to probes from VMs owned by other Amazon customers. As a result, our inference methodology described in § 4.4 cannot discover established VPIs that leverage private IP addresses. Pinning Interconnections. In § 4.6, we reported being able to pin only about half of all the inferred peering interfaces at the metro level. In an attempt to understand what is limiting our ability to pin the rest of the inferred interfaces, we identified two main reasons. First, there is a lack of anchors in certain regions, and second, there is the common use of remote peering. These two factors in conjunction with our conservative iterative strategy for pinning interfaces to the metro level make it difficult to provide enough and sufficiently reliable indicators of interface-specific locations. One way to overcome some of these limiting factors is by using a coarser scale for pinning (e.g. regional level). In fact, as shown in § 4.6, at the regional level, we are able to pin some 30% of the remaining interfaces which improves the overall coverage of our pinning strategy at the granularity of regions to about 80%. Other Observations. Although our study does not consider IPv6 addresses, we argue that the proposed methodology only requires minimal modifications (e.g. incorporating IPv6 target selection techniques Beverly et al. (2018); Gasser et al. 147 (2018)) to be applicable to infer IPv6 peerings. We will explore IPv6 peerings as part of future work. Like others before us, as third-party researchers, we found it challenging to validate our Amazon-specific findings. Like most of the large commercial provider networks, Amazon makes little, if any, ground truth data about its global-scale serving infrastructure publicly available, and our attempts at obtaining peering- related ground truth information from either Amazon, Amazon’s customers, operators of colo facilities where Amazon is native, or AWS Direct Connect Partners have been futile. Faced with the reality of a dearth of ground truth data, whenever possible, we relied on extensive consistency-checking of our results (e.g. see § 4.5, § 4.6). At the same time, many of our heuristics are conservative in nature, typically requiring agreement when provided with input from multiple complementary sources of information. As a result, the reported quantities in this chapter are in general lower bounds but nevertheless demonstrate the existence of a substantial number of Amazon-related peerings that are not visible to more conventional measurement studies and/or inference techniques. 4.10 Summary In this chapter, we present a measurement study of the interconnection fabric that Amazon utilizes on a global scale to run its various businesses, including AWS. We show that in addition to some 0.12k private peerings and about 2.69k pubic peerings (i.e., bi-lateral and multi-lateral peerings), Amazon also utilizes at least 0.24k (and likely many more) virtual private interconnections or VPIs. VPIs are a new and increasingly popular interconnection option for entities such as enterprises that desire highly elastic and flexible connections to the cloud providers 148 that offer the type of services that these entities deem critical for running their business. Our study makes no use of Amazon-proprietary data and can be used to map the interconnection fabric of any large cloud provider, provided the provider in question does not filter traceroute probes. Our findings emphasize that new methods are needed to track and study the type of “hybrid" connectivity that is in use today at the Internet’s edge. This hybrid connectivity describes an emerging strategy whereby one part of an Internet player’s traffic bypasses the public Internet (i.e. cloud service-related traffic traversing cloud exchange-provided VPIs), another part is handled by its upstream ISP (i.e. traversing colo-provided private interconnections), and yet another portion of its traffic is exchanged over a colo-owned and colo-operated IXP. As the number of businesses investing in cloud services is expected to continue to increase rapidly, multi-cloud strategies are predicted to become mainstream, and the majority of future workload-related traffic is anticipated to be handled by cloud-enabled colos Gartner (2016), tracking and studying this hybrid connectivity will require significant research efforts on parts of the networking community. Knowing the structure of this hybrid connectivity, for instance, is a prerequisite for studying which types of interconnections will handle the bulk of tomorrow’s Internet traffic, and how much of that traffic will bypass the public Internet, with implications on the role that traditional players such as Internet transit providers and emerging players such as cloud-centric data center providers may play in the future Internet. 149 CHAPTER V CLOUD CONNECTIVITY PERFORMANCE 5.1 Introduction In Chapter IV we presented and characterized different peering relationships that CPs form with various networks. This chapter focuses on the performance of various connectivity options that are at the disposal of enterprises for establishing end-to-end connectivity with cloud resources. The content in this chapter is the result of a collaboration between Bahador Yeganeh with Ramakrishnan Durairajan, Reza Rejaie, and Walter Willinger. Bahador Yeganeh is the primary author of this work and responsible for conducting all measurements and producing the presented analyses. 5.2 Introduction For enterprises, the premise of deploying a multi-cloud strategy1 is succinctly captured by the phrase “not all clouds are equal". That is, instead of considering and consuming compute resources as a utility from a single cloud provider (CP), to better satisfy their specific requirements, enterprise networks can pick-and-choose services from multiple participating CPs (e.g. rent storage from one CP, compute resources from another) and establish end-to-end connectivity between them and their on-premises server(s) at the same or different locations. In the process, they also avoid vendor lock-in, enhance the reliability and performance of the selected services, and can reduce the operational cost of deployments. Indeed, according to an industry report from late 2018 Krishna et al. (2018), 85% of the enterprises have already adopted multi-cloud strategies, and that number is expected to rise to 98% by 2021. Because of their popularity with enterprise networks, multi- 1 This is different from hybrid cloud computing, where a direct connection exists between a public cloud and private on-premises enterprise server(s). 150 cloud strategies are here to stay and can be expected to be one of the drivers of innovation in the future cloud services The enterprise deployment game-plan: why multi-cloud is the future (2018); Five Reasons Why Multi-Cloud Infrastructure is the Future of Enterprise IT (2018); The Future of IT Transformation Is Multi- Cloud (2018); The Future of Multi-Cloud: Common APIs Across Public and Private Clouds (2018); The Future of the Datacenter is Multicloud (2018); How multi-cloud business models will shape the future (2018); IBM bets on a multi-cloud future (2018). Fueled by the deployment of multi-cloud strategies, we are witnessing two new trends in Internet connectivity. First, there is the emergence of new Internet players in the form of third-party private connectivity providers (e.g. DataPipe, HopOne, among others Amazon (2018c); Google (2018b); Microsoft (2018c)). These entities offer direct, secure, private, layer 3 connectivity between CPs (henceforth referred to as third-party private (TPP)), at a cost of a few hundreds of dollars per month. TPP routes bypass the public Internet at Cloud Exchanges CoreSite (2018); Demchenko et al. (2013) and offer additional benefits to users (e.g. enterprise networks can connect to CPs without owning an Autonomous System Number, or ASN, or physical infrastructure). Second, the large CPs are aggressively expanding the footprint of their serving infrastructures, including the number of direct connect locations where enterprises can reach the cloud via direct, private connectivity (henceforth referred to as cloud-provider private (CPP)) using either new CP-specific interconnection services (e.g. Amazon (2018a); Google (2018a); Microsoft (2018a)) or third-party private connectivity providers at colocation facilities. Of course, a user can forgo the TPP and CPP options altogether and rely instead on the traditional, best-effort connectivity over the 151 public Internet—henceforth referred to as (transit provider-based) best-effort public (Internet) (BEP)—to employ a multi-cloud strategy. Cloud-provider private (CPP) backbone Enterprise Network CP 1 Best-effort public (BEP) Internet Virtual Machines CP2 Transit Transit Transit provider 1 provider 2 provider N Third-party private (TPP) backbone Cloud exchange Private peering Cloud router (CR) Figure 21. Three different multi-cloud connectivity options. To illustrate the problem, consider, for example, the case of a modern enterprise whose goal is to adopt a multi-cloud strategy (i.e. establishing end-to-end connectivity between (i) two or more CPs, i.e. cloud-to-cloud; and (ii) enterprise servers and the participating CPs, i.e. enterprise-to-cloud) that is performance- and cost-aware. For this scenario, let us assume that (a) the enterprise’s customers are geo-dispersed and different CPs are available in different geographic regions (i.e. latency matters for all customers); (b) regulations are in place (e.g. for file sharing and storing data in EU; hence, throughput matters for data transfers Example Applications Services (2018)); (c) cloud reliability and disaster recovery are important, especially in the face of path failures (i.e. routing matters); and (d) cost savings play an important role in connectivity decisions. Given these requirements, the diversity of CPs, the above-mentioned different connectivity options, and the lack of visibility into the performance tradeoffs, routing choices, and topological features associated with these multi-cloud connectivity options, the enterprise faces 152 the “problem of plenty": how to best leverage the different CPs’ infrastructures, the various available connectivity choices, and the possible routing options to deploy a multi-cloud strategy that achieves the enterprise’s performance and cost objectives? With multi-cloud connectivity being the main focus of this chapter, we note that existing measurement techniques are a poor match in this context. For one, they fall short of providing the data needed to infer the type of connectivity (i.e. TPP, CPP, and BEP) between (two or more) participating CPs. Second, they are largely incapable of providing the visibility needed to study the topological properties, performance differences, or routing strategies associated with different connectivity options. Last but not least, while mapping the connectivity from cloud/content providers to users has been considered in prior work (e.g. Anwar et al. (2015); Calder, Flavel, Katz-Bassett, Mahajan, and Padhye (2015); Calder et al. (2018); Chiu et al. (2015); Cunha et al. (2016); Schlinker et al. (2017) and references therein), multi-cloud connectivity from a cloud-to-cloud (C2C) perspective has remained largely unexplored to date. This chapter aims to empirically examine the different types of multi- cloud connectivity options that are available in today’s Internet and investigate their performance characteristics using non-proprietary cloud-centric, active measurements. In the process, we are also interested in attributing the observed characteristics to aspects related to connectivity, routing strategy, or the presence of any performance bottlenecks. To study multi-cloud connectivity from a C2C perspective, we deploy and interconnect VMs hosted within and across two different geographic regions or availability zones (i.e. CA and VA) of three large cloud providers (i.e. Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure) using the TPP, CPP, and BEP option, respectively. We note that 153 the high cost of using the services of commercial third-party private connectivity providers for implementing the TPP option prevents us from having a more global- scale deployment that utilizes more than one such provider. Using this experimental setup as a starting point, we first compare the stability and/or variability in performance across the three connectivity options using metrics such as delay, throughput, and loss rate over time. We find that CPP routes exhibit lower latency and are more stable when compared to BEP and TPP routes. CPP routes also have higher throughput and exhibit less variation compared to the other two options. Given that using the TPP option is expensive, this finding is puzzling. In our attempt to explain this observation, we find that inconsistencies in performance characteristics are caused by several factors including border routers, queuing delays, and higher loss-rates of TPP routes. Moreover, we attribute the CPP routes’ overall superior performance to the fact that each of the CPs has a private optical backbone, there exists rich inter-CP connectivity, and that the CPs’ traffic always bypasses (i.e. is invisible to) BEP transits. In summary, this chapter makes the following contributions: • To the best of our knowledge, this is one of the first efforts to perform a comparative characterization of multi-cloud connectivity in today’s Internet. To facilitate independent validation of our results, we will release all relevant datasets (properly anonymized; e.g. with all TPP-related information removed). • We identify issues, differences, and tradeoffs associated with three popular multi-cloud connectivity options and strive to elucidate/discuss the underlying reasons. Our results highlight the critical need for open measurement platforms and more transparency by the multi-cloud connectivity providers. 154 The rest of the chapter is organized as follows. We describe the measurement framework, cloud providers, performance metrics, and data collection in § 5.3. Our measurements results and root causes, both from C2C and E2C perspectives are in § 5.4 and § 5.5 respectively. We present the open issues and future work in § 5.6. Finally, we summarize the key findings of this chapter in § 5.7. 5.3 Measurement Methodology In this section, we describe our measurement setting and how we examine the various multi-cloud connectivity options, the cloud providers under consideration, and the performance metrics of interest. 5.3.1 Deployment Strategy. As shown in Figure 21, we explore in this chapter three different types of multi-cloud connectivity options: third-party private (TPP) connectivity between CP VMs that bypasses the public Internet, cloud-provider private (CPP) connectivity enabled by private peering between the CPs, and best-effort public (BEP) connectivity via transit providers. To establish TPPs, we identify the set of colocation facilities where connectivity partners offer their services Amazon (2018c); Google (2018b); Microsoft (2018c). Using this information, we select colocation facilities of interest (e.g. in the geo-proximity of cloud VMs) and deploy the third-party providers’ cloud routers (CRs) that interconnect virtual private cloud networks within a region or regions. The selection of CR locations can also leverage latency information obtained from the third-party connectivity providers. Next, based on the set of selected VMs and CRs we utilize third-party connectivity APIs to deploy CRs and establish virtual cloud interconnections between VMs and CRs to create TPPs. At a high level, this step involves (i) 155 establishing a virtual circuit between the CP and a connectivity partner, (ii) establishing a BGP peering session between the CP’s border routers and the partner’s CR, (iii) connecting the virtual private cloud gateway to the CP’s border routers, and (iv) configuring each cloud instance to route any traffic destined to the overlay network towards the configured virtual gateway. Establishing CPP connectivity is similar to TPP. The only difference is in the user-specified connectivity graph where in the case of CPP, CR information is omitted. To establish CPP connectivity, participating CPs automatically select private peering locations to stitch the multi-cloud VMs together. Finally, we have two measurement settings for BEP. While the first setting is between a non-native colocation facility in AZ and our VMs through the BEP Internet, the second form of measurement is towards Looking Glasses (LGs) residing in the colocation facility hosting our CRs, also traverses the BEP Internet, and only yields latency measurements. Our network measurements are performed in rounds. Each round consists of path, latency, and throughput measurements between all pairs of VMs (in both directions to account for route asymmetry) but can be expanded to include additional measurements as well. Furthermore, the measurements are performed over the public BEPs as well as the two private options (i.e. CPP and TPP). We avoid cross-measurement interference by tracking the current state of ongoing measurements and limit measurement activities to one active measurement per cloud VM. The results of the measurements are stored locally on the VMs (hard disks) and are transmitted to a centralized storage at the end of our measurement period. 156 5.3.2 Measurement Scenario & Cloud Providers. As mentioned earlier, the measurement setting is designed to provide visibility into multi-cloud deployments so as to be able to study aspects related to the topology, routing, and performance tradeoffs. Unfortunately, the availability of several TPP providers, and more importantly, the incurred costs for connecting multiple clouds using TPP connections are very high. For example, for each 1 Gbps link to a CP network, third party providers charge anywhere from about 300 to 700 USD per month Megaport (2019a); PacketFabric (2019); Pureport (2019)2 . Such high costs of TPP connections prevents us from having a global-scale deployment and the possibility of examining multiple TPP providers. Due to the costly nature of establishing TPP connections, we empirically measure and examine only one coast- to-coast, multi-cloud deployment in the US. The deployment we consider in this study is nevertheless representative of a typical multi-cloud strategy that is adopted by modern enterprises Megaport (2019b). More specifically, our study focuses on connectivity between three major CPs (AWS, Azure, and GCP) and one enterprise. To emulate realistic multi-cloud scenarios, each entity is associated with a geographic location.The deployments are shown in Figure 22. We select the three CPs as they collectively have a significant market share and are used by many clients concurrently ZDNet (2019). Using these CPs, we create a realistic multi-cloud scenario by deploying three CRs using one of the top third-party connectivity provider’s network; one in the Santa Clara, CA (CR-CA) region, one in the Phoenix, AZ (CR-AZ) region, and one in the Ashburn, VA (CR-VA) region. CR-CA is interconnected to CR-VA and CR-AZ. Furthermore, CR-CA and CR-VA are interconnected with native cloud VMs from Amazon, 2 Note that these price points do not take into consideration the additional charges that are incurred by CPs for establishing connectivity to their network. 157 Google, and Microsoft. To emulate an enterprise leveraging the multi-clouds, CR- AZ is connected to a physical server hosted within a colocation facility in Phoenix, AZ (server-AZ). CPP or TPP or BEP? CP1 CP1 Colocation Colocation Facility in CP2 CP2 Facility in Virginia California CP3 CP3 BE Po rT PP ? Enterprise in Arizona BEP CPP TPP Figure 22. Our measurement setup showing the locations of our VMs from AWS, GCP and Azure. A third-party provider’s CRs and line-of-sight links for TPP, BEP, and CPP are also shown. The cloud VMs and server-AZ are all connected to CRs with 50Mb/s links. We select the colocation facility hosting the CRs based on two criteria (i) CPs offer native cloud connectivity within that colo, and (ii) geo-proximity to the target CPs datacenters. CRs are interconnected with each other using a 150Mb/s link capacity that support the maximum (3 concurrent measurements in total to avoid more than 1 ongoing measurement per VM) number of concurrent measurements that we perform. Each cloud VM has at least 2 vCPU cores, 4GB of memory, and runs Ubuntu server 18.04 LTS. Our VMs were purposefully over-provisioned to reduce any measurement noise within virtualized environments. Throughout our measurements the VMs CPU utilization always remained below 2%. We also cap the VM interfaces at 50Mb/s to have a consistent measurement setting for both public (BEP) and private (TPP and CPP) routes. We perform measurements between all CP VMs within regions (intra-region), across regions (inter-region) for C2C analysis, and from server-AZ to VMs in CA for E2C analysis. Additionally, 158 we also perform measurements between our cloud VMs and two LGs that are located within the same facility as CR-CA and CR-VA, respectively, and use these measurements as baselines for comparisons (C2LG). Together, these efforts resulted in 60 pairs of measurements between CP instances (P (6, 2) ∗ 2 permutation of 2 pairs out of 6 CP VMs over 2 types of unidirectional network paths), 24 pairs of measurement between CP VMs and LGs (6 CP VMs * 2 LGs * 2 type of unidirectional network paths), and 12 pairs of measurement between server-AZ and west coast CP VMs (P (3, 2) ∗ 2 permutation of 3 west coast CP VMs over 2 types of unidirectional network paths). 5.3.3 Data Collection & Performance Metrics. Using our measurement setting, we conducted measurements for about a month in the Spring of 2019.3 We conduct measurements in 10-minute rounds. In each round, we performed latency, path, and throughput measurements between all pairs of relevant nodes. For each round, we measure and report the latency using 10 ping probes. We refrain from using a more accurate one-way latency measurement tool such as OWAMP as the authors of OWAMP caution its use within virtualized environments One-Way Ping (OWAMP) (2019). Similarly, paths are measured by performing 10 attempts of paris-traceroute using scamper Luckie (2010) towards each destination. We used ICMP probes for path discovery as they maximized the number of responsive hops along the forward path. Lastly, throughput is measured using the iperf3 tool, which was configured to transmit data over a 10-second interval using TCP. We discard the first 5 seconds of our throughput measurement to account for TCP’s slow-start phase and consider the median of throughput for 3 See § 5.3.5 for more details. 159 the remaining 5 seconds. These efforts resulted in about 30k latency and path samples and some 15k throughput samples between each measurement pair. To infer inter-AS interconnections, the resulting traceroute hops from our measurements were translated to their corresponding AS paths using BGP prefix announcements from Routeviews and RIPE RIS RIPE (2019); University of Oregon (2018). Missing hops were attributed to their surrounding ASN if the prior and next hop ASNs were identical. The existence of IXP hops along the forward path was detected by matching hop addresses against IXP prefixes published by PeeringDB PeeringDB (2017) and Packet Clearing House (PCH) Packet Clearing House (2017). Lastly, we mapped each ASN to its corresponding ORG number using CAIDA’s AS-to-ORG mapping dataset Huffaker et al. (2018). CPs are heterogeneous in handling path measurements. In our mappings, we observed the use of private IP addresses internally by CPs as well as on traceroutes traversing the three connectivity options. We measured the number of observed AS/ORGs (excluding hops utilizing private IP addresses) for inter-cloud, intra-cloud, and cloud-to-LG, and made the following two observations. First, of the three CPs, only AWS used multiple ASNs (i.e. ASes 8987, 14618, and 16509). Second, not surprisingly, we observed a striking difference between how CPs respond to traceroute probes. In particular, we noted that the differences in responses are dependent on the destination network and path type (public vs. private). For example, GCP does not expose any of its routers unless the target address is within another GCP region. Similarly, Azure does not expose its internal routers except for their border routers that are involved in peering with other networks. Finally, we found that AWS heavily relies on private/shared IP addresses for their internal network. These observations serve as motivation for our 160 characterization of the various multi-cloud connectivity options in § 5.4 and § 5.5 below. 5.3.4 Representation of Results. Distributions in this chapter are presented using letter-value plots Hofmann, Kafadar, and Wickham (2011). Letter- value plots, similar to boxplots, are helpful for summarizing the distribution of data points but offer finer details beyond the quartiles. The median is shown using a dark horizontal line and the 1/2i quantile is encoded using the box width, with the widest boxes surrounding the median representing the quartiles, the 2nd widest boxes corresponding to the octiles, etc. Distributions with low variance centered around a single value appear as a narrow horizontal bar while distributions with diverse values appear as vertical bars. Throughout this chapter we try to present full distributions of latency when it is illustrative. Furthermore, we compare latency characteristics of different paths using the median and variance measures and specifically refrain from relying on minimum latency as it does not capture the stability and dynamics of this measure across each path. 5.3.5 Ethical and Legal Considerations. This study does not raise any ethical issues. Overall, our goal in this study is to measure and improve multi-cloud connectivity without attributing particular features to any of the utilized third-party providers which might be in violation of their terms of service. Hence, we obfuscate, and wherever possible, omit all information that can be used to identify the colocation and third-party connectivity providers. This information includes names, supported measurement APIs, costs, time and date of measurements, topology information, and any other potential identifiers. 161 5.4 Characteristics of C2C routes In this section, we characterize the performance of C2C routes and attribute their characteristics to connectivity and routing. 5.4.1 Latency Characteristics. CPP routes exhibit lower latency than TPP routes and are stable. Figure 23 depicts the distribution of RTT values between different CPs across different connectivity options. The rows (from top to bottom) correspond to AWS, GCP, and Azure as the source CP, respectively. Intra-region (inter-region) measurements are shown in the left (right) columns, and CPP (TPP) paths are depicted in blue (orange). To complement Figure 23, the median RTT values comparing CPP and TPP routes are shown in Figure 24. From Figures 23 and 24, we see that, surprisingly, CPP routes typically exhibit lower medians of RTT compared to TPP routes, suggesting that CPP routes traverse the CP’s optical private backbone. We also observe a median RTT of ∼2ms between AWS and Azure VMs in California which is in accordance with the relative proximity of their datacenters for this region. The GCP VM in California has a median RTT of 13ms to other CPs in California, which can be attributed to the geographical distance between GCP’s California datacenter in LA and the Silicon Valley datacenters for AWS and Azure. Similarly, we notice that the VMs in Virginia all exhibit low median RTTs between them. We attribute this behavior to the geographical proximity of the datacenters for these CPs. At the same time, the inter-region latencies within a CP are about 60ms with the exception of Azure which has a higher median of latency of about 67ms. Finally, the measured latencies (and hence the routes) are asymmetric in both directions albeit the median of RTT values in Figure 24 shows latency symmetry (<0.1ms). 162 20 Intra Region AWS Inter Region CPP 80 15 TPP RTT (ms) 10 70 5 0 60 Z R :CA CP:CA CP:VA ZR:VA W S :VA CP:VA ZR:VA ZR:CA S:CA CP:CA A G G A A G -A A AW G CA- CA- VA- VA- CA- CA- CA VA- VA- VA- 20 Intra Region GCP Inter Region CPP 80 15 TPP RTT (ms) 10 70 5 0 60 Z R :CA WS:CA WS:VA ZR:VA W S :VA CP:VA ZR:VA ZR:CA S:CA CP:CA A A A A A G -A A AW G CA- CA- VA- VA- CA- CA- CA VA- VA- VA- 20 Intra Region AZR Inter Region CPP 80 15 TPP RTT (ms) 10 70 5 0 60 A A A A A A A A A A W S:C GCP:C WS:V GCP:V W S:V GCP:V AZR:V AZR:C WS:C GCP:C A A VA- A - A CA- CA- VA- CA- CA- CA VA- VA- VA- Figure 23. Rows from top to bottom represent the distribution of RTT (using letter-value plots) between AWS, GCP, and Azure’s network as the source CP and various CP regions for intra (inter) region paths in left (right) columns. CPP and TPP routes are depicted in blue and orange, respectively. The first two characters of the X axis labels encode the source CP region with the remaining characters depicting the destination CP and region. 163 CPP TPP AWS:CA 60.4 2.2 62.0 8.6 68.3 70.2 4.2 71.411.770.7 75 AWS:VA 60.4 66.8 1.3 62.3 1.2 70.3 71.3 3.8 78.2 3.0 60 AZR:CA 2.1 66.8 64.011.761.9 4.2 71.2 72.112.471.1 45 AZR:VA 62.0 1.4 64.1 65.3 1.8 71.4 3.9 72.2 80.4 3.8 30 GCP:CA 8.5 62.311.865.2 60.0 11.678.112.480.4 77.8 15 GCP:VA 68.3 1.2 61.9 1.8 59.9 70.7 3.0 71.1 3.9 77.9 AWS:CA AWS:VA GCP:VA AWS:CA AWS:VA AZR:CA AZR:VA GCP:CA AZR:CA AZR:VA GCP:CA GCP:VA Figure 24. Comparison of median RTT values (in ms) for CPP and TPP routes between different pairs. Also, the median of the measured latency between our CRs is in line with the published values by third-party connectivity providers, but the high variance of latency indicates that the TPP paths are in general a less reliable connectivity option compared to CPP routes. Lastly, BEP routes for C2LG measurements always have an equal or higher median of latency compared to CPP paths with much higher variability (order of magnitude larger standard deviation). Results are omitted for brevity and to avoid skewed scales in current figures. 5.4.2 Why do CPP routes have better latency than TPP routes?. CPP routes are short, stable, and private. Figure 25a depicts the distribution of ORG hops for different connectivity options. We observe that intra-cloud paths always have a single ORG, indicating that regardless of the target region, the CP routes traffic internally towards the destination VM. More interestingly, the majority of inter-cloud paths only observe two ORGs corresponding to the source and destination CPs. Only a small fraction (<4%) of paths involves three ORGs, and upon closer examination of the corresponding paths, we find that they traverse IXPs and involve traceroutes that originate from Azure and are destined to Amazon’s network in another region. We reiterate that 164 single ORG inter-CP paths correspond to traceroutes which are originated from GCP’s network and does not reveal any internal hops of its network. For the cloud- to-LG paths, we observe a different number of ORGs depending on the source CP as well as the physical location of the target LG. The observations range from only encountering the target LG’s ORG to seeing intermediary IXP hops as points of peering. Lastly, we measure the stability of routes at the AS-level and observe that all paths remain consistently stable over time with the exception of routes sourced at Azure California and destined to Amazon Virginia. The latter usually pass through private peerings between the CPs, and only less than 1% of our path measurements go through an intermediary IXP. In short, we did not encounter any transit providers in our measured CPP routes. ORG Hops 1 2 3 1.0 1.0 0.8 0.6 0.5 CDF 0.4 0.2 0.0 0.0 20 40 1 2 3 intra-CP inter-CP LG Hop Distance ORG Hops (a) (b) Figure 25. (a) Distribution for number of ORG hops observed on intra-cloud, inter- cloud, and cloud to LG paths. (b) Distribution of IP (AS/ORG) hop lengths for all paths in left (right) plot. CPs are tightly interconnected with each other in the US. Not observing any transit AS along our measured C2C paths motivated us to measure the prevalence of this phenomenon by launching VM instances within all US regions for our target CP networks. This results in a total of 17 VM instances corresponding to 8, 5, and 4 regions within Azure, GCP, and AWS. We perform 165 UDP and ICMP paris-traceroutes using scamper between all VM instances (272 unique pairs) in 10-minute rounds for four days and remove the small fraction (9 ∗ 10−5 ) of traceroutes that encountered a loop along the path. Overall, we observe that ICMP probes are better in revealing intermediate hops as well as reaching the destination VMs. Similar to § 5.3.3, we annotate the hops of the collected traceroutes with their corresponding ASN/ORG and infer the presence of IXP hops along the path. For each path, we measure its IP and AS/ORG hop length and show in Figure 25b the corresponding distributions. C2C paths exhibit a median (0.9 percentile) IP hop length of 22 (33). Similar to our initial C2C path measurements, with respect to AS/ORGhop length, we only observe ORGs corresponding to the three target CPs as well as IXP ASNs for Coresite Any2 and Equinix. All ORG hop paths passing through an IXP correspond to paths which are sourced from Azure and are destined to AWS. The measurements further extend our initial observation regarding the rich connectivity of our three large CPs and their tendency to avoid exchanging traffic through the public Internet. On the routing models of multi-cloud backbones. By leveraging the AS/ORG paths described in § 5.3, we next identify the peering points between the CPs. Identifying the peering point between two networks from traceroute measurements is a challenging problem and the subject of many recent studies Alexander et al. (2018); Luckie et al. (2016); Marder and Smith (2016). For our study, we utilized the latest version of bdrmapIT Alexander et al. (2018) to infer the interconnection segment on the collection of traceroutes that we have gathered. Additionally, we manually inspected the inferred peering segments and, where applicable, validated their correctness using (i) IXP address to tenant ASN mapping and (ii) DNS names such as amazon.sjc-96cbe-1a.ntwk.msn.net 166 which is suggestive of peering between AWS and Azure. We find that bdrmapIT is unable to identify peering points between GCP and the other CPs since GCP only exposes external IP addresses for paths destined outside of its network, i.e. bdrmapIT is unaware of the source CPs network as it does not observe any addresses from that network on the initial set of hops. For these paths, we choose the first hop of the traceroute as the peering point only if it has an ASN equal to the target IP addresses ASN.Using this information, we measure the RTT between the source CP and the border interface to infer the geo-proximity of the peering point from the source CP. Using this heuristic allows us to analyze each CP’s inclination to use hot-potato routing. AWS GCP AZR 75 intra 75 75 inter RTT (ms) 50 50 50 25 25 25 0 0 0 GCP AZR AZR AWS GCP AWS Figure 26. Distribution of RTT between the source CP and the peering hop. From left to right plots represent AWS, GCP, and Azure as the source CP. Each distribution is split based on intra (inter) region values into the left/blue (right/orange) halves, respectively. Figure 26 shows the distribution of RTT for the peering points between each CP. From left to right, the plots represent AWS, GCP, and Azure as the source CP. Each distribution is split based on intra (inter) region values into the left/blue (right/orange) halves, respectively. We observe that AWS’ peering points with other CPs are very close to their networks and therefore, AWS is employing hot- potato routing. For GCP, we find that hot-potato routing is never employed and 167 traffic is always handed off near the destination region. The bi-modal distribution of RTT values for each destination CP is centered at around 2ms, 12ms, 58ms, and 65ms corresponding to the intra-region latency for VA and CA, inter-region latency to GCP, and inter-region latency to other CPs, respectively. Finally, Azure exhibits mixed routing behavior. Specifically, Azure’s routing behavior depends on the target network – Azure employs hot-potato routing for GCP, its Virginia-California traffic destined to AWS is handed off in Los Angeles, and for inter-region paths from California to AWS Virginia, the traffic is usually (99%) handed off in Dallas TX and for the remainder is being exchanged through Digital Realty Atlanta’s IXP. From these observations, the routing behavior for each path can be modeled with a simple threshold-based method. More concretely, for each path i with an end-to-end latency of lei and a border latency of lbi , we can infer if source CP 1 employs hot-potato routing if lbi < l . 10 ei Otherwise, the source CP employs cold- 9 1 9 potato routing (i.e. lbi > l ). 10 ei The fractions (i.e. 10 and 10 ) are not prescriptive and are derived based on the latency distributions depicted in Figure 26. 5.4.3 Throughput Characteristics. CPP routes exhibit higher and more stable throughput than TPP routes. Figure 27 depicts the distribution of throughput values between different CPs using different connectivity options. While intra-region measurements tend to have a similar median and variance of throughput, we observe that with respect to inter-region measurements, TPPs exhibit a lower median throughput with higher variance. Degradation of throughput seems to be directly correlated with higher RTT values as shown in Figure 23. Using our latency measurements, we also approximate loss-rate to be 10−3 and 10−4 for TPP and CPP routes, respectively. Using the formula of 168 Throughput (Mb/s) Intra Region AWS Inter Region 40 20 CPP TPP 0 Z R :CA CP:CA CP:VA ZR:VA W S :VA CP:VA ZR:VA ZR:CA S:CA CP:CA A G G A A G -A A AW G CA- CA- VA- VA- CA- CA- CA VA- VA- VA- Intra Region GCP Inter Region Throughput (Mb/s) 40 20 CPP TPP 0 Z R :CA WS:CA WS:VA ZR:VA W S :VA CP:VA ZR:VA ZR:CA S:CA CP:CA A A A A A G -A A AW G CA- CA- VA- VA- CA- CA- CA VA- VA- VA- Intra Region AZR Inter Region Throughput (Mb/s) 40 20 CPP TPP 0 A A A A A A A A A A W S:C GCP:C WS:V GCP:V W S:V GCP:V AZR:V AZR:C WS:C GCP:C A A VA- A - A CA- CA- VA- CA- CA- CA VA- VA- VA- Figure 27. Rows from top to bottom in the letter-value plots represent the distribution of throughput between AWS’, GCP’s, and Azure’s network as the source CP and various CP regions for intra- (inter-) region paths in left (right) columns. CPP and TPP routes are depicted in blue and orange respectively. Mathis et al. Mathis et al. (1997) to approximate TCP throughput4 , we can obtain 4 We do not have access to parameters such as TCP timeout delay and number of acknowledged packets by each ACK to use more elaborate TCP models (e.g. Padhye, Firoiu, Towsley, and Kurose (1998)). 169 an upper bound for throughput for our measured loss-rate and latency values. Figure 28 shows the upper bound of throughput for an MSS of 1460 bytes and several modes of latency and loss-rate. For example, the upper bound of TCP throughput for a 70ms latency and loss-rate of 10−3 (corresponding to the average measured values for TPP routes between two coasts) is about 53Mb/s. While this value is higher than our interface/link bandwidth cap of 50Mb/s, bursts of packet loss or transient increases in latency could easily lead to sub-optimal TCP throughput for TPP routes. 10e-5 10e-4 10e-3 10e-2 104 Throughput (Mb/s) 103 102 101 0 20 40 60 80 RTT (ms) Figure 28. Upper bound for TCP throughput using the formula of Mathis et al. Mathis et al. (1997) with an MSS of 1460 bytes and various latency (X axis) and loss-rates (log-scale Y axis) values. 5.4.4 Why do CPP routes have better throughput than TPP routes?. TPPs have higher loss-rates than CPPs. Our initial methodology for measuring loss-rate relied on our low-rate ping probes (outlined in § 5.3.3). While this form of probing can produce a reliable estimate of average loss-rate over a long period of time Tariq, Dhamdhere, Dovrolis, and Ammar (2005), it doesn’t capture the dynamics of packet loss at finer resolutions. We thus modified our 170 probing methodology to incorporate an additional iperf3 measurement using UDP probes between all CP instances. Each measurement is performed for 5 seconds and packets are sent at a 50Mb/s rate.5 We measure the number of transmitted and lost packets during each second and also count the number of packets that were delivered out of order at the receiver. We perform these loss-rate measurements for a full week. Based on this new set of measurements, we estimate the overall loss-rate to be 5 ∗ 10−3 and 10−2 for CPP and TPP paths, respectively. Moreover, we experience 0 packet loss in 76% (37%) of our sampling periods for CPP (TPP) routes, indicating that losses for CPP routes tend to be more bursty than for TPP routes. The bursty nature of packet losses for CPP routes could be detrimental to real-time applications which can only tolerate certain levels of loss and should be factored in by the client. The receivers did not observe any out-of-order packets during our measurement period. Figure 29 shows the distribution of loss rate for various paths. The rows (from top to bottom) correspond to AWS, GCP, and Azure as the source CP, respectively. Intra-region (inter-region) measurements are shown in the left (right) columns, and CPP (TPP) paths are depicted in blue (orange). We observe consistently higher loss-rates for TPP routes compared to their CPP counterparts and lower loss-rates for intra-CP routes in Virginia compared to California. Moreover, paths destined to VMs in the California region show higher loss-rates regardless of where the traffic has been sourced from, with asymmetrically lower loss-rate on the reverse path indicating the presence of congested ingress points for CPs within the California region. We also notice extremely low loss-rates for intra-CP (except Azure) CPP routes between the US east and west coasts and for 5 In an ideal setting, we should not experience any packet losses as we are limiting our probing rate at the source. 171 inter-CP CPP routes between the two coasts for certain CP pairs (e.g. AWS CA to GCP VA or Azure CA to AWS VA). 0.10 Intra Region AWS Inter Region 0.08 CPP TPP Loss-Rate 0.06 0.04 0.02 0.00 Z R :CA CP:CA CP:VA ZR:VA S :VA CP:VA ZR:VA ZR:CA S:CA CP:CA A G VA- G VA- A AW G -A A AW G CA- CA- CA- CA- CA VA- VA- VA- 0.10 Intra Region GCP Inter Region 0.08 CPP TPP Loss-Rate 0.06 0.04 0.02 0.00 Z R :CA WS:CA WS:VA ZR:VA W S :VA CP:VA ZR:VA ZR:CA S:CA CP:CA A A A A A G -A A AW G CA- CA- VA- VA- CA- CA- CA VA- VA- VA- 0.10 Intra Region AZR Inter Region 0.08 CPP TPP Loss-Rate 0.06 0.04 0.02 0.00 A A A A A A A A A A W S:C GCP:C WS:V GCP:V W S:V GCP:V AZR:V AZR:C WS:C GCP:C A A VA- A - A CA- CA- VA- CA- CA- CA VA- VA- VA- Figure 29. Rows from top to bottom in the letter-value plots represent the distribution of loss-rate between AWS, GCP, and Azure as the source CP and various CP regions for intra- (inter-) region paths in left (right) columns. CPP and TPP routes are depicted using blue and orange respectively. 172 5.4.5 Summary. To summarize, our measurements for characterizing C2C routes reveal the following important insights: • CPP routes are better than TPP routes in terms of latency as well as throughput. This finding begs the question: Given the sub-optimal performance of TPP routes and their cost implications, why should an enterprise seek connectivity from third-party providers when deciding on its multi-cloud strategy? • The better performance of CPP routes as compared to their TPP counterparts can be attributed to two factors: (a) the CPs’ rich (private) connectivity in different regions with other CPs (traffic is by-passing the BEP Internet altogether) and (b) more stable and better provisioned CPP (private) backbones. 5.5 Characteristics of E2C routes In this section, we turn our attention to E2C routes, characterize their performance and attribute the observations to connectivity and routing. 5.5.1 Latency Characteristics. TPP routes offer better latency than BEP routes. Figure 30a shows the distribution of latency for our measured E2C paths. We observe that TPP routes consistently outperform their BEP counterparts by having a lower baseline of latency and also exhibiting less variation. We observe a median latency of 11ms, 20ms, and 21ms for TPP routes towards GCP, AWS, and Azure VM instances in California, respectively. We also observe symmetric distributions on the reverse path but omit the results for brevity. 5.5.2 Why do TPP routes offer better latency than BEP routes?. In the case of our E2C paths, we always observe direct peerings between the upstream provider (e.g. Cox Communications (AS22773)) and the CP network. 173 25 60 Throughput (Mb/s) 40 20 40 RTT (ms) RTT (ms) 15 BEP 20 20 CPP TPP TPP 10 AWS:CA AZR:CA GCP:CA 0 AWS AZR GCP 0 AWS AZR GCP (a) (b) (c) Figure 30. (a) Distribution of latency for E2C paths between our server in AZ and CP instances in California through TPP and BEP routes. Outliers on the Y- axis have been deliberately cut-off to increase the readability of distributions. (b) Distribution of RTT on the inferred peering hop for E2C paths sourced from CP instances in California. (c) Distribution of throughput for E2C paths between our server in AZ and CP instances in California through TPP and BEP routes. Relying on bdrmapIT to infer the peering points from the traceroutes associated with our E2C paths, we measure the latency on the peering hop. Figure 30b shows the distribution of the latency for the peering hop for E2C paths originated from the CPs’ instances in CA towards our enterprise server in AZ. While the routing policies of GCP and Azure for E2C paths are similar to our observations for C2C paths, Amazon seems to hand-off traffic near the destination which is unlike their hot-potato tendencies for C2C paths. We hypothesize that this change in AWS’ policy is to minimize the operational costs via their Transit Gateway service Amazon (2019b). In addition, observing an equal or lower minimum latency for TPP routes as compared to BEP routes suggests that TPP routes are shorter than BEP paths6 . We also find (not shown here) that the average loss rate on TPP routes is 6 ∗ 10−4 which is an order of magnitude lower than the loss rate experienced on BEP routes (1.6 ∗ 10−3 ). 5.5.3 Throughput Characteristics. TPP offers consistent throughput for E2C paths. Figure 30c depicts the distribution of throughput 6 In the absence of information regarding the physical fiber paths, we rely on latency as a proxy measure of path length. 174 for E2C paths between our server in AZ and CP instances in CA via TPP and BEP routes, respectively. While we observe very consistent throughput values near the purchased link capacity for TPP paths, BEP paths exhibit higher variability which is expected given the best effort nature of public Internet paths. 5.5.4 Summary. In summary, our measurements for characterizing E2C routes support the following observations: • TPP routes exhibit better latency and throughput characteristics when compared with BEP routes. • The key reasons for the better performance of TPP routes as compared to their BEP counterparts include shorter (e.g. no transit providers) and better performant (e.g. lower loss rate) paths. • For an enterprise deciding on a suitable multi-cloud strategy, CPP routes are better only when enterprises are closer to the CPs’ native locations. Given that TPPs are present at many geographic locations where the CPs are not native, third-party providers offer better connectivity options compared to relying on the public Internet (i.e. using BEP routes). 5.6 Discussion and Future Work In this section, we discuss the limitations of our study and open issues. We also discuss ongoing and future work. Representativeness. While the measurement setup depicted in Figure 22 represents a realistic enterprise network employing a multi-cloud strategy, it is not the only representative setting. We note that there are a number of other multi-cloud connectivity scenarios (e.g. distinct CPs in different continents, different third-party providers in different countries, etc.), which we do not discuss in this study. For example, what are the inter-cloud connectivity and routing 175 characteristics between intercontinental VMs e.g. in USA and EU? Unfortunately, the costs associated with establishing TPP paths prevent an exhaustive exploration of multi-cloud connectivity in general and TPP connectivity in particular. Additional Cloud and Third-party Providers. Our study focuses on multi-cloud connectivity options between three major CPs (i.e. AWS, Azure, and GCP) as they collectively have a significant market share. We plan to consider additional cloud providers (e.g. Alibaba, IBM Softlayer, Oracle, etc.) as part of future work. Similar to the availability of other CPs, TPP connectivity between CPs are offered via new services by a number of third-party connectivity providers Amazon (2018c); Google (2018b); Microsoft (2018c). Exploring the TPP connectivity provided by the ecosystem and economics of these different third-party providers is an open problem. In addition, there has been no attempt to date to compare their characteristics in terms of geography, routing, and performance, and we intend to explore this aspect as part of future work. Longitudinal Analysis & Invariants. Despite the fact that we conduct our measurements for about a month in the Spring of 2019 (as mentioned in § 5.3.5), we note that our study is a short-term characterization of multi- cloud connectivity options. Identifying the invariants in this context requires a longitudinal analysis of measurements which is the focus of our ongoing work. Impact of Connectivity Options on Cloud-hosted Applications. Modern cloud applications pose a wide variety of latency and throughput requirements. For example, key-value stores are latency sensitive Tokusashi, Matsutani, and Zilberman (2018), whereas applications like streaming and geo-distributed analytics require low latency as well as high throughput Lai, 176 Chowdhury, and Madhyastha (2018). In the face of such diverse requirements, what is critically lacking is a systematic benchmarking of the impact of performance tradeoffs between the BEP, CPP and TPP routes on the cloud-hosted applications (e.g. key-value stores, streaming, etc.) While tackling WAN heterogeneity is the focus of a recent effort Jonathan, Chandra, and Weissman (2018), dealing with multi-cloud connectivity options and their impacts on applications is an open problem. Connectivity and Routing Implications. In terms of routing and connectivity, our study has two implications. First, while it is known that the CPs are contributing to the ongoing “flattening" of the Internet Dhamdhere and Dovrolis (2010); Gill, Arlitt, Li, and Mahanti (2008); Labovitz, Iekel-Johnson, McPherson, Oberheide, and Jahanian (2010), our findings underscore the fact that the third-party private connectivity providers act as a catalyst to the ongoing flattening of the Internet. In addition, our study offers additional insights into the ongoing “cloudification" of the Internet in terms of where and why cloud traffic bypasses the BEP transits. Our study also implies that compared to the public Internet, CPP backbones are better performant, more stable, and more secure (invisible and isolated from the BEP transits), making them first-class citizens for future Internet connectivity. In light of these two implications, our study also warrants revisiting existing efforts from the multi-cloud perspective. In particular, we plan to pursue issues such as failure detection and characterization for multi- cloud services (e.g. Zhang, Zhang, Pai, Peterson, and Wang (2004)) and multi- cloud reliability (e.g. Quan, Heidemann, and Pradkin (2013)). Other open problems concern inferring inter-CP congestion (e.g. Dhamdhere et al. (2018)) and examining 177 the economics of multi-cloud strategies (e.g. Zarchy, Dhamdhere, Dovrolis, and Schapira (2018)). 5.7 Summary Enterprises are connecting to multiple CPs at an unprecedented pace and multi-cloud strategies are here to stay. Due to this development, in addition to best-effort public (BEP) transit provider-based connectivity, two additional connectivity options are available in today’s Internet: third-party private (TPP) connectivity and cloud-provider private (CPP) connectivity. In this work, we perform a first-of-its-kind measurement study to understand the tradeoffs between three popular multi-cloud connectivity options (CPP vs. TPP vs. BEP). Based on our cloud-centric measurements, we find that CPP routes are better than TPP routes in terms of latency as well as throughput. The better performance of CPPs can be attributed to (a) CPs’ rich connectivity in different regions with other CPs (by-passing the BEP Internet altogether) and (b) CPs’ stable and well-designed private backbones. In addition, we find that TPP routes exhibit better latency and throughput characteristics when compared with BEP routes. The key reasons include shorter paths and lower loss rates compared to the BEP transits. Although limited in scale, our work highlights the need for more transparency and access to open measurement platforms by all the entities involved in interconnecting enterprises with multiple clouds. 178 CHAPTER VI OPTIMAL CLOUD OVERLAYS Motivated by the observations in Chapter V on the diversity of performance characteristics of various cloud connectivity paths, in this chapter, we design an extensible measurement platform for cloud environments. Furthermore, we create a decision support framework that facilitates enterprises in creating optimal multi- cloud deployments. The content in this chapter is the result of a collaboration between Bahador Yeganeh with Ramakrishnan Durairajan, Reza Rejaie, and Walter Willinger. Bahador Yeganeh is the primary author of this work and responsible for desiging all systems, conducting measurements and producing the presented analyses. 6.1 Introduction Modern enterprises are adopting multi-cloud strategies1 at a rapid pace. Among the benefits of pursuing such strategies are competitive pricing, vendor lockout, global reach, and requirements for data sovereignty. According to a recent industry report, more than 85% of enterprises have already adopted multi-cloud strategies Krishna et al. (2018). Despite this existing market push for multi-cloud strategies, we posit that there is a technology pull: seamlessly connecting resources across disparate, already- competitive cloud providers (CPs) in a performance- and cost-aware manner is an open problem. This problem is further complicated by two keys issues. First, prior research on overlays has focused either on the public Internet-based Andersen, Balakrishnan, Kaashoek, and Morris (2001) or on CP paths in isolation Costa, Migliavacca, Pietzuch, and Wolf (2012); Haq, Raja, and Dogar (2017); Lai et al. 1 This is different from hybrid cloud computing, where a direct connection exists between a public cloud and private on-premises enterprise server(s). 179 (2018). Second, because CP backbones are private and are invisible to traditional measurement techniques, we lack a basic understanding of their performance, path, and traffic-cost characteristics. AWS AZR GCP Figure 31. Global regions for AWS, Azure, and GCP. To examine the benefits of multi-cloud overlays, we perform a third-party, cloud-centric measurement study2 to understand the performance, path, and traffic- cost characteristics of three major global-scale private cloud backbones (i.e., AWS, Azure and GCP). Our measurements were ran across 6 continents and 23 countries for 2 weeks (see Figure 31). Our measurements reveal a number of key insights. First, the cloud backbones (a) are optimal (i.e., 2x reduction in latency inflation ratio, which is defined as the ratio between line-of-sight and latency-based speed- of-light distances, w.r.t. public Internet), (b) lack path and delay asymmetry, and (c) are tightly interconnected with other CPs. Second, multi-cloud paths exhibit higher latency reductions than single cloud paths; e.g., 67% of all paths, 54% of all intra-CP paths, and 74% of all inter-CP paths experience an improvement in their latencies. Third, although traffic costs vary from location to location and 2 Code and datasets used in this study will be openly available to the community upon publication. 180 across CPs, the costs are not prohibitively high. Based on these insights, we argue that enterprises and cloud users can indeed benefit from future efforts aimed at constructing high-performance overlay networks atop multi-cloud underlays in a performance- and cost-aware manner. While our initial findings suggest that multi-cloud overlays are indeed beneficial for enterprises, establishing overlay-based connectivity to route enterprise traffic in a cost- and performance-aware manner among islands of disparate CP resources is an open and challenging problem. For one, the problem is complicated by the lack of continuous multi-cloud measurements and vendor-agnostic APIs. To tackle these challenges, the main goal of this chapter is to create a service to establish and manage overlays on top of multi-cloud underlays. The starting point of our approach to create a cloud-centric measurement and management service called Tondbaz that continuously monitors the inter- and intra-CP links. At the core of Tondbaz are vendor-agnostic APIs to connect the disparate island of CP resources. With the measurement service and APIs in place, Tondbaz constructs a directed graph consisting of nodes that represent VM instances, given two locations (e.g., cities) as input by a cloud user. Edges in the graph will be annotated with latencies and traffic-cost values from the measurement service. This study makes the following contributions: – We propose and design an extensible system called Tondbaz to facilitate the measurement of multi-CP network paths. – We design a decision-support framework for constructing optimal cloud overlay paths using insights gleaned using Tondbaz . 181 – We demonstrate the cost and performance benefits of utilizing a decision- support framework by integrating it into the snitch mechanism of Cassandra, a distributed key-value store. The remainder of this chapter is organized by, first presenting the design objectives of our measurement platform and provide formal definitions for our optimization framework in §6.2. In §6.3 we utilize our measurement platform to measure the path characteristics of the top 3 CPs on a global scale and apply our optimization framework to obtain optimal paths between all pairs of CP regions. Next, we demonstrate the applicability of our overlays for a handful of paths and discuss the operational trade-offs of overlays in §6.4.2. Lastly, we conclude this chapter by summarizing our findings in §6.5. 6.2 Tondbaz Design In this section, we will describe the Tondbaz ’s components and their corresponding design principles and objectives. At a high-level Tondbaz consists of 2 main components namely, (i) a measurement platform for conducting cloud to cloud measurements 6.2.1 and (ii) a decision support framework for obtaining optimal cloud paths based on a set of constraints 6.2.3. 6.2.1 Measurement Platform. The measurement platform is designed with low resource overhead and extensibility as objectives in mind. The measurement platform consists of the following 3 main components: – an agent for conducting/gathering multi-cloud performance measurements – a centralized data-store for collecting and archiving the measurement results from each agent 182 1 2 3 1 2 3 Figure 32. Overview of components for the measurement system including the centralized controller, measurement agents, and data-store. – a centralized controller/scheduler that configures each measurement agent and schedules measurement tasks Figure 32 shows a high-level overview of the components for the measurement platform as well as how they communicate with each other. The agents communicate with the centralized controller in a client-server model over a control channel. Furthermore, the agents store the result of running the measurements in a data-store. The data-store and controller by design are decoupled from each other although they can reside on the same node. 6.2.1.1 Measurement Agent. The measurement agent is designed with multiple objectives in mind namely, (i) ease of deployment, (ii) low resource overhead, (iii) and extensibility of measurements. In the following, we describe how each of these design objectives are achieved within our measurement agents. Ease of Deployment: The measurement subsystem is designed to be installed as a daemon on the host system with minimal dependencies (except for a python distribution) using a simple shell script. The only required parameter for 183 installation is the address for the controller that the agent will be communicating with. Upon installation, the agent would announce itself to the controller on a predefined channel. After registration with the controller configuration of the agent including (but not limited to) target addresses, output destination, execution of measurement tasks should all happen through a configuration channel and therefore can be managed from a centralized location. We rely on the MQTT protocol OASIS (2019) for communication between agents and the centralized controller . Low Resource Overhead: A barebone agent is simply a daemon listening for incoming commands from the centralized controller on its control channel. Using this minimal design the agent is implemented in less than 1k lines of python code with a single dependency on the Eclipse Paho MQTT library Eclipse (2019). The agent uses 10MB of memory on runtime and requires less than 100KB of memory to maintain the state for ongoing measurement. Extensibility of Measurements: The agents should support a wide range of measurements including standard network measurement tools such as ping, traceroute, iperf as well as any custom executables. Each measurement tool should be implemented as a container image. In addition to the container image, the developer should implement a Python class that inherits from a standard interface depicting how the agent can communicate with the measurement tool. Measurement results should be serialized into a predefined JSON schema prior to being stored on the data-store. 6.2.1.2 Centralized Controller. The coordinator awaits incoming connections from agents that announce their presence and register themselves with the coordinator (anc). After the initial registration the coordinator can schedule and conduct measurements on the agent if needed. The coordinator would maintain 184 a control channel with each agent which is used for (i) monitoring the health of each agent through heartbeat messages (hbt), (ii) sending configuration parameters (cfg), (iii) scheduling and issuing measurement commands (run and fin), and (iv) monitoring/reporting the status of ongoing measurements (sta). 6.2.2 Data Collector. Tondbaz agents can store the results of each measurement locally for later aggregation in a centralized data-store. Additionally, each agent can stream the results of each measurement back to the centralized data-store. Each measurement result is presented as a JSON object containing generic fields (start time, end time, measurement id, agent address) in addition to a JSON serialized representation of the measurement output provided by the commands plugin for Tondbaz agent. We rely on MongoDB for our centralized data collector given that our data has a NoSQL JSON schema. 6.2.3 Optimization Framework. In addition to the measurement platform, we have designed an optimization framework that can identify cloud overlay paths that optimize a network performance metric while satisfying certain constraints specified by the user. The optimization framework relies on the stream of measurements reported by all agents within the data-store and using them would create an internal model of the network using a directed graph G where nodes represent agent instances and edges depict the network path between each instance. G = (V, E) V = {v1 , v2 , ..., vN } (6.1) E = {eij = (vi , vj )| ∀vi , vj ∈ V (vi , vj ) 6= (vj , vi )} Measurement results pertaining to the network path are added as edge attributes. Additionally, the optimization framework relies on an internal cost model that calculates the cost of transmitting traffic over each path based on 185 the policies that each CP advertises on their websites Amazon (2019c); Google (2019); Microsoft (2019). The details of each CP’s pricing policy differs from one to another but at a high-level is governed by 4 common rules namely, (i) CPs only charge for egress traffic from a compute instance, (ii) customers are charged based on the volume of exchanged traffic (ii) traffic remaining within a CPs network has a lower charge rate, (iii) each source/destination region (or a combination of both) has a specific charging rate. While the measurement platform is designed to be extensible and supports a wide variety of measurement tools, the optimization framework only utilizes latency and cost measurements. Extensions to the framework to support additional network metrics is part of our future work. The optimization framework requires the user to specify a series of constraints namely, (i) set of target regions where the user needs to have a deployment (R), (ii) set of regions that should be avoided when constructing an optimal path (A), (iii) a set of region pairs that would be communicating with each other through the overlay (T ), and (iv) an overall budget for traffic cost (B). Equation (6.2) formally defines the aforementioned constraints. This formulation of the optimization problem can be mapped to the Steiner tree graph problem Hwang and Richards (1992) which is known to be NP-complete. A⊂V V0 =V −A (6.2) 0 R⊂V Tij = (vi , vj ); ∀vi , vj ∈ R We approximate the solution (if any) to this optimization problem by (i) creating an induced graph by removing all regions in A from its internal directed 186 graph G (Equation (6.3)) G0 = (V 0 , E 0 ) (6.3) 0 0 0 V = V − A , E = (vi , vj ) ∀vi , vj ∈ V (ii) performing a breadth first search (BFS) to obtain all paths (P ) between each pair of regions within T that have an overall cost (C function) within the budget B and do not have an inflated end-to-end latency compared to the default path (lij ) (Equations (6.4) and (6.5)) Pij = {pxij | pxij = (vx1 , ..., vxn ), ∀1 ≤ k < n (vxk , vxk+1 ) ∈ E 0 and vx1 = vi , vxn = vj , (6.4) 1 ≤ x ≤ b(|V | − 2)!ec} Pij0 = {pxij | C(pxij ) ≤ B, L(pxij ) ≤ lij } X C(pxij ) = cwz ; ∀ewz ∈ pxij (6.5) X L(pxij ) = lwz ; ∀ewz ∈ pxij and (iii) selecting the overlay that has the overall greatest reduction in latency among all possible sets of overlays (Equation (6.6)). O = {pij | ∀pij ∈ Pij0 and ∀i, j eij ∈ T } X L(O) = lij − L(pij ) ; ∀pij ∈ O (6.6) OP T = Ox ; x = argmin(L(Ox )) The time complexity of this approach is equal to performing a BFS (O(|V 0 | + |E 0 |)) for each pair of nodes in T in addition to selecting the set of paths which result in the most amount of overall latency reduction. The latter step has a time complexity of O((|P 0 |!)|T | ), where |P 0 | = b(|V 0 | − 2)!ec. While the high complexity of the second step might seem intractable, our BFS algorithm 187 would backtrack whenever it encounters a path that exceeds our total budget B or has an end-to-end latency greater than the default path lij . Additionally, based on our empirical evaluation we observe that each relay point can add about 1ms of forwarding latency and therefore our search would backtrack from paths that yield less than 1ms of latency improvement per relay hop. Through our analysis, we observed that on average 31% number of paths that do not exceed the default end- to-end latency with an unlimited budget effectively making our solution tractable. 6.3 A Case for Multi-cloud Overlays In this section, we demonstrate the use of Tondbaz to conduct path and latency measurements in a multi-cloud setting (§ 6.3.1), followed by the optimality of single CP paths (§ 6.3.2) and motivating performance gains of multi-cloud paths (§ 6.3.3). Next, we present the challenge of inferring traffic cost profiles which hinders the realization of multi-cloud overlays (§ 6.3.4). Lastly, we investigate the possibility of utilizing IXP points for the creation of further optimal overlays in § 6.3.6. 6.3.1 Measurement Setting & Data Collection. We target the top 3 CPs namely, Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). We create small VM instances within all global regions of these CPs resulting in a total of 68 regions (17, 31, and 20 for AWS, Azure, and GCP respectively). regions are dedicated to government agencies and are not available to the public. Furthermore, we were not able to allocate VMs in 5 Azure regions3 . Through our private correspondence with the support team, we learned that those regions either are mainly designed for storage redundancy of nearby regions or did not have free resources available at the time of this study. 3 Central India, Canada East, France South, South Africa West, and Australia Central 188 Additionally, we identify the datacenter’s geo-location for each CP. Although CPs are secretive with respect to the location of their datacenters, various sources do point to their exact or approximate location Build Azure (2019); Burrington (2016); Google (2019); Miller (2015); Plaven (2017); WikiLeaks (2018); Williams (2016) and in the absence of any online information we resort to the nearest metro area that the CP advertises. We conduct pairwise latency and path measurements between all VM instances in 10 minute rounds for the duration of 2 weeks in October of 2019 resulting in about 20k latency and path samples between each pair of VM. Each round of measurement consists of 5 latency probes and 2 (UDP, and TCP) paris- traceroute path measurements. The resultant traceroute hops from our path measurements are annotated with their corresponding ASN using BGP feeds of Routeviews University of Oregon (2018) and RIPE RIPE (2018) collectors aggregated by BGPStream Orsini, King, Giordano, Giotsas, and Dainotti (2016). Furthermore, we map each hop to its owner ORG by relying on CAIDA’s AS- to-ORG dataset Huffaker et al. (2018). Lastly, the existence of IXP hops along the path is checked by matching hop addresses against the set of IXP prefixes published by PeeringDB PeeringDB (2017), Packet Clearing House (PCH) Packet Clearing House (2017), and Hurricane Electric (HE) using CAIDA’s aggregate IXP dataset CAIDA (2018). 6.3.2 Are Cloud Backbones Optimal?. 6.3.2.1 Path Characteristics of CP Backbones. As mentioned above, we measure the AS and ORG path for all of the collected traceroutes. In all our measurements, we observe multiple ASes for AWS only (AS14618 and AS16509). Hence, without the loss of generality, from this point onward we only 189 present statistics using the ORG measure. We measure the ORG-hop length for all unique paths and find that for 97.86% of our measurements, we only observe 2 ORGs (i.e. the source and destination CP networks). Out of the remaining paths, we observe that 2.12%, and 0.02% have 3, and 4 ORG hops respectively. These observations indicate two key results. First, all intra-CP measurements (and, hence, traffic) remain almost always within the CPs’ backbones. Second, the CP networks are tightly interconnected with each other and establish private peerings between each other on a global scale. Surprised by these findings, we take a closer look at the 2.14% of paths which include other networks along their path. About 76% of these paths have a single IXP hop between the source and destination CPs. That is, the CPs are peering directly with each other over an IXP fabric. For the remaining 24% of paths, we observe 2 prominent patterns: (i) paths sourced from AWS in Seoul and Singapore as well as various GCP regions that are destined to Azure in UAE; and (ii) paths sourced from various AWS regions in Europe and destined to Azure in Busan, Korea. Main findings: All intra-CP and the majority of inter-CP traffic remains within the CPs’ network and is transmitted between the CPs’ networks over private and public peerings. CP’s backbones are tightly interconnected and can be leveraged for creating a global multi-cloud overlay. 6.3.2.2 Performance Characteristics of CP Backbones. Using the physical location of datacenters for each CP, we measure the geo-distance between each pair of regions within a CP’s network using the Haversine distance Robusto (1957) and approximate the optimal latency using speed of light (SPL) constraints.4 Figure 33 depicts the CDF of latency inflation, which is defined as the 4 2 We use 3 ∗ C within our calculations Singla et al. (2014) 190 ratio of measured latency and SPL latency calculated using line-of-sight distances for each CP. 1.0 0.5 AWS AZR GCP 0.0 2 3 4 5 6 Latency Inflation Ratio Figure 33. Distribution of latency inflation between network latency and RTT approximation using speed of light constraints for all regions of each CP. We observe median latency inflation of about 1.68, 1.63, and 1.67 for intra- CP paths of AWS, Azure, and GCP, respectively. Compared to a median latency inflation ratio of 3.2 for public Internet paths Singla et al. (2014), these low latency inflation ratios attest to the optimal fiber paths and routes that are employed by CPs. Furthermore, Azure and GCP paths have long-tail in their latency inflation distributions while all intra-CP paths for AWS have a ratio of less than 3.6, making it the most optimal backbone among all CPs. Main findings: CPs employ an optimal fiber backbone with near line- of-sight latencies to create a global network. This result opens up a tantalizing opportunity to construct multi-cloud overlays in a performance-aware manner. 6.3.2.3 Latency Characteristics of CP Backbones. Next, we turn our attention to the latency characteristics of the CP backbones toward the goal of creating CP-specific latency profiles. Figure 34 shows the distribution of RTT and standard deviation across different measurements for all paths between VM 191 pairs. We observe a wide range of RTT values between VM instances, which can be explained by the geographic distance between CP regions. Furthermore, latency between each pair is relatively stable across different measurements with a 90th - percentile coefficient of variation of less than 0.05. Coefficient of Variation 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1.0 0.5 RTT CV 0.0 100 200 300 400 Median RTT (ms) Figure 34. Distribution of median RTT and coefficient of variation for latency measurements between all VM pairs. In addition to stability characteristics, we also compare the forward and reverse path latencies by measuring the difference between the median of latencies in each direction. We find that paths exhibit symmetric latencies with a 95th - percentile latency difference of 0.22ms among all paths as shown in Figure 35. Main findings: Cloud paths exhibit a stable and symmetric latency profile over our measurement period, making them ideal for reliable multi-cloud overlays. 6.3.3 Are Multi-Cloud Paths Better Than Single Cloud Paths?. 6.3.3.1 Overall Latency Improvements. The distribution of latency reduction percentage for all, intra-CP, and inter-CP paths is shown in Figure 36. From this figure, we observe that about 55%, 76%, and 69% of all, intra-CP, and inter-CP paths experience an improvement in their latency using an indirect 192 1.0 0.5 0.0 0.0 0.5 1.0 1.5 2.0 |Fwd - Rev| Latency (ms) Figure 35. Distribution for difference in latency between forward and reverse paths for unique paths. optimal path. These optimal paths can be constructed by relaying traffic through one or multiple intermediary CP regions. We provide more details on the intra- and inter-CP optimal overlay paths below. 1.0 0.5 all intra-CP inter-CP 0.0 0 20 40 60 Latency Reduction % Figure 36. Distribution for RTT reduction ratio through all, intra-CP, and inter-CP optimal paths. To complement Figure 36, Figure 37-(left) shows the distribution of the number of relay hops along optimal paths. From this figure, we find that the majority (64%) of optimal paths can be constructed using only one relay hop while some paths can go through as many as 5 relay hops. Almost all of the optimal 193 paths with latency reductions greater than 30% have less than 4 relay hops as shown in Figure 37-(right). In addition, we observe that the median of latency reduction percentage increases with the number of relay hops. We note that (a) forwarding traffic through additional relay hops might have negative effects (e.g., increase in latencies) and (b) optimal paths with many relay hops might have an alternative path with fewer hops and comparable performance. Latency Reduction % 1000 60 40 Paths 500 20 0 0 1 2 3 4 5 6 1 2 3 4 5 6 Relay Hops Relay Hops Figure 37. Distribution for the number of relay hops along optimal paths (left) and the distribution of latency reduction percentage for optimal paths grouped based on the number of relay hops (right). Lastly, we measure the prevalence of each CP along optimal paths and find that AWS, Azure, and GCP nodes are selected as relays for 55%, 48%, and 28% of optimal paths. 6.3.3.2 Intra-CP Latency Improvements. We present statistics on the possibility of optimal overlay paths that are sourced and destined towards the same CP network (i.e. intra-CP overlays). Figure 38 depicts the distribution of latency reduction ratio for intra-CP paths of each CP. The distributions are grouped based on the CP network. Furthermore, each boxplot’s color represents the ownership of relay nodes with A, Z, and G corresponding to AWS, Azure, and GCP relays respectively. From this figure, we observe that intra-CP paths 194 ..G .Z. .Z.G A.. A..G A.Z. A.Z.G Latency Reduction % 60 40 20 0 AWS AZR GCP Figure 38. Distribution of latency reduction percentage for intra-CP paths of each CP, divided based on the ownership of the relay node. can benefit from relay nodes within their own network in addition to nodes from other CPs. Furthermore, we observe that intra-CP paths within GCP’s network observe the greatest reduction in latency among all CPs with AWS relays being the most effective in lowering the end-to-end latency. Upon closer examination, we observe that the majority of these paths correspond to GCP regions within Europe communicating with GCP regions in either India or Hong Kong. Main findings: Our measurements demonstrate that surprisingly, intra- CP paths can observe end-to-end latency reductions via optimal paths that are constructed with relay hops that belong to a different CP. 6.3.3.3 Inter-CP Latency Improvements. We next focus on the possibility of overlay paths that are sourced from one CP but destined towards a different CP (i.e. inter-CP overlays). Figure 39 presents the latency reduction percentage for inter-CP paths. For brevity, only one direction of each CP pair is presented as the reverse direction is identical. Similar to Figure 38 the color and label encoding of each boxplot represent the ownership of relay nodes. From this figure, we make a number of observations. First, optimal paths constructed 195 ..G .Z. .Z.G A.. A..G A.Z. A.Z.G Latency Reduction % 60 40 20 0 AWS-AZR AZR-GCP AWS-GCP Figure 39. Distribution of latency reduction ratio for inter-CP paths of each CP, divided based on the ownership of the relay nodes. using GCP nodes as relays exhibit the least amount of latency reduction. Second, AWS-AZR paths have lower values of latency reduction with equal amounts of reduction across each relay type. This is indicative of a tight coupling between these networks. Lastly, optimal paths with AWS relays tend to have higher latency reductions which are in line with our observations in §6.3.2.1 regarding AWS’ backbone. Main findings: Similar to intra-CP paths, inter-CP paths can benefit from relay nodes to construct new, optimal paths with lower latencies. Moreover, inter- CP paths tend to experience greater reductions in their latency. 6.3.4 Are there Challenges in Creating Multi-Cloud Overlays?. 6.3.4.1 Traffic Costs of CP Backbones. We turn our focus to the cost of sending traffic via CP backbones. Commonly, CPs charge their customers for traffic that is transmitted from their VM instances. That is, customers are charged only for egress traffic; all ingress traffic is free. Moreover, traffic is billed on a volume-by-volume basis (e.g., per GB of egress traffic) but each CP has a different set of rules and rates that govern their pricing policy. For example, we find 196 that AWS and GCP have lower rates for traffic that remains within their network (i.e. is sourced and destined between different regions of their network) while Azure is agnostic to the destination of the traffic. Furthermore, GCP has different rates for traffic destined to the Internet based on the geographic region of the destination address. We compile all these pricing policies based on the information that each CP provides on their webpage Amazon (2019c); Google (2019); Microsoft (2019) into a series of rules that allow us to infer the cost of transmitting traffic from each CP instance to other destinations. Traffic costs for AWS. For AWS (see Figure 40), we observe that intra- CP traffic is always cheaper than inter-CP traffic with the exception of traffic that is sourced from Australia and Korea. Furthermore, traffic sourced from the US, Canada, and European regions have the lowest rate while traffic sourced from Brazil has the highest charge rate per volume of traffic. Lastly, traffic is priced in multiple tiers defined based on the volume of exchanged traffic and we see that exchanging extra traffic leads to lower charging rates. 40 src Japan Cost (1k USD) US Europe Brazil India inter Korea False 20 Australia True 0 0 50 100 150 200 Traffic (TB) Figure 40. Cost of transmitting traffic sourced from different groupings of AWS regions. Dashed (solid) lines present inter-CP (intra-CP) traffic cost. 197 Traffic costs for Azure. Azure’s pricing policy is much more simple (see Figure 41). Global regions are split into multiple large size areas namely (i) North America and Europe excluding Germany, (ii) Asia and Pacific, (iii) South America, and (iv) Germany. Each of these areas has a different rate, with North America and Europe being the cheapest while traffic sourced from South America can cost up to 3x more than North America. Lastly, as mentioned earlier, Azure is agnostic to the destination of traffic and does not differentiate between intra-CP and traffic destined to the Internet. 60 US Europe Cost (1k USD) Asia Pacific 40 Brazil Germany 20 0 0 100 200 300 400 Traffic (TB) Figure 41. Cost of transmitting traffic sourced from different groupings of Azure regions. Traffic costs for GCP. GCP’s pricing policy is the most complicated among the top 3 CPs (see Figure 42). At a high level, GCP’s pricing policy can be determined based on (i) source region, (ii) destination geographic location, and (iii) whether the destination is within GCP’s network or the Internet (intra-CP vs inter-CP). Intra-CP traffic generally has a lower rate compared to inter-CP traffic. Furthermore, traffic destined to China (excluding Hong Kong) and Australia have higher rates compared to other global destinations. 198 Inter-CP 80 src dst Cost (1k USD) US Europe China 60 Singapore Australia Japan Global Australia 40 20 0 Intra-CP 80 src Asia Intercontinental South America Cost (1k USD) 60 US Canada Ocenia Europe 40 20 0 0 100 200 300 400 Traffic (TB) Figure 42. Cost of transmitting traffic sourced from different groupings of GCP regions. Solid, dashed, and dotted lines represent cost of traffic destined to China (excluding Hong Kong), Australia, and all other global regions accordingly. 6.3.5 Cost Penalty for Multi-Cloud Overlays. Next, we seek an answer to the question of the cost incurred by using relay nodes from other CPs. Figure 43 depicts the distribution of cost penalty (i.e. the difference between the optimal overlay cost and default path cost) within various latency reduction percentage bins for transmitting 1TB of traffic. 199 Cost Penalty (USD) inter-CP intra-CP 400 200 0 , 1 0] , 20] , 30] , 40] , 50] , 60] , 70] (0 (10 (20 (30 (40 (50 (60 Latency Reduction % Figure 43. Distribution of cost penalty within different latency reduction ratio bins for intra-CP and inter-CP paths. From Figure 43, we make a number of key observations. First, we find that optimal paths between intra-CP endpoints incur higher cost penalties compared to inter-CP paths. This is expected as intra-CP paths tend to have lower charging rates and optimal overlays usually pass through a 3rd party CP’s backbone. Counter-intuitively, we next observe that the median cost penalty for paths with the most amount of latency reduction is less or equal to less optimal overlay paths. Lastly, we find that 2 of our optimal overlay paths have a negative cost penalty. That is, the optimal path costs are lesser than transmitting traffic directly between the endpoints. Upon closer inspection, we find that all of these paths are destined to the AWS Australia region and are sourced from GCP regions in Oregon US and Montreal, Canada, respectively. All of these paths benefit from AWS’ lower transit cost towards Australia by handing off their traffic towards a nearby AWS region. Motivated by this observation, for each set of endpoint pairs we find the path with the minimum cost. We find that the cost of traffic sourced from all GCP regions (except for GCP Australia) and destined to AWS Australia can be reduced by 28% 200 by relying on AWS’ network as a relay hop. These cost-optimal paths on average experience a 72% inflation in their latency. Main findings: The added cost of overlay networks is not highly prohibitive. In addition to the inherent benefits of multi-cloud settings, our results demonstrate that enterprises and cloud users can construct high-performance overlay networks atop multi-cloud underlays in a cost-aware manner. 6.3.6 Further Optimization Through IXPs. Motivated by the observations within Kotronis et al. (2016), we investigate the possibility of creating optimal inter-CP paths via IXP relays. Using this approach an enterprise (or possibly a third-party relay service provider) would peer CPs at IXPs that have multiple CPs present and would relay traffic between their networks. We should note that the results presented in this section offer upper bounds on the amount of latency reduction and realization of these values in practice are dependent on several factors including (i) enterprise or a third-party entity should be present at IXP relay points and has to peer with the corresponding CPs, (ii) relay nodes should implement an address translation scheme since CPs would only route traffic to destination addresses within a peers address space, (iii) CPs could have restrictions on which portion of their network is reachable from each peering point and therefor a customers cloud traffic might not be routable to certain IXP relays. Towards this goal, we gather a list of ∼20k IXP tenant interface addresses using CAIDA’s aggregate IXP dataset CAIDA (2018) corresponding to 741 IXPs in total. We limit our focus to 143 IXPs which host more than one of our target CPs (i.e. an enterprise or third-party relay provider has the opportunity to peer with more than one CP). Given that IXP tenants can peer remotely, we only limit our focus to the interface addresses of CPs within an IXP and perform path and 201 latency probes using the same methodology described in § 6.3.1. We approximate the latency of each CP region towards each unique IXP by relying on the median of measured latencies. We augment our connectivity graph by creating nodes for each IXP and place an edge between IXPs and the regions of each CP that is a tenant of that IXP. Furthermore, we annotate edges with their corresponding measured minimum latency. Using this augmented graph we measure the optimal overlay paths between CP region pairs. Out of the 4.56k path, about 3.21k (compared to 3.16k CP based overlays) can benefit from overlay paths that have IXP relay nodes. About 0.19k of the optimized overlay paths exclusively rely on IXP relays, i.e. CPs do not appear as relay nodes along the path. Figure 44 depicts the distribution of latency reduction percentage for optimal paths using CP relays, IXP relays, and a combination of CP and IXP relays. From this figure, we observe that IXP relays offer minimal improvement to multi-cloud overlay paths indicating that CP paths are extremely optimized and that CPs tend to leverage peering opportunities with other CPs when available. We should note that the results in this section explore a hypothetical relay service provider that only operates within IXPs that have more than one CP. Further improvements in multi-cloud connectivity via dark fiber paths between IXPs/colos hosting a single CP are part of future work that we would like to investigate. 6.4 Evaluation of Tondbaz 6.4.1 Case Studies of Optimal Paths. Given the large number of possible paths between all CP regions, we select a handful of large scale areas that are most likely to be utilized by enterprises which have WAN deployments. For 202 1.0 CP IXP 0.5 IXP+CP 0.0 0 20 40 60 80 Latency Reduction % Figure 44. Distribution for RTT reduction percentage through CP, IXP, and CP+IXP relay paths. each set of regions, we present the optimal path and discuss the cost penalty for traversing through this path. US East - US West: All of our target CPs have representative regions near northern Virginia on the east coast of the US. Contrary to the east coast, CP regions on the west coast are not concentrated in a single area. AZR is the only CP with a region within the state of Washington, GCP and AWS have deployments within Oregon, and AWS and AZR have regions in northern California while GCP has a region in southern California. The shortest path between US coasts is possible through an overlay between AZR on the east coast and AWS in northern California with a median RTT of 59.1 ms and a traffic cost of about $86 for transmitting 1 TB of data. By accepting a 1 ms increase in RTT the cost of transmitting 1 TB of traffic can be reduced to $10 by utilizing GCP regions on both US coasts. US East - Europe: For brevity, we group all european regions together. The optimal path between the east coast of US and Europe is between AWS in northern Virginia and AZR in Ireland with a median RTT of 66.4 ms and a traffic cost of 203 about $90 for transmitting 1 TB of data. The cost of traffic can be reduced to $20 for 1 TB of data by remaining within AWS’ network and transmitting traffic between AWS in norther Virginia and AWS in Ireland with an RTT of 74.7 ms. US East - South America: All CP regions within south America are located in São Paulo Brazil. The optimal path between these areas is established through AZR in northern Virginia and AWS in Brazil. The median of RTT for this path is 116.7 ms and transmitting 1 TB of data would cost about $87. Interestingly, this optimal path also has the lowest cost for transmitting data between these areas. US East - South Africa: AZR is the only CP that is present in South Africa, the optimal path between each CP’s region in northern Virginia and AZR’s region in South Africa all have the same amount RTT of about 231 ms with the traffic cost for sourcing traffic from AZR, AWS, and GCP in northern Virginia is $86, $90, and $110 accordingly. US West - South America: As stated earlier the CP regions on the west coast of US are not concentrated in a single area. The most optimal path from all CP regions on the west coast is between GCP in southern California and GCP in Brazil with an RTT of 167.5 ms with a traffic cost of $80 for exchanging 1 TB of data. The optimal path from northern California is made possible through AZR’s region in northern California and AWS in Brazil with an RTT of 169.5 ms and a traffic cost of $86.5 for 1 TB of data. The cheapest path for exchanging 1 TB of traffic is made possible through AWS in northern California and AWS in Brazil for $20 with an RTT of 192.2 ms. US West - Asia East: For east Asia, we consider regions within Japan, South Korea, Hong Kong, and Singapore. The optimal path between the west coast of US and east Asia is possible through GCP’s region on US west coast and GCP in 204 Tokyo Japan with an RTT of 88.5 ms and a traffic cost of $80 for transmitting 1 TB of traffic. Optimal paths destined to AZR in Japan tend to go through GCP relays. The cheapest path for transmitting 1 TB of data from US west coast to east Asia is possible through AWS in Oregon and AWS in Japan for $20 and a median RTT of 98.6 ms. US West - Australia: AWS and GCP both have regions within Sydney Australia while AZR has regions in Sydney, Canberra, and Melbourne Australia. The optimal path from US west coast towards Australia is sourced from GCP in southern California and GCP in Australia with a median RTT of 137 ms and a traffic cost of $150 for 1 TB of data. The next optimal path is possible through AWS in Oregon and AWS in Australia with a median RTT of 138.9 ms and a traffic cost of $20 for 1 TB of data. Optimal paths for other combinations of regions typically benefit from going through GCP and AWS relays with the latter option having lower traffic cost. India - Europe: All 3 CPs have regions within Mumbai India. Furthermore, AZR has 2 more regions within India namely in Pune and Chennai. The optimal path from India towards Europe is sourced from AWS in Mumbai and destined to AWS in France with a median RTT of 103 ms and a traffic cost of $86 for 1 TB of data. The cheapest path is sourced from GCP India and destined to GCP in Belgium with a traffic cost of $80 for 1 TB of data and a median RTT of 110 ms. 6.4.2 Deployment of Overlays. In this section, we demonstrate how Tondbaz creates multi-cloud overlays and empirically measure the latency reduction through the overlay and contrast them with Tondbaz’s estimated latency reductions based on its internal model. 205 overlay underlay 1 2 3 = → = → = → → = → → → Figure 45. Overlay network composed of 2 nodes (V M1 and V M3 ) and 1 relay node (V M2 ). Forwarding rules are depicted below each node. Network overlays can be created either at the application layer or happen transparently at the network layer. In the former case, each application is responsible for incorporating the forwarding logic into the program while in the latter case applications need not be aware of the forwarding logic within the overlay and simply need to utilize IP addresses within the overlay domain. Given the wide range of applications that could be deployed within a cloud environment, we chose to create multi-cloud overlays at the network level. The construction of overlays consists of several high-level steps, namely (i) identifying a private overlay subnet which does not overlap with the private address space of participating nodes, (ii) assigning unique IP addresses to each overlay node including relay nodes, (iii) creating virtual tunneling interfaces and assigning their next-hop address based on the inferred optimal overlay path, and (iv) creating forwarding rules for routing traffic through the correct tunneling interface. To illustrate these steps consider the example overlay network in Figure 45 composed of two nodes (V M1 and V M3 ) and one relay node (V M2 ). Each 206 node has a default interface (highlighted in blue) that is connected to the public Internet. Furthermore, each node can have one or two virtual tunneling interfaces depending on whether they are a regular or relay node in the overlay respectively. Below each node, the forwarding rules to support the overlay network are given. Based on the given forwarding rules, a data packet sourced from an application on V M1 destined to ipd on V M3 would be forwarded to interface ipa where the packet would be encapsulated inside an additional IP header and forwarded to ipy on V M2 . Upon the receipt of this packet, V M2 would decapsulate the outer IP header and since the packet is destined to ipd , it would be forwarded to ipc where it would be encapsulated once again inside an IP header destined to ipz . Once V M3 receives the packet, it would decapsulate the outer IP header and would forward the data packet to the corresponding application on V M3 . Initially, we implemented the overlay construction mechanism using the IPIP module of Linux which simply encapsulates packets within an IP header without applying any encryption to the payload. Although we were able to establish overlay tunnels within GCP and AWS’ network, for an unknown reason Azure’s network would drop our tunneled packets. For this reason, we migrated our tunneling mechanism to WireGuard WireGuard (2019) which encrypts the payload and encapsulates the encrypted content within an IP+UDP header. This encapsulation mechanism has a minimum of 28 Bytes of overhead corresponding to 8 Bytes for the UDP header + a minimum of 20 Bytes for the IP header which translates to less than 2% overhead for a 1500 MTU. 6.4.2.1 Empirical vs Estimated Overlay Latencies. As stated earlier given the large number of possibilities for creating overlay networks we limit our focus to a handful of cases where Tondbaz estimated a reduction in end-to-end 207 Table 9. List of selected overlay endpoints (first two columns) along the number of relay nodes for each overlay presented in the third column. The default RTT, estimated overlay RTT, and empirical RTT are presented in the last three columns respectively. source destination relays default overlay empiric RTT RTT RTT RTT saving (ms) (ms) (ms) (ms) AWS Hong Kong GCP Hong Kong 1 15.79 2.14 2.25 13.54 AZR Wyoming GCP Oregon 1 49.91 33.74 33.86 16.05 GCP India GCP Germany 2 351.5 148.8 149.02 202.48 GCP Singapore AZR UAE 3 250.08 83.56 84.42 165.66 latency through an overlay network. Table 9 lists the set of selected end-points, number of relay nodes, the RTT of the default path, and Tondbaz’s estimated RTT through the optimal overlay. While limited in number, the selected overlay paths represent a different combination of CP networks, geographic regions, number of relays, and latency reductions. Additionally, we list the set of relay nodes for each selected end-points within Table 10. Table 10. List of selected overlay endpoints (first two columns) along with the optimal relay nodes (third column). source destination relays AWS Hong Kong GCP Hong Kong AZR Hong Kong AZR Wyoming GCP Oregon AZR Washington GCP India GCP Germany AWS India - AWS Germany GCP Singapore AZR UAE AZR Singapore - AZR S.India - AZR W.India 208 For each overlay network, we conduct latency probes over the default and overlay paths for the full duration of a day using 5 minute rounds. Within each round we send 5 latency probes towards each destination address, resulting in a total of about 1.4k measurement samples per endpoint. Additionally, we also probe each VM’s default interface address to obtain a baseline of latency that is needed to traverse the network stack on each VM node. Similar to our observations in § 6.3.2, the measured latencies exhibit tight distributions over both default and overlay paths with a coefficient of variation of less than 0.06. The last column in Table 9 presents the median of empirical latency over the overlay paths. For all overlay paths, we observe that Tondbaz’s estimate deviates less 1ms from empirical measures, with paths having a greater number of relays exhibiting larger amounts of deviation. The observed deviation values are inline with our estimates of network stack traversal overhead for each VM (median of 0.1ms). Summary: in this section, we demonstrated Tondbaz’s overlay construction strategy and showcased its applicability of it through the construction of 4 optimal overlays. Although limited in number, these overlays exhibit the accuracy of Tondbaz’s internal model in assessing overlay end-to-end latency. 6.5 Summary Market push indicates that the future of enterprises is multi-cloud. Unfortunately, there is a technology pull: what is critically lacking is a framework for seamlessly gluing the public cloud resources together in a cost- and performance-aware manner. A key reason behind this technology pull is the lack of understanding of the path, delay, and traffic-cost characteristics of CPs’ private backbones. In this chapter, we presented Tondbaz as a cloud- centric measurement platform and decision support framework for multi-cloud 209 environments. We demonstrate the applicability of our framework by deploying on global cloud regions of AWS, Azure, and GCP. Our cloud-centric measurement study sheds light on the characteristics of CPs’ (private) backbones and reveals several new/interesting insights including optimal cloud backbones, lack of delay and path asymmetries in cloud paths, possible latency improvements in inter- and intra-cloud paths, and traffic-cost characteristics. We present recommendations regarding optimal inter-CP paths for select geographic region pairs. Lastly, we construct a handful of overlay networks and empirically measure the latency through the overlay network and contrast our measures with Tondbaz’s internal model. 210 CHAPTER VII CONCLUSIONS & FUTURE WORK 7.1 Conclusions Cloud providers have been transformative to how enterprises conduct their business. By virtualizing vast resources of compute and storage through centralized data-centers, cloud providers have been an attractive alternative to maintaining in-house infrastructure and well adopted by private and public sectors. While cloud resources have been the center of many research studies, little attention has been dealt towards the connectivity of cloud providers and their effect on the topological structure of the Internet. In this dissertation we presented a holistic analysis of cloud providers and their role in today’s Internet and made the following conclusions: – Cloud providers in conjunction with CDN’s are the major content providers in the Internet and collectively are responsible for a significant portion of an edge networks traffic; – Similar to CDN networks, cloud providers have been making efforts to reduce their network distance by expanding the set of centralized compute regions in addition to offering new peering services (VPIs) to edge networks; – In terms of connectivity of an enterprise towards cloud providers many factors including the type of connectivity (CPP, TPP, and BEP), cloud providers routing strategy, geo-proximity of cloud resources, and cross-traffic and congestion of TPP networks should be taken into consideration; 211 – The optimal backbone of cloud providers in combination with the tight interconnectivity of cloud provider networks with each other can be leveraged towards the creation of optimal overlays that have a global span; Specifically, we have made the following contributions in each chapter. In Chapter III we utilized traffic traces from an edge network (UOnet) to study the traffic footprint of major content providers and more specifically outline the degree to which their content is served from nearby locations. We demonstrated that the majority of traffic is associated with CDN and cloud providers networks. Furthermore, we devised a technique to identify cache servers residing within other networks which further enlarging the share of CDN networks towards traffic. Lastly, we quantify the effects of content locality on user-perceived performance and observe that many other factors such as last-mile connectivity are the main bottlenecks of performance for end-users. In Chapter IV we present a measurement study of the interconnectivity fabric of Amazon as the largest cloud provider. We pay special attention to VPIs as an emergent and increasingly popular interconnection option for entities such as enterprises that desire highly elastic and flexible connections to cloud providers which bypass the public Internet. We present a methodology for capturing VPIs and offer lower bound estimates on the number of VPI peerings that Amazon utilizes. Next, we present a methodology for geolocating both ends of our inferred peerings. Lastly, we characterize customer networks that peer over various peering options (private, public, VPI) and offer insight into the visibility and routing implications of each peering type from the cloud providers’ perspective. In Chapter V we perform a third-party measurement study to understand the tradeoffs between three multi-cloud connectivity options (CPP, TPP, and 212 BEP). Based on our cloud-centric measurements, we find that CPP routes are better than TPP routes in terms of latency as well as throughput. We attribute the observed performance benefits to CPs’ rich connectivity with other CPs and CPs’ stable and well-designed private backbones. Additionally, we characterize the routing strategies of CPs (hot- cold- potato routing) and highlight their implications on end-to-end network performance metrics. Lastly, we identify that subpar performance characteristics of TPP routes are caused by several factors including border routers, queuing delays, and higher loss-rates on these paths. In Chapter VI we propose and design Tondbaz as a measurement platform and decision support framework for multi-cloud settings. We demonstrate its applicability by conducting path and latency measurements between the global regions of AWS, Azure, and GCP networks. Our measurements highlight the tight interconnectivity of cloud providers networks on a global scale with backbones offering reliable connectivity to their customers. We utilize Tondbaz to measure optimal cloud overlays between various endpoints and by establishing traffic cost models for each cloud provider and inputting them to the decision support framework of Tondbaz we offer insight into the tradeoffs of cost vs performance. Next, we offer recommendations regarding the best connectivity paths between various geographic regions. Lastly, we deploy a handful of overlay networks and through empirical measurements, demonstrate the accuracy of Tondbaz’s network performance estimates based on its internal model. 7.2 Future Work In the following, we present several possible directions for future work that are in line with the presented dissertation. 213 – Exploring the possibility of further improving the connectivity of multi-cloud paths via the utilization of dark fiber links either by cloud providers or third- party connectivity providers is an open research problem. Investigating these possibilities can be beneficiary in obtaining improved multi-cloud connectivity in addition to improving the connectivity of poorly connected cloud regions; – Complementary to our work in Chapter VI, one could measure and profile the connectivity and performance of edge networks towards cloud providers. Profiling the last mile of connectivity between edge users and cloud providers is equally important to study of cloud providers’ backbone performance. This study in conjunction with the optimal cloud overlays generated by Tondbaz would enable us to provide estimates on the performance characteristics of connectivity between edge-users which is facilitated via optimal cloud overlays. In light of the rapid expansion of cloud providers backbones and their increasing role in the transit of Internet traffic conducting this study is of high importance and can provide insight into possible directions of end-user connectivity; – In the current state of the Internet end-users are accustomed to utilizing many free services such as email, video streaming, social media networks, etc. The majority of these free services are funded via targeted advertisement platforms that rely on constructing accurate profiles of users based on their personal interests. These Internet services are based on an economic model of exchanging a user’s personal data and time in return for utilizing free services. The past years have seen an increased interest in the development and adoption of decentralized alternatives Calendar (2019); Docs (2019); Fediverse (2019); Forms.id (2019); IPFS (2019); Mastodon (2019); PeerTube 214 (2019) for many Internet services. These decentralized applications rely on strong cryptography to ensure that only users with proper access/keys have access to the data. Furthermore, given their decentralized nature, the governance of data is not in the hands of a single entity. Decentralized or P2P services can have varying performance depending on the state of the network. The constant push of cloud providers for increasing their locality to end-users in conjunction with the vast amount of storage and compute resources within cloud regions makes them an ideal candidate for having a hybrid deployment of these decentralized services, where part of the deployment is residing on cloud regions and the remainder is deployed on end-user. Studying the performance of decentralized services in a hybrid deployment and contrasting it with their centralized counterparts would facilitate the wide adoption of these services by end-users. Furthermore, estimating the per user operational cost of running these services within cloud environments could be helpful in the advocacy of democratized Internet services; – The rise in multi-cloud deployments by enterprises has fueled the emergence/expansion of cloud providers as well as third-party connectivity providers. The stakeholders in a multi-cloud setting including cloud providers, third-party connectivity providers, and enterprises can have incongruent goals or objectives. For example, cloud providers are interested in maximizing their profit by following certain routing policies while an enterprise is interested in maximizing their performance for the lowest operational cost via the adoption of multi-cloud overlays. Furthermore, stakeholders could lack incentive for sharing information retaining to their internal operation with each other. For example, cloud providers host applications on a set of heterogeneous hardware 215 which in turn could introduce varying degrees of performance for enterprises. Measuring, modeling and mitigating the tussles between all stakeholders of a multi-cloud ecosystem is crucial for the advancement of multi-cloud deployments. – The optimal overlays outlined in Chapter VI would only be beneficial to enterprises that maintain and manage their compute resources, i.e. they do not rely on the added/managed services that cloud providers offer. For example, an enterprise can maintain its stream processing pipeline using Apache Kafka within their cloud instances or rely on managed services like Amazon MSK Amazon (2019a) or Confluent for GCP users Google (2019). In the former case given that an enterprise is in complete control of the service they can benefit from the overlays that are constructed with Tondbaz while in the later case the network connectivity paths are maintained by the cloud provider. The seamless operation of these managed services in a multi-cloud setting would require the development of interoperability layers between managed services of cloud providers. Furthermore, the optimal operation of these managed services requires additional APIs that expose the network layer and provide finer control to cloud users. – Evaluating the connectivity performance for various third-party connectivity providers (TPPs) and a push for the disclosure of such information via public measurement platforms would be beneficial for enterprises seeking optimal hybrid or multi-cloud deployments; – Exploring the adoption of VPIs by the customers of other cloud providers, in addition to repeating the measurements outlined in Chapter IV on a 216 temporal basis would offer a more comprehensive picture of Internet topology in addition to capturing the micro-dynamics of Internet peering enabled by VPIs; 217 REFERENCES CITED Adhikari, V. K., Guo, Y., Hao, F., Varvello, M., Hilt, V., Steiner, M., & Zhang, Z.-l. (2012). Unreeling Netflix: Understanding and Improving Multi-CDN Movie Delivery. In INFOCOM. IEEE. Ager, B., Chatzis, N., Feldmann, A., Sarrar, N., Uhlig, S., & Willinger, W. (2012). Anatomy of a large european ixp. SIGCOMM CCR. Ager, B., Mühlbauer, W., Smaragdakis, G., & Uhlig, S. (2011). Web content cartography. In Internet Measurement Conference (IMC). ACM. Akamai. (2017). Akamai Technologies Facts & Figures. https://www.akamai.com/us/en/about/facts-figures.jsp. Alexander, M., Luckie, M., Dhamdhere, A., Huffaker, B., KC, C., & Jonathan, S. M. (2018). Pushing the boundaries with bdrmapit: Mapping router ownership at internet scale. In Internet Measurement Conference (IMC). Amazon. (2018a). AWS Direct Connect. https://aws.amazon.com/directconnect/. Amazon. (2018b). AWS Direct Connect Frequently Asked Questions. https://aws.amazon.com/directconnect/faqs/. Amazon. (2018c). AWS Direct Connect Partners. https://aws.amazon.com/directconnect/partners/. Amazon. (2018d). AWS Direct Connect | Product Details. https://aws.amazon.com/directconnect/details/. Amazon. (2018e). Describe virtual interfaces. https://docs.aws.amazon.com/cli/latest/reference/directconnect/ describe-virtual-interfaces.html. Amazon. (2018f). Regions and Availability Zones - Amazon Elastic Compute Cloud. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ using-regions-availability-zones.html#concepts-regions -availability-zones. Amazon. (2019a). Amazon Managed Streaming for Apache Kafka. https://aws.amazon.com/msk/. Amazon. (2019b). AWS Transit Gateway. https://aws.amazon.com/transit-gateway/. 218 Amazon. (2019c). EC2 Instance Pricing. https://aws.amazon.com/ec2/pricing/on-demand/. Andersen, D., Balakrishnan, H., Kaashoek, F., & Morris, R. (2001). Resilient overlay networks. In SOSP. ACM. Anwar, R., Niaz, H., Choffnes, D., Cunha, Í., Gill, P., & Katz-Bassett, E. (2015). Investigating interdomain routing policies in the wild. In Internet Measurement Conference (IMC). APNIC. (2018). Measuring IPv6. https://labs.apnic.net/measureipv6/. Augustin, B., Friedman, T., & Teixeira, R. (2007). Multipath tracing with paris traceroute. In End-to-End Monitoring Techniques and Services. IEEE. Augustin, B., Krishnamurthy, B., & Willinger, W. (2009). IXPs: Mapped? In Internet Measurement Conference (IMC). ACM. Baker, F. (1995). Requirements for IP Version 4 Routers (Tech. Rep.). Cisco Systems. Bender, A., Sherwood, R., & Spring, N. (2008). Fixing ally’s growing pains with velocity modeling. In SIGCOMM. ACM. Berman, M., Chase, J. S., Landweber, L., Nakao, A., Ott, M., Raychaudhuri, D., . . . Seskar, I. (2014). GENI: A federated testbed for innovative network experiments. Computer Networks. Beverly, R. (2016). Yarrp’ing the internet: Randomized high-speed active topology discovery. In Internet Measurement Conference (IMC). ACM. Beverly, R., Durairajan, R., Plonka, D., & Rohrer, J. P. (2018). In the IP of the beholder: Strategies for active IPv6 topology discovery. In Internet Measurement Conference (IMC). ACM. Böttger, T., Cuadrado, F., Tyson, G., Castro, I., & Uhlig, S. (2016). Open connect everywhere: A glimpse at the internet ecosystem through the lens of the netflix cdn. arXiv preprint arXiv:1606.05519 . Bozkurt, I. N., Aqeel, W., Bhattacherjee, D., Chandrasekaran, B., Godfrey, P. B., Laughlin, G., . . . Singla, A. (2018). Dissecting latency in the internet’s fiber infrastructure. arXiv preprint arXiv:1811.10737 . Build Azure. (2019). Microsoft Azure Region Map. https://map.buildazure.com/. 219 Burrington, I. (2016). Why Amazon’s Data Centers Are Hidden in Spy Country. https://www.theatlantic.com/technology/archive/2016/01/ amazon-web-services-data-center/423147/. CAIDA. (2018). Archipelago (Ark) measurement infrastructure. http://www.caida.org/projects/ark/. CAIDA. (2018). AS Relationships. http://www.caida.org/data/as-relationships/. CAIDA. (2018). The CAIDA UCSD IXPs Dataset. http://www.caida.org/data/ixps.xml. Calder, M., Fan, X., Hu, Z., Katz-Bassett, E., Heidemann, J., & Govindan, R. (2013). Mapping the Expansion of Google’s Serving Infrastructure. In Internet measurement conference (imc). Calder, M., Flavel, A., Katz-Bassett, E., Mahajan, R., & Padhye, J. (2015). Analyzing the Performance of an Anycast CDN. In Internet Measurement Conference (IMC). ACM. Calder, M., Gao, R., Schröder, M., Stewart, R., Padhye, J., Mahajan, R., . . . Katz-Bassett, E. (2018). Odin: Microsoft’s Scalable Fault-Tolerant {CDN} Measurement System. In NSDI. USENIX. Calendar, S. (2019). Secure Calendar - Free Encrypted Calendar. https://securecalendar.online/. Castro, I., Cardona, J. C., Gorinsky, S., & Francois, P. (2014). Remote Peering: More Peering without Internet Flattening. CoNEXT . Chabarek, J., & Barford, P. (2013). What’s in a name?: decoding router interface names. In Hotplanet. ACM. Chandrasekaran, B., Smaragdakis, G., Berger, A. W., Luckie, M. J., & Ng, K.-C. (2015). A server-to-server view of the internet. In Conext. ACM. Chatzis, N., Smaragdakis, G., Böttger, J., Krenc, T., & Feldmann, A. (2013). On the Benefits of Using a Large IXP as an Internet Vantage Point. In Internet Measurement Conference (IMC). ACM. Chiu, Y.-C., Schlinker, B., Radhakrishnan, A. B., Katz-Bassett, E., & Govindan, R. (2015). Are we one hop away from a better internet? In Internet Measurement Conference (IMC). ACM. Chun, B., Culler, D., Roscoe, T., Bavier, A., Peterson, L., Wawrzoniak, M., & Bowman, M. (2003). Planetlab: an overlay testbed for broad-coverage services. SIGCOMM CCR. 220 Comarela, G., Terzi, E., & Crovella, M. (2016). Detecting unusually-routed ASes: Methods and applications. In Internet Measurement Conference (IMC). ACM. CoreSite. (2018). The Coresite Open Cloud Exchange - One Connection. Countless Cloud Options. https://www.coresite.com/solutions/cloud-services/ open-cloud-exchange. Costa, P., Migliavacca, M., Pietzuch, P., & Wolf, A. L. (2012). Naas: Network-as-a-service in the cloud. In Workshop on hot topics in management of internet, cloud, and enterprise networks and services. Cunha, Í., Marchetta, P., Calder, M., Chiu, Y.-C., Machado, B. V., Pescapè, A., . . . Katz-Bassett, E. (2016). Sibyl: a practical internet route oracle. NSDI . DatacenterMap. (2018). Amazon EC2. http://www.datacentermap.com/cloud/amazon-ec2.html. Demchenko, Y., Van Der Ham, J., Ngo, C., Matselyukh, T., Filiposka, S., de Laat, C., & Escalona, E. (2013). Open cloud exchange (OCX): Architecture and functional components. In Cloud Computing Technology and Science. IEEE. Dhamdhere, A., Clark, D. D., Gamero-Garrido, A., Luckie, M., Mok, R. K., Akiwate, G., . . . Claffy, K. (2018). Inferring persistent interdomain congestion. In SIGCOMM. ACM. Dhamdhere, A., & Dovrolis, C. (2010). The Internet is Flat: Modeling the Transition from a Transit Hierarchy to a Peering Mesh. In CoNEXT. ACM. Diamantidis, N., Karlis, D., & Giakoumakis, E. A. (2000). Unsupervised stratification of cross-validation for accuracy estimation. Artificial Intelligence. Docs, A. (2019). Arcane Docs – Blockchain-based alternative for Google Docs. https://docs.arcaneoffice.com/signup/. Durairajan, R., Barford, C., & Barford, P. (2018). Lights Out: Climate Change Risk to Internet Infrastructure. In Proceedings of the applied networking research workshop. ACM. Durairajan, R., Ghosh, S., Tang, X., Barford, P., & Eriksson, B. (2013). Internet atlas: a geographic database of the internet. In Hotplanet. Durairajan, R., Sommers, J., & Barford, P. (2014). Layer 1-Informed Internet Topology Measurement. In Internet Measurement Conference (IMC). ACM. 221 Durairajan, R., Sommers, J., Willinger, W., & Barford, P. (2015). InterTubes: A Study of the US Long-haul Fiber-optic Infrastructure. In SIGCOMM. ACM. Durumeric, Z., Wustrow, E., & Halderman, J. A. (2013). Zmap: Fast internet-wide scanning and its security applications. In Usenix security symposium. Eclipse. (2019). Eclipse Paho - MQTT and MQTT-SN Software. http://www.eclipse.org/paho/. EdgeConneX. (2018). Space, power and connectivity. http://www.edgeconnex.com/company/about/. Engebretson, J. (2014). Verizon-netflix dispute: Is netflix using direct connections or not? https://www.telecompetitor.com/ verizon-netflix-dispute-netflix-using-direct-connections/. The enterprise deployment game-plan: why multi-cloud is the future. (2018). https://blog.ubuntu.com/2018/08/30/the-enterprise-deployment -game-plan-why-multi-cloud-is-the-future. Equinix. (2017). Cloud Exchange. http://www.equinix.com/services/ interconnection-connectivity/cloud-exchange/. Eriksson, B., Durairajan, R., & Barford, P. (2013). RiskRoute: A Framework for Mitigating Network Outage Threats. CoNEXT . doi: 10.1145/2535372.2535385 European Internet Exchange Association. (2018). https://www.euro-ix.net/. Example Applications Services. (2018). https://builtin.com/cloud-computing/ examples-applications-services. Fan, X., Katz-Bassett, E., & Heidemann, J. (2015). Assessing Affinity Between Users and CDN Sites. In Traffic monitoring and analysis. Springer. Fanou, R., Francois, P., & Aben, E. (2015). On the diversity of interdomain routing in Africa. In Passive and active measurements (pam). Fediverse. (2019). Fediverse. https://fediverse.party/. Five Reasons Why Multi-Cloud Infrastructure is the Future of Enterprise IT. (2018). https://www.cloudindustryforum.org/content/five-reasons -why-multi-cloud-infrastructure-future-enterprise-it. Fontugne, R., Pelsser, C., Aben, E., & Bush, R. (2017). Pinpointing delay and forwarding anomalies using large-scale traceroute measurements. In Internet Measurement Conference (IMC). ACM. 222 Fontugne, R., Shah, A., & Aben, E. (2018). The (thin) bridges of as connectivity: Measuring dependency using as hegemony. In Passive and Active Measurement (PAM). Springer. Forms.id. (2019). Private, simple, forms. | Forms.id. https://forms.id/. The Future of IT Transformation Is Multi-Cloud. (2018). https://searchcio.techtarget.com/Rackspace/ The-Future-of-IT-Transformation-Is-Multi-Cloud. The Future of Multi-Cloud: Common APIs Across Public and Private Clouds. (2018). https://blog.rackspace.com/ future-multi-cloud-common-apis-across-public-private-clouds. The Future of the Datacenter is Multicloud. (2018). https://www.nutanix.com/ 2018/11/01/future-datacenter-multicloud/. Gartner. (2016). https://www.gartner.com/doc/3396633/ market-trends-cloud-adoption-trends. Gasser, O., Scheitle, Q., Foremski, P., Lone, Q., Korczyński, M., Strowes, S. D., . . . Carle, G. (2018). Clusters in the expanse: Understanding and unbiasing IPv6 hitlists. In Internet Measurement Conference (IMC). ACM. Gehlen, V., Finamore, A., Mellia, M., & Munafò, M. M. (2012). Uncovering the big players of the web. In Lecture notes in computer science. Springer. Gharaibeh, M., Shah, A., Huffaker, B., Zhang, H., Ensafi, R., & Papadopoulos, C. (2017). A look at router geolocation in public and commercial databases. In Internet Measurement Conference (IMC). ACM. Gill, P., Arlitt, M., Li, Z., & Mahanti, A. (2008). The Flattening Internet Topology: Natural Evolution, Unsightly Barnacles or Contrived Collapse? In Passive and Active Measurement (PAM). Springer. Giotsas, V., Dhamdhere, A., & Claffy, K. C. (2016). Periscope: Unifying looking glass querying. In Passive and Active Measurement (PAM). Springer. Giotsas, V., Dietzel, C., Smaragdakis, G., Feldmann, A., Berger, A., & Aben, E. (2017). Detecting peering infrastructure outages in the wild. In SIGCOMM. ACM. Giotsas, V., Luckie, M., Huffaker, B., & Claffy, K. (2015). Ipv6 as relationships, cliques, and congruence. In Passive and Active Measurements (PAM). Giotsas, V., Luckie, M., Huffaker, B., et al. (2014). Inferring Complex AS Relationships. In Internet Measurement Conference (IMC). ACM. 223 Giotsas, V., Smaragdakis, G., Huffaker, B., Luckie, M., & claffy, k. (2015). Mapping Peering Interconnections to a Facility. In CoNEXT. Giotsas, V., & Zhou, S. (2012). Valley-free violation in internet routing—analysis based on bgp community data. In International conference on communications. Giotsas, V., & Zhou, S. (2013). Improving the discovery of IXP peering links through passive BGP measurements. In INFOCOM. Giotsas, V., Zhou, S., Luckie, M., & Klaffy, K. (2013). Inferring Multilateral Peering. In CoNEXT. ACM. Google. (2018a). GCP Direct Peering. https://cloud.google.com/ interconnect/docs/how-to/direct-peering. Google. (2018b). Google supported service providers. https://cloud.google.com/ interconnect/docs/concepts/service-providers. Google. (2018c). Partner Interconnect | Google Cloud. https://cloud.google.com/interconnect/partners/. Google. (2019). Apache Kafka for GCP Users. https://cloud.google.com/blog/products/gcp/apache-kafka-for-gcp -users-connectors-for-pubsub-dataflow-and-bigquery. Google. (2019). Data center locations. https://www.google.com/about/ datacenters/inside/locations/index.html. Google. (2019). Google Compute Engine Pricing. https://cloud.google.com/compute/pricing#network. Govindan, R., & Tangmunarunkit, H. (2000). Heuristics for Internet map discovery. In INFOCOM. Graham, R., Mcmillan, P., & Tentler, D. (2014). Mass Scanning the Internet: Tips, Tricks, Results. In Def Con 22. Green, T., Lambert, A., Pelsser, C., & Rossi, D. (2018). Leveraging inter-domain stability for bgp dynamics analysis. In Passive and Active Measurement (PAM). Springer. Gregori, E., Improta, A., Lenzini, L., & Orsini, C. (2011). The impact of IXPs on the AS-level topology structure of the Internet. Computer Communications. Gunes, M., & Sarac, K. (2009). Resolving IP aliases in building traceroute-based Internet maps. Transactions on Networking (ToN). 224 Gunes, M. H., & Sarac, K. (2006). Analytical IP alias resolution. In International conference on communications. Gupta, A., Calder, M., Feamster, N., Chetty, M., Calandro, E., & Katz-Bassett, E. (2014). Peering at the Internet’s Frontier: A First Look at ISP Interconnectivity in Africa. Passive and Active Measurements (PAM). Haq, O., Raja, M., & Dogar, F. R. (2017). Measuring and improving the reliability of wide-area cloud paths. In WWW. ACM. He, Y., Siganos, G., Faloutsos, M., & Krishnamurthy, S. (2009). Lord of the links: a framework for discovering missing links in the Internet topology. Transactions on Networking (ToN). Hofmann, H., Kafadar, K., & Wickham, H. (2011). Letter-value plots: Boxplots for large data (Tech. Rep.). had.co.nz. How multi-cloud business models will shape the future. (2018). https://www.cloudcomputing-news.net/news/2018/oct/05/ how-multi-cloud-business-models-will-shape-future/. Huffaker, B., Fomenkov, M., & claffy, k. (2014). DRoP:DNS-based Router Positioning. SIGCOMM CCR. Huffaker, B., Fomenkov, M., et al. (2014). Drop: Dns-based router positioning. SIGCOMM CCR. Huffaker, B., Keys, K., Fomenkov, M., & Claffy, K. (2018). As-to-organization dataset. http://www.caida.org/research/topology/as2org/. Hwang, F. K., & Richards, D. S. (1992). Steiner tree problems. Networks. Hyun, Y. (2006). Archipelago measurement infrastructure. In CAIDA-WIDE Workshop. IBM bets on a multi-cloud future. (2018). https://www.zdnet.com/article/ibm-bets-on-a-multi-cloud-future/. IP2Location. (2015). IP2Location DB9, 2015. http://www.ip2location.com/. IP2Location. (2018). IP address geolocaiton. https://www.ip2location.com/database/ip2location. IPFS. (2019). IPFS is the Distributed Web. https://ipfs.io/. Jacobson, V. (1989). traceroute. ftp://ftp.ee.lbl.gov/traceroute.tar.gz. 225 Jonathan, A., Chandra, A., & Weissman, J. (2018). Rethinking adaptability in wide-area stream processing systems. In Hot topics in cloud computing. USENIX. Kang, M. S., & Gligor, V. D. (2014). Routing bottlenecks in the internet: Causes, exploits, and countermeasures. In Computer and communications security. ACM. Kang, M. S., Lee, S. B., & Gligor, V. D. (2013). The crossfire attack. Symposium on Security and Privacy. Katz-Bassett, E., Scott, C., Choffnes, D. R., Cunha, Í., Valancius, V., Feamster, N., . . . Krishnamurthy, A. (2012). LIFEGUARD: Practical repair of persistent route failures. In SIGCOMM. Keys, K., Hyun, Y., Luckie, M., & Claffy, K. (2013). Internet-Scale IPv4 Alias Resolution with MIDAR. Transactions on Networking (ToN). Khan, A., Kwon, T., Kim, H.-c., & Choi, Y. (2013). AS-level topology collection through looking glass servers. In Internet Measurement Conference (IMC). Klöti, R., Ager, B., Kotronis, V., Nomikos, G., & Dimitropoulos, X. (2016). A comparative look into public ixp datasets. SIGCOMM CCR. Knight, S., Nguyen, H. X., Falkner, N., Bowden, R., & Roughan, M. (2011). The Internet topology zoo. Selected Areas in Communications. Kotronis, V., Klöti, R., Rost, M., Georgopoulos, P., Ager, B., Schmid, S., & Dimitropoulos, X. (2016). Stitching Inter-Domain Paths over IXPs. In Symposium on SDN Research. ACM. Kotronis, V., Nomikos, G., Manassakis, L., Mavrommatis, D., & Dimitropoulos, X. (2017). Shortcuts through colocation facilities. In Internet Measurement Conference (IMC). ACM. Krishna, A., Cowley, S., Singh, S., & Kesterson-Townes, L. (2018). Assembling your cloud orchestra: A field guide to multicloud management. https://www.ibm.com/thought-leadership/institute-business-value/ report/multicloud. Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, J., & Jahanian, F. (2010). Internet inter-domain traffic. In SIGCOMM. ACM. Lad, M., Oliveira, R., Zhang, B., & Zhang, L. (2007). Understanding resiliency of internet topology against prefix hijack attacks. In International conference on dependable systems and networks. 226 Lai, F., Chowdhury, M., & Madhyastha, H. V. (2018). To relay or not to relay for inter-cloud transfers? In Workshop on hot topics in cloud computing. Li, L., Alderson, D., Willinger, W., & Doyle, J. (2004). A first-principles approach to understanding the internet’s router-level topology. In SIGCOMM CCR. Limelight. (2017). Private global content delivery network. https://www.limelight.com/network/. Lodhi, A., Larson, N., Dhamdhere, A., Dovrolis, C., et al. (2014). Using peeringDB to understand the peering ecosystem. SIGCOMM CCR. Luckie, M. (2010). Scamper: a scalable and extensible packet prober for active measurement of the internet. In Internet Measurement Conference (IMC). Luckie, M., & Beverly, R. (2017). The impact of router outages on the as-level internet. In Sigcomm. Luckie, M., Dhamdhere, A., Huffaker, B., Clark, D., et al. (2016). bdrmap: Inference of Borders Between IP Networks. In Internet measurement conference (imc). Luckie, M., Huffaker, B., Dhamdhere, A., & Giotsas, V. (2013). AS Relationships, Customer Cones, and Validation. IMC . doi: 10.1145/2504730.2504735 Luckie, M., Huffaker, B., Dhamdhere, A., Giotsas, V., et al. (2013). As relationships, customer cones, and validation. In Internet Measurement Conference (IMC). ACM. Luckie, M., et al. (2014). A second look at detecting third-party addresses in traceroute traces with the IP timestamp option. In Passive and Active Measurement (PAM). Marder, A., & Smith, J. M. (2016). MAP-IT: Multipass Accurate Passive Inferences from Traceroute. In Internet Measurement Conference (IMC). ACM. Mastodon. (2019). Giving social networking back to you. https://joinmastodon.org/. Mathis, M., Semke, J., Mahdavi, J., & Ott, T. (1997). The macroscopic behavior of the TCP congestion avoidance algorithm. SIGCOMM CCR. MaxMind. (2018). GeoIP2 databases. https://www.maxmind.com/en/geoip2-databases. Megaport. (2019a). Megaport Pricing. https://www.megaport.com/pricing/. 227 Megaport. (2019b). Nine Common Scenarios of multi-cloud design. https://knowledgebase.megaport.com/megaport-cloud-router/ nine-common-scenarios-for-multicloud-design/. Microsoft. (2018a). Azure ExpressRoute. https://azure.microsoft.com/en-us/services/expressroute/. Microsoft. (2018b). ExpressRoute connectivity partners. https://azure.microsoft.com/en-us/services/expressroute/ connectivity-partners/. Microsoft. (2018c). Expressroute partners and peering locations. https://docs.microsoft.com/en-us/azure/expressroute/ expressroute-locations. Microsoft. (2019). Bandwidth Pricing. https://azure.microsoft.com/en-us/pricing/details/bandwidth/. Miller, R. (2015). Regional Data Center Clusters Power Amazon’s Cloud. https://datacenterfrontier.com/ regional-data-center-clusters-power-amazons-cloud/. M-Lab. (2018). NDT (Network Diagnostic Tool). https://www.measurementlab.net/tests/ndt/. Motamedi, R., Yeganeh, B., Chandrasekaran, B., Rejaie, R., Maggs, B., & Willinger, W. (2019). On Mapping the Interconnections in Today’s Internet. Transactions on Networking (ToN). NetAcuity. (2018). Industry-standard geolocation. https://www.digitalelement.com/solutions/. Netflix. (2017a). Internet connection speed requirements. https://help.netflix.com/en/node/306. Netflix. (2017b). Open Connect Appliance Overview. https://openconnect.netflix.com/en/appliances-overview/. Nomikos, G., & Dimitropoulos, X. (2016). traIXroute: Detecting IXPs in traceroute paths. In Passive and Active Measurement (PAM). Nomikos, G., Kotronis, V., Sermpezis, P., Gigis, P., Manassakis, L., Dietzel, C., . . . Giotsas, V. (2018). O Peer, Where Art Thou?: Uncovering Remote Peering Interconnections at IXPs. In Internet Measurement Conference (IMC). Nur, A. Y., & Tozal, M. E. (2018). Cross-as (x-as) internet topology mapping. Computer Networks. 228 OASIS. (2019). MQTT. http://mqtt.org/. One-Way Ping (OWAMP). (2019). http://software.internet2.edu/owamp/. Orsini, C., King, A., Giordano, D., Giotsas, V., & Dainotti, A. (2016). BGPStream: a software framework for live and historical BGP data analysis. In Internet Measurement Conference (IMC). ACM. Packet Clearing House. (2017). Routing archive. https://www.pch.net. Packet Clearing House. (2018). MRT Routing Updates. https://www.pch.net/resources/Raw_Routing_Data/. PacketFabric. (2019). Cloud Connectivity. https://www.packetfabric.com/packetcor#pricing. Padhye, J., Firoiu, V., Towsley, D., & Kurose, J. (1998). Modeling tcp throughput: A simple model and its empirical validation. SIGCOMM CCR. Padmanabhan, V. N., & Subramanian, L. (2001). An investigation of geographic mapping techniques for internet hosts. In SIGCOMM CCR. Palmer, C. R., Siganos, G., Faloutsos, M., Faloutsos, C., & Gibbons, P. (2001). The connectivity and fault-tolerance of the internet topology. In Workshop on Network-Related Data Management (NRDM). PeeringDB. (2017). Exchange Points List. https://peeringdb.com/. PeerTube. (2019). Join PeerTube. https://joinpeertube.org/. Plaven, G. (2017). Amazon keeps building data centers in umatilla, morrow counties. http://www.eastoregonian.com/eo/local-news/20170317/ amazon-keeps-building-data-centers-in-umatilla-morrow-counties. Pureport. (2019). Pricing - Pureport. https://www.pureport.com/pricing/. Quan, L., Heidemann, J., & Pradkin, Y. (2013). Trinocular: Understanding internet reliability through adaptive probing. In SIGCOMM CCR. Richter, P., Smaragdakis, G., Feldmann, A., Chatzis, N., Boettger, J., & Willinger, W. (2014). Peering at peerings: On the role of IXP route servers. In Internet Measurement Conference (IMC). RIPE. (2018). Routing information service (ris). https://www.ripe.net/ analyse/internet-measurements/routing-information-service-ris. RIPE. (2019). RIPE RIS. 229 RIPE NCC. (2016). RIPE Atlas. Robusto, C. C. (1957). The cosine-haversine formula. The American Mathematical Monthly. SamKnows. (2018). The internet measurement standard. https://www.samknows.com/. Sanchez, M. A., Bustamante, F. E., Krishnamurthy, B., Willinger, W., Smaragdakis, G., & Erman, J. (2014). Inter-domain traffic estimation for the outsider. In Internet Measurement Conference (IMC). Sánchez, M. A., Otto, J. S., Bischof, Z. S., Choffnes, D. R., Bustamante, F. E., Krishnamurthy, B., & Willinger, W. (2013). Dasu: Pushing experiments to the Internet’s edge. In NSDI. Scheitle, Q., Gasser, O., Sattler, P., & Carle, G. (2017). Hloc: Hints-based geolocation leveraging multiple measurement frameworks. In Network Traffic Measurement and Analysis Conference. Schlinker, B., Kim, H., Cui, T., Katz-Bassett, E., Madhyastha, H. V., Cunha, I., . . . Zeng, H. (2017). Engineering egress with edge fabric: Steering oceans of content to the world. In SIGCOMM. Schulman, A., & Spring, N. (2011). Pingin’in the rain. In Internet Measurement Conference (IMC). Shavitt, Y., & Shir, E. (2005). Dimes: Let the internet measure itself. SIGCOMM CCR. Sherwood, R., Bender, A., & Spring, N. (2008). DisCarte: A Disjunctive Internet Cartographer. In SIGCOMM CCR. Siganos, G., & Faloutsos, M. (2004). Analyzing BGP policies: Methodology and tool. In INFOCOM. Singla, A., Chandrasekaran, B., Godfrey, P., & Maggs, B. (2014). The internet at the speed of light. In Proceedings of hot topics in networks. Sodagar, I. (2011). The MPEG-dash standard for multimedia streaming over the internet. In IEEE MultiMedia. Spring, N., Dontcheva, M., Rodrig, M., & Wetherall, D. (2004). How to resolve IP aliases (Tech. Rep.). Univ. Michigan, UW CSE. Spring, N., Mahajan, R., & Wetherall, D. (2002). Measuring isp topologies with rocketfuel. SIGCOMM CCR. 230 Sundaresan, S., Burnett, S., Feamster, N., & De Donato, W. (2014). BISmark: A Testbed for Deploying Measurements and Applications in Broadband Access Networks. In Usenix annual technical conference. Sundaresan, S., Feamster, N., & Teixeira, R. (2015). Measuring the Performance of User Traffic in Home Wireless Networks. In Passive and Active Measurement (PAM). ACM. Tariq, M. M. B., Dhamdhere, A., Dovrolis, C., & Ammar, M. (2005). Poisson versus periodic path probing (or, does pasta matter?). In Internet Measurement Conference (IMC). TeamCymru. (2008). IP to ASN mapping. https://www.team-cymru.com/IP-ASN-mapping.html. Tokusashi, Y., Matsutani, H., & Zilberman, N. (2018). Lake: An energy efficient, low latency, accelerated key-value store. arXiv preprint arXiv:1805.11344 . Torres, R., Finamore, A., Kim, J. R., Mellia, M., Munafò, M. M., & Rao, S. (2011). Dissecting video server selection strategies in the YouTube CDN. In International conference on distributed computing systems. IEEE. Tozal, M. E., & Sarac, K. (2011). Palmtree: An ip alias resolution algorithm with linear probing complexity. Computer Communications. Triukose, S., Wen, Z., & Rabinovich, M. (2011). Measuring a commercial content delivery network. In World Wide Web (WWW). ACM. University of Oregon. (2018). University of oregon route views project. http://www.routeviews.org/routeviews/. WikiLeaks. (2018). Amazon Atlas. https://wikileaks.org/amazon-atlas/. Williams, M. (2016). Amazon’s central Ohio data centers now open. http://www.dispatch.com/content/stories/business/2016/10/18/ amazon-data-centers-in-central-ohio-now-open.html. WireGuard. (2019). WireGuard: fast, modern, secure VPN tunnel. Wohlfart, F., Chatzis, N., Dabanoglu, C., Carle, G., & Willinger, W. (2018). Leveraging interconnections for performance: the serving infrastructure of a large CDN. In SIGCOMM. Xia, J., & Gao, L. (2004). On the evaluation of AS relationship inferences [Internet reachability/traffic flow applications]. In GLOBECOM. 231 Yap, K.-K., Motiwala, M., Rahe, J., Padgett, S., Holliman, M., Baldus, G., . . . others (2017). Taking the edge off with espresso: Scale, reliability and programmability for global internet peering. In SIGCOMM. Yeganeh, B., Durairajan, R., Rejaie, R., & Willinger, W. (2019). How cloud traffic goes hiding: A study of amazon’s peering fabric. In Internet Measurement Conference (IMC). Yeganeh, B., Rejaie, R., & Willinger, W. (2017). A view from the edge: A stub-as perspective of traffic localization and its implications. In Network Traffic Measurement and Analysis Conference (TMA). Zarchy, D., Dhamdhere, A., Dovrolis, C., & Schapira, M. (2018). Nash-peering: A new techno-economic framework for internet interconnections. In INFOCOM Computer Communications Workshops. ZDNet. (2019). Top cloud providers 2019. https://tinyurl.com/y526vneg. Zhang, M., Zhang, C., Pai, V. S., Peterson, L. L., & Wang, R. Y. (2004). Planetseer: Internet path failure monitoring and characterization in wide-area services. In OSDI. 232