Authors Carlo Augusto Grazia Martin Klapez Maurizio Casoni
License CC-BY-4.0
Received September 7, 2021, accepted September 16, 2021, date of publication September 20, 2021, date of current version September 27, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3113891 The New TCP Modules on the Block: A Performance Evaluation of TCP Pacing and TCP Small Queues CARLO AUGUSTO GRAZIA , (Member, IEEE), MARTIN KLAPEZ , AND MAURIZIO CASONI , (Senior Member, IEEE) Department of Engineering Enzo Ferrari, University of Modena and Reggio Emilia, 41125 Modena, Italy Corresponding author: Carlo Augusto Grazia (carloaugusto.grazia@unimore.it) ABSTRACT Google and the Bufferbloat community have designed several solutions to reduce Internet latency in recent years, involving different TCP-IP stack layers. One of these solutions is named TCP Small Queues (TSQ) and reduces a TCP flow latency by controlling the number of packets that each TCP socket can enqueue in the sender node. It works in conjunction with TCP Pacing (TP), which affects the actual TSQ size as a function of the TCP rate. This paper analyzes TSQ and TP’s performance through real-system tests over different networks’ bottlenecks, emphasizing Wi-Fi technologies, where their behavior strongly affects the Wi-Fi frame aggregation mechanism. INDEX TERMS Bufferbloat, latency, TCP small queues. I. INTRODUCTION reduce latency without reducing the throughput. For what Internet latency issues have been exposed under a magnifying concerns wireless environments, instead, other few scientific glass in recent years, involving big players like Google and contributions [7]–[9], dealing with LTE and Wi-Fi technolo- the Bufferbloat community to design novel algorithms for gies, report some insight on latency reduction by tuning the several TCP-IP layers. To name one, the Bufferbloat com- TSQ size. Unfortunately, these previous works deal with old munity has developed the hybrid packet scheduler and queue Linux kernels, in which TSQ was operating statically by manager Flow Queue Controlled Delay (FQ-CoDel) [1]. imposing an amount of data to be enqueued, instead of con- FQ-CoDel has quickly become the standard queueing disci- trolling the amount of data to be enqueued dynamically based pline for many Linux-based end nodes and routers. At the on the current data delivery rate. Moreover, none of these same time, even Google has designed a couple of remarkable cited works accommodate a broader analysis of TSQ and algorithms like TCP Bottleneck Bandwidth and Round-trip TP’s interaction in real and up-to-date environments. Indeed, propagation time (BBR) [2] and TCP Small Queues (TSQ). to properly investigate the TSQ impact on latency reduction, The former is a transport layer solution, and the latter is a a complete system involving TCP BBR and FQ-CoDel must cross-layering solution, considering the TCP-IP stack. Unlike also be considered concerning the bottleneck position in the FQ-CoDel, which can be deployed on any node of the path, network path. The sole literature contributions combining BBR and TSQ are designed exclusively for the end nodes. TSQ and TP, to the best of our knowledge, are [10] and [11]: To the best of our knowledge, there is a lack of scientific in the former, the authors analyzed the CPU impact of these contributions related to TSQ and TP. In particular, we report solutions, discussing software bottlenecks more than network a lack in the performance analysis of TSQ in conjunction bottlenecks, while in the latter, the authors investigated the with these new TCP-IP solutions involving all the stack lay- behavior of different TCP congestion controls regarding an ers. The TSQ module alone is reported in a few scientific hybrid bottleneck network which involves only Wi-Fi 6 envi- contributions [3]–[6] involving wired networks, but without ronments. investigating the main purpose of the mechanism, i.e., to The contribution of this paper is (i) the description of the TSQ and TP, contextualized in the latest Long-Term The associate editor coordinating the review of this manuscript and Support (LTS) version of the Linux kernel 5.10-lts with the approving it for publication was Barbara Masini . other involved TCP-IP solutions on both wired and wireless This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 9, 2021 129329 C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ FIGURE 1. TCP-IP linux stack. scenarios, and (ii) a real-test analysis of the TSQ and TP A. TP AND TSQ performance under different network technologies. The paper The most significant change experienced by the TCP-IP is structured as follows: Section II describes the latest TCP-IP stack in recent years has been the introduction of Linux stack and Section III illustrates the testbed used to TP and TSQ. The cooperative work of these two collect our data. Finally, Section IV shows the results and TCP submodules strongly impacts the way packets Section V concludes the paper. are delivered by the TCP socket, affecting the TCP RTT and the system latency. TP is controlled by two II. TCP-IP LINUX STACK system variables, i.e., tcp_pacing_ss_ratio and The current and up-to-date TCP-IP Linux stack is depicted tcp_pacing_ca_ratio, used in the slow start and in Figure 1. We report the three main blocks involved in a TCP the congestion avoidance phases, respectively, as reported flow transmission with the whole TCP transport block on the in Algorithm 1. The TCP socket’s final TCP-paced rate left, the Queueing Layer corresponding to the TCP-IP net- to deliver data is then adjusted with a pacing ratio that working layer in the middle and the host-to-network Driver changes according to the TCP transmission phase. By default, block on the right. The TCP congestion control algorithm role tcp_pacing_ss_ratio is equal to 2 in the slow-start did not change recently, so the TCP socket is still calculating phase, and tcp_pacing_ca_ratio is equal to 1.2 in the the congestion window (CWND) and dealing with the ACK congestion avoidance phase. This means that the TCP flow reception according to the algorithm used. The most signifi- doubles the slow-start phase rate and increases it by 20% cant change in recent years has been in the way packets are in the congestion-avoidance phase. This mechanism allows delivered by the TCP socket, now regulated by TSQ and TP, probing for more bandwidth without forming excessive bursts also shown in Figure 1. Once the TCP socket delivers the of packets in the path’s network queues. packets, they are enqueued in the lower layers. The Queueing Layer, depicted in the middle of Figure 1, deploys a standard Algorithm 2 TCP Small Queue FQ-CoDel algorithm, which is the default solution in recent Input: TCP_SOCKET sk; kernels with many Linux distributions [1]. Once the scheduler 1: int limit; delivers the packets, they move to the last block, where the 2: limit = max(2 * sk→pktsize, sk→tcp_pacing_rate driver firmware implements the last hardware queue before 10); moving to the physical medium channel. Once a packet is 3: limit = min(limit, tcp_limit_output_bytes); physically transmitted, a completion signal is cross-passed to the TSQ algorithm. On the other hand, the TCP paced rate is used to calculate, in conjunction with the TSQ mechanism, the number of packets that a TCP socket can enqueue in the sender stack. Algorithm 1 TCP Pacing Rate This quantity is a dynamic value described in Algorithm 2. Input: TCP_SOCKET sk, int baseRTT ; According to the algorithm, the TSQ limit is always higher 1: int rate = mss * sk→cwnd / baseRTT ; than a minimum amount of 2 packets and lower than a maxi- 2: if sk→cwnd < sk→ssthresh / 2 then mum amount of limit_output_bytes bytes (128 KB by 3: rate *= tcp_pacing_ss_ratio; // SlowStart phase default). The dynamic limit moves through these two bounds 4: else and is the amount of data that corresponds to a latency equal 5: rate *= tcp_pacing_ca_ratio; // Cong.Avoid. phase to 1 ms by default. Algorithm 2 clarifies this behavior: the 6: end if dynamic amount of data that can be enqueued is calculated 129330 VOLUME 9, 2021 C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ through sk→tcp_pacing_rate 10, which is a 10-bit shift of the current pacing rate, that corresponds to the amount of data transmitted in 1 ms at the current paced rate. This mechanism helps the sender congestion control mitigate the queueing delay occurring inside the node and accurately calculate RTTs. Concluding, the bit shift quantity changes the latency introduced by TSQ, while the TP ratio changes the FIGURE 3. Physical testbed. TSQ limit size. It is important to notice that BBR operates with a customized TCP pacing mechanism used in its finite TABLE 1. Bottlenecks configuration. state machine. BBR, indeed, is the sole congestion control to ignore the global TP module and does not react to the global tcp_pacing_ratio variable changes. BBR deploys a specific finite-state machine, reported in Figure 2, where the pacing rate is computed as a function of the bottleneck bandwidth. BBR estimates the parameters in the finite-state machine, in kernel-space, and cannot be modified by the user. III. TESTBED To validate the performance of TSQ and TP, we designed the testbed reported in Figure 3. Our testbed is composed of 3 nodes, resembling a classical internet connection with Wi-Fi access for the Client C and a wired backhaul for the Server S. In the middle between C and S there is the Router R or Access Point AP, which interconnects the endpoints and implements a wired testbed or a wireless one, respectively. A Gigabit Ethernet supports the connection between S and R/AP. In contrast, the connections between R/AP and C are FIGURE 2. Linux TCP BBR block. a Gigabit Ethernet and an IEEE 802.11n, both configured with PCIe devices. These embed the BCM5761 chipset in the wired testbed and the AR9580 chipset in the wireless B. QUEUEING LAYER AND DRIVER one. The bottleneck segment is software-defined to be local The default structure of FQ-CoDel, reported in Figure 1, (the link between C and R) or remote (the link between S works as follows: a separate software queue serves each TCP and R) in the wired testbed. The difference between local and flow, and each queue is managed by the CoDel algorithm to remote bottlenecks resembles different possible laboratories control the latency and is served in a round-robin fashion. and home connections, allowing us to focus on widely differ- The default CoDel threshold is set at 5 ms, which means ent possible real networks by only controlling the few testbed that packets with sojourn time greater than the threshold will nodes. To deploy a configurable wired bottleneck, we used be dropped at the dequeue stage. The queueing discipline the tc Linux package and implemented a hierarchical token in novel Linux systems is managed, by default, with the bucket (HTB) queueing discipline, coupled with the default tc tool, which allows configuring the characteristics of the FQ-CoDel on the Router interfaces. It is important to notice networking layer. The Queueing layer and the Network Inter- that tc allows for traffic shaping through HTB, i.e., impos- face Card (NIC) Driver blocks are strongly coupled in their ing a specific delivery rate for an interface, without com- behavior, and Figure 1 represents a simple scenario in which a promising the scheduling and AQM algorithms behaviors, single hardware queue is present. The driver also implements running in cascade to HTB. This is not true if the bottleneck the Byte Queue Limit (BQL) for all the hardware queues, is modeled also imposing specific delay or packet losses which is the last algorithm to control the global latency of the thorough the netem queueing discipline, which must be used system [12]. The BQL mechanism tries to store enough data in mutual exclusion to the other queueing disciplines such to avoid starvation and, simultaneously, to avoid accumulat- as FQ-CoDel. Anyway, this is not the case in our discussion ing excessive data increasing the latency. The BQL algorithm since we focus only on the bottleneck bandwidth characteris- is not tested in our paper, and the drivers’ default configura- tic. We defined 5 possible different bottleneck configurations, tions are maintained. A considerable change imposed by the summarized in Table 1. Instead, the bottleneck is the physical usage of a wireless Atheros NIC equipped with the ath9k wireless interface in the wireless testbed, reflecting a standard driver, as is the case of our tests, is that it implements the Wi-Fi home access network with a Gigabit Ethernet backhaul. FQ-CoDel mechanism directly in the firmware, disabling the All the nodes run an Arch Linux distribution with a 5.10-lts queueing discipline layer when the driver is used [13]. Thus, Linux kernel version.1 it impacts the maximum aggregation size of the NIC due to the 5 ms limit imposed by FQ-CoDel. 1 Tests and scripts are available at: netlab.unimore.it/sw/TSQ-NtwL.zip. VOLUME 9, 2021 129331 C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ FIGURE 4. Throughput and RTT on different wired bottlenecks: TSQ vs. NoTSQ. FIGURE 5. TCP RTT: Different wired bottlenecks with TCP BBR or CUBIC. IV. RESULTS from 1 to 8 ms of data at the current rate. We also allowed Our tests are performed using the Flent [14] tool, running the possibility to disable the TSQ mechanism, naming this four TCP uploads from C to S for 30 seconds with a con- strategy NoTSQ in our results. The TCP congestion control current ping flow. The core parameter changed between algorithms used in our tests are TCP Cubic and TCP BBR to each test is the TSQ size; indeed, we developed a Linux evaluate the TSQ performance in the presence of a loss-based kernel patch1 to allow us to change the standard TSQ dynamic and a delay-based variant, respectively. Moreover, we also size of 1 ms of data at the current rate, with a value evaluated two possible queueing disciplines, FQ-CoDel 129332 VOLUME 9, 2021 C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ FIGURE 6. Throughput and RTT of TCP BBR and CUBIC on a remote bottleneck. and PFIFO,2 which is a priority scheduler with three pos- Figure 4 shows the throughput and the latency of a simple sible software queues. PFIFO is still the default queueing TCP Cubic stream from C to S, with PFIFO used as queueing discipline in several Linux distribution and networking nodes, discipline in the two possible bottlenecks tested: L100M and like the Raspberry PI model 4, waiting for FQ-CoDel to R100M. We used PFIFO instead of FQ-CoDel to enhance replace it incrementally. We did not focus on the FQ queueing the sole impact of TSQ on the latency reduction. All our discipline, historically associated with BBR; this is because, results are in candlestick format; the top and the bottom of before the kernel version 4.17, BBR could not deploy a proper the boxes represent the 90th and the 10th percentiles of the TP mechanism by itself and was recommended to be used in data, respectively, while the solid line into the box represents conjunction with FQ. The use of FQ indeed helps in terms the median data value. One clear evidence of Figure 4 is of global TCP Pacing, adding an extra CPU usage for this the remarkable impact of TSQ when the bottleneck is local, task as shown in [10]. Anyway, in our analysis, we focus on because the dominant latency contribution is the queueing network performance, where throughput and latency need to delay caused by the sender node queues, which is limited by be controlled through the default packet schedulers or AQM the TSQ mechanism. It is important to note that this remark- techniques offered by current Linux kernels. As a matter of able latency drop is performed maintaining the throughput fact, the usage of FQ is not common anymore since, from the close to 96 Mbit/s, like in all the other configurations. On the 4.17 kernel, BBR effectively implements its pacing system other hand, when the bottleneck is remote, the presence of without needing FQ anymore. TSQ only marginally mitigates the end-to-end latency. Considering that TSQ does not impact a wired bottleneck A. WIRED RESULTS throughput, we now focus only on the TCP RTT perfor- We first introduce the results obtained with the wired version mance of all the five wired bottlenecks tested in Figure 5. of our testbed of Figure 3. All the wired tests have been per- Moreover, we also include the impact of a different TCP formed using the standard TSQ configuration or the NoTSQ congestion control and a different queueing discipline, BBR, one with the mechanism disabled. and FQ-CoDel. Figure 5a is an extension of Figure 4 that embraces also the L10M, R10M and LR1000M bottleneck 2 The queueing discipline name in the tc package is pfifo_fast. configurations, focusing on the TCP RTT instead of the VOLUME 9, 2021 129333 C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ FIGURE 7. Throughput and RTT of TCP BBR and CUBIC on a local bottleneck. ICMP ping RTT. The introduction of TSQ reduces by two and BBR on a remote bottleneck operating at 100 Mbit/s orders of magnitude the TCP RTT in L10M and L100M, with a PFIFO queueing discipline; even though throughput where the bottleneck is imposed on the local network through performance is similar, the RTT is remarkably different, and the HTB filter. In the LR1000M, instead, where the interfaces the shape of the curves helps to understand the big difference are not software-limited, and the HTB filter is disabled every- between the two congestion controls. TCP CUBIC operates where, the TCP RTT reduction introduced by TSQ is of one by waiting for the loss feedback from the network, filling the order of magnitude, which is again a remarkable result. bottleneck queue up to the packet drops. Indeed, the shape Moving from Figure 5a to Figure 5b, only the TCP con- of the RTT curve presents several peaks corresponding to the gestion control is changed from CUBIC to BBR. In this case, maximum queueing delay when the bottleneck queue is full the impact of TSQ is still observable but with a smaller impact and some minimum peaks corresponding to the new starting of 10 ms and 3 ms in the L10M and L100M configurations, congestion window of CUBIC in response to the loss. The respectively. Even in this case, removing the bottlenecks with behavior of BBR, instead, is different due to the model-based the LR1000M configuration leads to a smaller TSQ impact nature of the congestion control. It is possible to notice the of a couple of ms. Finally, Figure 5c reports the results of draining spikes happening every 10 s, which correspond to BBR in conjunction with FQ-CoDel, which is very similar the probe_RTT phase of BBR reported in Figure 2. Since to the effects of CUBIC with FQ-CoDel, not reported here. the bottleneck of Figure 6 is remote, the presence or not of The differences with Figure 5b are the maximum latency TSQ moving from Figure 6a to Figure 6b does not impact in the R10M configuration and in the L10M with NoTSQ, the results. The situation is different in Figure 7, where in which FQ-CoDel imposes a maximum queueing delay at the bottleneck is local, and packets get accumulated in the the bottleneck that results in a global TCP RTT of 10 ms. sender’s NIC. If the TSQ algorithm is not active (Figure 7a), To conclude the discussion on wired bottlenecks, we also the behavior of the system is logically equivalent to the case present Figures 6 and 7, which include the same tests of of a remote bottleneck since there is no way to control the Figure 4, adding also BBR as a congestion control algorithm. number of packets at the NIC, and both TCP CUBIC and TCP Figure 6 reports the throughput and the RTT of CUBIC BBR operate like in Figure 6a. If the TSQ is active, instead, 129334 VOLUME 9, 2021 C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ FIGURE 8. Throughput and RTT on the ath9k wireless bottleneck. TCP CUBIC completely changes its behavior because TSQ concern, instead, the different RTT flows, we changed the forbids TCP CUBIC to accumulate packets in the bottleneck base RTT through the netem package at the AP with three queue (which is the local NIC’s queue). The result is that TCP possible flows configurations, at 1, 10, and 100 ms of CUBIC, and any loss-based TCP variant generally, would base RTT. operate ruled by the TSQ interrupts, maintaining a minimum Figure 8a shows the results of four TCP CUBIC uploads NIC’s queue usage like the BBR congestion control. with our eight different TSQ configurations and the three possible pacing ratios, while Figure 8b shows the results of B. Wi-Fi RESULTS the same test with a standard pacing ratio and three possi- The TSQ mechanism has been shown to break the frame ble base-RTT. In this set of experiments, we selected only aggregation logic of Wi-Fi technologies in [8], so we investi- TCP CUBIC since TCP BBR ignores the global TP variable gate the impact of different TSQ sizes in conjunction with changes. These two plots are presented together to highlight different TP ratios and different base-RTT flows. We con- the relation between TP and RTT, which both affect the figured 3 different TP ratios by changing the congestion number of packets that the TCP socket can enqueue and, avoidance pace variable tcp_pacing_ca_ratio, from consequently, the throughput. Theoretically, from the combi- the default value of 1.2 to 1.3 and 1.4, naming these nation of Algorithms 1 and 2, halving the RTT has the same configurations 2p, 3p, and 4p, respectively. For what effect of doubling the TP rate. VOLUME 9, 2021 129335 C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ The values reported in green in Figure 8 are almost the [5] B. Briscoe, A. Brunstrom, A. Petlund, D. Hayes, D. Ros, I.-J. Tsang, same, due to a slightly different base-RTT of the first exper- S. Gjessing, G. Fairhurst, C. Griwodz, and M. Welzl, ‘‘Reducing internet latency: A survey of techniques and their merits,’’ IEEE Commun. Surveys iment, under 1 ms. In Figure 8a, it is possible to see that Tuts., vol. 18, no. 3, pp. 2149–2196, 3rd Quart., 2016. increasing the pacing rate (blue and red) has the effect of [6] Y. Zhao, A. Saeed, E. Zegura, and M. Ammar, ‘‘zD: A scalable zero-drop increasing the throughput on the Wi-Fi path and increasing network stack at end hosts,’’ in Proc. CoNEXT, 2019, pp. 220–232. [7] Y. Guo, F. Qian, Q. Chen, Z. Morley Mao, and S. Sen, ‘‘Understanding on- the latency as well. This result is justified by Algorithm 2; device bufferbloat for cellular upload,’’ in ACM SIGCOMM, vols. 14–16, indeed, the higher is the TP rate, the larger is the number of 2016, pp. 303–317. packets enqueued in the NIC, which allows for larger aggre- [8] C. A. Grazia, N. Patriciello, T. Hoiland-Jorgensen, M. Klapez, M. Casoni, and J. Mangues-Bafalluy, ‘‘Adapting TCP small queues for IEEE 802.11 gates increasing the throughput, despite a latency increment. networks,’’ in Proc. IEEE 29th Annu. Int. Symp. Pers., Indoor Mobile Radio Once the initial TSQ value is greater than 4TSQ, increasing Commun. (PIMRC), Sep. 2018, pp. 1–6. the TP ratio has the sole effect of increasing the latency [9] C. A. Grazia, ‘‘IEEE 802.11n/AC wireless network efficiency under differ- ent TCP congestion controls,’’ in Proc. Int. Conf. Wireless Mobile Comput., since the Wi-Fi bottleneck has already been saturated with Netw. Commun. (WiMob), Oct. 2019, pp. 1–6. the maximum available frame aggregation. [10] Y. Zhao, A. Saeed, M. Ammar, and E. Zegura, ‘‘Scouting the path to a In Figure 8b, on the other hand, it is possible to see that a million-client server,’’ in Passive and Active Measurement. Cham, Switzer- land: Springer, 2021, pp. 337–354. higher RTT has the same effect as a pacing reduction. This [11] C. A. Grazia, ‘‘Future of TCP on Wi-Fi 6,’’ IEEE Access, vol. 9, effect can be observed moving from 1 ms of base RTT to pp. 107929–107940, 2021. 10 ms of base RTT, where the former configuration registers [12] N. Mareev, D. Kachan, K. Karpov, D. Syzov, and E. Siemens, ‘‘Efficiency of BQL congestion control under high bandwidth-delay product network higher throughput with respect to the latter. With a base conditions,’’ in Proc. Int. Conf. Appl. Innov. (IT), 2019, vol. 7, no. 1, RTT of 100 ms instead, the delivery rate is too low, and pp. 19–22. the final throughput result is not optimal, even relaxing the [13] T. Høiland-Jørgensen, M. Kazior, D. Taht, P. Hurtig, and A. Brunstrom, ‘‘Ending the anomaly: Achieving low latency and airtime fairness in TSQ constraints. This effect is justified because the high RTT WiFi,’’ in Proc. USENIX ATC, 2017, pp. 139–151. forces a low rate, considering Algorithm 1. Consequently, [14] T. Hoeiland-Joergensen, C. A. Grazia, P. Hurtig, and A. Brunstrom, ‘‘Flent: even relaxing the TSQ value in Algorithm 2 does not allow the The flexible network tester,’’ in Proc. 11th EAI Int. Conf. Perform. Eval. Methodol. Tools (ValueTools), 2017, pp. 120–125. rate to grow enough to form larger aggregates and discover higher throughputs available on the Wi-Fi channel. CARLO AUGUSTO GRAZIA (Member, IEEE) V. CONCLUSION received the Ph.D. degree from the Department In this paper, we evaluated that TSQ alone significantly of Engineering Enzo Ferrari (DIEF), University of Modena and Reggio Emilia (UNIMORE), impacts latency when the bottleneck is local. The reason is in 2016. He is currently an Assistant Professor due to the nature of TSQ, which limits the amount of data holding the course automotive connectivity with that each socket can enqueue, or, in other words, limits the UNIMORE. He has been involved in the EU bottleneck queue size if the bottleneck is the sender’s NIC. FP7 Projects E-SPONDER and PPDR-TC. His research interests include computer networking, The latency reduction effect is mitigated when algorithms like with an emphasis on wireless networks, queueing BBR and FQ-CoDel are deployed, but it is still observable. algorithms, and V2X. These results pose the TSQ mechanism in a critical position when dealing with network performance. Network simulators will have to include this mechanism to maintain high fidelity MARTIN KLAPEZ received the Ph.D. degree from results compared with real systems. Nevertheless, the TSQ DIEF, UNIMORE, in 2017. He is currently a Post- has a different impact on a local Wi-Fi bottleneck with respect doctoral Research Fellow with UNIMORE. He has collaborated with the Italian Nanoscience National to a local wired bottleneck; the latency reduction is coupled Research Center S3 and he has been involved with a non-optimal throughput if the limit imposed by TSQ is in the EU FP7 Project PPDR-TC. His research too strict. Simultaneously, TP and the base RTT roles impact interests include verge around network softwariza- the Wi-Fi performance because they modify the TSQ limit, tion, public safety networks, and safety-related V2X systems. and the higher the TP, or the smaller the base RTT, the higher the throughput will be. REFERENCES MAURIZIO CASONI (Senior Member, IEEE) [1] T. Hoeiland-Joergensen, P. McKenney, D. Taht, J. Gettys, and received the M.S. (Hons.) and Ph.D. degrees E. Dumazet. (Jan. 2018). FlowQueue-CoDel. [Online]. Available: in electrical engineering from the University of https://tools.ietf.org/html/rfc8290 Bologna, Italy, in 1991 and 1995, respectively. [2] N. Cardwell, Y. Cheng, C. S. Gunn, S. H. Yeganeh, and V. Jacobson, In 1995, he was with the Computer Science ‘‘BBR: Congestion-based congestion control,’’ Commun. ACM, vol. 60, no. 2, pp. 58–66, 2017. Department, Washington University in St. Louis, [3] A. Saeed, N. Dukkipati, V. Valancius, V. Lam, C. Contavalli, and A. Vahdat, MO, USA, as a Research Fellow. He is cur- ‘‘Carousel: Scalable traffic shaping at end hosts,’’ in ACM SIGCOMM, rently an Associate Professor of telecommunica- 2017, pp. 404–417. tions with DIEF, UNIMORE, Italy. He has been [4] B. Stephens, A. Singhvi, A. Akella, and M. Swift, ‘‘Titan: Fair packet responsible at UNIMORE for the EU FP7 Projects scheduling for commodity multiqueue NICs,’’ in Proc. USENIX Annu. E-SPONDER and PPDR-TC. Tech. Conf., 2017, pp. 431–444. 129336 VOLUME 9, 2021