The New TCP Modules on the Block: A Performance Evaluation of TCP Pacing and TCP Small Queues

Authors Carlo Augusto Grazia Martin Klapez Maurizio Casoni

Plaintext

Received September 7, 2021, accepted September 16, 2021, date of publication September 20, 2021,
date of current version September 27, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3113891

The New TCP Modules on the Block:
A Performance Evaluation of TCP
Pacing and TCP Small Queues
CARLO AUGUSTO GRAZIA , (Member, IEEE), MARTIN KLAPEZ ,
AND MAURIZIO CASONI , (Senior Member, IEEE)
Department of Engineering Enzo Ferrari, University of Modena and Reggio Emilia, 41125 Modena, Italy
Corresponding author: Carlo Augusto Grazia (carloaugusto.grazia@unimore.it)

ABSTRACT Google and the Bufferbloat community have designed several solutions to reduce Internet
latency in recent years, involving different TCP-IP stack layers. One of these solutions is named TCP Small
Queues (TSQ) and reduces a TCP flow latency by controlling the number of packets that each TCP socket
can enqueue in the sender node. It works in conjunction with TCP Pacing (TP), which affects the actual TSQ
size as a function of the TCP rate. This paper analyzes TSQ and TP’s performance through real-system tests
over different networks’ bottlenecks, emphasizing Wi-Fi technologies, where their behavior strongly affects
the Wi-Fi frame aggregation mechanism.

INDEX TERMS Bufferbloat, latency, TCP small queues.

I. INTRODUCTION reduce latency without reducing the throughput. For what
Internet latency issues have been exposed under a magnifying concerns wireless environments, instead, other few scientific
glass in recent years, involving big players like Google and contributions [7]–[9], dealing with LTE and Wi-Fi technolo-
the Bufferbloat community to design novel algorithms for gies, report some insight on latency reduction by tuning the
several TCP-IP layers. To name one, the Bufferbloat com- TSQ size. Unfortunately, these previous works deal with old
munity has developed the hybrid packet scheduler and queue Linux kernels, in which TSQ was operating statically by
manager Flow Queue Controlled Delay (FQ-CoDel) [1]. imposing an amount of data to be enqueued, instead of con-
FQ-CoDel has quickly become the standard queueing disci- trolling the amount of data to be enqueued dynamically based
pline for many Linux-based end nodes and routers. At the on the current data delivery rate. Moreover, none of these
same time, even Google has designed a couple of remarkable cited works accommodate a broader analysis of TSQ and
algorithms like TCP Bottleneck Bandwidth and Round-trip TP’s interaction in real and up-to-date environments. Indeed,
propagation time (BBR) [2] and TCP Small Queues (TSQ). to properly investigate the TSQ impact on latency reduction,
The former is a transport layer solution, and the latter is a a complete system involving TCP BBR and FQ-CoDel must
cross-layering solution, considering the TCP-IP stack. Unlike also be considered concerning the bottleneck position in the
FQ-CoDel, which can be deployed on any node of the path, network path. The sole literature contributions combining
BBR and TSQ are designed exclusively for the end nodes. TSQ and TP, to the best of our knowledge, are [10] and [11]:
To the best of our knowledge, there is a lack of scientific in the former, the authors analyzed the CPU impact of these
contributions related to TSQ and TP. In particular, we report solutions, discussing software bottlenecks more than network
a lack in the performance analysis of TSQ in conjunction bottlenecks, while in the latter, the authors investigated the
with these new TCP-IP solutions involving all the stack lay- behavior of different TCP congestion controls regarding an
ers. The TSQ module alone is reported in a few scientific hybrid bottleneck network which involves only Wi-Fi 6 envi-
contributions [3]–[6] involving wired networks, but without ronments.
investigating the main purpose of the mechanism, i.e., to The contribution of this paper is (i) the description of
the TSQ and TP, contextualized in the latest Long-Term
The associate editor coordinating the review of this manuscript and Support (LTS) version of the Linux kernel 5.10-lts with the
approving it for publication was Barbara Masini . other involved TCP-IP solutions on both wired and wireless

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 129329
C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ

FIGURE 1. TCP-IP linux stack.

scenarios, and (ii) a real-test analysis of the TSQ and TP A. TP AND TSQ
performance under different network technologies. The paper The most significant change experienced by the TCP-IP
is structured as follows: Section II describes the latest TCP-IP stack in recent years has been the introduction of
Linux stack and Section III illustrates the testbed used to TP and TSQ. The cooperative work of these two
collect our data. Finally, Section IV shows the results and TCP submodules strongly impacts the way packets
Section V concludes the paper. are delivered by the TCP socket, affecting the TCP
RTT and the system latency. TP is controlled by two
II. TCP-IP LINUX STACK system variables, i.e., tcp_pacing_ss_ratio and
The current and up-to-date TCP-IP Linux stack is depicted tcp_pacing_ca_ratio, used in the slow start and
in Figure 1. We report the three main blocks involved in a TCP the congestion avoidance phases, respectively, as reported
flow transmission with the whole TCP transport block on the in Algorithm 1. The TCP socket’s final TCP-paced rate
left, the Queueing Layer corresponding to the TCP-IP net- to deliver data is then adjusted with a pacing ratio that
working layer in the middle and the host-to-network Driver changes according to the TCP transmission phase. By default,
block on the right. The TCP congestion control algorithm role tcp_pacing_ss_ratio is equal to 2 in the slow-start
did not change recently, so the TCP socket is still calculating phase, and tcp_pacing_ca_ratio is equal to 1.2 in the
the congestion window (CWND) and dealing with the ACK congestion avoidance phase. This means that the TCP flow
reception according to the algorithm used. The most signifi- doubles the slow-start phase rate and increases it by 20%
cant change in recent years has been in the way packets are in the congestion-avoidance phase. This mechanism allows
delivered by the TCP socket, now regulated by TSQ and TP, probing for more bandwidth without forming excessive bursts
also shown in Figure 1. Once the TCP socket delivers the of packets in the path’s network queues.
packets, they are enqueued in the lower layers. The Queueing
Layer, depicted in the middle of Figure 1, deploys a standard Algorithm 2 TCP Small Queue
FQ-CoDel algorithm, which is the default solution in recent Input: TCP_SOCKET sk;
kernels with many Linux distributions [1]. Once the scheduler 1: int limit;
delivers the packets, they move to the last block, where the 2: limit = max(2 * sk→pktsize, sk→tcp_pacing_rate
driver firmware implements the last hardware queue before 10);
moving to the physical medium channel. Once a packet is 3: limit = min(limit, tcp_limit_output_bytes);
physically transmitted, a completion signal is cross-passed to
the TSQ algorithm.
On the other hand, the TCP paced rate is used to calculate,
in conjunction with the TSQ mechanism, the number of
packets that a TCP socket can enqueue in the sender stack.
Algorithm 1 TCP Pacing Rate
This quantity is a dynamic value described in Algorithm 2.
Input: TCP_SOCKET sk, int baseRTT ; According to the algorithm, the TSQ limit is always higher
1: int rate = mss * sk→cwnd / baseRTT ;
than a minimum amount of 2 packets and lower than a maxi-
2: if sk→cwnd < sk→ssthresh / 2 then
mum amount of limit_output_bytes bytes (128 KB by
3: rate *= tcp_pacing_ss_ratio; // SlowStart phase default). The dynamic limit moves through these two bounds
4: else
and is the amount of data that corresponds to a latency equal
5: rate *= tcp_pacing_ca_ratio; // Cong.Avoid. phase to 1 ms by default. Algorithm 2 clarifies this behavior: the
6: end if
dynamic amount of data that can be enqueued is calculated

129330 VOLUME 9, 2021
C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ

through sk→tcp_pacing_rate 10, which is a 10-bit shift
of the current pacing rate, that corresponds to the amount
of data transmitted in 1 ms at the current paced rate. This
mechanism helps the sender congestion control mitigate the
queueing delay occurring inside the node and accurately
calculate RTTs. Concluding, the bit shift quantity changes the
latency introduced by TSQ, while the TP ratio changes the FIGURE 3. Physical testbed.
TSQ limit size. It is important to notice that BBR operates
with a customized TCP pacing mechanism used in its finite TABLE 1. Bottlenecks configuration.
state machine. BBR, indeed, is the sole congestion control to
ignore the global TP module and does not react to the global
tcp_pacing_ratio variable changes. BBR deploys a
specific finite-state machine, reported in Figure 2, where
the pacing rate is computed as a function of the bottleneck
bandwidth. BBR estimates the parameters in the finite-state
machine, in kernel-space, and cannot be modified by the user.
III. TESTBED
To validate the performance of TSQ and TP, we designed
the testbed reported in Figure 3. Our testbed is composed
of 3 nodes, resembling a classical internet connection with
Wi-Fi access for the Client C and a wired backhaul for the
Server S. In the middle between C and S there is the Router
R or Access Point AP, which interconnects the endpoints and
implements a wired testbed or a wireless one, respectively.
A Gigabit Ethernet supports the connection between S and
R/AP. In contrast, the connections between R/AP and C are
FIGURE 2. Linux TCP BBR block.
a Gigabit Ethernet and an IEEE 802.11n, both configured
with PCIe devices. These embed the BCM5761 chipset in
the wired testbed and the AR9580 chipset in the wireless
B. QUEUEING LAYER AND DRIVER one. The bottleneck segment is software-defined to be local
The default structure of FQ-CoDel, reported in Figure 1, (the link between C and R) or remote (the link between S
works as follows: a separate software queue serves each TCP and R) in the wired testbed. The difference between local and
flow, and each queue is managed by the CoDel algorithm to remote bottlenecks resembles different possible laboratories
control the latency and is served in a round-robin fashion. and home connections, allowing us to focus on widely differ-
The default CoDel threshold is set at 5 ms, which means ent possible real networks by only controlling the few testbed
that packets with sojourn time greater than the threshold will nodes. To deploy a configurable wired bottleneck, we used
be dropped at the dequeue stage. The queueing discipline the tc Linux package and implemented a hierarchical token
in novel Linux systems is managed, by default, with the bucket (HTB) queueing discipline, coupled with the default
tc tool, which allows configuring the characteristics of the FQ-CoDel on the Router interfaces. It is important to notice
networking layer. The Queueing layer and the Network Inter- that tc allows for traffic shaping through HTB, i.e., impos-
face Card (NIC) Driver blocks are strongly coupled in their ing a specific delivery rate for an interface, without com-
behavior, and Figure 1 represents a simple scenario in which a promising the scheduling and AQM algorithms behaviors,
single hardware queue is present. The driver also implements running in cascade to HTB. This is not true if the bottleneck
the Byte Queue Limit (BQL) for all the hardware queues, is modeled also imposing specific delay or packet losses
which is the last algorithm to control the global latency of the thorough the netem queueing discipline, which must be used
system [12]. The BQL mechanism tries to store enough data in mutual exclusion to the other queueing disciplines such
to avoid starvation and, simultaneously, to avoid accumulat- as FQ-CoDel. Anyway, this is not the case in our discussion
ing excessive data increasing the latency. The BQL algorithm since we focus only on the bottleneck bandwidth characteris-
is not tested in our paper, and the drivers’ default configura- tic. We defined 5 possible different bottleneck configurations,
tions are maintained. A considerable change imposed by the summarized in Table 1. Instead, the bottleneck is the physical
usage of a wireless Atheros NIC equipped with the ath9k wireless interface in the wireless testbed, reflecting a standard
driver, as is the case of our tests, is that it implements the Wi-Fi home access network with a Gigabit Ethernet backhaul.
FQ-CoDel mechanism directly in the firmware, disabling the All the nodes run an Arch Linux distribution with a 5.10-lts
queueing discipline layer when the driver is used [13]. Thus, Linux kernel version.1
it impacts the maximum aggregation size of the NIC due to
the 5 ms limit imposed by FQ-CoDel. 1 Tests and scripts are available at: netlab.unimore.it/sw/TSQ-NtwL.zip.

VOLUME 9, 2021 129331
C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ

FIGURE 4. Throughput and RTT on different wired bottlenecks: TSQ vs. NoTSQ.

FIGURE 5. TCP RTT: Different wired bottlenecks with TCP BBR or CUBIC.

IV. RESULTS from 1 to 8 ms of data at the current rate. We also allowed
Our tests are performed using the Flent [14] tool, running the possibility to disable the TSQ mechanism, naming this
four TCP uploads from C to S for 30 seconds with a con- strategy NoTSQ in our results. The TCP congestion control
current ping flow. The core parameter changed between algorithms used in our tests are TCP Cubic and TCP BBR to
each test is the TSQ size; indeed, we developed a Linux evaluate the TSQ performance in the presence of a loss-based
kernel patch1 to allow us to change the standard TSQ dynamic and a delay-based variant, respectively. Moreover, we also
size of 1 ms of data at the current rate, with a value evaluated two possible queueing disciplines, FQ-CoDel

129332 VOLUME 9, 2021
C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ

FIGURE 6. Throughput and RTT of TCP BBR and CUBIC on a remote bottleneck.

and PFIFO,2 which is a priority scheduler with three pos- Figure 4 shows the throughput and the latency of a simple
sible software queues. PFIFO is still the default queueing TCP Cubic stream from C to S, with PFIFO used as queueing
discipline in several Linux distribution and networking nodes, discipline in the two possible bottlenecks tested: L100M and
like the Raspberry PI model 4, waiting for FQ-CoDel to R100M. We used PFIFO instead of FQ-CoDel to enhance
replace it incrementally. We did not focus on the FQ queueing the sole impact of TSQ on the latency reduction. All our
discipline, historically associated with BBR; this is because, results are in candlestick format; the top and the bottom of
before the kernel version 4.17, BBR could not deploy a proper the boxes represent the 90th and the 10th percentiles of the
TP mechanism by itself and was recommended to be used in data, respectively, while the solid line into the box represents
conjunction with FQ. The use of FQ indeed helps in terms the median data value. One clear evidence of Figure 4 is
of global TCP Pacing, adding an extra CPU usage for this the remarkable impact of TSQ when the bottleneck is local,
task as shown in [10]. Anyway, in our analysis, we focus on because the dominant latency contribution is the queueing
network performance, where throughput and latency need to delay caused by the sender node queues, which is limited by
be controlled through the default packet schedulers or AQM the TSQ mechanism. It is important to note that this remark-
techniques offered by current Linux kernels. As a matter of able latency drop is performed maintaining the throughput
fact, the usage of FQ is not common anymore since, from the close to 96 Mbit/s, like in all the other configurations. On the
4.17 kernel, BBR effectively implements its pacing system other hand, when the bottleneck is remote, the presence of
without needing FQ anymore. TSQ only marginally mitigates the end-to-end latency.
Considering that TSQ does not impact a wired bottleneck
A. WIRED RESULTS throughput, we now focus only on the TCP RTT perfor-
We first introduce the results obtained with the wired version mance of all the five wired bottlenecks tested in Figure 5.
of our testbed of Figure 3. All the wired tests have been per- Moreover, we also include the impact of a different TCP
formed using the standard TSQ configuration or the NoTSQ congestion control and a different queueing discipline, BBR,
one with the mechanism disabled. and FQ-CoDel. Figure 5a is an extension of Figure 4 that
embraces also the L10M, R10M and LR1000M bottleneck
2 The queueing discipline name in the tc package is pfifo_fast.
configurations, focusing on the TCP RTT instead of the

VOLUME 9, 2021 129333
C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ

FIGURE 7. Throughput and RTT of TCP BBR and CUBIC on a local bottleneck.

ICMP ping RTT. The introduction of TSQ reduces by two and BBR on a remote bottleneck operating at 100 Mbit/s
orders of magnitude the TCP RTT in L10M and L100M, with a PFIFO queueing discipline; even though throughput
where the bottleneck is imposed on the local network through performance is similar, the RTT is remarkably different, and
the HTB filter. In the LR1000M, instead, where the interfaces the shape of the curves helps to understand the big difference
are not software-limited, and the HTB filter is disabled every- between the two congestion controls. TCP CUBIC operates
where, the TCP RTT reduction introduced by TSQ is of one by waiting for the loss feedback from the network, filling the
order of magnitude, which is again a remarkable result. bottleneck queue up to the packet drops. Indeed, the shape
Moving from Figure 5a to Figure 5b, only the TCP con- of the RTT curve presents several peaks corresponding to the
gestion control is changed from CUBIC to BBR. In this case, maximum queueing delay when the bottleneck queue is full
the impact of TSQ is still observable but with a smaller impact and some minimum peaks corresponding to the new starting
of 10 ms and 3 ms in the L10M and L100M configurations, congestion window of CUBIC in response to the loss. The
respectively. Even in this case, removing the bottlenecks with behavior of BBR, instead, is different due to the model-based
the LR1000M configuration leads to a smaller TSQ impact nature of the congestion control. It is possible to notice the
of a couple of ms. Finally, Figure 5c reports the results of draining spikes happening every 10 s, which correspond to
BBR in conjunction with FQ-CoDel, which is very similar the probe_RTT phase of BBR reported in Figure 2. Since
to the effects of CUBIC with FQ-CoDel, not reported here. the bottleneck of Figure 6 is remote, the presence or not of
The differences with Figure 5b are the maximum latency TSQ moving from Figure 6a to Figure 6b does not impact
in the R10M configuration and in the L10M with NoTSQ, the results. The situation is different in Figure 7, where
in which FQ-CoDel imposes a maximum queueing delay at the bottleneck is local, and packets get accumulated in the
the bottleneck that results in a global TCP RTT of 10 ms. sender’s NIC. If the TSQ algorithm is not active (Figure 7a),
To conclude the discussion on wired bottlenecks, we also the behavior of the system is logically equivalent to the case
present Figures 6 and 7, which include the same tests of of a remote bottleneck since there is no way to control the
Figure 4, adding also BBR as a congestion control algorithm. number of packets at the NIC, and both TCP CUBIC and TCP
Figure 6 reports the throughput and the RTT of CUBIC BBR operate like in Figure 6a. If the TSQ is active, instead,

129334 VOLUME 9, 2021
C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ

FIGURE 8. Throughput and RTT on the ath9k wireless bottleneck.

TCP CUBIC completely changes its behavior because TSQ concern, instead, the different RTT flows, we changed the
forbids TCP CUBIC to accumulate packets in the bottleneck base RTT through the netem package at the AP with three
queue (which is the local NIC’s queue). The result is that TCP possible flows configurations, at 1, 10, and 100 ms of
CUBIC, and any loss-based TCP variant generally, would base RTT.
operate ruled by the TSQ interrupts, maintaining a minimum Figure 8a shows the results of four TCP CUBIC uploads
NIC’s queue usage like the BBR congestion control. with our eight different TSQ configurations and the three
possible pacing ratios, while Figure 8b shows the results of
B. Wi-Fi RESULTS the same test with a standard pacing ratio and three possi-
The TSQ mechanism has been shown to break the frame ble base-RTT. In this set of experiments, we selected only
aggregation logic of Wi-Fi technologies in [8], so we investi- TCP CUBIC since TCP BBR ignores the global TP variable
gate the impact of different TSQ sizes in conjunction with changes. These two plots are presented together to highlight
different TP ratios and different base-RTT flows. We con- the relation between TP and RTT, which both affect the
figured 3 different TP ratios by changing the congestion number of packets that the TCP socket can enqueue and,
avoidance pace variable tcp_pacing_ca_ratio, from consequently, the throughput. Theoretically, from the combi-
the default value of 1.2 to 1.3 and 1.4, naming these nation of Algorithms 1 and 2, halving the RTT has the same
configurations 2p, 3p, and 4p, respectively. For what effect of doubling the TP rate.

VOLUME 9, 2021 129335
C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ

The values reported in green in Figure 8 are almost the [5] B. Briscoe, A. Brunstrom, A. Petlund, D. Hayes, D. Ros, I.-J. Tsang,
same, due to a slightly different base-RTT of the first exper- S. Gjessing, G. Fairhurst, C. Griwodz, and M. Welzl, ‘‘Reducing internet
latency: A survey of techniques and their merits,’’ IEEE Commun. Surveys
iment, under 1 ms. In Figure 8a, it is possible to see that Tuts., vol. 18, no. 3, pp. 2149–2196, 3rd Quart., 2016.
increasing the pacing rate (blue and red) has the effect of [6] Y. Zhao, A. Saeed, E. Zegura, and M. Ammar, ‘‘zD: A scalable zero-drop
increasing the throughput on the Wi-Fi path and increasing network stack at end hosts,’’ in Proc. CoNEXT, 2019, pp. 220–232.
[7] Y. Guo, F. Qian, Q. Chen, Z. Morley Mao, and S. Sen, ‘‘Understanding on-
the latency as well. This result is justified by Algorithm 2; device bufferbloat for cellular upload,’’ in ACM SIGCOMM, vols. 14–16,
indeed, the higher is the TP rate, the larger is the number of 2016, pp. 303–317.
packets enqueued in the NIC, which allows for larger aggre- [8] C. A. Grazia, N. Patriciello, T. Hoiland-Jorgensen, M. Klapez, M. Casoni,
and J. Mangues-Bafalluy, ‘‘Adapting TCP small queues for IEEE 802.11
gates increasing the throughput, despite a latency increment. networks,’’ in Proc. IEEE 29th Annu. Int. Symp. Pers., Indoor Mobile Radio
Once the initial TSQ value is greater than 4TSQ, increasing Commun. (PIMRC), Sep. 2018, pp. 1–6.
the TP ratio has the sole effect of increasing the latency [9] C. A. Grazia, ‘‘IEEE 802.11n/AC wireless network efficiency under differ-
ent TCP congestion controls,’’ in Proc. Int. Conf. Wireless Mobile Comput.,
since the Wi-Fi bottleneck has already been saturated with Netw. Commun. (WiMob), Oct. 2019, pp. 1–6.
the maximum available frame aggregation. [10] Y. Zhao, A. Saeed, M. Ammar, and E. Zegura, ‘‘Scouting the path to a
In Figure 8b, on the other hand, it is possible to see that a million-client server,’’ in Passive and Active Measurement. Cham, Switzer-
land: Springer, 2021, pp. 337–354.
higher RTT has the same effect as a pacing reduction. This [11] C. A. Grazia, ‘‘Future of TCP on Wi-Fi 6,’’ IEEE Access, vol. 9,
effect can be observed moving from 1 ms of base RTT to pp. 107929–107940, 2021.
10 ms of base RTT, where the former configuration registers [12] N. Mareev, D. Kachan, K. Karpov, D. Syzov, and E. Siemens, ‘‘Efficiency
of BQL congestion control under high bandwidth-delay product network
higher throughput with respect to the latter. With a base conditions,’’ in Proc. Int. Conf. Appl. Innov. (IT), 2019, vol. 7, no. 1,
RTT of 100 ms instead, the delivery rate is too low, and pp. 19–22.
the final throughput result is not optimal, even relaxing the [13] T. Høiland-Jørgensen, M. Kazior, D. Taht, P. Hurtig, and A. Brunstrom,
‘‘Ending the anomaly: Achieving low latency and airtime fairness in
TSQ constraints. This effect is justified because the high RTT WiFi,’’ in Proc. USENIX ATC, 2017, pp. 139–151.
forces a low rate, considering Algorithm 1. Consequently, [14] T. Hoeiland-Joergensen, C. A. Grazia, P. Hurtig, and A. Brunstrom, ‘‘Flent:
even relaxing the TSQ value in Algorithm 2 does not allow the The flexible network tester,’’ in Proc. 11th EAI Int. Conf. Perform. Eval.
Methodol. Tools (ValueTools), 2017, pp. 120–125.
rate to grow enough to form larger aggregates and discover
higher throughputs available on the Wi-Fi channel.
CARLO AUGUSTO GRAZIA (Member, IEEE)
V. CONCLUSION received the Ph.D. degree from the Department
In this paper, we evaluated that TSQ alone significantly of Engineering Enzo Ferrari (DIEF), University
of Modena and Reggio Emilia (UNIMORE),
impacts latency when the bottleneck is local. The reason is in 2016. He is currently an Assistant Professor
due to the nature of TSQ, which limits the amount of data holding the course automotive connectivity with
that each socket can enqueue, or, in other words, limits the UNIMORE. He has been involved in the EU
bottleneck queue size if the bottleneck is the sender’s NIC. FP7 Projects E-SPONDER and PPDR-TC. His
research interests include computer networking,
The latency reduction effect is mitigated when algorithms like with an emphasis on wireless networks, queueing
BBR and FQ-CoDel are deployed, but it is still observable. algorithms, and V2X.
These results pose the TSQ mechanism in a critical position
when dealing with network performance. Network simulators
will have to include this mechanism to maintain high fidelity MARTIN KLAPEZ received the Ph.D. degree from
results compared with real systems. Nevertheless, the TSQ DIEF, UNIMORE, in 2017. He is currently a Post-
has a different impact on a local Wi-Fi bottleneck with respect doctoral Research Fellow with UNIMORE. He has
collaborated with the Italian Nanoscience National
to a local wired bottleneck; the latency reduction is coupled Research Center S3 and he has been involved
with a non-optimal throughput if the limit imposed by TSQ is in the EU FP7 Project PPDR-TC. His research
too strict. Simultaneously, TP and the base RTT roles impact interests include verge around network softwariza-
the Wi-Fi performance because they modify the TSQ limit, tion, public safety networks, and safety-related
V2X systems.
and the higher the TP, or the smaller the base RTT, the higher
the throughput will be.

REFERENCES MAURIZIO CASONI (Senior Member, IEEE)
[1] T. Hoeiland-Joergensen, P. McKenney, D. Taht, J. Gettys, and received the M.S. (Hons.) and Ph.D. degrees
E. Dumazet. (Jan. 2018). FlowQueue-CoDel. [Online]. Available: in electrical engineering from the University of
https://tools.ietf.org/html/rfc8290
Bologna, Italy, in 1991 and 1995, respectively.
[2] N. Cardwell, Y. Cheng, C. S. Gunn, S. H. Yeganeh, and V. Jacobson,
In 1995, he was with the Computer Science
‘‘BBR: Congestion-based congestion control,’’ Commun. ACM, vol. 60,
no. 2, pp. 58–66, 2017. Department, Washington University in St. Louis,
[3] A. Saeed, N. Dukkipati, V. Valancius, V. Lam, C. Contavalli, and A. Vahdat, MO, USA, as a Research Fellow. He is cur-
‘‘Carousel: Scalable traffic shaping at end hosts,’’ in ACM SIGCOMM, rently an Associate Professor of telecommunica-
2017, pp. 404–417. tions with DIEF, UNIMORE, Italy. He has been
[4] B. Stephens, A. Singhvi, A. Akella, and M. Swift, ‘‘Titan: Fair packet responsible at UNIMORE for the EU FP7 Projects
scheduling for commodity multiqueue NICs,’’ in Proc. USENIX Annu. E-SPONDER and PPDR-TC.
Tech. Conf., 2017, pp. 431–444.

129336 VOLUME 9, 2021