DOKK Library

The New TCP Modules on the Block: A Performance Evaluation of TCP Pacing and TCP Small Queues

Authors Carlo Augusto Grazia Martin Klapez Maurizio Casoni

License CC-BY-4.0

Plaintext
Received September 7, 2021, accepted September 16, 2021, date of publication September 20, 2021,
date of current version September 27, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3113891




The New TCP Modules on the Block:
A Performance Evaluation of TCP
Pacing and TCP Small Queues
CARLO AUGUSTO GRAZIA , (Member, IEEE), MARTIN KLAPEZ ,
AND MAURIZIO CASONI , (Senior Member, IEEE)
Department of Engineering Enzo Ferrari, University of Modena and Reggio Emilia, 41125 Modena, Italy
Corresponding author: Carlo Augusto Grazia (carloaugusto.grazia@unimore.it)




  ABSTRACT Google and the Bufferbloat community have designed several solutions to reduce Internet
  latency in recent years, involving different TCP-IP stack layers. One of these solutions is named TCP Small
  Queues (TSQ) and reduces a TCP flow latency by controlling the number of packets that each TCP socket
  can enqueue in the sender node. It works in conjunction with TCP Pacing (TP), which affects the actual TSQ
  size as a function of the TCP rate. This paper analyzes TSQ and TP’s performance through real-system tests
  over different networks’ bottlenecks, emphasizing Wi-Fi technologies, where their behavior strongly affects
  the Wi-Fi frame aggregation mechanism.


  INDEX TERMS Bufferbloat, latency, TCP small queues.

I. INTRODUCTION                                                                                reduce latency without reducing the throughput. For what
Internet latency issues have been exposed under a magnifying                                   concerns wireless environments, instead, other few scientific
glass in recent years, involving big players like Google and                                   contributions [7]–[9], dealing with LTE and Wi-Fi technolo-
the Bufferbloat community to design novel algorithms for                                       gies, report some insight on latency reduction by tuning the
several TCP-IP layers. To name one, the Bufferbloat com-                                       TSQ size. Unfortunately, these previous works deal with old
munity has developed the hybrid packet scheduler and queue                                     Linux kernels, in which TSQ was operating statically by
manager Flow Queue Controlled Delay (FQ-CoDel) [1].                                            imposing an amount of data to be enqueued, instead of con-
FQ-CoDel has quickly become the standard queueing disci-                                       trolling the amount of data to be enqueued dynamically based
pline for many Linux-based end nodes and routers. At the                                       on the current data delivery rate. Moreover, none of these
same time, even Google has designed a couple of remarkable                                     cited works accommodate a broader analysis of TSQ and
algorithms like TCP Bottleneck Bandwidth and Round-trip                                        TP’s interaction in real and up-to-date environments. Indeed,
propagation time (BBR) [2] and TCP Small Queues (TSQ).                                         to properly investigate the TSQ impact on latency reduction,
The former is a transport layer solution, and the latter is a                                  a complete system involving TCP BBR and FQ-CoDel must
cross-layering solution, considering the TCP-IP stack. Unlike                                  also be considered concerning the bottleneck position in the
FQ-CoDel, which can be deployed on any node of the path,                                       network path. The sole literature contributions combining
BBR and TSQ are designed exclusively for the end nodes.                                        TSQ and TP, to the best of our knowledge, are [10] and [11]:
   To the best of our knowledge, there is a lack of scientific                                 in the former, the authors analyzed the CPU impact of these
contributions related to TSQ and TP. In particular, we report                                  solutions, discussing software bottlenecks more than network
a lack in the performance analysis of TSQ in conjunction                                       bottlenecks, while in the latter, the authors investigated the
with these new TCP-IP solutions involving all the stack lay-                                   behavior of different TCP congestion controls regarding an
ers. The TSQ module alone is reported in a few scientific                                      hybrid bottleneck network which involves only Wi-Fi 6 envi-
contributions [3]–[6] involving wired networks, but without                                    ronments.
investigating the main purpose of the mechanism, i.e., to                                         The contribution of this paper is (i) the description of
                                                                                               the TSQ and TP, contextualized in the latest Long-Term
   The associate editor coordinating the review of this manuscript and                         Support (LTS) version of the Linux kernel 5.10-lts with the
approving it for publication was Barbara Masini            .                                   other involved TCP-IP solutions on both wired and wireless

                     This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021                                                                                                                                                        129329
                                                              C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ




 FIGURE 1. TCP-IP linux stack.




scenarios, and (ii) a real-test analysis of the TSQ and TP            A. TP AND TSQ
performance under different network technologies. The paper           The most significant change experienced by the TCP-IP
is structured as follows: Section II describes the latest TCP-IP      stack in recent years has been the introduction of
Linux stack and Section III illustrates the testbed used to           TP and TSQ. The cooperative work of these two
collect our data. Finally, Section IV shows the results and           TCP submodules strongly impacts the way packets
Section V concludes the paper.                                        are delivered by the TCP socket, affecting the TCP
                                                                      RTT and the system latency. TP is controlled by two
II. TCP-IP LINUX STACK                                                system variables, i.e., tcp_pacing_ss_ratio and
The current and up-to-date TCP-IP Linux stack is depicted             tcp_pacing_ca_ratio, used in the slow start and
in Figure 1. We report the three main blocks involved in a TCP        the congestion avoidance phases, respectively, as reported
flow transmission with the whole TCP transport block on the           in Algorithm 1. The TCP socket’s final TCP-paced rate
left, the Queueing Layer corresponding to the TCP-IP net-             to deliver data is then adjusted with a pacing ratio that
working layer in the middle and the host-to-network Driver            changes according to the TCP transmission phase. By default,
block on the right. The TCP congestion control algorithm role         tcp_pacing_ss_ratio is equal to 2 in the slow-start
did not change recently, so the TCP socket is still calculating       phase, and tcp_pacing_ca_ratio is equal to 1.2 in the
the congestion window (CWND) and dealing with the ACK                 congestion avoidance phase. This means that the TCP flow
reception according to the algorithm used. The most signifi-          doubles the slow-start phase rate and increases it by 20%
cant change in recent years has been in the way packets are           in the congestion-avoidance phase. This mechanism allows
delivered by the TCP socket, now regulated by TSQ and TP,             probing for more bandwidth without forming excessive bursts
also shown in Figure 1. Once the TCP socket delivers the              of packets in the path’s network queues.
packets, they are enqueued in the lower layers. The Queueing
Layer, depicted in the middle of Figure 1, deploys a standard         Algorithm 2 TCP Small Queue
FQ-CoDel algorithm, which is the default solution in recent           Input: TCP_SOCKET sk;
kernels with many Linux distributions [1]. Once the scheduler          1: int limit;
delivers the packets, they move to the last block, where the           2: limit = max(2 * sk→pktsize, sk→tcp_pacing_rate 
driver firmware implements the last hardware queue before                 10);
moving to the physical medium channel. Once a packet is                3: limit = min(limit, tcp_limit_output_bytes);
physically transmitted, a completion signal is cross-passed to
the TSQ algorithm.
                                                                         On the other hand, the TCP paced rate is used to calculate,
                                                                      in conjunction with the TSQ mechanism, the number of
                                                                      packets that a TCP socket can enqueue in the sender stack.
Algorithm 1 TCP Pacing Rate
                                                                      This quantity is a dynamic value described in Algorithm 2.
Input: TCP_SOCKET sk, int baseRTT ;                                   According to the algorithm, the TSQ limit is always higher
 1: int rate = mss * sk→cwnd / baseRTT ;
                                                                      than a minimum amount of 2 packets and lower than a maxi-
 2: if sk→cwnd < sk→ssthresh / 2 then
                                                                      mum amount of limit_output_bytes bytes (128 KB by
 3:    rate *= tcp_pacing_ss_ratio;      // SlowStart phase           default). The dynamic limit moves through these two bounds
 4: else
                                                                      and is the amount of data that corresponds to a latency equal
 5:    rate *= tcp_pacing_ca_ratio; // Cong.Avoid. phase              to 1 ms by default. Algorithm 2 clarifies this behavior: the
 6: end if
                                                                      dynamic amount of data that can be enqueued is calculated

129330                                                                                                                            VOLUME 9, 2021
C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ




through sk→tcp_pacing_rate  10, which is a 10-bit shift
of the current pacing rate, that corresponds to the amount
of data transmitted in 1 ms at the current paced rate. This
mechanism helps the sender congestion control mitigate the
queueing delay occurring inside the node and accurately
calculate RTTs. Concluding, the bit shift quantity changes the
latency introduced by TSQ, while the TP ratio changes the                        FIGURE 3. Physical testbed.
TSQ limit size. It is important to notice that BBR operates
with a customized TCP pacing mechanism used in its finite                        TABLE 1. Bottlenecks configuration.
state machine. BBR, indeed, is the sole congestion control to
ignore the global TP module and does not react to the global
tcp_pacing_ratio variable changes. BBR deploys a
specific finite-state machine, reported in Figure 2, where
the pacing rate is computed as a function of the bottleneck
bandwidth. BBR estimates the parameters in the finite-state
machine, in kernel-space, and cannot be modified by the user.
                                                                                 III. TESTBED
                                                                                 To validate the performance of TSQ and TP, we designed
                                                                                 the testbed reported in Figure 3. Our testbed is composed
                                                                                 of 3 nodes, resembling a classical internet connection with
                                                                                 Wi-Fi access for the Client C and a wired backhaul for the
                                                                                 Server S. In the middle between C and S there is the Router
                                                                                 R or Access Point AP, which interconnects the endpoints and
                                                                                 implements a wired testbed or a wireless one, respectively.
                                                                                 A Gigabit Ethernet supports the connection between S and
                                                                                 R/AP. In contrast, the connections between R/AP and C are
FIGURE 2. Linux TCP BBR block.
                                                                                 a Gigabit Ethernet and an IEEE 802.11n, both configured
                                                                                 with PCIe devices. These embed the BCM5761 chipset in
                                                                                 the wired testbed and the AR9580 chipset in the wireless
B. QUEUEING LAYER AND DRIVER                                                     one. The bottleneck segment is software-defined to be local
The default structure of FQ-CoDel, reported in Figure 1,                         (the link between C and R) or remote (the link between S
works as follows: a separate software queue serves each TCP                      and R) in the wired testbed. The difference between local and
flow, and each queue is managed by the CoDel algorithm to                        remote bottlenecks resembles different possible laboratories
control the latency and is served in a round-robin fashion.                      and home connections, allowing us to focus on widely differ-
The default CoDel threshold is set at 5 ms, which means                          ent possible real networks by only controlling the few testbed
that packets with sojourn time greater than the threshold will                   nodes. To deploy a configurable wired bottleneck, we used
be dropped at the dequeue stage. The queueing discipline                         the tc Linux package and implemented a hierarchical token
in novel Linux systems is managed, by default, with the                          bucket (HTB) queueing discipline, coupled with the default
tc tool, which allows configuring the characteristics of the                     FQ-CoDel on the Router interfaces. It is important to notice
networking layer. The Queueing layer and the Network Inter-                      that tc allows for traffic shaping through HTB, i.e., impos-
face Card (NIC) Driver blocks are strongly coupled in their                      ing a specific delivery rate for an interface, without com-
behavior, and Figure 1 represents a simple scenario in which a                   promising the scheduling and AQM algorithms behaviors,
single hardware queue is present. The driver also implements                     running in cascade to HTB. This is not true if the bottleneck
the Byte Queue Limit (BQL) for all the hardware queues,                          is modeled also imposing specific delay or packet losses
which is the last algorithm to control the global latency of the                 thorough the netem queueing discipline, which must be used
system [12]. The BQL mechanism tries to store enough data                        in mutual exclusion to the other queueing disciplines such
to avoid starvation and, simultaneously, to avoid accumulat-                     as FQ-CoDel. Anyway, this is not the case in our discussion
ing excessive data increasing the latency. The BQL algorithm                     since we focus only on the bottleneck bandwidth characteris-
is not tested in our paper, and the drivers’ default configura-                  tic. We defined 5 possible different bottleneck configurations,
tions are maintained. A considerable change imposed by the                       summarized in Table 1. Instead, the bottleneck is the physical
usage of a wireless Atheros NIC equipped with the ath9k                          wireless interface in the wireless testbed, reflecting a standard
driver, as is the case of our tests, is that it implements the                   Wi-Fi home access network with a Gigabit Ethernet backhaul.
FQ-CoDel mechanism directly in the firmware, disabling the                       All the nodes run an Arch Linux distribution with a 5.10-lts
queueing discipline layer when the driver is used [13]. Thus,                    Linux kernel version.1
it impacts the maximum aggregation size of the NIC due to
the 5 ms limit imposed by FQ-CoDel.                                                   1 Tests and scripts are available at: netlab.unimore.it/sw/TSQ-NtwL.zip.


VOLUME 9, 2021                                                                                                                                         129331
                                                                    C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ




                 FIGURE 4. Throughput and RTT on different wired bottlenecks: TSQ vs. NoTSQ.




FIGURE 5. TCP RTT: Different wired bottlenecks with TCP BBR or CUBIC.




IV. RESULTS                                                                 from 1 to 8 ms of data at the current rate. We also allowed
Our tests are performed using the Flent [14] tool, running                  the possibility to disable the TSQ mechanism, naming this
four TCP uploads from C to S for 30 seconds with a con-                     strategy NoTSQ in our results. The TCP congestion control
current ping flow. The core parameter changed between                       algorithms used in our tests are TCP Cubic and TCP BBR to
each test is the TSQ size; indeed, we developed a Linux                     evaluate the TSQ performance in the presence of a loss-based
kernel patch1 to allow us to change the standard TSQ dynamic                and a delay-based variant, respectively. Moreover, we also
size of 1 ms of data at the current rate, with a value                      evaluated two possible queueing disciplines, FQ-CoDel

129332                                                                                                                                  VOLUME 9, 2021
C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ




                   FIGURE 6. Throughput and RTT of TCP BBR and CUBIC on a remote bottleneck.


and PFIFO,2 which is a priority scheduler with three pos-                           Figure 4 shows the throughput and the latency of a simple
sible software queues. PFIFO is still the default queueing                       TCP Cubic stream from C to S, with PFIFO used as queueing
discipline in several Linux distribution and networking nodes,                   discipline in the two possible bottlenecks tested: L100M and
like the Raspberry PI model 4, waiting for FQ-CoDel to                           R100M. We used PFIFO instead of FQ-CoDel to enhance
replace it incrementally. We did not focus on the FQ queueing                    the sole impact of TSQ on the latency reduction. All our
discipline, historically associated with BBR; this is because,                   results are in candlestick format; the top and the bottom of
before the kernel version 4.17, BBR could not deploy a proper                    the boxes represent the 90th and the 10th percentiles of the
TP mechanism by itself and was recommended to be used in                         data, respectively, while the solid line into the box represents
conjunction with FQ. The use of FQ indeed helps in terms                         the median data value. One clear evidence of Figure 4 is
of global TCP Pacing, adding an extra CPU usage for this                         the remarkable impact of TSQ when the bottleneck is local,
task as shown in [10]. Anyway, in our analysis, we focus on                      because the dominant latency contribution is the queueing
network performance, where throughput and latency need to                        delay caused by the sender node queues, which is limited by
be controlled through the default packet schedulers or AQM                       the TSQ mechanism. It is important to note that this remark-
techniques offered by current Linux kernels. As a matter of                      able latency drop is performed maintaining the throughput
fact, the usage of FQ is not common anymore since, from the                      close to 96 Mbit/s, like in all the other configurations. On the
4.17 kernel, BBR effectively implements its pacing system                        other hand, when the bottleneck is remote, the presence of
without needing FQ anymore.                                                      TSQ only marginally mitigates the end-to-end latency.
                                                                                    Considering that TSQ does not impact a wired bottleneck
A. WIRED RESULTS                                                                 throughput, we now focus only on the TCP RTT perfor-
We first introduce the results obtained with the wired version                   mance of all the five wired bottlenecks tested in Figure 5.
of our testbed of Figure 3. All the wired tests have been per-                   Moreover, we also include the impact of a different TCP
formed using the standard TSQ configuration or the NoTSQ                         congestion control and a different queueing discipline, BBR,
one with the mechanism disabled.                                                 and FQ-CoDel. Figure 5a is an extension of Figure 4 that
                                                                                 embraces also the L10M, R10M and LR1000M bottleneck
  2 The queueing discipline name in the tc package is pfifo_fast.
                                                                                 configurations, focusing on the TCP RTT instead of the

VOLUME 9, 2021                                                                                                                            129333
                                                                 C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ




                FIGURE 7. Throughput and RTT of TCP BBR and CUBIC on a local bottleneck.




ICMP ping RTT. The introduction of TSQ reduces by two                    and BBR on a remote bottleneck operating at 100 Mbit/s
orders of magnitude the TCP RTT in L10M and L100M,                       with a PFIFO queueing discipline; even though throughput
where the bottleneck is imposed on the local network through             performance is similar, the RTT is remarkably different, and
the HTB filter. In the LR1000M, instead, where the interfaces            the shape of the curves helps to understand the big difference
are not software-limited, and the HTB filter is disabled every-          between the two congestion controls. TCP CUBIC operates
where, the TCP RTT reduction introduced by TSQ is of one                 by waiting for the loss feedback from the network, filling the
order of magnitude, which is again a remarkable result.                  bottleneck queue up to the packet drops. Indeed, the shape
   Moving from Figure 5a to Figure 5b, only the TCP con-                 of the RTT curve presents several peaks corresponding to the
gestion control is changed from CUBIC to BBR. In this case,              maximum queueing delay when the bottleneck queue is full
the impact of TSQ is still observable but with a smaller impact          and some minimum peaks corresponding to the new starting
of 10 ms and 3 ms in the L10M and L100M configurations,                  congestion window of CUBIC in response to the loss. The
respectively. Even in this case, removing the bottlenecks with           behavior of BBR, instead, is different due to the model-based
the LR1000M configuration leads to a smaller TSQ impact                  nature of the congestion control. It is possible to notice the
of a couple of ms. Finally, Figure 5c reports the results of             draining spikes happening every 10 s, which correspond to
BBR in conjunction with FQ-CoDel, which is very similar                  the probe_RTT phase of BBR reported in Figure 2. Since
to the effects of CUBIC with FQ-CoDel, not reported here.                the bottleneck of Figure 6 is remote, the presence or not of
The differences with Figure 5b are the maximum latency                   TSQ moving from Figure 6a to Figure 6b does not impact
in the R10M configuration and in the L10M with NoTSQ,                    the results. The situation is different in Figure 7, where
in which FQ-CoDel imposes a maximum queueing delay at                    the bottleneck is local, and packets get accumulated in the
the bottleneck that results in a global TCP RTT of 10 ms.                sender’s NIC. If the TSQ algorithm is not active (Figure 7a),
   To conclude the discussion on wired bottlenecks, we also              the behavior of the system is logically equivalent to the case
present Figures 6 and 7, which include the same tests of                 of a remote bottleneck since there is no way to control the
Figure 4, adding also BBR as a congestion control algorithm.             number of packets at the NIC, and both TCP CUBIC and TCP
Figure 6 reports the throughput and the RTT of CUBIC                     BBR operate like in Figure 6a. If the TSQ is active, instead,

129334                                                                                                                               VOLUME 9, 2021
C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ




       FIGURE 8. Throughput and RTT on the ath9k wireless bottleneck.




TCP CUBIC completely changes its behavior because TSQ                            concern, instead, the different RTT flows, we changed the
forbids TCP CUBIC to accumulate packets in the bottleneck                        base RTT through the netem package at the AP with three
queue (which is the local NIC’s queue). The result is that TCP                   possible flows configurations, at 1, 10, and 100 ms of
CUBIC, and any loss-based TCP variant generally, would                           base RTT.
operate ruled by the TSQ interrupts, maintaining a minimum                          Figure 8a shows the results of four TCP CUBIC uploads
NIC’s queue usage like the BBR congestion control.                               with our eight different TSQ configurations and the three
                                                                                 possible pacing ratios, while Figure 8b shows the results of
B. Wi-Fi RESULTS                                                                 the same test with a standard pacing ratio and three possi-
The TSQ mechanism has been shown to break the frame                              ble base-RTT. In this set of experiments, we selected only
aggregation logic of Wi-Fi technologies in [8], so we investi-                   TCP CUBIC since TCP BBR ignores the global TP variable
gate the impact of different TSQ sizes in conjunction with                       changes. These two plots are presented together to highlight
different TP ratios and different base-RTT flows. We con-                        the relation between TP and RTT, which both affect the
figured 3 different TP ratios by changing the congestion                         number of packets that the TCP socket can enqueue and,
avoidance pace variable tcp_pacing_ca_ratio, from                                consequently, the throughput. Theoretically, from the combi-
the default value of 1.2 to 1.3 and 1.4, naming these                            nation of Algorithms 1 and 2, halving the RTT has the same
configurations 2p, 3p, and 4p, respectively. For what                            effect of doubling the TP rate.

VOLUME 9, 2021                                                                                                                        129335
                                                                             C. A. Grazia et al.: New TCP Modules on Block: Performance Evaluation of TP and TSQ




   The values reported in green in Figure 8 are almost the                            [5] B. Briscoe, A. Brunstrom, A. Petlund, D. Hayes, D. Ros, I.-J. Tsang,
same, due to a slightly different base-RTT of the first exper-                            S. Gjessing, G. Fairhurst, C. Griwodz, and M. Welzl, ‘‘Reducing internet
                                                                                          latency: A survey of techniques and their merits,’’ IEEE Commun. Surveys
iment, under 1 ms. In Figure 8a, it is possible to see that                               Tuts., vol. 18, no. 3, pp. 2149–2196, 3rd Quart., 2016.
increasing the pacing rate (blue and red) has the effect of                           [6] Y. Zhao, A. Saeed, E. Zegura, and M. Ammar, ‘‘zD: A scalable zero-drop
increasing the throughput on the Wi-Fi path and increasing                                network stack at end hosts,’’ in Proc. CoNEXT, 2019, pp. 220–232.
                                                                                      [7] Y. Guo, F. Qian, Q. Chen, Z. Morley Mao, and S. Sen, ‘‘Understanding on-
the latency as well. This result is justified by Algorithm 2;                             device bufferbloat for cellular upload,’’ in ACM SIGCOMM, vols. 14–16,
indeed, the higher is the TP rate, the larger is the number of                            2016, pp. 303–317.
packets enqueued in the NIC, which allows for larger aggre-                           [8] C. A. Grazia, N. Patriciello, T. Hoiland-Jorgensen, M. Klapez, M. Casoni,
                                                                                          and J. Mangues-Bafalluy, ‘‘Adapting TCP small queues for IEEE 802.11
gates increasing the throughput, despite a latency increment.                             networks,’’ in Proc. IEEE 29th Annu. Int. Symp. Pers., Indoor Mobile Radio
Once the initial TSQ value is greater than 4TSQ, increasing                               Commun. (PIMRC), Sep. 2018, pp. 1–6.
the TP ratio has the sole effect of increasing the latency                            [9] C. A. Grazia, ‘‘IEEE 802.11n/AC wireless network efficiency under differ-
                                                                                          ent TCP congestion controls,’’ in Proc. Int. Conf. Wireless Mobile Comput.,
since the Wi-Fi bottleneck has already been saturated with                                Netw. Commun. (WiMob), Oct. 2019, pp. 1–6.
the maximum available frame aggregation.                                             [10] Y. Zhao, A. Saeed, M. Ammar, and E. Zegura, ‘‘Scouting the path to a
   In Figure 8b, on the other hand, it is possible to see that a                          million-client server,’’ in Passive and Active Measurement. Cham, Switzer-
                                                                                          land: Springer, 2021, pp. 337–354.
higher RTT has the same effect as a pacing reduction. This                           [11] C. A. Grazia, ‘‘Future of TCP on Wi-Fi 6,’’ IEEE Access, vol. 9,
effect can be observed moving from 1 ms of base RTT to                                    pp. 107929–107940, 2021.
10 ms of base RTT, where the former configuration registers                          [12] N. Mareev, D. Kachan, K. Karpov, D. Syzov, and E. Siemens, ‘‘Efficiency
                                                                                          of BQL congestion control under high bandwidth-delay product network
higher throughput with respect to the latter. With a base                                 conditions,’’ in Proc. Int. Conf. Appl. Innov. (IT), 2019, vol. 7, no. 1,
RTT of 100 ms instead, the delivery rate is too low, and                                  pp. 19–22.
the final throughput result is not optimal, even relaxing the                        [13] T. Høiland-Jørgensen, M. Kazior, D. Taht, P. Hurtig, and A. Brunstrom,
                                                                                          ‘‘Ending the anomaly: Achieving low latency and airtime fairness in
TSQ constraints. This effect is justified because the high RTT                            WiFi,’’ in Proc. USENIX ATC, 2017, pp. 139–151.
forces a low rate, considering Algorithm 1. Consequently,                            [14] T. Hoeiland-Joergensen, C. A. Grazia, P. Hurtig, and A. Brunstrom, ‘‘Flent:
even relaxing the TSQ value in Algorithm 2 does not allow the                             The flexible network tester,’’ in Proc. 11th EAI Int. Conf. Perform. Eval.
                                                                                          Methodol. Tools (ValueTools), 2017, pp. 120–125.
rate to grow enough to form larger aggregates and discover
higher throughputs available on the Wi-Fi channel.
                                                                                                                CARLO AUGUSTO GRAZIA (Member, IEEE)
V. CONCLUSION                                                                                                   received the Ph.D. degree from the Department
In this paper, we evaluated that TSQ alone significantly                                                        of Engineering Enzo Ferrari (DIEF), University
                                                                                                                of Modena and Reggio Emilia (UNIMORE),
impacts latency when the bottleneck is local. The reason is                                                     in 2016. He is currently an Assistant Professor
due to the nature of TSQ, which limits the amount of data                                                       holding the course automotive connectivity with
that each socket can enqueue, or, in other words, limits the                                                    UNIMORE. He has been involved in the EU
bottleneck queue size if the bottleneck is the sender’s NIC.                                                    FP7 Projects E-SPONDER and PPDR-TC. His
                                                                                                                research interests include computer networking,
The latency reduction effect is mitigated when algorithms like                                                  with an emphasis on wireless networks, queueing
BBR and FQ-CoDel are deployed, but it is still observable.                                                      algorithms, and V2X.
These results pose the TSQ mechanism in a critical position
when dealing with network performance. Network simulators
will have to include this mechanism to maintain high fidelity                                                   MARTIN KLAPEZ received the Ph.D. degree from
results compared with real systems. Nevertheless, the TSQ                                                       DIEF, UNIMORE, in 2017. He is currently a Post-
has a different impact on a local Wi-Fi bottleneck with respect                                                 doctoral Research Fellow with UNIMORE. He has
                                                                                                                collaborated with the Italian Nanoscience National
to a local wired bottleneck; the latency reduction is coupled                                                   Research Center S3 and he has been involved
with a non-optimal throughput if the limit imposed by TSQ is                                                    in the EU FP7 Project PPDR-TC. His research
too strict. Simultaneously, TP and the base RTT roles impact                                                    interests include verge around network softwariza-
the Wi-Fi performance because they modify the TSQ limit,                                                        tion, public safety networks, and safety-related
                                                                                                                V2X systems.
and the higher the TP, or the smaller the base RTT, the higher
the throughput will be.

REFERENCES                                                                                                      MAURIZIO CASONI (Senior Member, IEEE)
 [1] T. Hoeiland-Joergensen, P. McKenney, D. Taht, J. Gettys, and                                               received the M.S. (Hons.) and Ph.D. degrees
     E. Dumazet. (Jan. 2018). FlowQueue-CoDel. [Online]. Available:                                             in electrical engineering from the University of
     https://tools.ietf.org/html/rfc8290
                                                                                                                Bologna, Italy, in 1991 and 1995, respectively.
 [2] N. Cardwell, Y. Cheng, C. S. Gunn, S. H. Yeganeh, and V. Jacobson,
                                                                                                                In 1995, he was with the Computer Science
     ‘‘BBR: Congestion-based congestion control,’’ Commun. ACM, vol. 60,
     no. 2, pp. 58–66, 2017.                                                                                    Department, Washington University in St. Louis,
 [3] A. Saeed, N. Dukkipati, V. Valancius, V. Lam, C. Contavalli, and A. Vahdat,                                MO, USA, as a Research Fellow. He is cur-
     ‘‘Carousel: Scalable traffic shaping at end hosts,’’ in ACM SIGCOMM,                                       rently an Associate Professor of telecommunica-
     2017, pp. 404–417.                                                                                         tions with DIEF, UNIMORE, Italy. He has been
 [4] B. Stephens, A. Singhvi, A. Akella, and M. Swift, ‘‘Titan: Fair packet                                     responsible at UNIMORE for the EU FP7 Projects
     scheduling for commodity multiqueue NICs,’’ in Proc. USENIX Annu.                                          E-SPONDER and PPDR-TC.
     Tech. Conf., 2017, pp. 431–444.

129336                                                                                                                                               VOLUME 9, 2021