Routing Without Regret: On Convergence to Nash Equilibria of Regret-Minimizing Algorithms in Routing Games

Authors Avrim Blum, Eyal Even-Dar, Katrina Ligett,
Plaintext
                            T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199
                                         www.theoryofcomputing.org




           Routing Without Regret:
    On Convergence to Nash Equilibria of
   Regret-Minimizing Algorithms in Routing
                   Games∗
                   Avrim Blum†                     Eyal Even-Dar                Katrina Ligett‡
                            Received: November 10, 2009; published: September 15, 2010.




       Abstract: There has been substantial work developing simple, efficient regret-minimizing
       algorithms for a wide class of repeated decision-making problems including online routing.
       These are adaptive strategies an individual can use that give strong guarantees on perfor-
       mance even in adversarially-changing environments. There has also been substantial work
       on analyzing properties of Nash equilibria in routing games. In this paper, we consider the
       question: if each player in a routing game uses a regret-minimizing strategy, will behavior
       converge to a Nash equilibrium? In general games the answer to this question is known to
       be no in a strong sense, but routing games have substantially more structure.
           In this paper we show that in the Wardrop setting of multicommodity flow and infinites-
       imal agents, behavior will approach Nash equilibrium (on most days, the cost of the flow
   ∗ A preliminary version of these results appeared in the Proceedings of the 25th Annual ACM Symposium on Principles of

Distributed Computing, July 2006.
   † Supported in part by the National Science Foundation under grants IIS-0121678, CCR-0122581, and CCF-0514922.
   ‡ This material is based upon work supported in part by an AT&T Labs Graduate Fellowship, an NSF Graduate Research

Fellowship, and by the National Science Foundation under Grant #0937060 to the Computing Research Association for the
CIFellows program.


ACM Classification: G.2.2, J.4
AMS Classification: 68Q32, 68T05, 68W40, 91A13, 91A20, 91A43
Key words and phrases: regret minimization, Nash equilibria, routing games


  2010 Avrim Blum, Eyal Even-Dar, and Katrina Ligett
  Licensed under a Creative Commons Attribution License                                     DOI: 10.4086/toc.2010.v006a008
                         AVRIM B LUM , E YAL E VEN -DAR , AND K ATRINA L IGETT

      will be close to the cost of the cheapest paths possible given that flow) at a rate that depends
      polynomially on the players’ regret bounds and the maximum slope of any latency function.
      We also show that Price of Anarchy results may be applied to these approximate equilibria,
      and also consider the finite-size (non-infinitesimal) load-balancing model of Azar (1998).
      Our nonatomic results also apply to the more general class of congestion games.




1    Introduction

There has been substantial work in learning theory and game theory on regret-minimizing algorithms
for problems of repeated decision-making. Regret-minimizing algorithms have the property that in any
online, repeated setting, their average loss per time step approaches that of the best fixed strategy in
hindsight (or better) over time. Moreover, the convergence rates are quite good: in Hannan’s original
algorithm [23], the number of time steps needed to achieve a gap of ε with respect to the best fixed
strategy in hindsight—the “per time step regret”—is linear in the size of the game N. This was reduced
to logarithmic in N in more recent exponential-weighting algorithms for this problem [28, 8, 20] (also
known as the problem of “combining expert advice”). Most recently, a number of algorithms have been
developed for achieving such guarantees in a computationally efficient manner in many settings where
the number of possible actions N is exponential in the natural description-length of the problem [25, 38,
39].
    One specific setting where efficient regret-minimizing algorithms can be applied is online routing.
Given a graph G = (V, E) and two distinguished nodes vstart and vend , the game for an individual player
is defined as follows. At each time step t, the player’s algorithm chooses a path Pt from vstart to vend ,
and a set of edge costs {cte }e∈E is simultaneously determined. (Many regret-minimizing algorithms can
handle arbitrary adversarially chosen edge costs, but in our application, the edge costs are determined by
the other players’ routing decisions.) The edge costs are then revealed and the player pays the resulting
cost of the path it chose. Even though the number of possible paths can be exponential in the size of a
graph, regret-minimizing algorithms exist (e. g., [25, 39]) with computation costs and convergence rates
to the cost of the best fixed path in hindsight that are both polynomial in the size of the graph and in the
maximum edge cost. Moreover, a number of extensions [3, 29] have shown how these algorithms can
be applied even to the “bandit” setting where only the cost of edges actually traversed (or even just the
total cost of Pt ) is revealed to the algorithm at the end of each time step t.
    Along a very different line of inquiry, there has also been much recent work on the Price of Anarchy
in games. Koutsoupias and Papadimitriou [27] defined the Price of Anarchy, which is the ratio of the
cost of an optimal global objective function to the cost of the worst Nash equilibrium. Many subsequent
results have studied the Price of Anarchy in a wide range of computational problems from job scheduling
to facility location to network creation games, and especially to problems of routing in the Wardrop
model such as that described above, where the cost of an edge is a function of the amount of traffic using
that edge, and the individual players are infinitesimal [10, 11, 27, 34, 15]. Such work implicitly assumes
that selfish individual behavior results in Nash equilibria.

                        T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                              180
                                              ROUTING W ITHOUT R EGRET

Our Contribution We consider the following question: if all players in a routing game use regret-
minimizing algorithms to choose their paths each day, what can we say about the overall behavior of the
system? In particular, the regret-minimizing property—achieving performance close to that of the best
fixed action in hindsight (also called Hannan Consistency)—can be viewed as a natural condition of well-
reasoned self-interested behavior over time. That is, given that simple, computationally feasible regret-
minimizing algorithms exist for a repeated situation faced by an agent, one might posit that the agent’s
self-interested behavior would be regret-minimizing. Given that, we ask, if all players are adapting
their behavior in a regret-minimizing manner, can we say that the system as a whole will approach
equilibrium? Our main result is that in the Wardrop network routing setting of multicommodity flow
and infinitesimal agents, the flows will approach equilibrium in the sense that a 1 − ε fraction of the
daily flows will have the property that at most an ε fraction of the agents in them have more than an ε
incentive to deviate from their chosen path, where ε approaches 0 at a rate that depends polynomially
on the size of the graph, the regret-bounds of the algorithms, and the maximum slope of any latency
function.1 Moreover, we show that the dependence on slope (the one new parameter) is necessary.

    One further consequence, as we show in Section 5, is that if all players use regret-minimizing strate-
gies, there is no other strategy that would significantly improve any player’s cost on more than a small
fraction of time steps. In addition, we give stronger results for special cases such as the case of n par-
allel links and also consider the finite-size (atomic) load-balancing model of Azar [4]. Our results for
infinitesimal players also hold for a more general class of games called congestion games, although
computationally efficient regret-minimizing algorithms do not always exist for the most general of these
games.

    One way our result can be viewed is as follows. Regret-minimizing algorithms are very compelling
from the point of view of individuals: if you use a regret-minimizing algorithm to drive to work each
day, you will get a good guarantee on your performance no matter what is causing congestion (other
drivers, road construction, or unpredictable events). But it would be a shame if, were everyone to use
such an algorithm, this produced globally unstable behavior. Our results imply that in the Wardrop
network routing model, so long as edge latencies have bounded slope, we can view equilibria as not just
a stable steady-state or the result of adaptive procedures specifically designed to find them, but in fact
as the natural result of individual selfishly adaptive behavior by agents that do not necessarily know (or
care) what policies other agents are using. Moreover, our results do not in fact require that users follow
strategies that are regret-minimizing over all possible cost scenarios, as long as their behavior satisfies
the regret-minimization property over the sequence of flows actually observed.




    1 A more traditional notion of approximate Nash equilibrium requires that no player will have more than ε incentive to

deviate from her strategy. However, one cannot hope to achieve such a guarantee using arbitrary regret-minimizing algorithms,
since such algorithms allow players to occasionally try bad paths, and in fact such experimentation is even necessary in bandit
settings. For the same reason, one cannot hope that all days will be approximate-Nash. Finally, our guarantee may make
one worry that some users could always do badly, falling in the ε minority on every day, but as we discuss in Section 5, the
regret-minimization property can be used in our setting to further show that no player experiences many days in which her
expected cost is much worse than the best path available on that day.


                            T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                                           181
                           AVRIM B LUM , E YAL E VEN -DAR , AND K ATRINA L IGETT

1.1   Regret and Nash equilibria
At first glance, a result of this form seems that it should be obvious given that a Nash equilibrium is pre-
cisely a set of strategies (pure or mixed) that are all regret-minimizing with respect to each other. Thus
if the learning algorithms settle at all, they will have to settle at a Nash equilibrium. In fact, for zero-sum
games, regret-minimizing algorithms when played against each other will approach a minimax optimal
solution [21]. However, it is known that even in small 2-player general-sum games, regret-minimizing
algorithms need not approach a Nash equilibrium and can instead cycle, achieving performance sub-
stantially worse than any Nash equilibrium for all players. Indeed simple examples are known where
standard algorithms will have this property with arbitrarily high probability [40].

1.2   Regret and correlated equilibria
It is known that certain algorithms such as that of Hart and Mas-Colell [24], as well as any algorithms
satisfying the stronger property of “no internal regret” [19], have the property that the empirical distribu-
tion of play approaches the set of correlated equilibria. On the positive side, such results are extremely
general, apply to nearly any game including routing, and do not require any bound on the slopes of edge
latencies. However, such results do not imply that the daily flows themselves (or even the time-average
flow) are at all close to equilibrium. It could well be that on each day, a substantial fraction of the
players experience latency substantially greater than the best path given the flow (and we give a specific
example of how this can happen when edge-latencies have unbounded slope in Section 2.4). In addition,
although Neyman [32] does show that the only correlated equilibrium in potential games with strictly
concave potential functions is the unique Nash equilibrium, there is no known efficient implementation
for internal regret minimization for routing problems.

1.3   Related work
Sandholm [36] considers convergence in potential games (which include routing games), and shows that
a very broad class of evolutionary dynamics is guaranteed to converge to Nash equilibrium.2 Fischer and
Vöcking [17] consider a specific adaptive dynamics (a particular functional form in which flow might
naturally change over time) in the context of selfish routing and prove results about convergence of this
dynamics to an approximately stable configuration. In more recent work, they study the convergence
of a class of routing policies under a specific model of stale information [18]. Most recently, Fischer,
Raecke, and Vöcking [16] gave a distributed procedure with especially good convergence properties.
The key difference between that work and ours is that those results consider specific adaptive strategies
designed to quickly approach equilibrium. In contrast, we are interested in showing convergence for
any algorithm satisfying the regret-minimization property. That is, even if the players are using many
different strategies, without necessarily knowing or caring about what strategies others are using, then
so long as all are regret-minimizing, we show they achieve convergence. In addition, because efficient
regret-minimizing algorithms exist even in the bandit setting where each agent gets feedback only about
its own actions [3, 29], our results can apply to scenarios in which agents adapt their behavior based on
only very limited information and there is no communication at all between different agents.
  2 Note that such dynamics do not include general regret-minimizing dynamics.



                          T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                              182
                                              ROUTING W ITHOUT R EGRET

     Convergence time to Nash equilibrium in load balancing has also been studied. Earlier work studied
convergence time using potential functions, with the limitation that only one player is allowed to move
in each time step; the convergence times derived depended on the appropriate potential functions of the
exact model [30, 12]. The work of Goldberg [22] studied a randomized model in which each user can
select a random delay over continuous time. This implies that only one user tries to reroute at each
specific time; therefore the setting was similar to that mentioned above. Even-Dar and Mansour [13]
considered a model where many users are allowed to move concurrently, and derived a logarithmic
convergence rate for users following a centrally-moderated greedy algorithm. Most recently, Berenbrink
et al. [6] showed weaker convergence results for a specific distributed protocol. To summarize, previous
work studied the convergence time to pure Nash equilibria in situations with a centralized mechanism
or specific protocol. In contrast, we present fast convergence results for approximate Nash equilibria
in a non-centralized setting, and our only assumption about the player strategies is that they are all
regret-minimizing.

Subsequent work Since the initial publication of these results, a number of publications have studied
related questions. Blum et al. [7] and Roughgarden [33] explore the outcomes of regret-minimizing
behavior in a variety of classes of games; they are able to show Price of Anarchy style bounds on the
social cost, but do not prove convergence results. Kleinberg et al. [26] study agents in atomic congestion
games employing a particular class of regret-minimization algorithms and show that in many cases, the
additional assumptions on the player algorithms allow convergence to pure Nash equilibria. Even-Dar et
al. [14] demonstrate convergence of general regret-minimizing algorithms to Nash equilibria in a general
class of games they call “socially-concave” games; our games do not fit in their model, because their
work considers a finite number of players.
     Awerbuch et al. [2] show that a certain type of best response dynamics converges quickly to approxi-
mate Nash equilibria in congestion games. General regret-minimizing dynamics are much more complex
than the dynamics they study, and, we believe, are better motivated from an individual’s perspective in
potentially rapidly-changing environments.


2     Preliminaries
2.1    Nonatomic congestion games
Let E be a finite ground set of elements (we refer to them as edges). There are k player types 1, 2, . . . , k,
and each player type i has an associated set of feasible paths Pi , where Pi is a set of subsets of E.
Elements of Pi are called paths or strategies. For example, player type i might correspond to players
who want to travel from node ui to node vi in some underlying graph G, and Pi might be the set of all
ui -vi paths. The continuum Ai of agents of type i is represented by the interval [0, ai ], endowed with
Lebesgue measure. We restrict ∑ki=1 ai = 1, so there is a total of one unit of flow. Each edge e ∈ E
has an associated traffic-dependent, non-negative, continuous, non-decreasing latency function `e . A
nonatomic congestion game is defined by (E, `, P, A).
     A flow specifies a path for each player:      fi : Ai → Qi Rwhere Qi is the set of 0/1R vectors in P      Ri
                                                                                                                   with ex-
actly one 1. We write f = ( A1 f1 , . . . , Ak fk ), where by Ai fi we mean ( Ai ( fi )1 , Ai ( fi )2 , . . . , Ai ( fi )|Pi | ).
                              R            R                                     R


                             T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                                            183
                           AVRIM B LUM , E YAL E VEN -DAR , AND K ATRINA L IGETT

For a path P in Pi , we write fP = ( fi )P to represent the weight that flow f places on path P. Thus,
∑P∈Pi fP = ai for all i, and fP is the measure of the set of players selecting path P. Each flow induces
a unique flow on edges such that the flow fe on an edge e has the property fe = ∑P:e∈P fP . The latency
of a path P given a flow f is `P ( f ) = ∑e∈P `e ( fe ), i. e., the sum of the latencies of the edges in the path,
given that flow. The cost incurred by a player is simply the latency of the path she plays.
    We define |E| = m. We will assume all edge latency functions have range [0, 1], and the latency of a
path is always between 0 and n. We use L to denote n times the maximum slope of any latency function.
Let f 1 , f 2 , . . . , f T denote a series of flows from time 1 up to time T . We use fˆ to denote the time-average
flow, i. e., fˆe = (1/T ) ∑t=1   T
                                     fet .
Remark 2.1. Network routing (Wardrop) games are a special case of nonatomic congestion games,
where there is an underlying graph G and players of type i have a start node ui and a destination node vi ,
and Pi is the set of all ui -vi paths.

2.2     Equilibria and social cost
A flow f is at Wardrop equilibrium if no user would prefer to reroute her traffic, given the existing flow.
This is essentially a continuous (infinitesimal, nonatomic) analogue of the Nash equilibrium.
Definition 2.2. A flow f on game (E, `, P, A) is at equilibrium if and only if for every player type i, and
paths P1 , P2 ∈ Pi with fP1 > 0, `P1 ( f ) ≤ `P2 ( f ).
    It is useful to note that in this domain, the flows at equilibrium are those for which all flow-carrying
paths for a particular player type have the same latency, and this latency is minimal among all paths
for players of that type. In addition, given our assumption that all latency functions are continuous and
non-decreasing, one can prove the existence of equilibria:
Proposition 2.3 (Schmeidler [37], generalization of Beckman et al. [5]). Every nonatomic congestion
game with continuous, non-decreasing latency functions admits a flow at equilibrium.
      We define the social cost of a flow to be the average cost incurred by the players:
Definition 2.4. Define the cost C( f ) of a flow f to be C( f ) = ∑e∈E `e ( fe ) fe .
      In addition, for any nonatomic congestion game, there is a unique equilibrium cost:
Proposition 2.5 (Milchtaich [31], generalization of Beckman et al. [5]). Distinct equilibria for a non-
atomic congestion game have equal social cost.

2.3     Regret-minimizing algorithms
Definition 2.6. Consider a series of flows f 1 , f 2 , . . . , f T and a user who has experienced latencies
c1 , c2 , . . . , cT over these flows. The per-time-step regret of the user is the difference between her av-
erage latency and the latency of the best fixed path in hindsight for players of her type i, that is,

                                         1 T t         1 T
                                           ∑ P∈Pi T ∑ ∑ `e ( fet ) .
                                         T t=1
                                               c − min
                                                        t=1 e∈P


                          T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                                   184
                                          ROUTING W ITHOUT R EGRET

The concept which we refer to here as regret is known more generally as external regret, to distinguish it
from other regret notions that compare with a different set of reference outcomes. An online algorithm
for selecting paths at each time step t is regret-minimizing (has the regret-minimizing property) if, for
any sequence of flows, the expected per-time step regret (over internal randomness in the algorithm)
goes to 0 as T goes to infinity.

    Here and in the rest of this paper, excluding Section 7, we consider infinitesimal users using a finite
number of different algorithms; in this setting, we can get rid of the expectation. In particular, if each
user is running a regret-minimizing algorithm, then the average regret over users also approaches 0.
Thus, since all players have bounded per-timestep cost, applying the strong law of large numbers, we
can make the following assumption:

Assumption 2.7. The series of flows f 1 , f 2 , . . . satisfies

                            1 T              t t            1 k           T
                              ∑∑ e e
                            T t=1
                                      `e ( f  ) f ≤ R(T ) +
                                                            T ∑ P∈Pi ∑ ∑ `e ( fet )
                                                                  ai min
                                  e∈E                         i=1        t=1 e∈P

where R(T ) → 0 as T → ∞. The function R(T ) may depend on the size of the network and its maximum
possible latency. We then define Tε as the value of T required for a particular algorithm to get R(T ) ≤ ε
(this is well defined for any algorithm with non-increasing expected regret).

    For example, for the case of a routing game consisting of only two nodes and m parallel edges,
exponential-weighting algorithms [28, 8, 20] give Tε = O((log m)/ε 2 ). For general graphs, an alternative
algorithm and analysis of Kalai and Vempala yields Tε = O(mn(log n)/ε 2 ) [25]. For general graphs
where an agent can observe only its path cost, results of Abernathy et al. [1] achieve Tε = O(m3 /ε 2 ).

2.4   Approaching Nash equilibria
We now need to specify in what sense flow will be approaching a Nash equilibrium. The first notion one
might consider is the L1 distance from some true equilibrium flow. However, if some edges have nearly-
flat latency functions, it is possible for a flow to have regret near 0 and yet still be far in L1 distance from
a true equilibrium flow. A second natural notion would be to say that the flow f has the property that no
user has cost much more than the cheapest path given f . However, notice that the regret-minimization
property allows users to occasionally take long paths, so long as they perform well on average (and in
fact algorithms for the bandit problem will have exploration steps that do just that [3, 29]). So, one
cannot expect that on any time step all users are taking cheap paths.
     Instead, we require that most users be taking a nearly-cheapest path given f . Specifically,

Definition 2.8. A flow f is at average-ε-Nash equilibrium if the average cost under this flow is within ε
of the minimum cost paths under this flow, i. e., C( f ) − ∑ki=1 ai minP∈Pi ∑e∈P `e ( fe ) ≤ ε.
                                                      √                                           √
    Note that Definition 2.8 implies that at most a ε fraction of traffic can have more than a ε incen-
tive to deviate from their path, and as a result is very similar to the definition of (ε, δ )-Nash equilibria
in [16] (similarly, an r < ε fraction of traffic could have an ε/r incentive to deviate). We also are able to
show that one can apply Price of Anarchy results to average-ε-Nash flows; we discuss this in Section 6.

                          T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                               185
                          AVRIM B LUM , E YAL E VEN -DAR , AND K ATRINA L IGETT

     We will begin by focusing on the time-average flow fˆ, showing that for regret-minimizing algo-
rithms, this flow is approaching equilibrium. That is, for a given Tε we will give bounds on the number
of time steps before fˆ is average-ε-Nash. After analyzing fˆ, we then extend our analysis to show that
in fact for most time steps t, the flow f t itself is average-ε-Nash. To achieve bounds of this form, which
we show in Section 5, we will however need to lose an additional factor polynomial in the size of the
graph. Again, we cannot hope to say that f t is average-ε-Nash for all (sufficiently large) time-steps
t, because regret-minimizing algorithms may occasionally take long paths, and an “adversarial” set of
such algorithms may occasionally all take long paths at the same time.

2.5   Dependence on slope
Our convergence rates will depend on the maximum slope s allowed for any latency function. For
example, consider the case of a routing game with two parallel links, where one edge has latency 0 up
to a load of 1/3 and then rises immediately to 1, and the other edge has latency 0 up to a load of 2/3
and then rises directly to 1. In this case the equilibrium cost is 0, and moreover for any flow f 0 we
have minP∈P ∑e∈P `e ( fe0 ) = 0. Thus, the only way f 0 can be average-ε-Nash is for it to actually have low
cost, which means the algorithm must precisely be at a 1/3-2/3 split. If players use regret-minimizing
algorithms, traffic will instead oscillate, each edge having cost 1 on about half the days and each player
incurring cost 1 on not much more than half the days (and thus not having much regret). However, none
of the daily flows will be better than average-(1/3)-Nash, because on each day, the cost of the flow f is
at least 1/3.
     This example demonstrates that some dependence on the shape of the latency functions is necessary.
However, it is possible that one could obtain stronger results using a different parameter than the slope.


3     Infinitesimal users: Linear latency functions
We begin as a warm-up with the easiest case, infinitesimal users and linear latency functions, which
simplifies many of the arguments. In particular, for linear latency functions, the latency of any edge
under the time-average flow fˆ is guaranteed to be equal to the average latency of that edge over time,
i. e., `e ( fˆe ) = (1/T ) ∑t=1
                            T
                                `e ( fet ) for all e.

Theorem 3.1. Suppose the latency functions are linear. Then for flows satisfying the regret-minimizing
assumption (2.7), for T ≥ Tε , the time-average flow fˆ is average-ε-Nash, i. e.,

                                       C( fˆ) ≤ ε + ∑ ai min ∑ `e ( fˆe ) .
                                                         i      P∈Pi e∈P


Proof. From the linearity of the latency functions, we have for all e, `e ( fˆe ) = (1/T ) ∑t=1
                                                                                            T
                                                                                                `e ( fet ). Since
      t    t
`e ( fe ) fe is a convex function of the flow, this implies

                                                             1 T
                                          `e ( fˆe ) fˆe ≤     ∑ `e ( fet ) fet .
                                                             T t=1

Summing over all e, we have

                         T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                                 186
                                                ROUTING W ITHOUT R EGRET


                 C( fˆ) ≤ T1 ∑t=1
                              T
                                  C( f t )
                        ≤ ε + ∑i ai minP∈Pi T1 ∑t=1
                                                T
                                                    ∑e∈P `e ( fet ) (by Assumption 2.7)
                        = ε + ∑i ai minP∈Pi ∑e∈P `e ( fˆe )         (by linearity.)


Corollary 3.2. Assume that all latency functions are linear. In network routing games, if all agents use,
for example, the algorithm of Kalai and Vempala [25], the time-averaged flow converges to an average-
ε-Nash equilibrium at Tε = O(mn(log n)/ε 2 ). On networks consisting of two nodes and m parallel links,
if all agents use optimized regret-minimizing algorithms (in particular, “combining expert advice”-style
algorithms with each edge an expert [28, 8, 20]), the time-averaged flow converges to an average-ε-
Nash equilibrium at Tε = O((log m)/ε 2 ).

    Note that we not only proved that the average flow approaches an average-ε-Nash equilibrium, but
by connecting the first and third steps of the proof, we also showed that the actual average cost incurred
by a user of type i is at most ε worse than the best path in Pi in the average flow.


4    Infinitesimal users: General latency functions
The case of general latency functions is more complicated because the proof above used a convexity
assumption in its first step and a linearity (concavity) assumption in the third step; the same argument
does not apply to general latency functions. In the case of general functions, the additive term in our
bounds depends on the maximum slope of any latency function.
                               √
Theorem 4.1. Let ε 0 = ε + 2 εL. (Recall, L = sn.) Then for flows satisfying the regret-minimizing
assumption (2.7), for general functions with maximum slope s, for T ≥ Tε , the time-average flow fˆ is
average-ε 0 -Nash, that is,
                                                      √
                              ∑ `e ( fˆe ) fˆe ≤ ε + 2 εL + ∑ ai min ∑ `e ( fˆe ) .
                                                                     P∈Pi e∈P
                                e∈E                              i

Before giving the proof, we list several quantities we will need to relate:

                                       ∑ `e ( fˆe ) fˆe      (cost of fˆ)                            (4.1)
                                       e∈E
                              1 T
                                ∑ ∑ `e ( fet ) fˆe
                              T t=1
                                                             (“cost of fˆ in hindsight”)             (4.2)
                                    e∈E
                              1 T
                                ∑ ∑ `e ( fet ) fet
                              T t=1
                                                             (avg cost of flows up to time T )       (4.3)
                                    e∈E
                                       1 T
                        min ∑ ∑ `e ( fet )
                   ∑ ai P∈P                                  (cost of best path in hindsight)        (4.4)
                    i        T i e∈P     t=1
                              min ∑ `e ( fˆe )
                         ∑ ai P∈P                            (cost of best path given fˆ)            (4.5)
                          i             i e∈P



                        T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                           187
                         AVRIM B LUM , E YAL E VEN -DAR , AND K ATRINA L IGETT

    Our goal in proving Theorem 4.1 is to show that quantity (4.1) is not too much greater than quantity
(4.5). We will prove this through a chain of inequalities in the following lemmas.

Lemma 4.2. For general latency functions,

                                 1 T                       1 T
                                   ∑   ∑   `e ( fet ) fˆe ≤ ∑ ∑ `e ( fet ) fet .
                                 T t=1 e∈E                 T t=1 e∈E

Proof. This is a direct application of Chebyshev’s sum inequality.

Lemma 4.3. For general latency functions with maximum slope s, for flows satisfying the regret-
minimizing assumption (2.7),

                                       1 T                                  √
                              min ∑      ∑   `e ( fet ) ≤ min ∑ `e ( fˆe ) + εL .
                              P∈Pi e∈P T t=1              P∈Pi e∈P


Proof. Since our latency functions are non-decreasing, we can apply Chebyshev’s sum inequality to
obtain for every e that,
                                   1 ˆ T              1 T
                                     fe ∑ `e ( fet ) ≤ ∑ `e ( fet ) fet .
                                   T t=1              T t=1
Now, notice that the right-hand side of the above inequality, summed over all edges, is precisely quantity
(4.3). By the regret-minimization property, this is at most ε larger than the time-average cost of the best
paths in hindsight, which in turn is clearly at most the time-average cost of fˆ. Therefore, we have:

                          1 T                          1 T
                            ∑ ∑ fˆe `e ( fet ) ≤
                          T t=1                          ∑ ∑ `e ( fet ) fet
                                                       T t=1
                                e∈E                          e∈E
                                                                          1 T
                                                ≤ ε + ∑ ai min ∑            ∑ `e ( fet )
                                                                 P∈Pi e∈P T t=1
                                                            i
                                                           1 T
                                                ≤ ε+         ∑ ∑ fˆe `e ( fet ).
                                                           T t=1 e∈E

That is, we have “sandwiched” the flow-average latency between the time-average latency and the time-
average latency plus ε. This implies that for every edge e, its time-average cost must be close to its
flow-average cost, namely,
                                              ∑ εe ≤ ε .
                                                 e∈E

We now use this fact, together with the assumption of bounded slope, to show that edge latencies cannot
be varying wildly over time. Define

                                         1 T                1 ˆ T
                                  εe =     ∑ e e e T fe ∑ `e ( fet ) .
                                         T t=1
                                               ` ( f t t
                                                      ) f −
                                                               t=1


                        T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                            188
                                               ROUTING W ITHOUT R EGRET

We can then use the fact that fˆe = (1/T ) ∑t=1
                                            T
                                                fet and so (1/T ) ∑t=1    T
                                                                              `e ( fˆe )( fet − fˆe ) = 0 to rewrite this
definition of εe as:
                                     1 T
                               εe = ∑ (`e ( fet ) − `e ( fˆe ))( fet − fˆe ) ≥ 0 .                                  (4.6)
                                     T t=1

    From the bound on the maximum slope of any latency function, we know that |`e ( fet ) − `e ( fˆe )| ≤
s| fet − fˆe | and thus,
                                                         q
                               |`e ( fet ) − `e ( fˆe )| =(`e ( fet ) − `e ( fˆe ))2
                                                         q
                                                          s `e ( fet ) − `e ( fˆe ) fet − fˆe
                                                                                             
                                                       ≤

for all e.
    We then get
                                                          √ T q
                           1 T         t                   s
                             ∑   `e ( fe ) − `e ( fˆe ) ≤    ∑ (`e ( fet ) − `e ( fˆe ))( fet − fˆe ) .
                           T t=1                          T t=1
Using equation (4.6) above and the fact the square root is a concave function, an application of the
Jensen inequality yields
                                  1 T                             √
                                     ∑   `e ( fet ) − `e ( fˆe ) ≤ sεe .                       (4.7)
                                  T t=1

    Finally, let Pi∗ be the best path of type i given fˆ. Summing equation (4.7)√over the edges in Pi∗ , and
                                                                      √
using an application of the Cauchy-Schwartz inequality to get ∑e∈Pi∗ sεe ≤ εL, we have

                                1 T         t          1 T                                  √
                       min ∑      ∑   ` (
                                       e ef   ) ≤  ∑     ∑   `e ( fet ) ≤ min ∑ `e ( fˆe ) + εL ,
                       P∈Pi e∈P T t=1             e∈P∗ T t=1              P∈Pi e∈P
                                                         i


as desired.

Lemma 4.4. For general latency functions with maximum slope s, for flows satisfying the regret-
minimizing assumption (2.7),

                                                            1 T                       √
                                       ∑   `e ( ˆ
                                                f e ) ˆ
                                                      f e ≤   ∑   ∑   `e ( fet ) fˆe + εL .
                                       e∈E                  T t=1 e∈E

Proof. Equation (4.7) above directly gives us
                                                                   √
                                                (4.1) ≤ ∑              sεe fˆe + (4.2) .
                                                             e∈E

An application of the Cauchy-Schwartz inequality then gives us
                                                       !2                      !              !
                                           √
                                          ∑ sεe fˆe           ≤        ∑ sεe         ∑ fˆe2       .
                                         e∈E                           e∈E           e∈E


                             T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                                    189
                          AVRIM B LUM , E YAL E VEN -DAR , AND K ATRINA L IGETT

Since fˆe ≤ 1 for all e, this is at most (∑e∈E sεe ) ∑e∈E fˆe . Since ∑e∈E fˆe ≤ n, this is at most εL, and thus
                                                             

                                                   √         √
                                               ∑ sεe fˆe ≤ εL ,
                                             e∈E

which gives the desired result.

    Given the above lemmas we now present the proof of Theorem 4.1.

Proof of Theorem 4.1. We can now piece together the following chain of inequalities:
                                             √
                                  (4.1) ≤      sεn + (4.2)
                                             √
                                         ≤     εL + (4.3)
                                                  √
                                         ≤ ε + εL + (4.4)
                                                   √
                                         ≤ ε + 2 εL + (4.5) .

The first line is from Lemma 4.4 and the second from Lemma 4.2. The third line is a consequence of the
regret-minimization property. Finally, the fourth line follows from Lemma 4.3.
                               √
Corollary 4.5. Let ε 0 = ε + 2 εL. Assume that all latency functions have maximum slope s. In general
routing games, if all agents use, for example, the algorithm of Kalai and Vempala [25], the average flow
converges to an average-ε 0 -Nash equilibrium at
                                                         3 2        
                                          mn log n          mn s log n
                                Tε = O               =O                  .
                                            ε2                  ε 04
On networks consisting of two nodes and m parallel links, if all agents use optimized regret-minimizing
algorithms (in particular “combining expert advice”-style algorithms [28, 8, 20]), the average flow
converges to an average-ε 0 -Nash equilibrium at
                                                     2 2         
                                           log m        n s log m
                                 Tε = O            =O                 .
                                             ε2              ε 04

    Once again we remark that not only have we proved that the average flow approaches average-ε 0 -
Nash equilibrium, but as an intermediate step in our proof we showed that actual average cost obtained
by the users is at most ε 0 worse than the best path in the average flow.


5    Infinitesimal users: Bounds on most timesteps
Here we present results applicable to general graphs and general functions showing that on most time
steps t, the flow f t will be at average-ε-Nash equilibrium.
Theorem 5.1. In routing games with general latency functions with maximum slope s, for flows satisfy-
ing the regret-minimizing assumption√    (2.7), for all but a (ms1/4 ε 1/4 ) fraction of time steps up to time T
for T ≥ Tε , f t is a average-(ε + 2 εL + 2m3/4 s1/4 ε 1/4 )-Nash flow. We can rewrite this as: for all but an
ε 0 fraction of time steps up to T for T ≥ Tε , f t is an average-ε 0 -Nash flow for ε = Ω ε 04 /(sm4 + s2 n2 ) .
                                                                                                               


                         T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                                 190
                                         ROUTING W ITHOUT R EGRET

Proof. Based on equation (4.6),
                                        √     1 T
                                         sεe ≥ ∑ |`e ( fet ) − `e ( fˆe )|
                                              T t=1
                                                         1/4
for all edges. Thus, for all edges, for all but s1/4 εe        of the time steps,
                                               1/4
                                         s1/4 εe     ≥ |`e ( fet ) − `e ( fˆe )| .
     Using a union bound over edges, this implies that on all but a ms1/4 ε 1/4 fraction of the time steps,
all edges have
                                           1/4
                                     s1/4 εe ≥ |`e ( fet ) − `e ( fˆe )| .
From this, it follows directly that on most time steps, the cost of the best path given f t differs from the
cost of the best path given fˆ by at most n3/4 s1/4 ε 1/4 . Also on most time steps, the cost incurred by
      t differs from the cost incurred by flow fˆ by at most m3/4 s1/4 ε 1/4 . Thus since fˆ is an average-
flow f√                                                 √
(ε + 2 εL)-Nash equilibrium, f t is an average-(ε + 2 εL + 2m3/4 s1/4 ε 1/4 )-Nash equilibrium on all but
a ms1/4 ε 1/4 fraction of time steps.
Corollary 5.2. In general routing games with general latency functions with maximum slope s, for all
                                                                                                 T
but a (ms1/4 ε 1/4 ) fraction of time steps      time T = Tε , the expected average cost (1/T ) ∑t=1
                                        √ up to 3/4                                                  ct
                                                    1/4 1/4
incurred by any user is at most (ε + 2 εL + m s ε ) worse than the cost of the best path on that
time step.
Proof. From the proof of Theorem 5.1 we see that on most days, the cost of the best path        √given the flow
for that day is within m3/4 s1/4 ε 1/4 of the cost of the best path given fˆ, which is at most 2 εL worse than
the cost of the best path in hindsight. Combining this with the regret-minimization property achieved by
each user gives the desired result.
     This demonstrates that regret-minimizing algorithms are a reasonable, stable response in a network
setting: if a player knows that all other players are using regret-minimizing algorithms, there is no
strategy that will significantly improve her expected cost on more than a small fraction of days. By
using a regret-minimizing algorithm, she gets the guarantee that on most time steps her expected cost is
within some epsilon of the cost of the best path given the flow for that day.


6    Regret minimization and the price of anarchy
In this section, we relate the costs incurred by regret-minimizing players in a congestion game to the cost
of the social optimum. We approach this problem in two ways: First, we give an argument paralleling
that of Roughgarden and Tardos [35] that directly relates the costs of regret-minimizing users to the cost
of the social optimum. In our second result in this section, we show that any average-ε-Nash equilibrium
in a congestion game is closely related to a true equilibrium in a related congestion game. This is an
interesting property of approximate equilibria in their own right, and further allows us to apply Price of
Anarchy results for the congestion game to the regret-minimizing players in the original game.
    We can directly characterize the costs incurred by regret-minimizing players by an argument that
parallels the Price of Anarchy proofs of Roughgarden and Tardos [35].

                         T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                               191
                           AVRIM B LUM , E YAL E VEN -DAR , AND K ATRINA L IGETT

Definition 6.1. Let L be the set of cost functions used by a nonatomic congestion game, with all `(ξ )ξ
convex on [0, ∞). For a nonzero cost function ` ∈ L, we define α(`) by

                                          α(`) =         sup      [λ µ + (1 − λ )]−1
                                                     n>0:`(n)>0

where `∗e (ξ ) = `e (ξ ) + ξ · `0e (ξ ), λ ∈ [0, 1], (the marginal social cost) satisfies `∗ (λ n) = `(n), and µ =
`(λ n)/`(n) ∈ [0, 1]. We define α(L) by

                                                    α(L) = sup α(`) .
                                                                06=`∈L

Theorem 6.2. If Γ is a nonatomic congestion game with cost functions L with all `(ξ )ξ convex on
[0, ∞), then the ratio of the costs incurred by regret-minimizing players to the cost of the global optimum
flow is at most α(L) (which is the Price of Anarchy bound given by Roughgarden and Tardos [35]),
once the players achieve the regret-minimizing assumption.
Proof. Let f ∗ be an optimal action distribution and f1 , . . . , fT be a sequence of action distributions
obtained by regret-minimizing players. We can lower bound the optimum social cost using a linear
approximation of the function `e (ξ )ξ at the point λet fet , where λet ∈ [0, 1] solves `∗e (λet fet ) = `e ( fet ):
                                                                           Z f∗
                                                                               e
                                 `e ( fe∗ ) fe∗ = `e (λet fet )λet fet +              `∗e ( f ) dx
                                                                            λet fet
                                              ≥ `e (λet fet )λet fet + ( fe∗ − λet fet )`∗e (λet fet )
                                              = `e (λet fet )λet fet + ( fe∗ − λet fet )`e ( fet )

for all edges and time steps, and thus

                                          1 T
                           C( f ∗ ) ≥       ∑ ∑ [`e (λet fet )λet fet + ( fe∗ − λet fet )`e ( fet )] .
                                          T t=1 e∈E

We can rewrite this as
                                 1 T
                    C( f ∗ ) ≥     ∑ ∑ [µet λet fet + (1 − λet ) fet ]`e ( fet ) + ∑ [ fe∗ − fet ]`e ( fet ) ,
                                 T t=1 e∈E                                        e∈E

where µet = `e (λet fet )/`e ( fet ). By the regret minimization property,

                                 1 T         t       t                  1 T
                                   ∑∑ e e e
                                 T t=1
                                           f   ` ( f   ) ≤ ε + ∑ i P∈Pi T ∑ ∑ `e ( fet )
                                                                 a min
                                       e∈E                     i          t=1 e∈E

and thus
                                    1 T                           1 T
                                      ∑   ∑   fet `e ( fet ) ≤ ε + ∑ ∑ fe∗ `e ( fet ) ,
                                    T t=1 e∈E                     T t=1 e∈E
which gives us
                                                  1 T
                             C( f ∗ ) + ε ≥         ∑ ∑ [µet λet fet + (1 − λet ) fet ]`e ( fet ) .
                                                  T t=1 e∈E


                          T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                                    192
                                               ROUTING W ITHOUT R EGRET

    By definition, µet λet + (1 − λet ) ≥ 1/α(L) for each e and t, so µet λet fet + (1 − λet ) fet ]`e ( fet ) and `e ( fet ) fet
differ by at most a multiplicative α(L) factor for every e and t. This gives us

                                                    1 1 T                          C(x)
                                   C(x∗ ) + ε ≥           ∑   ∑   `e ( fet ) fet =      ,
                                                   α(L) T t=1 e∈E                  α(L)

as desired.

     Subsequent work of Christodoulou et al. [9] also studies the Price of Anarchy of approximate equi-
libria in congestion games. Their definition of approximate equilibrium is stronger than ours (though
with a multiplicative rather than additive definition of the players’ incentive to deviate), but their results
apply only to linear congestion games.
     We can also analyze the performance of regret-minimizing players indirectly, by gaining better un-
derstanding of flows at average-ε-Nash equilibrium. In the proof of this theorem, we need to consider
costs of different flows on a few different congestion games; we will use the notation C( f on Γ) to
denote the cost C( f ) of a flow (or partial, non-unit-sized flow) f in the congestion game Γ.

Theorem 6.3. If f is an average-ε-Nash equilibrium flow for a nonatomic congestion game Γ, then
                                                            √      √
                           C( f on Γ) ≤ ρ C(OPT ) + (L + 1) ε + n ε ,

where L = sn, OPT is the minimum cost flow, and ρ is the Price of Anarchy in a related congestion game
with the same class of latency functions as Γ but with additive offsets.

     For example, Theorem 6.3 implies that for linear latency functions
                                                                   √     of slope less
                                                                                   √ than or equal to one,
an average-ε-Nash flow f will have cost at most 4/3(C(OPT ) + ε(n + 1)) + n ε. Note that for regret
minimizing players, Theorem 6.2 improves this to (4/3)C(OPT ) + ε.
     The proof idea for this theorem is as follows: For every nonatomic congestion game Γ and flow f at
average-ε-Nash equilibrium on Γ, there exists a nonatomic congestion game Γ0 that approximates Γ and
a flow f 0 that approximates f such that: (a) f 0 is an equilibrium flow on Γ0 , (b) the cost of f 0 on Γ0 is
close to the cost of f on Γ, and (c) the cost of the optimal flow on Γ0 is close to the cost of the optimal
flow on Γ. These approximations allow one to apply Price of Anarchy results from f 0 and Γ0 to f and Γ.
                                                                                   √
Proof. Note that since f is at average-ε-Nash
                                √                equilibrium on Γ, then at most a ε fraction of users are
experiencing costs more than ε worse than the cost of their best path given f . We can modify Γ to
Γ2 to embed the costs associated with these “meandering” users such that the costs experienced by the
remaining users do not change. For example, a latency function `(x) associated with an edge with a flow
of δ meandering users can be modified to√the function `0 (x) = `(x + δ ) for x ≤ 1 − δ and `0 (x) = `(1) for
x > 1 − δ . Call                                                                 0
               √ the remaining η ≥ (1 − ε) flow of non-meandering users f , all of whom          experience
                                                                                                       √
costs at most ε worse than the cost of their best√path given f . Then C( f on Γ) ≤ C( f 0 on Γ2 ) + εn,
since the flow of the meandering users is at most ε and the cost of each path is at most n.
     We now construct an alternate congestion game Γ0 (not necessarily a routing game, even if the
original game was a routing game) such that f 0 interpreted on Γ0 is at equilibrium. To do this, we create
a new edge for every allowable path for each commodity. We can now assign costs to these new “entry
edges” to cause the minimum cost of any available path for each commodity to be equal to the cost of

                             T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                                            193
                            AVRIM B LUM , E YAL E VEN -DAR , AND K ATRINA L IGETT

the worst flow-carrying path for that commodity
                                            √        in f 0 on Γ2 . The maximum cost we need to assign√to
any entry edge in order to achieve this is ε, since we already removed all users paying more than ε
plus the cost of the best path available to them. Thus C( f 0 on Γ2 ) ≤ C( f 0 interpreted on Γ0 ), so we have
                                                                          √
                               C( f on Γ) ≤ C( f 0 interpreted on Γ0 ) + n ε .

     Define ρ to be the Price of Anarchy of the new congestion game Γ0 when played with up to one unit
of flow. Thus, defining OPTα (H) to be the min-cost flow of size α in game H, we have
                                                                   √
                                  C( f on Γ) ≤ ρ C(OPTη (Γ0 )) + n ε .
                                                              

                         √
Since we added at most ε to the cost of any solution in going from Γ2 to Γ0 and OPTη (Γ2 ) is the
min-cost flow of size η on Γ2 ,
                                                          √     √
                            C( f on Γ) ≤ ρ C(OPTη (Γ2 )) + ε + n ε ,

    We now must quantify the amount by which the   √ cost of OPTη on Γ2 could exceed the cost of OPT1
on Γ. Since the cost of any edge in Γ2 is at most s ε more than the cost of that edge in Γ, this gives
                                                                √    √
                               C( f on Γ) ≤ ρ C(OPT ) + (sn + 1) ε + n ε .



   In particular, when all latency functions are linear, we can apply results of Roughgarden and Tardos
bounding the price of anarchy in a congestion game with linear latency functions by 4/3 [35].


7    Discrete users: Parallel paths
In contrast with the previous sections, we now consider discrete users, where we denote the ith user
weight as wi . Without loss of generality, we assume that the weights are normalized such that ∑ni=1 wi =
1. We limit ourselves in this section to the single-commodity version of the parallel paths routing game
model and to functions with latency equal to the load, that is, for a path e we have `e = fe . For each user
i, we let the latency excluding her own path e at time t be `e ( fet \ i) and her average latency on path e be
`e ( fˆe \ i) = (1/T ) ∑t=1
                        T
                            `e ( fet \ i), where fet \ i = fet if user i is not routing on path e and fet \ i = fet − wi
otherwise. We always exclude the ith player from the latency function, since the ith player always pays
for its weight.
       Next we observe that at time t, there always exists a path with load at most the average load.

Observation 7.1. At any time step t, for every user i, there exists a path e such that

                                                                1 − wi
                                                `e ( fˆe \ i) ≤        .
                                                                  m
    The following theorem differs from other theorems in the paper in the sense that it is an expectation
result and holds for every user.

                           T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                                      194
                                        ROUTING W ITHOUT R EGRET

Theorem 7.2. Consider the parallel paths model, with latency functions such that the latency equals
the load. Assume that each discrete user i uses an optimized “combining expert advice”-style algorithm
with each edge an expert [28, 8, 20]. Then for all users, for all T ≥ O((log m)/ε 2 ),

                                    1 T                           1 − wi
                                      ∑   Ee∼qt [`e ( fet \ i)] ≤        +ε ,
                                    T t=1                           m

where qt is the distribution over the m paths output by the best expert algorithm at time t.
Proof. By Observation 7.1 we have that there exists a path with average cost at most (1 − wi )/m. Since
user i is using an optimized “combining expert advice”-style algorithm and the maximal latency is 1, we
have that
                                                                             r
                           1 T                                                 log m
                              ∑  Ee∼qt [`e ( fet \ i)] ≤ min `e ( fˆe \ i) +
                           T t=1                         e∈E                     T
                                                                      r
                                                         1 − wi          log m
                                                       ≤         +
                                                           m               T
                                                         1 − wi
                                                       ≤         +ε
                                                           m
where the last inequality holds for T ≥ O((log m)/ε 2 ).

    Consider an instance of this model where every user plays uniformly at random. The resulting flow
is clearly at equilibrium, and the expected latency for the ith player is (1 − wi )/m excluding its own
weight. We thus have shown that the expected latency experienced by each user i is at most ε worse than
this equilibrium latency.


8    Conclusions
In this paper, we consider the question: if each player in a routing game (or more general congestion
game) uses a regret-minimizing strategy, will behavior converge to an equilibrium, and under what con-
ditions and in what sense? Our main result is that in the setting of multicommodity flow and infinitesimal
agents, a 1 − ε fraction of the daily flows are at average-ε-Nash equilibrium for ε approaching 0 at a
rate that depends polynomially on the players’ regret bounds and the maximum slope of any latency
function. Moreover, we show the dependence on slope is necessary.
    Even for the case of reasonable (bounded) slopes, however, our bounds for general nonlinear la-
tencies are substantially worse than our bounds for the linear case. For instance if agents are run-
ning the Kalai-Vempala algorithm [25], we get a bound of O(mn(log n)/ε 2 ) on the number of time
steps needed for the time-average flow to reach an average-ε-Nash equilibrium in the linear case, but
O(mn3 (log n)/ε 4 ) for general latencies. We do not know if these bounds in the general case can be im-
proved. In addition, our bounds on the daily flows lose additional polynomial factors which we suspect
are not tight.
    We also show that Price of Anarchy results can be applied to regret-minimizing players in routing
games, that is, that existing results analyzing the quality of equilibria can also be applied to the results of

                         T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                               195
                        AVRIM B LUM , E YAL E VEN -DAR , AND K ATRINA L IGETT

regret-minimizing behavior. Recent work [7] shows that in fact Price of Anarchy results can be extended
to cover regret-minimizing behavior in a wide variety of games, including many for which this behavior
may not approach equilibria and where equilibria may be hard to find.


References
 [1] JACOB A BERNETHY, E LAD H AZAN , AND A LEXANDER R AKHLIN: Competing in the dark: An
     efficient algorithm for bandit linear optimization. In Proc. 21st Ann. Conf. Learning Theory
     (COLT). Springer, 2008. http://colt2008.cs.helsinki.fi/papers/123-Abernethy.pdf. 185

 [2] BARUCH AWERBUCH , YOSSI A ZAR , A MIR E PSTEIN , VAHAB S EYED M IRROKNI , AND
     A LEXANDER S KOPALIK: Fast convergence to nearly optimal solutions in potential games.
     In Proc. 9th ACM Conf. Electronic Commerce (EC), pp. 264–273. ACM Press, 2008.
     [doi:10.1145/1386790.1386832]. 183

 [3] BARUCH AWERBUCH AND ROBERT D. K LEINBERG: Adaptive routing with end-to-end feedback:
     Distributed learning and geometric approaches. In Proc. 36th STOC, pp. 45–53. ACM Press, 2004.
     [doi:10.1145/1007352.1007367]. 180, 182, 185

 [4] YOSSI A ZAR: Online Algorithms: The State of the Art, chapter “On-line load balancing”, pp.
     178–195. Springer, 1998. 181

 [5] M. B ECKMANN , C. B. M C G UIRE , AND C. B. W INSTEN: Studies in the Economics of Trans-
     portation. Yale University Press, 1956. 184

 [6] P ETRA B ERENBRINK , T OM F RIEDETZKY, L ESLIE A NN G OLDBERG , PAUL W. G OLDBERG ,
     Z ENGJIAN H U , AND RUSSELL M ARTIN: Distributed selfish load balancing. SIAM J. Comput.,
     37(4):1163–1181, 2007. [doi:10.1137/060660345]. 183

 [7] AVRIM B LUM , M OHAMMAD TAGHI H AJIAGHAYI , K ATRINA L IGETT, AND A ARON ROTH: Re-
     gret minimization and the price of total anarchy. In Proc. 40th STOC, pp. 373–382. ACM Press,
     2008. [doi:10.1145/1374376.1374430]. 183, 196

 [8] N ICOL Ò C ESA -B IANCHI , YOAV F REUND , DAVID H AUSSLER , DAVID P. H ELMBOLD ,
     ROBERT E. S CHAPIRE , AND M ANFRED K. WARMUTH: How to use expert advice. J. ACM,
     44(3):427–485, 1997. [doi:10.1145/258128.258179]. 180, 185, 187, 190, 195

 [9] G EORGE C HRISTODOULOU , E LIAS KOUTSOUPIAS , AND PAUL G. S PIRAKIS: On the perfor-
     mance of approximate equilibria in congestion games. In Proc. 17th European Symp. Algorithms
     (ESA’09), pp. 251–262. Springer, 2009. [doi:10.1007/978-3-642-04128-0 22]. 193

[10] A RTUR C ZUMAJ , P IOTR K RYSTA , AND B ERTHOLD V ÖCKING: Selfish traffic allocation for
     server farms. SIAM J. Comput., 39:1957–1987, 2010. [doi:10.1137/070693862]. 180

[11] A RTUR C ZUMAJ AND B ERTHOLD V ÖCKING: Tight bounds for worst-case equilibria. ACM Trans.
     Algorithms, 3(1):4, 2007. [doi:10.1145/1186810.1186814]. 180

                       T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                         196
                                    ROUTING W ITHOUT R EGRET

[12] E YAL E VEN -DAR , A LEX K ESSELMAN , AND Y ISHAY M ANSOUR: Convergence time to Nash
     equilibria. In Proc. 30th Intern. Colloq. Autom. Lang. Program. (ICALP), number 2719 in LNCS,
     pp. 193–193. Springer, 2003. [doi:10.1007/3-540-45061-0 41]. 183

[13] E YAL E VEN -DAR AND Y ISHAY M ANSOUR: Fast convergence of selfish rerouting. In Proc.
     16th Ann. ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 772–781. ACM Press, 2005.
     [ACM:1070432.1070541]. 183

[14] E YAL E VEN -DAR , Y ISHAY M ANSOUR , AND U RI NADAV: On the convergence of regret min-
     imization dynamics in concave games. In Proc. 41st STOC, pp. 523–532. ACM Press, 2009.
     [doi:10.1145/1536414.1536486]. 183

[15] A LEX FABRIKANT, A NKUR L UTHRA , E LITZA M ANEVA , C HRISTOS H. PAPADIMITRIOU , AND
     S COTT S HENKER: On a network creation game. In Proc. 22nd Symp. Principles of Distrib. Com-
     put. (PODC), pp. 347–351. ACM Press, 2003. [doi:10.1145/872035.872088]. 180

[16] S IMON F ISCHER , H ARALD R ÄCKE , AND B ERTHOLD V ÖCKING: Fast convergence to Wardrop
     equilibria by adaptive sampling methods. In Proc. 38th STOC, pp. 653–662. ACM Press, 2006.
     [doi:10.1145/1132516.1132608]. 182, 185

[17] S IMON F ISCHER AND B ERTHOLD V ÖCKING: On the evolution of selfish routing. In Proc.
     12th European Symp. Algorithms (ESA), number 3221 in LNCS, pp. 323–334. Springer, 2004.
     [doi:10.1007/978-3-540-30140-0 30]. 182

[18] S IMON F ISCHER AND B ERTHOLD V ÖCKING: Adaptive routing with stale information. Theoreti-
     cal Computer Science, 410(36):3357–3371, 2009. [doi:10.1016/j.tcs.2008.01.055]. 182

[19] D EAN P. F OSTER AND R AKESH V. VOHRA: Calibrated learning and correlated equilibrium.
     Games Econom. Behav., 21(1–2):40–55, 1997. [doi:10.1006/game.1997.0595]. 182

[20] YOAV F REUND AND ROBERT E. S CHAPIRE: A decision-theoretic generalization of on-line
     learning and an application to boosting. J. Comput. System Sci., 55(1):119–139, 1997.
     [doi:10.1006/jcss.1997.1504]. 180, 185, 187, 190, 195

[21] YOAV F REUND AND ROBERT E. S CHAPIRE: Adaptive game playing using multiplicative weights.
     Games Econom. Behav., 29:79–103, 1999. [doi:10.1006/game.1999.0738]. 182

[22] PAUL W. G OLDBERG: Bounds for the convergence rate of randomized local search in a mul-
     tiplayer load-balancing game. In Proc. 23rd Ann. ACM Symp. Principles of Distrib. Comput.
     (PODC), pp. 131–140. ACM Press, 2004. [doi:10.1145/1011767.1011787]. 183

[23] J.F. H ANNAN: Approximation to Bayes risk in repeated play. In M. D RESHER , A.W. T UCKER ,
     AND P. W OLFE, editors, Contributions to the Theory of Games, volume III, pp. 97–139. Princeton
     University Press, 1957. 180

[24] S ERGIU H ART AND A NDREU M AS -C OLELL: A simple adaptive procedure leading to correlated
     equilibrium. Econometrica, 68(5):1127–1150, 2000. [doi:10.1111/1468-0262.00153]. 182

                      T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                       197
                       AVRIM B LUM , E YAL E VEN -DAR , AND K ATRINA L IGETT

[25] A DAM K ALAI AND S ANTOSH V EMPALA: Efficient algorithms for online decision problems. J.
     Comput. System Sci., 71(3):291–307, 2005. [doi:10.1016/j.jcss.2004.10.016]. 180, 185, 187, 190,
     195

[26] ROBERT K LEINBERG , G EORGIOS P ILLOURAS , AND É VA TARDOS: Multiplicative updates out-
     perform generic no-regret learning in congestion games. In Proc. 41st STOC. ACM Press, 2009.
     [doi:10.1145/1536414.1536487]. 183

[27] E LIAS KOUTSOUPIAS AND C HRISTOS PAPADIMITRIOU: Worst-case equilibria. Comput. Sci.
     Rev., 3(2):65–69, 2009. [doi:10.1016/j.cosrev.2009.04.003]. 180

[28] N. L ITTLESTONE AND M. K. WARMUTH: The weighted majority algorithm. Inform. and Com-
     put., 108(2):212–261, 1994. [doi:10.1006/inco.1994.1009]. 180, 185, 187, 190, 195

[29] H. B RENDAN M C M AHAN AND AVRIM B LUM: Online geometric optimization in the bandit set-
     ting against an adaptive adversary. In Proc. 17th Ann. Conf. Learning Theory (COLT), number
     3120 in LNCS, pp. 109–123. Springer, 2004. [Springer:38p3f4h61cgy7gg7]. 180, 182, 185

[30] I GAL M ILCHTAICH: Congestion games with player-specific payoff functions. Games Econom.
     Behav., 13(1):111–124, 1996. [doi:10.1006/game.1996.0027]. 183

[31] I GAL M ILCHTAICH: Generic uniqueness of equilibrium in large crowding games. Math. Oper.
     Res., 25(3):349–364, 2000. [doi:10.1287/moor.25.3.349.12220]. 184

[32] A BRAHAM N EYMAN: Correlated equilibrium and potential games. Internat. J. Game Theory,
     26(2):223–227, 1997. [doi:10.1007/BF01295851]. 182

[33] T IM ROUGHGARDEN: Intrinsic robustness of the price of anarchy. In Proc. 41st STOC, pp. 513–
     522. ACM Press, 2009. [doi:10.1145/1536414.1536485]. 183

[34] T IM ROUGHGARDEN AND É VA TARDOS: How bad is selfish routing? J. ACM, 49(2):236–259,
     2002. [doi:10.1145/506147.506153]. 180

[35] T IM ROUGHGARDEN AND É VA TARDOS:                Bounding the inefficiency of equilib-
     ria in nonatomic congestion games.          Games Econom. Behav., 47(2):389–403, 2004.
     [doi:10.1016/j.geb.2003.06.004]. 191, 192, 194

[36] W ILLIAM H. S ANDHOLM: Potential games with continuous player sets. J. Econom. Theory,
     97(1):81–108, 2001. [doi:10.1006/jeth.2000.2696]. 182

[37] DAVID S CHMEIDLER: Equilibrium points of nonatomic games. J. Stat. Phys., 7(4):295–300, 1973.
     [doi:10.1007/BF01014905]. 184

[38] E IJI TAKIMOTO AND M ANFRED K. WARMUTH: Path kernels and multiplicative updates. J. Mach.
     Learn. Res., 4:818, 2003. http://mlr.csail.mit.edu/papers/volume4/takimoto03a/takimoto03a.pdf.
     180

                      T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                       198
                                    ROUTING W ITHOUT R EGRET

[39] M ARTIN Z INKEVICH: Online convex programming and generalized infinitesimal gradient ascent.
     In Proc. 20th Intern. Conf. Mach. Learn. (ICML), pp. 928–936. AAAI Press, 2003. 180
[40] M ARTIN Z INKEVICH: Theoretical guarantees for algorithms in multi-agent settings. Technical
     Report CMU-CS-04-161, Carnegie Mellon University, 2004. 182

AUTHORS
     Avrim Blum
     Professor
     Department of Computer Science
     Carnegie Mellon University
     Pittsburgh, PA
     avrim cs cmu edu
     http://www.cs.cmu.edu/∼avrim

     Eyal Even-Dar
     Google Research
     76 9th Avenue, New York, NY
     evendar google com
     http://sites.google.com/site/evendareyal/

     Katrina Ligett
     Postdoctoral Associate
     Department of Computer Science
     Cornell University
     Ithaca, NY
     katrina cs cornell edu
     http://www.cs.cornell.edu/∼katrina

ABOUT THE AUTHORS
     AVRIM B LUM received his Ph. D. at MIT in 1991 under Ron Rivest. His main research
       interests are machine learning theory, approximation algorithms, on-line algorithms,
       algorithmic game theory, and mechanism design.

     E YAL E VEN -DAR graduated from Tel Aviv University in 2005 under the supervision of
        Yishay Mansour. After two years of postdoctoral research at the University of Penn-
        sylvania under the supervision of Michael Kearns, he joined Google Research NY for
        three years. His research interests include learning theory and algorithmic game theory.

     K ATRINA L IGETT graduated from Carnegie Mellon in 2009; her advisor was Avrim Blum.
        Her interests include online learning, algorithmic game theory, and data privacy.


                      T HEORY OF C OMPUTING, Volume 6 (2010), pp. 179–199                          199