Minimizing Maximum Flow-Time on Related Machines

Authors Nikhil Bansal, Bouke Cloostermans,
Plaintext
                          T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14
                                       www.theoryofcomputing.org

                          S PECIAL ISSUE : APPROX-RANDOM 2015



          Minimizing Maximum Flow-Time on
                  Related Machines
                             Nikhil Bansal∗                    Bouke Cloostermans
              Received November 27, 2015; Revised June 28, 2016; Published September 14, 2016




       Abstract: We consider the online problem of minimizing the maximum flow-time on related
       machines. This is a natural generalization of the extensively studied makespan minimization
       problem to the setting where jobs arrive over time. Interestingly, natural algorithms such
       as Greedy or Slow-fit that work for the simpler identical machines case or for makespan
       minimization on related machines, are not O(1)-competitive. Our main result is a new
       O(1)-competitive algorithm for the problem. Previously, O(1)-competitive algorithms were
       known only with resource augmentation, and in fact no O(1)-approximation was known
       even in the offline model.

ACM Classification: F.2.2
AMS Classification: 68W25
Key words and phrases: related machines scheduling, maximum flow-time minimization, online
algorithm, approximation algorithm

1     Introduction
Scheduling a set of jobs on machines to optimize some quality of service measure is one of the most
well studied problems in computer science. A very natural measure of service received by a job is the
flow-time, defined as the amount of time the job spends in the system. In particular, if a job j arriving at
     A preliminary version of this paper appeared in the Proceedings of the 18th Internat. Workshop on Approximation
Algorithms for Combinatorial Optimization Problems (APPROX’15), 2015, pp. 85–95 [7].
   ∗ Supported by NWO grant 639.022.211 and ERC consolidator grant 617951.



© 2016 Nikhil Bansal and Bouke Cloostermans
c b Licensed under a Creative Commons Attribution License (CC-BY)                       DOI: 10.4086/toc.2016.v012a014
                                   N IKHIL BANSAL AND B OUKE C LOOSTERMANS

time r j completes its processing at time C j , then its flow-time Fj is defined as C j − r j , i. e., its completion
time minus its arrival time. Over the last few years, several variants of flow-time related problems have
received a lot of attention: on single and multiple machines, in online or offline setting, with or without
resource augmentation, in weighted or unweighted setting and so on. Also different objectives have been
studied, such as total flow-time, ` p norms of flow-time and stretch. We refer the reader to [16, 13, 12, 4]
for a survey of some of these results.
    In this paper we focus on the objective of minimizing the maximum flow-time. This is desirable
when we want to guarantee that each job has a small delay. Maximum flow-time is also a very natural
generalization of the minimum makespan or the load-balancing problem, that has been studied extensively
(see, e. g., [5, 9, 1] for a survey). In particular, if all jobs have identical release times, then the maximum
flow-time value is precisely equal to the makespan. Minimizing the maximum flow-time is also related
to deadline scheduling problems. In particular, the maximum flow-time is at most D if and only if each
job j completes by r j + D. Moreover, note that arbitrary deadlines d j can be modeled by considering the
weighted version of maximum flow-time.1

Known results for maximum flow-time. For a single machine, it is easy to see that First In First Out
(FIFO) is an optimal (online) algorithm for minimizing the maximum flow-time. For identical multiple
machines, Mastrolilli [15] showed that the G REEDY algorithm, which dispatches the incoming job to the
least loaded machine is 3 − 2/m-competitive, where m is the number of machines. They also showed that
this bound is tight for G REEDY. If jobs can be preempted and migrated (moved from one machine to
another), [2] gave a 2-competitive algorithm.
     A systematic investigation of the problem for various machine models was initiated recently by Anand
et al. [3]. In the related machines model each machine i has speed si , and processing job j on machine
i takes pi j = p j /si units of time, where p j is the size of job j. In the more general unrelated machines
model, pi j can be completely arbitrary.
     Among other results, Anand et al. [3] gave a (1 + ε)-speed O(1/ε)-competitive algorithm for the
unrelated machine case, for any given ε > 0. Here the online algorithm can process 1 + ε units of
volume per time step, but is compared to an offline optimum that does not have this extra resource
augmentation [14, 16]. They also showed that in the unrelated setting any online algorithm without
resource augmentation must be Ω(m) competitive. For the weighted maximum flow-time objective, they
gave a (1 + ε)-speed, O(1/ε 3 )-competitive algorithm for the related machines setting, and showed that
no O(1)-speed, O(1)-competitive algorithm exists in the unrelated setting.
     A natural question that remains is the complexity of the problem for related machines. Is there an
O(1)-competitive algorithm for the related machines setting, without using resource augmentation or
migration?
     This question is particularly intriguing as it is not at all clear what the right algorithm should be [2]. In
fact, no O(1)-approximation is known even in the offline model. One issue is that the natural S LOW- FIT
algorithm (described in the following section), that is O(1)-competitive for makespan minimization (even
when the jobs are temporary and have unknown durations [6]), is not O(1)-competitive for maximum
flow-time (Lemma 2.1 below). The algorithm of [3] for weighted maximum flow-time with resource
   1 In deadline scheduling however, the deadlines are typically considered fixed and the focus is on maximizing the throughput,

that is, maximizing the number of deadlines met.


                           T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                                              2
                      M INIMIZING M AXIMUM F LOW-T IME ON R ELATED M ACHINES

augmentation is also a variant of S LOW- FIT. Recently, [8] obtained an O(log n) approximation for
minimizing maximum flow-time on unrelated machines, where n is the number of machines. However
their techniques do not seem to give anything better for the related machines setting either.
    Our main result is the following.

Theorem 1.1. There is a 13.5-competitive algorithm, D OUBLE - FIT, for minimizing maximum flow-time
on related machines.

    This also gives the first O(1)-approximation for the offline problem. We also show that no such result
is possible in the weighted case (without resource augmentation), and give an Ω(W ) lower bound on the
competitive ratio where W is the maximum to minimum weight ratio.

High-level approach. There are two competing trade-offs while scheduling on related machines. On
one hand the algorithm should keep as many machines busy as possible, otherwise load might accumulate
and delay future jobs. We could end up with a large backlog that is impossible to get rid of without
resource augmentation. On the other hand, the algorithm should keep fast machines empty for processing
large jobs that might arrive later. In particular, fast machines are a scarce resource that should not be
wasted on processing small jobs unnecessarily. It is instructive to consider the lower bounds in Section 2,
where both S LOW- FIT and G REEDY are shown to perform badly due to these opposite reasons.
    To get around this, we design an algorithm that combines the good properties of both S LOW- FIT
and G REEDY. In particular, the algorithm uses a two-phase strategy while assigning jobs to machines
at each step. First, the jobs are spread out to ensure that machines are busy as much as possible. Once
machines are saturated, the algorithm shifts into a Slow-fit mode, which ensures that small jobs do not
unnecessarily go on fast machines.
    The key difficulty in the analysis is to control how the two phases interact with each other. To do this,
we maintain two invariants that capture the dynamics of the algorithm, and control how much the online
algorithm’s load on a subset of machines deviates from the offline algorithm’s load on those machines.
The main part of the argument is to show inductively that these invariants are maintained over time.

Notation and formal problem description. There are m machines indexed by non-decreasing order
of speeds s1 ≤ s2 ≤ · · · ≤ sm . The processing requirement of job j is p j , and it requires time p j /si on
machine i. We will call p j the size of j, and p j /si its load on machine i. Jobs arrive online over time
and p j is known immediately upon its release time r j . The goal is to find a schedule that minimizes the
maximum flow-time, and we assume that a job cannot be migrated from one machine to another. We use
Opt to denote some fixed optimum offline schedule, and also to denote the value of this solution.


2    Lower bounds on S LOW- FIT and G REEDY
S LOW- FIT. Algorithm S LOW- FIT takes as input a threshold Fopt (the current guess on optimum), and
dispatches every incoming job to the slowest possible machine while keeping the load below Fopt . If the
jobs cannot be feasibly scheduled on any machine, the algorithm fails and the threshold is doubled.

Lemma 2.1. S LOW- FIT has a competitive ratio of Ω(m).

                       T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                               3
                                N IKHIL BANSAL AND B OUKE C LOOSTERMANS

Proof. We describe an instance where the threshold Fopt keeps doubling until Fopt > m even though
Opt = 2.
    There are m identical machines (but we arbitrarily order them from slow to fast). Next, we assume
that Fopt ≥ 2, which can be achieved by giving 2m unit-size jobs initially at t = 0.
    The instance consists of two alternating phases: a buildup phase, and an overflow phase. In the
buildup phase we build up load on slow machines to Fopt − 1. In the overflow phase we release a burst of
jobs, causing Fopt to double. This happens until Fopt exceeds m.
    We start with the buildup phase. At each time step t ≥ 2, m unit-size jobs arrive. S LOW- FIT will
keep machine m empty until all machines 1, . . . , m − 1 have load Fopt . This means at most m − 1 jobs
get processed until some time t0 at which all machines 1, . . . , m − 1 have load Fopt . Once these machines
have high load we move to the overflow phase. At time t0 + 1, 2m unit-size jobs arrive. As there is at
most m − 1 + Fopt total space available below Fopt , these jobs cannot be scheduled feasibly, causing Fopt
to double. No jobs arrive at time t0 + 2 (which will allow Opt to clear its machines). At time t0 + 3
we continue with the next buildup phase. This process continues until Fopt ≥ m + 1, so that there is
m − 1 + Fopt ≥ 2m space available for the overflow phase.
    We claim that Opt has a maximum flow-time of 2. This value can be achieved by distributing the
incoming jobs equally over all machines (i. e., it does G REEDY). During the buildup phase every machine
receives exactly one job per time unit so that all jobs finish after unit time. During the overflow phase
every machine receives exactly 2 jobs so that when the next buildup phase starts all machines are empty.


   Intuitively, S LOW- FIT unnecessarily builds up load on slow machines while keeping the fast machines
empty, and cannot recover if there is small burst of jobs.

G REEDY. When a job j arrives, G REEDY dispatches j to the machine that minimizes the flow-time of
j (assuming FIFO order). Ties are broken arbitrarily. The following bound is well-known [11], but we
sketch it here for completeness. The idea is that G REEDY puts too many slow jobs on fast machines,
which causes problems when large jobs arrive.

Lemma 2.2. G REEDY has a competitive ratio of Ω(log m).

Proof. Consider an instance with k groups of machines where group Gi contains 22k−2i machines of
speed 2i . Thus, the total processing power of group Gi is equal to Si = 22k−i . The processing power of
groups G1 , . . . , Gk combined is equal to
                                            k
                                                    0
                                     Pi = ∑ 22k−i = 22k−i+1 − 2k < 2Si .
                                           i0 =i

k sets of jobs arrive, all at time 0, but in increasing order of size. For all i = 1, . . . , k, the set Ji contains
                                                                             0
Pi /2i jobs of size 2i . We claim that G REEDY will distribute exactly 2i−i jobs from Ji to each machine in
Gi0 for i ≤ i0 ≤ m. This will cause the load on all these machines to increase by exactly 1.
     We use an inductive argument. Just before the jobs from set Ji are distributed, for all i ≤ i0 ≤ m, each
machine in group G0i has a load of exactly i − 1. For all 1 ≤ i0 ≤ i − 1, each machine in group G0i has a
load of exactly i0 . G REEDY does not use groups G1 , . . . , Gi−1 to process jobs from Ji , since distributing a

                         T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                                    4
                       M INIMIZING M AXIMUM F LOW-T IME ON R ELATED M ACHINES

                                                                                             0
job from Ji to a machine in group Gi0 (i0 < i) would result in a flow time of i0 + 2i−i > i. As Ji contains
jobs with a total size of Pi , G REEDY will assign a load of 1 to each machine in Gi , . . . , Gm .
    After scheduling k sets of jobs, the single machine in Gk has received a load of k. However, optimum
can schedule the i-th batch of jobs on group i machines, incurring a maximum load of 2 (i. e., it does
S LOW- FIT with threshold 2).


3    The algorithm D OUBLE - FIT
We describe our algorithm, denoted by D OUBLE - FIT hereafter. D OUBLE - FIT takes as input a parameter
Fopt , which is supposed to be our estimate of Opt. By a slight variation on the doubling trick that loses an
additional factor of 1.5 (see Section 3.4), we will assume henceforth that Fopt ∈ [Opt, 1.5 Opt).
     We divide time into intervals Ik of size 3Fopt as Ik = [3(k − 1)Fopt , 3kFopt ). We refer to time 3kFopt as
the k-th epoch. For each k = 1, 2, . . . , D OUBLE - FIT batches the jobs that arrive during Ik and schedules
them at epoch k using the algorithm in Figure 1. We use [i : m] to denote the machines i, . . . , m. If the
total remaining size of jobs on machine i is w(i) at time t, we say that it has load w(i)/si .


     1. Let J denote the set of jobs arriving during Ik .
     2. Partition jobs in J into classes J1 , . . . , Jm , where each job j is in class Ji with the smallest index
        i such that p j ≤ si · Fopt .
     3. For i = m, m − 1, . . . , 1
     4.      Consider the jobs j in Ji in arbitrary order and assign them as follows.
     5.         (Saturation Phase) If some machine in [i : m] is loaded below 3Fopt
     6.             dispatch j to the slowest such machine.
     7.         (Slow-fit Phase) Else dispatch j to the slowest machine in [i : m]
     8.             such that its load stays below 6Fopt .
     9.         If no such machine exists return FAIL.


                             Figure 1: Algorithm D OUBLE - FIT for the epoch k.


Description. First, D OUBLE - FIT classifies the jobs arriving during Ik depending on the slowest machine
on which they have load no larger than Fopt . Note that as Fopt ≥ Opt, if job j is put in class Ji , then Opt
cannot schedule job j onto a machine smaller than i either.
     D OUBLE - FIT considers jobs from classes Jm down to J1 (this ordering will be used crucially). Each
class is scheduled in two phases. In the saturation phase, when scheduling a job j from class i, it checks
if there is some machine in [i : m] with load less than 3Fopt . If so, j is dispatched to the slowest such
machine. If no such machine exists, the algorithm enters the Slow-fit phase (for class Ji ), and performs
S LOW- FIT for class Ji on machines [i : m] with threshold 6Fopt .

                        T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                                        5
                                 N IKHIL BANSAL AND B OUKE C LOOSTERMANS

3.1     Analysis
Our goal in this section is to show the following result.
Theorem 3.1. If Fopt ≥ Opt, then D OUBLE - FIT never fails.
    This directly implies Theorem 1.1 as follows. Each job spends at most 3Fopt time waiting to be
assigned, and at most 6Fopt on its designated machine, thus the flow-time of any job is at most 9Fopt . As
Fopt ≤ 1.5 Opt by the doubling trick, this implies a competitive ratio of 13.5
    For the purpose of analysis, it will be convenient to consider a restricted Opt that also batches jobs
and dispatches the jobs arriving during Ik at epoch k. Note that such a restricted algorithm has objective
at most 3Fopt + Opt ≤ 4Fopt (as we can take the original schedule and delay every job by 3Fopt ). To prove
Theorem 3.1, we will in fact prove the following stronger result: D OUBLE - FIT never fails for any instance
where the restricted Opt has value at most 4Fopt .

The invariants. Fix an epoch k. Let Ai (k) and Bi (k) denote the total remaining size of jobs on machines
[i : m] in D OUBLE - FIT’s schedule just before and just after all the jobs from interval Ik are dispatched
                              opt
respectively. Similarly, let Ai (k) be the total size of jobs remaining on machines [i : m] in Opt’s schedule
                                                                              opt
before dispatching the jobs which arrived during interval Ik . We define Bi (k) slightly more carefully.
Let Li (k) be the load on machine i in Opt’s schedule after scheduling jobs at epoch k. Then we define
  opt
Bi (k) as
                                                     m
                                          opt
                                        Bi (k) = ∑ max{Li0 , 3Fopt · si0 } .
                                                    i0 =i
      We will show that the following two invariants hold at each epoch k.
                                                                      m
                                                       opt
                                           Ai (k) ≤ Ai (k) + Fopt ∑ si0 ,                                         (3.1)
                                                                     i0 =i
                                                                       m
                                                       opt
                                           Bi (k) ≤ Bi (k) + Fopt ∑ si0 .                                         (3.2)
                                                                     i0 =i

Roughly speaking, invariants (3.1) and (3.2) show that the load on any suffix of D OUBLE - FIT’s machines
stays close to Opt’s load on those machines, both before and after the jobs are dispatched at epoch k. We
will prove that (3.1) and (3.2) hold by a careful induction over i and k.
    Before we prove these invariants, let us first see why they imply Theorem 3.1.

Proof of Theorem 3.1. Consider a fixed epoch k. As (restricted) Opt has maximum flow-time at most
                                      opt
4Fopt , for each i it must hold that Bi (k) ≤ 4Fopt ∑m                                                         m
                                                     i0 =i si0 . Thus by (3.2) it follows that Bi (k) ≤ 5Fopt ∑i0 =i si0
for each i. Choosing i = m, this implies that D OUBLE - FIT never loads machine m above 5Fopt and thus
never fails (as machine m always has room for an additional job).

Proving the invariants. The strategy for proving that (3.1) and (3.2) hold at all epochs k will be to
show the following two lemmas.
Lemma 3.2. If at epoch k, (3.1) holds for all machines, then (3.2) also holds for all machines.

                          T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                                       6
                        M INIMIZING M AXIMUM F LOW-T IME ON R ELATED M ACHINES

      The next step will be to relate the conditions at epochs k and k + 1.
Lemma 3.3. If at epoch k, (3.2) holds for all machines, then (3.1) also holds for all machines at epoch
k + 1.
                                                     opt
    As (3.1) trivially holds for k = 0 (as Ai (0) = Ai (0) = 0 for all i), applying Lemma 3.2 and Lemma 3.3
alternately implies that (3.1) and (3.2) hold for all k.

3.2     Proof of Lemma 3.2
We first show that D OUBLE - FIT is conservative in scheduling small jobs on fast machines.
Lemma 3.4. Let i1 < i2 . If some job j of class i1 is dispatched by D OUBLE - FIT to machine i2 during the
saturation phase (i. e., using threshold 3Fopt ), then all jobs of class i for i1 < i ≤ i2 are also dispatched
during the saturation phase.
Proof. Consider the state of D OUBLE - FIT’s machines just before j was dispatched. As j is dispatched to
machine i2 during the saturation phase, the load on i2 must be below 3Fopt at that point. As jobs of class i
for i1 < i ≤ i2 were considered before class i1 -jobs, the load on i2 was also below 3Fopt after scheduling
class i jobs, and thus D OUBLE - FIT must have never switched to the Slow-fit phase while considering
class i.

      Next we define the notion of separated machines, which will play a crucial role in the analysis.
Definition 3.5. Machines i1 and i2 (i1 < i2 ) are separated at epoch k if D OUBLE - FIT dispatched no jobs
from classes [1 : i1 ] onto machines [i2 : m] at epoch k.
    The following lemma shows that if two consecutive machines are separated, it is easy to relate epochs
k and k + 1.
Lemma 3.6. If machines i − 1 and i are separated at epoch k, then (3.1) implies (3.2) for machine i.
Moreover this trivially holds for machine i = 1.
Proof. As machines i − 1 and i are separated at epoch k, no jobs from class [1 : i − 1] were dispatched to
machines [i : m] at epoch k. Thus
                                                              m
                                           Bi (k) = Ai (k) + ∑ |Ji0 | ,                                 (3.3)
                                                             i0 =i

where |Ji | represents the total size of all jobs in Ji .
   As jobs from Ji cannot be scheduled on machines [1 : i − 1] in an optimal schedule, we also obtain
                                                                  m
                                           opt        opt
                                          Bi (k) ≥ Ai (k) + ∑ |Ji | .                                   (3.4)
                                                                i0 =i

This implies that
                                   m                                                    m
                                                       opt           opt   opt
                 Bi (k) = Ai (k) + ∑ |Ji0 | ≤ Ai (k) + Bi (k) − Ai (k) ≤ Bi (k) + Fopt ∑ si0 ,          (3.5)
                                  i0 =i                                                i0 =i


                         T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                              7
                               N IKHIL BANSAL AND B OUKE C LOOSTERMANS

where the last step follows by our assumption that (3.1) holds for (i, k).
     Finally for i = 1, we observe that both (3.3) and (3.4) hold with equality, and hence the result holds
trivially.

    We now have all the tools we need to prove Lemma 3.2.

Proof of Lemma 3.2. We use induction over i in the order of larger to smaller i. In particular, to prove
that (3.2) holds for some pair (i, k), we assume that (3.1) holds for all (i0 , k) and that (3.2) holds for all
(i0 , k) with i0 > i. As the base case note that this is vacuously true for i = m + 1 (as all relevant quantities
are 0).
       We consider three cases depending on how D OUBLE - FIT assigns jobs from classes [1 : i − 1] to
machines [i : m].

   1. No jobs from class [1 : i − 1] were dispatched to machines [i : m]. In this case, machines i − 1 and i
      are separated and (3.2) follows from Lemma 3.6.

   2. Jobs from classes [1 : i − 1] are only dispatched to machines [i : m] during the saturation phase.
      Let imax ≥ i denote the smallest index such that machines i − 1 and imax + 1 are separated (if no
      such machine exists, set imax = m). By the inductive hypothesis, we can assume that (3.2) holds
      for imax + 1. In the case where imax = m, this holds vacuously. As jobs from classes [1 : i − 1]
      are assigned to [i : m] (and hence to imax ) during the saturation phase, Lemma 3.4 implies that all
      jobs in classes [i : imax ] were also dispatched during the saturation phase, which implies that all
      machines [i : imax ] are loaded below 4Fopt . This gives us the following.
                                                  imax
                                 Bi (k) ≤ 4Fopt ∑ si0 + Bimax +1 (k)
                                                  i0 =i
                                                  imax                               m
                                                           opt
                                       ≤ 4Fopt ∑ si0 + Bimax +1 (k) + Fopt        ∑        si0
                                                  i0 =i                      i0 =imax +1
                                                  imax                        m
                                                           opt
                                       = 3Fopt ∑ si0 + Bimax +1 (k) + Fopt ∑ si0
                                                  i0 =i                      i0 =i
                                                           m
                                            opt
                                       ≤ Bi (k) + Fopt ∑ si0 ,
                                                          i0 =i

      where the second inequality follows from the inductive hypothesis for machine imax + 1.

   3. Some job j from class [1 : i − 1] was dispatched to machines [i : m] during Slow-fit phase (using
      threshold 6Fopt ). We assume that i > 1, otherwise the result follows from case 1. Let imin < i
      denote the largest index such that machines [imin : i − 1] have load more than 5Fopt and machine
      imin − 1 has load at most 5Fopt . If no such machine exists, set imin = 1. imin is well-defined as i > 1
      and machine i − 1 must have load more than 5Fopt as job j from class [1 : i − 1] was assigned to a
      machine in [i : m] during the Slow-fit phase.

      Claim 3.7. Machines imin − 1 and imin are separated or imin = 1.

                        T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                                  8
                       M INIMIZING M AXIMUM F LOW-T IME ON R ELATED M ACHINES

      Proof. This is trivially true if imin = 1.
      If imin > 1, suppose that some job j0 from class [1 : imin − 1] was dispatched to machines [imin : m].
      Now j0 cannot be dispatched during the Slow-fit phase as this would imply that the load on imin − 1
      was more than 5Fopt , which contradicts the choice of imin .
      So all jobs in [1 : imin − 1] that were assigned to [imin : m] must have been assigned during the
      saturation phase. Let i0 ≥ imin denote the largest index where such a job is assigned. By Lemma 3.4,
      it must be that all machines [imin : i0 ] were assigned load during the saturation phase and must have
      load at most 4Fopt . This contradicts that imin has load more than 5Fopt .

      By Lemma 3.6 applied to imin , we get that (3.2) holds for machine imin and thus
                                                                               m
                                                           opt
                                         Bimin (k) ≤ Bimin (k) + Fopt ∑ si0 .                             (3.6)
                                                                            i0 =imin

      Furthermore, by choice of imin all the machines in [imin : i − 1] are loaded above 5Fopt . This implies
      that
                                                                             i−1
                                          Bi (k) ≤ Bimin (k) − 5Fopt ∑ si0 .                              (3.7)
                                                                            i0 =imin

      As every machine is loaded below 4Fopt in an optimal schedule, we also have
                                                                               i−1
                                          opt              opt
                                         Bimin (k) ≤ Bi (k) + 4Fopt ∑ si0 .                               (3.8)
                                                                             i0 =imin

      Adding (3.6) and (3.7) we obtain that
                                                    m                i−1
                                 opt
                      Bi (k) ≤ Bimin (k) + Fopt ∑ si0 − 5Fopt ∑ si0
                                                i0 =imin           i0 =imin
                                                   m              i−1
                                 opt
                            ≤ Bi (k) + Fopt ∑ si0 − Fopt ∑ si0                          by (3.8)
                                                i0 =imin         i0 =imin
                                                  m
                                 opt
                            = Bi (k) + Fopt ∑ si0 ,
                                                i0 =i

      which implies that (3.2) holds for i.

3.3   Proof of Lemma 3.3
We now prove Lemma 3.3, which is relatively easier.

Proof of Lemma 3.3. We will apply induction over i (in decreasing order of machines). Consider epoch
k. We assume that (3.2) holds for all i0 at epoch k, and that (3.1) holds for all i0 > i at epoch k + 1. For the
base case of i = m + 1 the lemma follows trivially since all the relevant quantities are 0.
    Consider some machine i. We consider two cases depending on the load of machine i after the jobs
were dispatched at epoch k.

                        T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                                 9
                               N IKHIL BANSAL AND B OUKE C LOOSTERMANS

   1. Machine i has load at most 4Fopt after epoch k, i. e., Bi (k) − Bi+1 (k) ≤ 4Fopt · si . At epoch k + 1
      before the jobs arriving during interval Ik+1 are dispatched, the load of machine i will be at most
      Fopt . Thus we have that

                                  Ai (k + 1) ≤ Ai+1 (k + 1) + Fopt · si
                                                                             m
                                                  opt
                                              ≤ Ai+1 (k + 1) + Fopt ∑ si0 + Fopt · si
                                                                       i0 =i+1
                                                                        m
                                                  opt
                                              ≤ Ai (k + 1) + Fopt ∑ si0 .
                                                                      i0 =i

      Here the second inequality follows by the inductive hypothesis for machine i + 1, and the third
                             opt
      inequality follows as Ai (k + 1) is non-decreasing as i decreases.

   2. Machine i is loaded above 4Fopt after epoch k, i. e., Bi (k) − Bi+1 (k) > 4Fopt · si . In this case, some
      job j must have been dispatched to machine i during the Slow-fit phase. This only happens if j
      could not be dispatched during the saturation phase. In particular, this implies that all the machines
      [i : m] (which is surely a subset of machines where j could have been scheduled) were loaded above
      3Fopt . So the total size of jobs on all machines [i : m] decreases by exactly 3Fopt ∑m     i0 =i si0 during
      interval Ik+1 .
      Thus we have that
                                                                                 m
                                            Ai (k + 1) = Bi (k) − 3Fopt ∑ si0 .                             (3.9)
                                                                              i0 =i

      Similarly, as Opt can complete at most 3Fopt ∑m
                                                    i0 =i si0 on machines [i : m] during this interval, we
      have
                                                                                     m
                                            opt                opt
                                        Ai (k + 1) = Bi (k) − 3Fopt ∑ si0 .                                (3.10)
                                                                                 i0 =i

      As (3.2) holds for each i at epoch k, we obtain that
                                                        m
                       Ai (k + 1) ≤ Bi (k) − 3Fopt ∑ si0
                                                    i0 =i
                                                       m              m
                                      opt
                                 ≤ Bi (k) + Fopt ∑ si0 − 3Fopt ∑ si0                     by (3.2)
                                                    i0 =i            i0 =i
                                                             m
                                      opt
                                 = Ai (k + 1) + Fopt ∑ si0 ,
                                                            i0 =i

      and hence (3.1) holds for i at epoch k + 1, which completes the proof.

3.4   Removing the assumption of knowledge of Opt
We describe a variant of the standard doubling trick where we increase the online estimate of Opt by only
1.5 times at each step.

                       T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                                   10
                       M INIMIZING M AXIMUM F LOW-T IME ON R ELATED M ACHINES

    Consider some epoch k where the algorithm first fails with the current guess of Fopt . It must be
that (3.2) does not hold. In particular, (3.1) holds at epoch k as (3.2) holds at k − 1. Now, Lemma 3.2
                                                                                                0 = 1.5F
implies that Fopt < Opt. We then abort epoch k, and do not schedule any jobs. Instead, we set Fopt      opt
and redefine the new epoch to be the time (k − 1)Fopt + 3Fopt 0 . Note that between these epochs 4.5F
                                                                                                        opt
time passes, so at the next epoch the load on all machines in the schedule of D OUBLE - FIT will be at most
           0 = 1.5F
6Fopt − 3Fopt                0
                     opt = Fopt . This implies that for all i

                                               m                      m
                                                         opt
                                 Ai (k) ≤ Fopt ∑ si0 ≤ Ai (k) + Fopt ∑ si0 .
                                              i0 =i                  i0 =i

The crucial point is that (3.1) holds for all machines i at this new epoch irrespective of the workload of
                                           0 ). Thus, (3.2) holds if F
the new restricted Opt (with parameter Fopt                           opt ≥ Opt and D OUBLE - FIT proceeds
as normal.


4    Other lower bounds
We also show simple (but strong) lower bounds for weighted maximum flow-time and maximum stretch.

Lemma 4.1. Any algorithm for minimizing maximum weighted flow-time on identical machines must
have a competitive ratio of Ω(W ), where W is the ratio between the largest and smallest weight.

Proof. Consider the following instance on 2 machines. At time t = 0 we receive 2 jobs of size w with
weight 1. Now, any algorithm has three options: (i) it commits to assigning both jobs to the same machine
by time w/2, (ii) it commits to assigning both jobs on different machines by time w/2, or (iii) it has not
committed at least one job to a machine yet by time w/2. Thus, the algorithm has not started processing
this job yet.
     In all three cases, we show that the algorithm will end up trailing by at least Ω(w) volume behind an
optimal schedule. In option (i), at least w volume remains at time t = w, while Opt assigned one job to
each machine and thus has no jobs left. In option (ii), another 3w/2-sized job with weight 1 arrives at
time w/2, so that by time 2w one of the algorithm’s machines has load of at least w/2 (or the entire job
still needs to be processed). Opt initially distributed both jobs to the same machine so it can distribute
this job to its empty machine and has no jobs left by time 2w. In option (iii), Opt distributes the jobs to
different machines so that by time w/2 we are trailing w/2 volume.
     Once we trail w volume behind optimum, at every unit time step we receive 2 unit-size jobs of weight
w. If the trailing jobs are ever to be finished, at least w/2 delay is incurred on the weight w jobs, implying
an objective value of Ω(w2 ). Opt on other hand has value O(w).

    If we remove weights in the above instance, this example also directly implies an Ω(S) lower bound
on the competitive ratio for maximum stretch [3] where S is the ratio between the size of the largest and
the smallest job.
    We remark that a weaker lower bound of Ω(W 0.4 ) for maximum weighted flow-time also follows
from [10], using the analogy between delay factor and weighted maximum flow-time described in [3].

                       T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                               11
                             N IKHIL BANSAL AND B OUKE C LOOSTERMANS

Concluding Remarks
Note that our algorithm D OUBLE - FIT is not immediate-dispatch, i. e., it does not dispatch a job to a
machine immediately upon arrival. We are unable to extend the ideas here to obtain an O(1)-competitive
immediate-dispatch algorithm, and it is not clear to us whether such an algorithm exists. Given that in
the unrelated setting, there can be no O(1)-speed, O(1)-competitive immediate-dispatch algorithm [3]
(while there is a (1 + ε)-speed, O(1/ε)-competitive algorithm), it would be quite interesting to resolve
this question.


Acknowledgments
The authors would like to thank Shuguang Li for noticing an error in a previous version of this paper.


References
 [1] S USANNE A LBERS: Online scheduling. In Introduction to Scheduling (Yves Robert and Frédéric
     Vivien, eds.), chapter 3, pp. 51–73. Chapman and Hall/CRC Press, 2010. Version available on
     author’s website. [doi:10.1201/9781420072747-c3] 2

 [2] C HRISTOPH A MBÜHL AND M ONALDO M ASTROLILLI: On-line scheduling to minimize
     max flow time: an optimal preemptive algorithm. Oper. Res. Lett., 33(6):597–602, 2005.
     [doi:10.1016/j.orl.2004.10.006] 2

 [3] S. A NAND , K ARL B RINGMANN , T OBIAS F RIEDRICH , NAVEEN G ARG , AND A MIT K UMAR:
     Minimizing maximum (weighted) flow-time on related and unrelated machines. In Proc. 40th
     Internat. Colloq. on Automata, Languages and Programming (ICALP’13), pp. 13–24. Springer,
     2013. [doi:10.1007/978-3-642-39206-1_2] 2, 11, 12

 [4] S. A NAND , NAVEEN G ARG , AND A MIT K UMAR: Resource augmentation for weighted flow-time
     explained by dual fitting. In Proc. 23rd Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’12),
     pp. 1228–1241. ACM Press, 2012. ACM DL. 2

 [5] YOSSI A ZAR: On-line load balancing. In Online Algorithms, volume 1442 of LNCS, pp. 178–195.
     Springer, 1998. [doi:10.1007/BFb0029569] 2

 [6] YOSSI A ZAR , BALA K ALYANASUNDARAM , S ERGE A. P LOTKIN , K IRK P RUHS , AND O RLI
     WAARTS: On-line load balancing of temporary tasks. J. Algorithms, 22(1):93–110, 1997. Prelimi-
     nary version in WADS’93. [doi:10.1006/jagm.1995.0799] 2

 [7] N IKHIL BANSAL AND B OUKE C LOOSTERMANS: Minimizing maximum flow-time on related ma-
     chines. In Proc. 18th Internat. Workshop on Approximation Algorithms for Combinatorial Optimiza-
     tion Problems (APPROX’15), pp. 85–95, 2015. [doi:10.4230/LIPIcs.APPROX-RANDOM.2015.85]
     1

                      T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                              12
                     M INIMIZING M AXIMUM F LOW-T IME ON R ELATED M ACHINES

 [8] N IKHIL BANSAL AND JANARDHAN K ULKARNI: Minimizing flow-time on unrelated ma-
     chines. In Proc. 47th STOC, pp. 851–860. ACM Press, 2015. [doi:10.1145/2746539.2746601,
     arXiv:1401.7284] 3

 [9] N IV B UCHBINDER AND J OSEPH NAOR: The design of competitive online algorithms via a primal-
     dual approach. Foundations and Trends in Theoretical Computer Science, 3(2-3):93–263, 2009.
     Preliminary version in FOCS’06. [doi:10.1561/0400000024] 2

[10] C HANDRA C HEKURI , S UNGJIN I M , AND B ENJAMIN M OSELEY: Online scheduling to minimize
     maximum response time and maximum delay factor. Theory of Computing, 8(7):165–195, 2012.
     Preliminary version in ESA’09. [doi:10.4086/toc.2012.v008a007, arXiv:0906.2048] 11

[11] YOOKUN C HO AND S ARTAJ S AHNI: Bounds for list schedules on uniform processors. SIAM J.
     Comput., 9(1):91–103, 1980. [doi:10.1137/0209007] 4

[12] NAVEEN G ARG: Minimizing average flow-time. In Efficient Algorithms, Essays dedicated to Kurt
     Mehlhorn on the occasion of his 60th birthday, pp. 187–198, 2009. Preliminary version in FOCS’07.
     [doi:10.1007/978-3-642-03456-5_13] 2

[13] S UNGJIN I M , B ENJAMIN M OSELEY, AND K IRK P RUHS: A tutorial on amortized local competitive-
     ness in online scheduling. ACM SIGACT News, 42(2):83–97, 2011. [doi:10.1145/1998037.1998058]
     2

[14] BALA K ALYANASUNDARAM AND K IRK P RUHS: Speed is as powerful as clairvoyance. J. ACM,
     47(4):617–643, 2000. Preliminary version in FOCS’95. [doi:10.1145/347476.347479] 2

[15] M ONALDO M ASTROLILLI: Scheduling to minimize max flow time: Off-line and on-line algo-
     rithms. Internat. J. Foundat. Comput. Sci., 15(2):385–401, 2004. Preliminary version in FCT’03.
     [doi:10.1142/S0129054104002480] 2

[16] K IRK P RUHS , J I ŘÍ S GALL , AND E RIC T ORNG: Online scheduling. In Handbook of Scheduling:
     Algorithms, Models, and Performance Analysis (Joseph Y-T. Leung, ed.), chapter 15. CRC Press,
     2004. Available at CRC Press. 2


AUTHORS

      Nikhil Bansal
      Professor
      Eindhoven University of Technology
      n bansal tue nl
      http://www.win.tue.nl/~nikhil/




                     T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                         13
                          N IKHIL BANSAL AND B OUKE C LOOSTERMANS

    Bouke Cloostermans
    Ph. D. student
    Eindhoven University of Technology
    b cloostermans tue nl
    http://www.tue.nl/en/university/departments/mathematics-and-computer-science/
        the-department/staff/


ABOUT THE AUTHORS

    N IKHIL BANSAL is a Professor in the Department of Mathematics and Computer Science
        at Eindhoven University of Technology. He received his Bachelors degree in Computer
        Science from IIT Mumbai (1999), and obtained his Ph. D. from Carnegie Mellon Univer-
        sity in 2003 under the guidance of Avrim Blum. He is broadly interested in theoretical
        computer science with focus on the design and analysis of algorithms.


    B OUKE C LOOSTERMANS received his Master’s degree from Eindhoven University of
       Technology in 2014. He is currently a Ph. D. student at the same institution under the
       supervision of Nikhil Bansal.




                   T HEORY OF C OMPUTING, Volume 12 (14), 2016, pp. 1–14                         14