DOKK Library

How Fast Do Algorithms Improve?

Authors Neil C. Thompson Yash Sherry

License CC-BY-4.0

Plaintext
How Fast Do Algorithms
Improve?
By YASH SHERRY
MIT Computer Science & Artificial Intelligence Laboratory, Cambridge, MA 02139 USA
NEIL C. THOMPSON
MIT Computer Science & Artificial Intelligence Laboratory, Cambridge, MA 02139 USA
MIT Initiative on the Digital Economy, Cambridge, MA 02142 USA
                                                                                                                      Is progress faster in most algo-
                                                                                                                      rithms? Just some? How much
                                                                                                                      on average?
                                                                                                                         A variety of research has
                                                                                                                      quantified progress for partic-
                                                                                                                      ular algorithms, including for
                                                                                                                      maximum flow [5], Boolean
                                                                                                                      satisfiability and factoring [6],
                                                                                                                      and (many times) for linear
                                                                                                                      solvers [4], [6], [7]. Others in
                                                                                                                      academia [6], [8]–[10] and the
                                                                                                                      private sector [11], [12] have
                                                                                                                      looked at progress on bench-
                                                                                                                      marks, such as computer chess
                                                                                                                      ratings or weather prediction,
                                                                                                                      that is not strictly comparable to
                                                                                                                      algorithms since they lack either
                                                                                                                      mathematically defined problem
                                                                                                                      statements or verifiably optimal




A
                                                                                                                      answers. Thus, despite substan-
                                                                                                                      tial interest in the question,
                lgorithms determine which calculations computers use to solve
                                                                                                                      existing research provides only
                problems and are one of the central pillars of computer science.
                                                                                                                      a limited, fragmentary view of
                As algorithms improve, they enable scientists to tackle larger
                                                                                                                      algorithm progress.
                problems and explore new domains and new scientific tech-
                                                                                                                         In this article, we provide the
niques [1], [2]. Bold claims have been made about the pace of algorithmic
                                                                                                                      first comprehensive analysis of
progress. For example, the President’s Council of Advisors on Science and
                                                                                                                      algorithm progress ever assem-
Technology (PCAST), a body of senior scientists that advise the U.S. President,
                                                                                                                      bled. This allows us to look sys-
wrote in 2010 that “in many areas, performance gains due to improvements in
                                                                                                                      tematically at when algorithms
algorithms have vastly exceeded even the dramatic performance gains due to
                                                                                                                      were discovered, how they have
increased processor speed” [3]. However, this conclusion was supported based
                                                                                                                      improved, and how the scale
on data from progress in linear solvers [4], which is just a single example. With
                                                                                                                      of these improvements com-
no guarantee that linear solvers are representative of algorithms in general,
                                                                                                                      pares to other sources of inno-
it is unclear how broadly conclusions, such as PCAST’s, should be interpreted.
                                                                                                                      vation. Analyzing data from 57
                                                                                                                      textbooks and more than 1137
                                                                                                                      research papers reveals enor-
Date of publication September 20, 2021; date of current version November 1, 2021.                                     mous variation. Around half of
This article has supplementary downloadable material available at https://doi.org/10.1109/JPROC.2021.
3107219, provided by the authors.
                                                                                                                      all algorithm families experience
Digital Object Identifier 10.1109/JPROC.2021.3107219                                                                  little or no improvement. At the


           This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

1768     P ROCEEDINGS OF THE IEEE | Vol. 109, No. 11, November 2021
                                                                                                                                Point of View


other extreme, 14% experience                      enough to discuss. Based on these         class transition into another because
transformative improvements, rad-                  inclusion criteria, there are 113 algo-   of an algorithmic improvement. Time
ically changing how and where                      rithm families. On average, there are     complexity classes, as defined in
they can be used. Overall, we find                 eight algorithms per family. We con-      algorithm theory, categorize algo-
that, for moderate-sized problems,                 sider an algorithm as an improvement      rithms by the number of operations
30%–43% of algorithmic families had                if it reduces the worst case asymp-       that they require (typically expressed
improvements comparable or greater                 totic time complexity of its algorithm    as a function of input size) [15].
than those that users experienced                  family. Based on this criterion, there    For example, a time complexity of
from Moore’s Law and other hardware                are 276 initial algorithms and sub-       O(n 2 ) indicates that, as the size of
advances. Thus, this article presents              sequent improvements, an average          the input n grows, there exists a
the first systematic, quantitative                 of 1.44 improvements after the initial    function Cn 2 (for some value of C)
evidence that algorithms are one                   algorithm in each algorithm family.       that upper bounds the number of
of the most important sources of                                                             operations required.2 Asymptotic time
improvement in computing.                                                                    is a useful shorthand for discussing
                                                   A. Creating New Algorithms                algorithms because, for a sufficiently
                                                      Fig. 1 summarizes algorithm            large value of n, an algorithm with
                                                   discovery and improvement over            a higher asymptotic complexity will
I. R E S U L T S
                                                   time. Fig. 1(a) shows the timing for      always require more steps to run.
In the following analysis, we focus on
                                                   when the first algorithm in each fam-     Later, in this article, we show that,
exact algorithms with exact solutions.
                                                   ily appeared, often as a brute-force      in general, little information is lost by
That is, cases where a problem state-
                                                   implementation       (straightforward,    our simplification to using asymptotic
ment can be met exactly (e.g., find the
                                                   but computationally inefficient),         complexity.
shortest path between two nodes on a
                                                   and Fig. 1(b) shows the share of             Fig. 1(c) shows that, at discovery,
graph) and where there is a guarantee
                                                   algorithms in each decade where           31% of algorithm families belong
that the optimal solution will be found
                                                   asymptotic time complexity improved.      to the exponential complexity
(e.g., that the shortest path has been
                                                   For example, in the 1970s, 23 new         category (denoted n!|cn )—meaning
identified).
                                                   algorithm families were discovered,       that they take at least exponentially
     We categorize algorithms into
                                                   and 34% of all the previously             more operations as input size grows.
algorithm families, by which we
                                                   discovered algorithm families were        For these algorithms, including
mean that they solve the same under-
                                                   improved upon. In later decades,          the famous “Traveling Salesman”
lying problem. For example, Merge
                                                   these rates of discovery and improve-     problem, the amount of computation
Sort and Bubble Sort are two of the
                                                   ment fell, indicating a slowdown in       grows so fast that it is often infeasible
18 algorithms in the “comparison sort-
                                                   progress on these types of algorithms.    (even on a modern computer) to
ing” family. In theory, an infinite num-
                                                   It is unclear exactly what caused         compute problems of size n = 100.
ber of such families could be created,
                                                   this. One possibility is that some        Another 50% of algorithm families
for example, by subdividing existing
                                                   algorithms were already theoretically     begin with polynomial time that is
domains so that special cases can
                                                   optimal, so further progress was          quadratic or higher, while 19% have
be addressed separately (e.g., matrix
                                                   impossible. Another possibility is that   asymptotic complexities of n log n or
multiplication with a sparseness guar-
                                                   there are decreasing marginal returns     better.
antee). In general, we exclude such
                                                   to algorithmic innovation [5] because        Fig. 1(d) shows that there is con-
special cases from our analysis since
                                                   the easy-to-catch innovations have        siderable movement of algorithms
they do not represent an asymptotic
                                                   already been “fished-out” [13] and        between complexity classes as algo-
improvement for the full problem.1
                                                   what remains is more difficult to         rithm designers find more efficient
     To focus on consequential algo-
                                                   find or provides smaller gains. The       ways of implementing them. For
rithms, we limit our consideration to
                                                   increased importance of approximate       example, on average from 1940 to
those families where the authors of a
                                                   algorithms may also be an explanation     2019, algorithms with complexity
textbook, one of 57 that we examined,
                                                   if approximate algorithms have            O(n 2 ) transitioned to complexity O(n)
considered that family important
                                                   drawn away researcher attention           with a probability of 0.5% per year,
                                                   (although the causality could also        as calculated using (3). Of particular
                                                   run in the opposite direction, with       note in Fig. 1(d) are the transitions
                                                   slower algorithmic progress pushing
   1 The only exception that we make to this       researchers into approximation) [14].        2 For example, the number of operations
rule is if the authors of our source text-            Fig. 1(c) and (d), respectively,       needed to alphabetically sort a list of 1000 file-
books deemed these special cases important         shows the distribution of “time com-      names in a computer directory might be
enough to discuss separately, e.g., the distinc-   plexity classes” for algorithms when      0.5(n 2 + n), where n is the number of file-
tion between comparison and noncomparison                                                    names. For simplicity, algorithm designers typ-
sorting. In these cases, we consider each as its   they were first discovered, and the       ically drop the leading constant and any smaller
own algorithm family.                              probabilities that algorithms in one      terms to write this as O(n 2 ).


                                                                     Vol. 109, No. 11, November 2021 | P ROCEEDINGS OF THE IEEE         1769
Point of View




Fig. 1.    Algorithm discovery and improvement. (a) Number of new algorithm families discovered each decade. (b) Share of known algorithm
families improved each decade. (c) Asymptotic time complexity class of algorithm families at first discovery. (d) Average yearly probability
that an algorithm in one time complexity class transitions to another (average family complexity improvement). In (c) and (d) “>n3 ” includes
time complexities that are superpolynomial but subexponential.




from factorial or exponential time                One algorithm family that has                 properties of weighted edges to
(n! | cn ) to polynomial times. These           undergone transformative improve-               bring the time complexity to cubic.
improvements can have profound                  ment is generating optimal binary               Hu and Tucker [17], in the same year,
effects, making algorithms that                 search trees. Naively, this problem             improved on this performance with a
were previously infeasible for any              takes exponential time, but, in 1971,           quasi-linear [O(n log n)] time solution
significant-sized problem possible for          Knuth [16] introduced a dynamic                 using minimum weighted path length,
large datasets.                                 programming solution using the                  which remains the best asymptotic


1770      P ROCEEDINGS OF THE IEEE | Vol. 109, No. 11, November 2021
                                                                                                                                   Point of View


time complexity achieved for this                much more rapidly than hard-                            As these graphs show, there are two
family.3                                         ware/Moore’s law, while others, such                large clusters of algorithm families
                                                 as self-balancing tree creation, have               and then some intermediate values.
                                                 not. The orders of magnitude of                     The first cluster, representing just
B. Measuring                                     variation shown in just these four                  under half the families, shows little
Algorithm Improvement                            of our 113 families make it clear                   to no improvement even for large
   Over time, the performance of                 why overall algorithm improvement                   problem sizes. These algorithm fam-
an algorithm family improves as                  estimates based on small numbers                    ilies may be ones that have received
new algorithms are discovered that               of case studies are unlikely to be                  little attention, ones that have already
solve the same problem with fewer                representative of the field as a whole.             achieved the mathematically optimal
operations. To measure progress,                    An important contrast between                    implementations (and, thus, are
we focus on discoveries that improve             algorithm and hardware improve-                     unable to further improve), those that
asymptotic complexity—for example,               ment comes in the predictability of                 remain intractable for problems of this
moving from O(n 2 ) to O(n log n),               improvements. While Moore’s law led                 size, or something else. In any case,
or from O(n 2.9 ) to O(n 2.8 ).                  to hardware improvements happening                  these problems have experienced
                                                 smoothly over time, Fig. 2 shows that               little algorithmic speedup, and
   Fig. 2(a) shows the progress over             algorithms experience large, but infre-             thus, improvements, perhaps from
time for four different algorithm                quent improvements (as discussed in                 hardware or approximate/heuristic
families, each shown in one color.               more detail in [5]).                                approaches, would be the most
In each case, performance is normal-                The asymptotic performance of an                 important sources of progress for
ized to 1 for the first algorithm in             algorithm is a function of input size               these algorithms.
that family. Whenever an algorithm               for the problem. As the input grows,                    The second cluster of algorithms,
is discovered with better asymptotic             so does the scale of improvement from               consisting of 14% of the families, has
complexity, it is represented by                 moving from one complexity class to                 yearly improvement rates greater than
a vertical step up. Inspired by                  the next. For example, for a problem                1000% per year. These are algorithms
Leiserson et al. [5], the height of each         with n = 4, an algorithmic change                   that are benefited from an expo-
step is calculated using (1), repre-             from O(n) to O(log n) only repre-                   nential speed-up, for example, when
senting the number of problems that              sents an improvement of 2 (=4/2),                   the initial algorithm had exponential
the new algorithm could solve                    whereas, for n = 16, it is an improve-              time complexity, but later improve-
in the same amount of time as the                ment of 4 (=16/4). That is, algorith-               ments made the problem solvable
first algorithm took to solve a single           mic improvement is more valuable for                in polynomial time.6 As this high
problem (in this case, for a problem             larger data. Fig. 2(b) demonstrates                 improvement rate makes clear, early
of size n = 1 million).4 For example,            this effect for the “nearest-neighbor               implementations of these algorithms
Grenander’s algorithm for the max-               search” family, showing that improve-               would have been impossibly slow for
imum subarray problem, which is                  ment size varies from 15× to ≈4                     even moderate size problems, but the
used in genetics (and elsewhere),                million× when the input size grows                  algorithmic improvement has made
is an improvement of one million ×               from 102 to 108 .                                   larger data feasible. For these families,
over the brute force algorithm.                     While Fig. 2 shows the impact of                 algorithm improvement has far out-
   To provide a reference point for              algorithmic improvement for four                    stripped improvements in computer
the magnitude of these rates, the fig-           algorithm families, Fig. 3 extends                  hardware.
ure also shows the SPECInt bench-                this analysis to 110 families.5 Instead                 Fig. 3 also shows how large
mark progress time series compiled               of showing the historical plot of                   an effect problem size has on
in [20], which encapsulates the effects          improvement for each family, Fig. 3                 the improvement rate, calculated
that Moore’s law, Dennard scaling,               presents the average annualized                     using (2). In particular, for n = 1
and other hardware improvements                  improvement rate for problem sizes                  thousand, only 18% of families
had on chip performance. Throughout              of one thousand, one million, and                   had improvement rates faster than
this article, we use this measure as the         one billion and contrasts them with                 hardware, whereas 82% had slower
hardware progress baseline. Fig. 2(a)            the average improvement rate in                     rates. However, for n = 1 million and
shows that, for problem sizes of n =             hardware as measured by the SPECInt                 n = 1 billion, 30% and 43% improved
1 million, some algorithms, such as              benchmark [20].                                     faster than hardware. Correspond-
maximum subarray, have improved                                                                      ingly, the median algorithm family
    3 There are faster algorithms for solving        4 For this analysis, we assume that the lead-   improved 6% per year for n = 1
this problem, for example, Levcopoulos’s lin-    ing constants are not changing from one algo-       thousand but 15% per year for n = 1
ear solution [18] in 1989 and another linear     rithm to another. We test this hypothesis later
solution by Klawe and Mumey [19], but these      in this article.
                                                     5 Three of the 113 families are excluded
are approximate and do not guarantee that they
will deliver the exact right answer and, thus,   from this analysis because the functional forms       6 One example of this is the matrix chain
are not included in this analysis.               of improvements are not comparable.                 multiplication algorithm family.


                                                                       Vol. 109, No. 11, November 2021 | P ROCEEDINGS OF THE IEEE         1771
Point of View




Fig. 2.    Relative performance improvement for algorithm families, as calculated using changes in asymptotic time complexity. The
comparison line is the SPECInt benchmark performance [20]. (a) Historical improvements for four algorithm families compared with the first
algorithm in that family (n = 1 million). (b) Sensitivity of algorithm improvement measures to input size (n) for the “nearest-neighbor
search” algorithm family. To ease comparison of improvement rates over time, in (b) we align the starting periods for the algorithm family
and the hardware benchmark.




million and 28% per year for n = 1 bil-         improvement       affects   computer             improvement can. Second, as prob-
lion. At a problem size of n = 1.06 tril-       science. First, when an algorithm                lems increase to billions or trillions of
lion, the median algorithm improved             family transitions from exponential to           data points, algorithmic improvement
faster than hardware performance.               polynomial complexity, it transforms             becomes substantially more important
   Our results quantify two impor-              the tractability of that problem in              than hardware improvement/Moore’s
tant lessons about how algorithm                a way that no amount of hardware                 law in terms of average yearly


1772      P ROCEEDINGS OF THE IEEE | Vol. 109, No. 11, November 2021
                                                                                                                                 Point of View




Fig. 3.   Distribution of average yearly improvement rates for 110 algorithm families, as calculated based on asymptotic time complexity,
for problems of size: (a) n = 1 thousand, (b) n = 1 million, and (c) n = 1 billion. The hardware improvement line shows the average yearly
growth rate in SPECInt benchmark performance from 1978 to 2017, as assembled by Hennessy and Patterson [20].




improvement rate. These findings                has been particularly important in               machine learning, which have large
suggest that algorithmic improvement            areas, such as data analytics and                datasets.



                                                                     Vol. 109, No. 11, November 2021 | P ROCEEDINGS OF THE IEEE         1773
Point of View




Fig. 4.    Evaluation of the importance of leading constants in algorithm performance improvement. Two measures of the performance
improvement for algorithm families (first versus last algorithm in each family) for n = 1 million. Algorithmic steps include leading constants
in the analysis, whereas asymptotic performance drops them.




C. Algorithmic Step Analysis                        To estimate the fidelity of our               of improvements to the number of
   Throughout this article, we have              asymptotic      complexity     approxi-          algorithmic steps and asymptotic
approximated the number of steps                 mation, we reanalyze algorithmic                 performance is nearly identical.7
that an algorithm needs to perform               improvement, including the leading               Thus, for the majority of algorithms,
by looking at its asymptotic complex-            constants (and call this latter to               there is virtually no systematic infla-
ity, which drops any leading constants           construct the algorithmic steps of               tion of leading constants. We cannot
or smaller-order terms, for example,             that algorithm). Since only 11% of               assume that this necessarily extrapo-
simplifying 0.5(n 2 + n) to n 2 . For any        the papers in our database directly              lates to unmeasured algorithms since
reasonable problem sizes, simplifying            report the number of algorithmic                 higher complexity may lead to both
to the highest order term is likely              steps that their algorithms require,             higher leading constants and a lower
to be a good approximation. How-                 whenever possible, we manually                   likelihood of quantifying them (e.g.,
ever, dropping the leading constant              reconstruct the number of steps based            matrix multiplication). However, this
may be worrisome if complexity class             on the pseudocode descriptions in                analysis reveals that these are the
improvements come with inflation in              the original papers. For example,                exception, rather than the rule. Thus,
the size of the leading constant. One            Counting Sort [14] has an asymptotic             asymptotic complexity is an excellent
particularly important example of this           time complexity of O(n), but the                 approximation for understanding how
is the 1990 Coppersmith–Winograd                 pseudocode has four linear for-                  most algorithms improve.
algorithm and its successors, which,             loops, yielding 4n algorithmic steps
to the best of our knowledge, have               in total. Using this method, we are
no actual implementations because                able to reconstruct the number of
                                                                                                  II. C O N C L U S I O N
“the huge constants involved in the              algorithmic steps needed for the
                                                                                                  Our results provide the first systematic
complexity of fast matrix multiplica-            first and last algorithms in 65% of
                                                                                                  review of progress in algorithms—
tion usually make these algorithms               our algorithm families. Fig. 4 shows
                                                                                                  one of the pillars underpinning com-
impractical” [21]. If inflation of lead-         the comparison between algorithm
                                                                                                  puting in science and society more
ing constants is typical, it would               step improvement and asymptotic
                                                                                                  broadly.
mean that our results overestimate               complexity improvement. In each
the scale of algorithm improvement.              case, we show the net effect across                 7 Cases where algorithms transitioned from

On the other hand, if leading con-               improvements in the family by taking             exponential to polynomial time (7%) cannot be
                                                 the ratio of the performances of the             shown on this graph because the change is too
stants neither increase nor decrease,                                                             large. However, an analysis of the change in
on average, then it is safe to ana-              first and final algorithms (kth) in the          their leading constants shows that, on average,
lyze algorithms without them since               family (steps1)/(stepsk ).                       there is a 28% leading constant deflation. That
                                                                                                  said, this effect is so small compared to the
they will, on average, cancel out when             Fig. 4 shows that, for the cases               asymptotic gains that it has no effect on our
ratios of algorithms are taken.                  where the data are available, the size           other estimates.

1774      P ROCEEDINGS OF THE IEEE | Vol. 109, No. 11, November 2021
                                                                                                                 Point of View


   We find enormous heterogene-           cited frequently in algorithm research     tiple processors or allows it to run
ity in algorithmic progress, with         papers, on Wikipedia pages, or in          on a GPU would not count toward
nearly half of algorithm families         other textbooks. For textbooks from        our definition. Similarly, an algorithm
experiencing virtually no progress,       recent years, where such citations’        that reduced the amount of mem-
while 14% experienced improvements        breadcrumbs are too scarce, we also        ory required, without changing the
orders of magnitude larger than hard-     use reviews on Amazon, Google,             total amount of work, would simi-
ware improvement (including Moore’s       and others to source the most-used         larly not be included in this analy-
law). Overall, we find that algorith-     textbooks.                                 sis. Finally, we focus on worst case
mic progress for the median algo-            From each of the 57 textbooks,          time complexity because it does not
rithm family increased substantially      we used those authors’ categoriza-         require assumptions about the distri-
but by less than Moore’s law for          tion of problems into chapters, sub-       bution of inputs and it is the most
moderate-sized problems and by more       headings, and book index divisions         widely reported outcome in algorithm
than Moore’s law for big data prob-       to determine which algorithms fami-        improvement papers.
lems. Collectively, our results high-     lies were important to the field (e.g.,
light the importance of algorithms as     “comparison sorting”) and which
an important, and previously largely      algorithms corresponded to each fam-       B. Historical Improvements
undocumented, source of computing         ily (e.g., “Quicksort” in comparison          We calculate historical improve-
improvement.                              sorting). We also searched academic        ment rates by examining the initial
                                          journals, online course material, and      algorithm in each family and all sub-
III. M E T H O D S                        Wikipedia, and published theses to         sequent algorithms that improve time
A full list of the algorithm textbooks,   find other algorithm improvements          complexity. For example, as discussed
course syllabi, and reference papers      for the families identified by the text-   in [23], the “Maximum Subarray in
used in our analysis can be found         books. To distinguish between serious      1D” problem was first proposed
in the Extended Data and Supple-          algorithm problems and pedagogical         in 1977 by Ulf Grenander with a
mentary Material, as a list of the        exercises intended to help students        brute force solution of O(n 3 ) but
algorithms in each family.                think algorithmically (e.g., Tower         was improved twice in the next two
                                          of Hanoi problem), we limit our            years—first to O(n 2 ) by Grenander
                                          analysis to families where at least        and then to O(n log n) by Shamos
A. Algorithms and                         one research paper was written that        using a Divide and Conquer strat-
Algorithm Families                        directly addresses the family’s prob-      egy. In 1982, Kadane came up with
   To generate a list of algorithms and   lem statement.                             an O(n) algorithm, and later that
their groupings into algorithmic fami-       In our analysis, we focus on exact      year, Gries [24] devised another lin-
lies, we use course syllabi, textbooks,   algorithms with exact solutions. That      ear algorithm using Dijkstra’s stan-
and research papers. We gather a list     is, cases where a problem statement        dard strategy. There were no further
of major subdomains of computer           can be met exactly (e.g., find the         complexity improvements after 1982.
science by analyzing the coursework       shortest path between two nodes on         Of all these algorithms, only Gries’ is
from 20 top computer science univer-      a graph), and there is a guaran-           excluded from our improvement list
sity programs, as measured by the QS      tee that an optimal solution will be       since it did not have a better time
World Rankings in 2018 [22]. We then      found (e.g., that the shortest path        complexity than Kadane’s.
shortlist those with maximum overlap      has been identified). This “exact algo-       When computing the number of
amongst the syllabi, yielding the         rithm, exact solution” criterion also      operations needed asymptotically,
following 11 algorithm subdomains:        excludes, amongst others, algorithms       we drop leading constants and smaller
combinatorics,       statistics/machine   where solutions, and even in theory,       order terms. Hence, an algorithm
learning, cryptography, numerical         are imprecise (e.g., detect parts of an    with time complexity 0.5(n 2 + n)
analysis, databases, operating sys-       image that might be edges) and algo-       is approximated as n 2 . As we
tems, computer networks, robot-           rithms with precise definitions but        show in Fig. 4, this is an excellent
ics, signal processing, computer          where proposed answers are approx-         approximation to the improvement
graphics/image       processing,    and   imate. We also exclude quantum algo-       in the actual number of algorithmic
bioinformatics.                           rithms from our analysis since such        steps for the vast majority of
   From each of these subdomains,         hardware is not yet available.             algorithms.
we analyze algorithm textbooks—              We assess that an algorithm has            To be consistent in our notation,
one textbook from each subdomain          improved if the work that needs to         we convert the time complexity for
for each decade since the 1960s.          be done to complete it is reduced,         algorithms with matrices, which are
This totals 57 textbooks because not      asymptotically. This, for example,         typically parameterized in terms of
all fields have textbooks in early        means that a parallel implementa-          the dimensions of the matrix, to being
decades, e.g., bioinformatics. Text-      tion of an algorithm that spreads the      parameterized as a function of the
books were chosen based on being          same amount of work across mul-            input size. For example, this means


                                                            Vol. 109, No. 11, November 2021 | P ROCEEDINGS OF THE IEEE   1775
Point of View


that the standard form for writing the             We calculate the average per-year        tions to calculate the improvement
time complexity of naive matrix multi-           percentage improvement rate over t         because newer algorithms scale dif-
plication goes from N 3 , where N × N            years as                                   ferently because of output sensitiv-
is the dimension of the matrix, to an                                                       ity and, thus, cannot be computed
  1.5 algorithm when n is the size of
n input                                                                                     directly with only the inputs parame-
                                                 YearlyImprovement i→ j
the input and n = N × N .                                                                   ters. In three instances, the functional
                                                                           1/t
    In the presentations of our results                    Operationsi (n)                  forms of the early and later algorithms
in Fig. 1(d), we “round-up”—meaning                    =                         − 1. (2)   are so incomparable that we do not
                                                           Operations j (n)
that results between complexity                                                             attempt to calculate rates of improve-
classes round up to the next highest                                                        ment and instead omit them.
category. For example, an algorithm                 We only consider years since
that scales as n 2.7 would be between            1940 to be those where an algorithm        D. Transition Probabilities
n 3 and n 2 and so would get rounded             was eligible for improvement. This
                                                 avoids biasing very early algorithms,         The transition probability from
up to n 3 .8 Algorithms with subex-
                                                 e.g., those discovered in Greek times,     one complexity class to another is
ponential but superpolynomial time
                                                 toward an average improvement              calculated by counting the number of
complexity are included in the >n 3
                                                 rate of zero.                              transitions that did occur and divid-
category.
                                                    Both of these measures are inten-       ing by the number of transitions
                                                 sive, rather than extensive, in which      that could have occurred. Specifically,
                                                 they measure how many more prob-           the probability of an algorithm tran-
C. Calculating Improvement                                                                  sitioning from class a to b is given as
                                                 lems (of the same size) could be
Rates and Transition Values                                                                 follows:
                                                 solved with a given number of oper-
    In general, we calculate the                 ations. Another potential measure
improvement from one algorithm,                  would be to look at the increase in the    prob(a → b)
i , to another j as                              problem size that could be achieved,          1            ||a → b||t
                                                                                             =                               (3)
                                                 i.e., (n new /n old ), but this requires      T    ||a||t−1 + c∈C ||c → a||t
                                                                                                 t∈T
                      Operationsi (n)            assumptions about the computing
Improvementi→ j     =                  (1)
                      Operations j (n)           power being used, which would              where t is a year from the set of possi-
                                                 introduce significant extra complex-       ble years T and c is a time complexity
where n is the problem size and the              ity into our analysis without compen-      class from C, which includes the null
number of operations is calculated               satory benefits.                           set (i.e., a new algorithm family).
either using the asymptotic complex-                Algorithms With Multiple Parame-           For example, say we are interested
ity or algorithmic step techniques.              ters: While many algorithms only have      in looking at the number of transitions
One challenge of this calculation                a single input parameter, others have      from cubic to quadratic, i.e., a = n 3
is that there are some improve-                  multiple. For example, graph algo-         and b = n 2 . For each year, we cal-
ments, which would be mathemati-                 rithms can depend on the number            culate the fraction of transitions that
cally impressive but not realizable, for         of vertices, V and the number of           did occur, which is just the number
example, an improvement from 22n to              edges, E, and, thus, have input size       of algorithm families that did improve
2n , when n = 1 billion is an improve-           V + E. For these algorithms, increas-      from n 3 to n 2 (i.e., ||a → b||t ) divided
ment ratio of 21 000 000 000 . However,          ing input size could have various          by all the transitions that could have
the improved algorithm, even after               effects on the number of operations        occurred. The latter includes to three
such an astronomical improvement,                needed, depending on how much of           terms: all the algorithm families that
remains completely beyond the abil-              the increase in the input size was         were in n 3 in the previous year (i.e.,
ity of any computer to actually calcu-           assigned to each variable. To avoid        ||a||t−1 ), all the algorithm families
late. In such cases, we deem that the            this ambiguity, we look to research        that move into n 3 from another class
                                                                                                                         
                                                                                                                            c∈C ||c →
“effective” performance improvement              papers that have analyzed problems         (c) in that year (i.e.,
is zero since it went from “too large            of this type as an indication of the       a||t ), and any new algorithm families
to do” to “too large to do.” In practice,        ratios of these parameters that are of     that are created and begin in n 3 (i.e.,
this means that we deem all algorithm            interest to the community. For exam-       ||Ø → a||t ). These latter two are
families that transition from one fac-           ple, if an average paper considers         included because a family can make
torial/exponential implementation to             graphs with E = 5 V, then we               multiple transitions within a single
another has no effective improvement             will assume this for both our base         year, and thus, “newly arrived” algo-
for Figs. 3 and 4.                               case and any scaling of input sizes.       rithms are also eligible for asymptotic
                                                 In general, we source such ratios as       improvement in that year. Averaging
                                                 the geometric mean of at least three       across all the years in the sample pro-
   8 For ambiguous cases, we round based on      studies. In a few cases, such as Con-      vides the average transition probabil-
the values for an input size of n = 1 000 000.   vex Hull, we have to make assump-          ity from n 3 to n 2 , which is 1.55%.


1776    P ROCEEDINGS OF THE IEEE | Vol. 109, No. 11, November 2021
                                                                                                                                                                     Point of View


E. Deriving the Number of                                     Acknowledgment                                                Data Availability and
Algorithmic Steps                                             The authors would like to acknowl-                            Code Availability
                                                              edge generous funding from the Tides
                                                              Foundation and the MIT Initiative                             Data are being made public through
   In general, we use pseudocode from                         on the Digital Economy. They would                            the online resource for algorithms
the original papers to derive the                             also like to thank Charles Leiserson,                         community at algorithm-wiki.org and
number of algorithmic steps needed                            the MIT Supertech Group, and Julian                           will launch at the time of this article’s
for an algorithm when the authors                             Shun for valuable input.                                      publication. Code will be available
have not done it. When that is not                                                                                          at     https://github.com/canuckneil/
available, we also use pseudocode                             Author Contributions                                          IEEE_algorithm_paper.
from textbooks or other sources.                              Statement
Analogously to asymptotic complex-                            Neil C. Thompson conceived the
ity calculations, we drop smaller                             project and directed the data gather-                         Additional Information
order terms and their constants                               ing and analysis. Yash Sherry gathered
because they have a diminishing                               the algorithm’s data and performed                            The requests for materials should
impact as the size of the problem                             the data analysis. Both authors wrote                         be addressed to Neil C. Thompson
increases.                                                    this article.                                                 (neil_t@mit.edu).

REFERENCES
[1] Division of Computing and Communication                         Available: https://www.lanl.gov/conferences/           [16] D. E. Knuth, “Optimum binary search trees,” Acta
    Foundations CCF: Algorithmic Foundations                        salishan/salishan2004/womble.pdf                            Inform., vol. 1, no. 1, pp. 14–25, 1971.
    (AF)—National Science Foundation, Nat. Sci.               [8]   N. Thompson, S. Ge, and G. Filipe, “The                [17] T. C. Hu and A. C. Tucker, “Optimal computer
    Found., Alexandria, VA, USA. [Online]. Available:               importance of (exponentially more) computing,”              search trees and variable-length alphabetical
    https://www.nsf.gov/funding/pgm_summ.jsp?pims_                  Tech. Rep., 2020.                                           codes,” SIAM J. Appl. Math., vol. 21, no. 4,
    id=503299                                                 [9]   D. Hernandez and T. Brown, “A.I. and efficiency,”           pp. 514–532, 1971.
[2] Division of Physics Computational and Data-Enabled              OpenAI, San Francisco, CA, USA, Tech. Rep.             [18] C. Levcopoulos, A. Lingas, and J.-R. Sack,
    Science and Engineering (CDS&E)—National Science                abs/2005.04305 Arxiv, 2020.                                 “Heuristics for optimum binary search trees and
    Foundation, Nat. Sci. Found., Alexandria, VA, USA.       [10]   N. Thompson, K. Greenewald, and K. Lee, “The                minimum weight triangulation problems,” Theor.
    [Online]. Available: https://www.nsf.gov/funding/               computation limits of deep learning,” 2020,                 Comput. Sci., vol. 66, no. 2, pp. 181–203,
    pgm_summ.jsp?pims_id=504813                                     arXiv:2007.05558. [Online]. Available:                      Aug. 1989.
[3] J. P. Holdren, E. Lander, and H. Varmus,                        https://arxiv.org/abs/2007.05558                       [19] M. Klawe and B. Mumey, “Upper and lower bounds
    “President’s council of advisors on science and          [11]   M. Manohara, A. Moorthy, J. D. Cock, and                    on constructing alphabetic binary trees,” SIAM J.
    technology,” Dec. 2010. [Online]. Available:                    A. Aaron, Netflix Optimized Encodes. Los Gatos, CA,         Discrete Math., vol. 8, no. 4, pp. 638–651,
    https://obamawhitehouse.archives.gov/sites/default/             USA: Netflix Tech Blog, 2018. [Online]. Available:          Nov. 1995.
    files/microsites/ostp/pcast-nitrd-report-2010.pdf               https://netflixtechblog.com/optimized-shot-based-      [20] J. L. Hennessy and D. A. Patterson, Computer
[4] R. Bixby, “Solving real-world linear programs: A                encodes-now-streaming-4b9464204830                          Architecture: A Quantitative Approach, 5th ed.
    decade and more of progress,” Oper. Res., vol. 50,       [12]   S. Ismail, Why Algorithms Are the Future of Business        San Mateo, CA, USA: Morgan Kaufmann, 2019.
    no. 1, pp. 3–15, 2002, doi: 10.1287/opre.50.1.3.                Success. Austin, TX, USA: Growth Inst. [Online].       [21] F. Le Gall, “Faster algorithms for rectangular matrix
    17780.                                                          Available: https://blog.growthinstitute.com/                multiplication,” Commun. ACM, vol. 62, no. 2,
[5] C. E. Leiserson et al., “There’s plenty of room at the          exo/algorithms                                              pp. 48–60, 2012, doi: 10.1145/3282307.
    top: What will drive computer performance after          [13]   S. S. Kortum, “Research, patenting, and                [22] TopUniversities QS World Rankings, Quacquarelli
    Moore’s law?” Science, vol. 368, no. 6495,                      technological change,” Econometrica, vol. 65, no. 6,        Symonds Ltd., London, U.K., 2018.
    Jun. 2020, Art. no. eaam9744.                                   p. 1389, Nov. 1997.                                    [23] J. Bentley, “Programming pearls: Algorithm design
[6] K. Grace, “Algorithm progress in six domains,”           [14]   T. H. Cormen, C. E. Leiserson, R. L. Rivest,                techniques,” Commun. ACM, vol. 27, no. 9,
    Mach. Intell. Res. Inst., Berkeley, CA, USA,                    and C. Stein, Introduction to Algorithms,                   pp. 865–873, 1984, doi: 10.1145/358234.381162.
    Tech. Rep., 2013.                                               3rd ed. Cambridge, MA, USA: MIT Press,                 [24] D. Gries, “A note on a standard strategy for
[7] D. E. Womble, “Is there a Moore’s law for                       2009.                                                       developing loop invariants and loops,” Sci. Comput.
    algorithms,” Sandia Nat. Laboratories,                   [15]   J. Bentley, Program. Pearls, 2nd ed. New York, NY,          Program., vol. 2, no. 3, pp. 207–214, 1982, doi:
    Albuquerque, NM, USA, Tech. Rep., 2004. [Online].               USA: Association for Computing Machinery, 2006.             10.1016/0167-6423(83)90015-1.




                                                                                          Vol. 109, No. 11, November 2021 | P ROCEEDINGS OF THE IEEE                           1777