The Data-Oriented Design Process for Game Development

Authors Jessica D. Bayliss

License CC-BY-4.0


           The Data-Oriented
           Design Process for
           Game Development
                  Jessica D. Bayliss, Rochester Institute of Technology & Unity Technologies

                  Data-oriented design is a growing software development
                  process for games that has not been well studied in academia.
                  It seeks to subtract complicated design methods from
                  problem solving and leverage the simplicity of what computer
                  architecture is designed to do: input, transform, and output data.

                    ata-oriented design (DOD) grew when game                                                           When discussing software processes, it is important
                    developers needed to use modern hardware                                                       to consider that we are likely biased when solving prob-
                    architectures for performant games, and exist-                                                 lems using a computer. Current research suggests that
                    ing software processes did not meet their needs.                                               we overlook subtractive changes in problem-solving in
         The DOD process reduces software to a basic goal of com-                                                  comparison with additive changes.1 For example, when
         puter architecture: to input, transform, and output data.                                                 given a Lego block bridge that has a one-block differ-
            To properly explain DOD, we first define DOD and                                                       ence between the left and right sides, most people under
         compare it to similar processes. A history of how DOD                                                     a time constraint will choose to add a block to one side
         evolved is introduced, and core concepts in the DOD                                                       rather than remove a block from the other side. It makes
         process are further discussed through several relevant                                                    sense that this bias would also impact how we make
         examples. The Unity Technologies Data-Oriented Tech                                                       decisions in developing software. “Feature creep” is a
         Stack (DOTS) is brought up as a canonical use of DOD,                                                     known potential issue, and most software is developed
         and the conclusion mentions the use of DOD outside of                                                     within time constraints. This bias can lead to bloated
         game development as well as its future.                                                                   and slow software, incompatible with soft real-time sys-
                                                                                                                   tems, such as games, which are required to consistently
    The emphasis on data as a design            DOD does not emphasize data con-           out and carefully thought through
driver allows DOD to reduce unneces-         trol or data flow in a program, only that     rather than just “thrown in” as part of
sary complexity and emphasizes that          the program be defined in terms of data       a “generic” solution. Solving for a prob-
transforming data well means that one        input, transformations, and output. The       lem that needs a flexible solution is a
must understand characteristics of the       core that ties all of the patterns in DOD     different concrete problem than solv-
data as well as the whole supply chain       together is that programs only input,         ing for nonflexible cases.
of development (for example, hardware        transform, and output data. All ­elements        In the search to understand data
                                                                                           transformations (especially when they
                                                                                           work poorly), the whole supply chain
                                                                                           for software development, including
                                                                                           both hardware and tools, is considered.
                  CURRENT RESEARCH SUGGESTS THAT                                           As an example, hardware is consid-
                 WE OVERLOOK SUBTRACTIVE CHANGES                                           ered primarily because it provides the
                 IN PROBLEM-SOLVING IN COMPARISON                                          physical means to transform data. A
                                                                                           DOD proponent does not seek to know
                       WITH ADDITIVE CHANGES.
                                                                                           all hardware specifics but to specifi-
                                                                                           cally understand how the details of the
                                                                                           hardware impact the constraints that
                                                                                           the software needs to meet. The most
and compilers) that implements data          and patterns involved with DOD may be         common hardware considerations in
transformations. Historically, view-         understood through this focus.                DOD are cache performance and mul-
ing data as core to the software devel-         DOD asks detailed questions about          ticore processing, for this reason. Both
opment process is not a new concept.         the data and uses answers to design soft-     of these hardware elements greatly
For example, data flow programming           ware. Examples include asking about           impact the performance of the input,
was conceived in the 1960s and concen-                                                     transformation, and output of data.
trates on the flow of data through soft-        ›› type                                       The core of DOD is not about opti-
ware algorithms, primarily for parallel         ›› distribution                            mization or making fast programs
computation.2,3 The concept of data             ›› count                                   through hardware consideration; it is
flow is related to DOD, but, in data flow,      ›› storage                                 about organizing programs around a
the emphasis is not truly on knowledge          ›› accuracy.                               deep knowledge of data and its transfor-
of data but on the flow of data from one                                                   mation. A slow program can be based
algorithm to another.                            In modeling, knowledge of the data        around modeling data and ignoring the
   DOD is an imperative design process       can change the way programs are cre-          supply chain view that DOD proponents
due to its emphasis on program state         ated. As an example, it becomes possi-        use. If performance is not important for
changes; however, it is different from       ble to consider which program system to       the application, then the knowledge of
similar-sounding design processes,           create next from how often that system        data can still be used to create software.
such as data-driven design, which is         is run and how much data it transforms.       However, it cannot be understated that
also related to data flow but allows the     Within a game context, if a character         engineering software well requires
input data to control the state of the       spends most of his or her time walking        understanding transformations. Hard-
program, sometimes even at a com-            around the game world, one could fore-        ware knowledge is required primarily
puter architecture level.4 In software       see that walking is very important and        because models of software and compil-
design, it is commonly used in games to      should be a high priority in development.     ers are abstractions, and those abstrac-
increase flexibility. As an example, the         Deep knowledge of data and the            tions are leaky and inconsistent.5
data for a game level may contain infor-     problem being solved leads to concrete
mation about special effects and door        solutions but does not preclude flexibil-     THE HISTORY OF DOD
state changes that are read in and exe-      ity in software. Flexibility is part of the   The movement toward data orienta-
cuted by a data-driven game program.         design process and should be planned          tion in game development occurred

after the PlayStation (PS) 3 game con-     Technology that investigated DOD.9             Figure 1 contains an image of a smart-
sole was released in the mid-2000s.        Straume interviewed several game               home device that can control multiple
The game console used the Cell hard-       industry proponents for DOD in the             lights through verbal commands, but,
ware architecture, which contains a        thesis and concluded that, while they          for a user who just wants to turn a light
PowerPC core and eight synergistic         differed in their characterizations of         on/off, it is not readily apparent how
processing elements, of which game         DOD, the core characteristics of DOD           to control a single light unless one has
developers traditionally used six. This    were to focus on solving specific prob-        already memorized the name of the
forced game developers to make the         lems rather than generic ones, the con-        light and the process for controlling it.
move from a single-threaded view of        sideration of all kinds of data, making        It also does not work when the house
software development to a more paral-      decisions based on data, and an empha-         has Internet problems. This represents
lel way of thinking about game execu-      sis on performance in a wide sense.            an overcomplicated solution for the ini-
tion to push the boundaries of perfor-         Both Fabian and Straume discuss            tial problem that requires extra data for
mance for games on the platform. At        DOD elements without fully tying               what is really a single-bit operation.
the same time, large-scale (AAA) game      those elements together to form a                  This is similar to what happens
development was growing in complex-        design practice that can be used to cre-       when software is constructed that tries
ity, with an emphasis on more content      ate software. The overarching theme            to abstract away from the core prob-
and realistic graphics.                    that ties all of the elements together         lem it tries to solve. A complicated list
    Within that environment, data paral-   is that software exists to input, trans-       of many other problems may be solved,
lelism and throughput were very import-    form, and output data.                         but the initial reason for the software to
ant. As practices coalesced around tech-                                                  exist may no longer be apparent or easy
niques used in game development, the       DOD                                            to use. Another example of this in phys-
term DOD was created and first men-                                                       ical hardware would be the PS4 game
tioned in an article in Game Developer     The light switch problem                       console power button. A lot of work was
magazine in 2009.6                         In DOD, questions about both the prob-         obviously put into making sure that the
    In 2017, Unity Technologies, known     lem and data used for the problem
best for the Unity Real-Time Develop-      need to be asked before attempting
ment Platform used by many game            a solution. The light switch problem
developers, hired DOD proponents           addresses the action of turning on a
Mike Acton and Andreas Fredriksson         light and exemplifies how to think of
from Insomniac Games to “democ-            data representation. In DOD, questions
ratize data-oriented programming”          about both the problem and data used
and to realize a philosophical vision      for the problem need to be asked before
with the tagline of “performance by        attempting a solution. For example:
default.”7 The result has been the
introduction of a DOTS, which is a            ›› Is the light meant to be variable
canonical use of DOD techniques.                 in intensity or just on and off?
    To date, many blogs and talks have        ›› What hardware is available to help
discussed DOD since the original arti-           with turning the light on/off?
cle, but very little has been studied in      ›› What data are necessary for
academia regarding the process. Rich-            turning the light on/off?
ard Fabian, a practitioner from indus-
try, published a book on DOD in 2018,          If the light is meant to be a simple
although it existed for several years in   light that can be turned on and off, then
                                                                                          FIGURE 1. A smart-home device that is
draft form online.8                        it really only requires a single bit of data
                                                                                          capable of turning multiple lights on/off as
    In 2019, a master’s thesis was pub-    to represent on (one) and off (zero). An
                                                                                          well as controlling blinds through verbal
lished by Per-Morten Straume at the        abstract solution can easily obscure
Norwegian University of Science and        necessary data for the problem solution.

game controller can turn the PS4 on/            commonly creates more complicated            breaking rocks, tilling soil, planting
off, but people not using the game con-         solutions that need to be fully tested/      crops, and selling those crops to a store
troller may have to look up an image to         debugged. Additional levels of com-          for money. The full simulation is too
see where the power on/off switch is            plexity also equal additional require-       complex for presentation, but it is use-
located, as it is hidden under a decora-        ments for testing and validation.            ful to show how to model software from
tive panel on the front of the PS4.                 One bit that represents the on/off       a data perspective.
   Given the complexity of most soft-           behavior of the light is necessary to           How does one begin to view this in
ware problems that need much more               turn on the light. Subtracting all of the    a DOD way, and how does using DOD
than a single bit of data, it is import-        extra data available yields an interface     alter the software development of the
ant not to allow extra complexity into          to the light bit that could look some-       program? Ignoring the 3D models in
the solution, as that extra complexity          thing like Figure 2(a). This particular      the program (which each have their
                                                solution is very similar to the solutions    own data), one potential set of main
                                                that DOD proponents seek in that it          simulation data includes
                                                well represents the data and transfor-
                                                mation of those data.                           ›› position (x, y)
                                                    There are some extra considerations         ›› speed
                                                involved with that solution outside of          ›› direction
                                                the initial questions asked, though. For        ›› target
                                                instance, the wall plate acts as a safety       ›› state
                                                measure and covers the wires on the             ›› scale.
                                                back side of the light switch so that peo-
                                                ple cannot accidentally touch them. It           These are the data necessary to per-
                                                is normal for extra considerations to        form the simulation part of the program.
                                                come up in the design and implemen-          Rather than considering each “thing” in
(a)                   (b)                       tation of a solution, but each consider-     the game as its own object with separate
FIGURE 2. A switch (a) that minimizes the       ation should be carefully thought about      activities, writing out data information
data necessary for turning on a light and       before being added to the solution.          allows operations across the game to be
(b) with slightly more data that allows a           In Figure 2(a), the solution assumes     batched where possible and functional-
user to reason about which direction of         the user knows that up is the on posi-       ity used by multiple sets of data.
the light switch is the on position.            tion and down is the off position, as the        As an example, everything in the
                                                state change is not labeled. While this      game has a position and can be placed
                                                is true in some countries, in others, the    at the same time. Only plants and
                                                opposite is true.                            farmers move. (Plants are carried to
                                                    This hinders the usability of the        the store by farmers.) This allows for
                                                switch, as the light switch on the left      the same movement functionality to
                                                does not contain all necessary data for a    be used on farmers and plants. DOD
                                                user to know how to turn it on and off.      looks at common data as well as oper-
                                                A final solution to the light problem may    ations and allows for the batching of
                                                look something like Figure 2(b), where       those data with those operations.
                                                ON is displayed, and the user is given all       In viewing the simulation in a DOD
                                                of the information necessary to turn on      manner, planning would also look at
                                                the light.                                   which transformations are made the
                                                                                             most often. As an example, farmers in
FIGURE 3. The Autofarmers simulation,
                                                The Autofarmers simulation                   the simulation spend most of their time
where robotic farmers break rocks, till soil,
                                                problem                                      walking to different places. Hence,
plant procedurally generated crops, and
                                                The Autofarmers simulation is shown          a DOD-based solution would seek to
take those crops to the market.
                                                in Figure 3 and consists of farmers          understand the exact data necessary

for the input to movement (for exam-        input data are in bytes that range from 0       DOD solutions require as much
ple, the position, speed, direction, and    to 255. Each individual byte is in a qua-   knowledge about the problem and
target) and construct the movement          druplet that represents the red, green,     transformations as possible. For exam-
transformation early in writing the         blue, and alpha values in the color         ple, is it required that the solution not
software for the simulation.                image. The image being transformed          overwrite the original image? If not,
   This would allow for experiments to      will be in an array of 1,024 × 1,024,       then two arrays are not necessary in
be made that measure pathing/move-          meaning that there are 1,048,576 pixels     the potential solution. What is the
ment and potential solutions for any        (each with a red–green–blue–alpha qua-      hardware, and what language is being
problems found. Using a poor solution       druplet) to be transformed.                 used for the solution?
for pathing/movement means that the            One possible transformation from             If the hardware is a modern PC archi-
simulation may run poorly due to the        color to gray scale mathematically is       tecture, then cache usage should be con-
amount of time spent moving in the          the byte average of the red, green, and     sidered, along with multiprocessor capa-
simulation, and this is a priority prob-    blue color information. The average         bilities. Both of these things are parts of
lem for software development due to         for each pixel’s color information will     hardware that can greatly impact the
the frequency of the data transforma-       replace each pixel’s red, green, and blue   engineering of data transformations.
tion in the program.                        information, while the value of alpha
                                            will remain the same. The output is an      Organizing data for good cache
Using minimal data for a concrete           array of size 1,024 × 1,024.                usage. One of the main bottlenecks
solution. Most data used in the imple-         The minimal design for this pro-         for performance on modern PCs is the
mentation of Autofarmers are 2D since       gram consists of the input data (an         bus and data transfer from storage to
the simulation is 2D, and the height is a   array with all of the data elements),       the processor. The concept of caching
constant. Developing a 2D simulation        the gray scale transformation, and          away data is fairly simple. If somebody
is a different problem than creating a      the output of the 1,024 × 1,024 trans-      is working late one night, and they
3D one, in terms of both data and algo-     formed data. Here is a potential partial    think ahead of time that a snack might
rithms. This is not a case where three      code transformation for the problem         be good for an energy boost in the early
dimensions should be solved for “just       that will yield a solution:                 hours of the morning, then they will
in case” since the algorithms and data                                                  grab the snack (assuming it doesn’t
are different for each one. On average,     for (int x = 0; x < width; x += 4){         need refrigeration—don’t try this with
3D computations are more expensive             for (int y = 0; y < height; y++){       ice cream) and put it near their desk.
than 2D ones, as well.                           byte avg =(byte)((image[y, x] +        That way, they will not have to get up
                                                    image[y, x + 1] +                   and go to the kitchen later. The food
DOD SOFTWARE DESIGN                                 image[y, x + 2])/3);                has been cached near the desk.
A key element of design is data, so one         result[y, x] = avg;                        Computers also think ahead and pre-
designs data before code and views the          result[y, x + 1] = avg;                 cache data for the processor. That way, it
data early and often. The hardware              result[y, x + 2] = avg;                 can more quickly and easily move those
and compiler information are used               result[y, x + 3] = image[y, x + 3];    data into and out of registers for process-
along with any other tools that impact         }                                        ing. The time cost for an L1 cache (the
data transformations to understand          }                                           closest cache to the processor) access is
the transformation and how to best                                                      less than 1 ns, whereas random-access
make that transformation.                       The potential transformation takes      memory access off the chip is on the
                                            the data, converts those data to gray       order of 100 ns. If the wrong data are
An image transformation problem             scale through averaging the color           brought into the cache because they are
Say that the problem is to transform        information, and puts that data into        poorly organized in memory, then the
image data from color to gray scale.        an output array. Is this the best poten-    penalty time to read the correct data is
For a DOD solution, one must first start    tial solution? It is unknown, as there      much larger than if the data were able to
with the data, which should be designed     is information about the problem and        be prefetched for use. Additionally, the
before the code. For this problem, the      transformation that has not been stated!    extra time cost for access to data farther

away from the processor also turns into         for the solution), data are in a row-major     important as games have evolved. The
extra power usage, as moving data costs         format, meaning that elements are laid         DOD view of programs as data input,
more in terms of power than processing          out next to each other in rows, rather         transformation, and output is helpful
those data.                                     than column-major format (for example,         when designing for parallel processing.
    Structurally, a common pattern for          Fortran), where columns are next to each       The need for synchronization is a large
organizing data for good cache usage            other in memory. Taking this informa-          bottleneck to performance for games.
is to employ Structures of Arrays (SoA)         tion into account means that, to properly         Since race conditions only happen
to organize arrays of homogeneous               lay out the image data for transforma-         when data can be changed or written
data so that they can easily be read and        tion, the two for loops in the piece of code   to, knowledge of the read and write
used in programs. A common phrase               should be swapped (image height should         status for all data and when transfor-
in DOD says that “where there is one,           come first, with the inner loop on image       mations happen in a program helps to
there are many,” as processing data             width) for better cache utilization.           avoid race conditions. The SoA format
in a batch can have multiple benefits,              Better cache utilization is only           can be used to help batch jobs since
including good cache usage. This dif-           needed for the solution because the            parallelization needs a certain number
fers from some traditional approaches,          requirement specifies that the data            of elements to process before parallel
where all data based around a single            count is enough that cache utilization         processing becomes advantageous.
concept are organized in a class based          will make a difference in performance.
around that concept, and instances              It is possible that DOD can be used to         The image transformation problem
of those classes are put into arrays            accomplish goals other than perfor-            and multiprocessor usage. For the
and methods called on each individ-             mance, such as memory or power utili-          image transformation problem, using
ual instance. Figure 4 shows how the            zation considerations. For this applica-       multiple processors likely will not help
organization differs in SoA when com-           tion, if the image was a 16 × 16 image,        to solve the problem, as it is only a single
pared to Arrays of Structures (AoS).            then it would not matter which way the         small image that is being transformed,
                                                transformation was done because there          and the overhead for setting up parallel
The image transformation problem                are only 256 total pixels to transform,        processing can be more than the trans-
and cache usage. Organizing data for            and they can fit within the cache on           formation of a single image. In the case
good cache usage appears to have                modern processors.                             of a problem that required transform-
already been done for the potential solu-                                                      ing a set of images, parallel processing
tion in the image problem. However,             Multiprocessor usage. Multiproces-             could be very useful and save signifi-
in the C# language (the language chosen         sor performance has become more                cant time in processing.

                                                                                               View data early and often. Even
            Structures of Arrays                   Arrays of Structures
                                                                                               though the first potential solution is
                                                                                               not a good one when further infor-
           Type1[ ]            System1                                                         mation is seen regarding the problem
                                                           A[1]:                               and data, proposing the first solution
                                                                                               allows us to reason about the data
                                               A[0]:       System1 A[2]:                       and improve on how it is input, trans-
           Type2[ ]           System2          Type1       System2 Type1                       formed, and output. Many problems
                                               Type2               Type2                       exist that do not have fully specified
                                               System1             System1                     requirements. Examples in game
                                               System2             System2
                                                                                               development abound since it is a cre-
                                                                                               ative field, and game designs change
                                                                                               frequently to “find the fun.”
FIGURE 4. Structures of Arrays (SoA), where data are laid out in homogenous arrays
                                                                                                  Viewing the data early and often
fed into systems for data transformation, and Arrays of Structures (AoS), where data are
                                                                                               aids in approximate solutions for
generally laid out in instances of classes, and systems are called on single pieces of data.
                                                                                               incomplete data knowledge. As an

example, DOD proponents commonly             handles for a key chain of specific data    well as which jobs run at what time
write small, experimental programs           components associated with the entity,      within the frame, and it shows this
(or add debugging statements to              components are struct data (in C#, a        across worker threads.
existing programs) that seek to deter-       struct is a value type) for input/output,       An entity debugger allows runtime
mine unknown knowledge, such as              and systems are the data transforma-        information to be displayed. Infor-
frequency or range. This knowledge           tions necessary for solving problems.       mation about both components and
can help refine a solution. In this             If some of this information reminds      systems is displayed, and data can be
way, progressive approximations can          people of databases, it is true that DOD    filtered to obtain information about a
be made. While incomplete data can           can be compared to how data are orga-       specific type of component. With these
lead to solutions that must be iterated      nized in databases, and DOD-based           tools, it is easier to create small exper-
on, flexibility in an application is         frameworks commonly have the con-           iments to discover information about
part of the design process and should        cept of a program-based query. An           data during the runtime.
be carefully considered separately           ECS allows for queries on components

from the experiments done for mea-           and commonly uses them as filters
suring data.                                 for jobs.                                            OD concepts have been pre-
                                                                                                  sented within the context of
THE UNITY DOTS                               Job system                                           the game development field.
Unity’s DOTS is a large-scale indus-         The job system exists to make data par-     DOD does not just exist within the game
try example of DOD and consists of an        allelism in C# easier. Jobs accept blit-    development field, and there are indica-
experimental set of packages in Unity        table (simple data types, such as float     tions that it has relevance for other fields
that were introduced after they were         and int) structures, transform the data     in software development. As an exam-
announced in 2017.4 It is currently the      in those structures, and output results.    ple, in 2014, CppCon: The C++ Confer-
most publicly available example of           The job system contains several differ-     ence invited DOD proponent Mike Acton
DOD in an industry product since the         ent constructs that range from job-based    to give a keynote speech to the larger C++
entity component system (ECS) source         parallel for loops that capture outside     community,10 and common DOD design
code is viewable online. The core com-       variables with a lambda expression to       patterns, such as using SoA rather than
ponents of DOTS are                          structures that have their own execute      AoS, can aid any application that pro-
                                             function.                                   cesses a lot of data programmatically.
   ›› the burst compiler                         Job inputs can be tagged as read only       In terms of philosophy, some of
   ›› an ECS                                 to allow for better knowledge regard-       the concepts of DOD are at odds with
   ›› a job system                           ing how the data are being input, trans-    object-oriented design (OOD), although
   ›› testing and debugging                  formed, and output. Jobs expect to          it depends on which definition of OOD
      support tools.                         obtain data laid out in an SoA manner,      is used, and it is highly dependent on
                                             and options exist to determine vari-        the problem being solved. A discussion
Burst compiler                               ous worker thread settings.                 of the many OOD definitions is outside
The burst compiler is an excellent                                                       the scope of this article, but, since DOD
example of understanding and using           Testing and debugging                       requires that the data be considered
the whole software development supply        support tools                               first and foremost for software devel-
chain to create better solutions. It is an   One important part of viewing data          opment, simulated objects represent-
LLVM-based compiler technology that          early and often is to have support for      ing the problem space are unlikely to
optimizes C# code for Unity’s job sys-       adequately accessing the data. While        be used. The philosophy of DOD does
tem. It exists to allow for better overall   profiler support of the job system in       not state that objects representing the
game performance when using DOTS.            Unity is not one of the main selling        problem space cannot be used if they
                                             features of DOTS, it supports DOD           happen to well represent the data for
ECS                                          development efforts deeply. The pro-        input, transformations, and output.
Unity’s ECS implementation works             filer specifically profiles per frame and       DOD promotes solving concrete prob-
closely with the job system. Entities are    shows overall job system utilization as     lems as opposed to generic ones, which

