.. include:: ../roles.incl ************************* Checkpoint/Restart Design ************************* .. toctree:: ============ Requirements ============ Three code functional requirements of I/O in Cello are: 1. writing data dumps for subsequent reading by external analysis/visualization applications 2. writing checkpoint files, and 3. reading checkpoint files to restart a previously run simulation (While writing image files such as "png" files is also included in the I/O component of Cello, here we focus on HDF5 files containing actual block data.) Additionally, writing and reading disk files must be scalable to the largest simulations runnable on the largest HPC platforms available, which necessarily include the largest parallel file systems available. This scalable I/O approach has been implemented for checkpoint/restart, and will be adapted for use with data dumps in the near future. ======== Approach ======== The approach used includes determining a block ordering to aid mapping blocks to files, what data are written to the files, and how file I/O is parallelized. -------- Ordering -------- The approach involves a generalization of the previous ``MethodOutput`` method, but enables load-balancing of data between disk files through the use of block `orderings` to define how blocks are mapped to files. Currently, the ordering used in ``MethodOutput``, which is implicit and embedded in the code, is based on a regular partitioning of root-level blocks together with their descendents. The updated implementation factors out this ordering into an ``Ordering`` class, provides a Morton space-filling curve ordering, and allows enables defining other orderings, such as Hilbert curves ------------ File content ------------ The content of the data files must be augmented to include all state data required to recreate a previously saved AMR block array on restart. Some information such as block connectivity are generated as blocks are inserted into the mesh hierarchy. Other information such as method or solver parameters are not stored, but are taken from the parameter file. This allows for "tweaking" of parameters on restart, for example to adjust refinement criteria or solver convergence criteria. ------------ Control flow ------------ Control flow is handled by separate ``IoWriter`` or ``IoReader`` chare arrays, where each element is associated with a single HDF5 file. Advantages over previous approaches are better load-balancing of I/O operations, and decoupling of I/O operations from the Block chare array. For Enzo-E checkpoint/restart data in particular, ``IoEnzoReader`` and ``IoEnzoWriter`` chare arrays are used. ====== Design ====== Components of the new I/O approach include 1. Control management * ``control_restart.cpp`` - ``Main::r_restart_enter()`` - ``Main::p_restart_done()`` - ``Main::restart_exit()`` 2. New Classes * ``EnzoMethodCheck`` * ``IoEnzoReader`` - ``IoEnzoReader::IoEnzoReader()`` * ``IoEnzoWriter`` - ``IoEnzoWriter::IoEnzoWriter()`` * ``IoReader`` - ``IoReader::IoReader()`` * ``IoWriter`` - ``IoWriter::IoWriter()`` * ``MethodOrderMorton`` ------------------ Output: checkpoint ------------------ .. image:: io-output.png :width: 800 -------------- Input: restart -------------- The UML sequence diagram below shows how the ``Simulation`` group, IoReader chare array, and Block chare array interoperate to read data from a checkpoint directory. Time runs vertically starting from the top, and the three Charm++ group/arrays are arranged into three columns. Code for restart is found in the ``enzo_control_restart.cpp`` file. .. image:: io-read.png startup ------- Restart begins in the "startup" phase, with the unique root block for the (0,0,0) octree in the array-of-octrees calling the ``Simulation`` entry method ``p_restart_enter()``. The ``p_restart_enter()`` entry method reads the number of restart files from the top-level `file-list` file, initializes synchronization counters, and creates the ``IoEnzoReader`` chare array, one element for each file. The ``IoEnzoReader`` constructors calld the ``p_io_reader_created()`` entry method in the root ``Simulation`` object to notify it that they've been created. ``p_io_reader_created`` counts the number of calls, and after it has received the last ``IoEnzoReader`` notification, it distributes the ``proxy_io_enzo_reader`` array proxy to all other ``Simulation`` objects by calling ``p_set_io_reader()``. ``p_set_io_reader()`` stores the incoming proxy, then calls the ``r_restart_start()`` barrier across ``Simulation`` objects, which is used to guarantee that all proxy elements will have been initialized before any are accessed in subsequent phases. level 0 ------- In the level-0 (root-level) phase, the root ``Simulation`` object reads the file names from the `file-list` file, and calls the ``p_init_root()`` entry method in all ``IoEnzoReader`` objects, sending the checkpoint directory and file names. The ``p_init_root()`` entry method opens the `block-data` (HDF5) file and reads global attributes. It also opens and reads tho `block-list` (text) file, reading in the list of blocks and organizing them by mesh refinement level. It reads in each block data, saving data in blocks levels greater than 0, and sending data to level-0 blocks. Note level-0 blocks exist at the beginning of restart, but no blocks in levels higher than 0 do. Data are packed and sent to blocks in levels <= 0 using the ``EnzoBlock::p_restart_set_data()`` entry method. The ``EnzoBlock::p_restart_set_data()`` method unpacks the data into the Block, then notifies the associated ``IoEnzoReader`` file object that data has been received using the ``p_block_ready`` entry method. ``IoEnzoReader::p_block_ready()`` counts the number of block-reday acknowledgements, and after the last one calls ``Simulation::p_restart_next_level()`` to process the next refinement level blocks. level k ------- The level-k phase for k=1 to L is more complicated than level-0 because the level k > 0 blocks must be created first. Assuming blocks up through level k-1 have been created, the root ``Simulation`` object calls ``IoEnzoReader::p_create_level(k)`` for each ``IoEnzoReader``. In ``p_create_level()``, synchronization counters are initialized for counting the k-level blocks, and then each block in the list of level-k blocks is processed. To reuse code from the adapt phase, level-k blocks are created by refining the `parent` block, via a ``p_restart_refine()`` entry method. In ``p_restart_refine()``, the parent level k-1 block creates a new child block, inserts the new block in its own child list, and recategorizes as a non-leaf. In the ``EnzoBlock`` constructor, the newly created block checks if it's in a restart phase, and if so sends an acknowledgement to the associated ``IoEnzoReader`` object using the ``p_block_created()`` entry method. In ``p_block_created`` the ``IoEnzoReader`` object counts the number of acknowledgements from newly-created level-k blocks, and after it receives the last one it calls ``p_restart_level_created()`` on the root-level ``Simulation`` object. After this, the rest of the level-k phase mirrors that of the level-0 phase. cleanup ------- In the cleanup section, after all blocks up to the maximum level have been created and initialized, the ``p_restart_next_level()`` entry method calls the Charm++ call ``doneInserting()`` on the block chare array, then calls ``p_restart_done()`` on all the blocks, which completes the restart phase. ------- Classes ------- EnzoMethodInput =========== Data format =========== Data for a given checkpoint dump are stored in a single checkpoint directory, specified in the user's parameter file using the ``Method:check:dir`` parameter. The number of data files in the directory is specified using the ``Method:check:num_files`` parameter. A rule-of-thumb is to use the same number of files as (physical) nodes in the simulation. Data files are named ``block_data-`` `x` ``.h5``, where 0 <= x < ``num_files``. The format of data files is given in the next section. Each data file has an associated `block-list` text file named ``block_data-`` `x` ``.block_list``. The block-list file contains a list of all block names in the associated data file, together with each block's mesh refinement level. There is one block listed per line, and the block name and level are separated by a space. A ``check.file_list`` text file is also included, which includes the number of data files, and a list of the file prefixes ``block_data-`` `x`. Note all blocks are included in the files, not just leaf-blocks, and including blocks in "negative" refinement levels. ------------------ Data file contents ------------------ The HDF5 data files are used to store all block state data, as well as some global data. Simulation attributes --------------------- Metadata for the simulation are stored in the top-level "/" group. These include the following: * `cycle`: Cycle of the simulation dump. * `dt`: Current global time-step. * `time`: Current time in code units. * `rank`: Dimensionality of the problem. * `lower`: Lower extents of the simulation domain. * `upper`: Upper extents of the simulation domain. * `max_level`: Maximum refinement level. Block attributes ---------------- Block attributes and data are stored in HDF5 groups with the same name as the block, e.g. "B00:0_00:0_00:0". Block attribute data include the following: * `cycle`: Cycle of this block. * `dt`: Current block time-step. * `time`: Current time of this block. * `lower`: Lower extents of the block. * `upper`: Upper extents of the block. * `index`: Index of the block, specified using three 32-bit integers. * `adapt_buffer`: Encoding of the block's neighbor configuration. * `num_field_data`: currently unused. * `array`: Indices identifying the octree containing the block in the "array-of-octrees". * `enzo_CellWidth`: Corresponds to the EnzoBlock ``CellWidth`` parameter. * `enzo_GridDimension`: Corresponds to the EnzoBlock ``GridDimension`` parameter. * `enzo_GridEndIndex`: Corresponds to the EnzoBlock ``GridEndIndex`` parameter. * `enzo_GridLeftEdge`: Corresponds to the EnzoBlock ``GridLeftEdge`` parameter. * `enzo_GridStartIndex`: Corresponds to the EnzoBlock ``GridStartIndex`` parameter. * `enzo_dt`: Corresponds to the EnzoBlock ``dt`` parameter. * `enzo_redshift`: Corresponds to the EnzoBlock ``redshift`` parameter. Block data ---------- Block data are stored as HDF5 datasets. Fields are currently stored as arrays of size ``(mx,my,mz)``, where ``mx``, ``my``, and ``mz`` are the dimensions of the field data `including` ghost data. (Note that future checkpoint versions may only include non-ghost data to reduce disk space.) Dataset names are field names with ``"field_`` prepended, for example ``"field_density"``. Particles are stored as one-dimensional HDF5 datasets, one dataset per attribute per particle type. Datasets are named using ``"particle"`` + `particle-type` + `particle attribute`, delimited by underscores. For example, ``"particle_dark_vx"`` for the x-velocity particle attribute ``"vx"`` values of the ``"dark"`` type particles in the block. The length of the arrays equals the number of that type of particle in the block.