Extend sizeof
Extend sizeof¶
When Dask needs to compute the size of an object in bytes, e.g. to determine which objects to spill to disk, it uses the dask.sizeof.sizeof
registration mechanism. Users who need to define a sizeof
implementation for their own objects can use sizeof.register
:
>>> import numpy as np
>>> from dask.sizeof import sizeof
>>> @sizeof.register(np.ndarray)
>>> def sizeof_numpy_like(array):
... return array.nbytes
This code can be executed in order to register the implementation with Dask by placing it in one of the library’s modules e.g. __init__.py
. However, this introduces a maintenance burden on the developers of these libraries, and must be manually imported on all workers in the event that these libraries do not accept the patch.
Therefore, Dask also exposes an entrypoint under the group dask.sizeof
to enable third-party libraries to develop and maintain these sizeof
implementations.
For a fictitious library numpy_sizeof_dask.py
, the necessary setup.cfg
configuration would be as follows:
[options.entry_points]
dask.sizeof =
numpy = numpy_sizeof_dask:sizeof_plugin
whilst numpy_sizeof_dask.py
would contain
>>> import numpy as np
>>> def sizeof_plugin(sizeof):
... @sizeof.register(np.ndarray)
... def sizeof_numpy_like(array):
... return array.nbytes
Upon the first import of dask.sizeof, Dask calls the entrypoint (sizeof_plugin
) with the dask.sizeof.sizeof
object, which can then be used to register a sizeof implementation.