dask.array.reduction

dask.array.reduction¶

dask.array.reduction(x, chunk, aggregate, axis=None, keepdims=False, dtype=None, split_every=None, combine=None, name=None, out=None, concatenate=True, output_size=1, meta=None, weights=None)[source]¶

General version of reductions

Parameters

x: Array

Data being reduced along one or more axes

chunk: callable(x_chunk, [weights_chunk=None], axis, keepdims)

First function to be executed when resolving the dask graph. This function is applied in parallel to all original chunks of x. See below for function parameters.

combine: callable(x_chunk, axis, keepdims), optional

Function used for intermediate recursive aggregation (see split_every below). If omitted, it defaults to aggregate. If the reduction can be performed in less than 3 steps, it will not be invoked at all.

aggregate: callable(x_chunk, axis, keepdims)

Last function to be executed when resolving the dask graph, producing the final output. It is always invoked, even when the reduced Array counts a single chunk along the reduced axes.

axis: int or sequence of ints, optional

Axis or axes to aggregate upon. If omitted, aggregate along all axes.

keepdims: boolean, optional

Whether the reduction function should preserve the reduced axes, leaving them at size output_size, or remove them.

dtype: np.dtype

data type of output. This argument was previously optional, but leaving as None will now raise an exception.

split_every: int >= 2 or dict(axis: int), optional

Determines the depth of the recursive aggregation. If set to or more than the number of input chunks, the aggregation will be performed in two steps, one chunk function per input chunk and a single aggregate function at the end. If set to less than that, an intermediate combine function will be used, so that any one combine or aggregate function has no more than split_every inputs. The depth of the aggregation graph will be \(log_{split_every}(input chunks along reduced axes)\). Setting to a low value can reduce cache size and network transfers, at the cost of more CPU and a larger dask graph.

Omit to let dask heuristically decide a good default. A default can also be set globally with the split_every key in dask.config.

name: str, optional

Prefix of the keys of the intermediate and output nodes. If omitted it defaults to the function names.

out: Array, optional

Another dask array whose contents will be replaced. Omit to create a new one. Note that, unlike in numpy, this setting gives no performance benefits whatsoever, but can still be useful if one needs to preserve the references to a previously existing Array.

concatenate: bool, optional

If True (the default), the outputs of the chunk/combine functions are concatenated into a single np.array before being passed to the combine/aggregate functions. If False, the input of combine and aggregate will be either a list of the raw outputs of the previous step or a single output, and the function will have to concatenate it itself. It can be useful to set this to False if the chunk and/or combine steps do not produce np.arrays.

output_size: int >= 1, optional

Size of the output of the aggregate function along the reduced axes. Ignored if keepdims is False.

weightsarray_like, optional

Weights to be used in the reduction of x. Will be automatically broadcast to the shape of x, and so must have a compatible shape. For instance, if x has shape (3, 4) then acceptable shapes for weights are (3, 4), (4,), (3, 1), (1, 1), (1), and ().

Returns

dask array
Function Parameters
x_chunk: numpy.ndarray: Individual input chunk. For chunk functions, it is one of the original chunks of x. For combine and aggregate functions, it’s the concatenation of the outputs produced by the previous chunk or combine functions. If concatenate=False, it’s a list of the raw outputs from the previous functions.
weights_chunk: numpy.ndarray, optional: Only applicable to the chunk function. Weights, with the same shape as x_chunk, to be applied during the reduction of the individual input chunk. If weights have not been provided then the function may omit this parameter. When weights_chunk is included then it must occur immediately after the x_chunk parameter, and must also have a default value for cases when weights are not provided.
axis: tuple: Normalized list of axes to reduce upon, e.g. (0, ) Scalar, negative, and None axes have been normalized away. Note that some numpy reduction functions cannot reduce along multiple axes at once and strictly require an int in input. Such functions have to be wrapped to cope.
keepdims: bool: Whether the reduction function should preserve the reduced axes or remove them.

dask.array.rechunk

dask.array.register_chunk_type