dask.bag.Bag.map_partitions

dask.bag.Bag.map_partitions

Bag.map_partitions(func, *args, **kwargs)[source]

Apply a function to every partition across one or more bags.

Note that all Bag arguments must be partitioned identically.

Parameters
funccallable

The function to be called on every partition. This function should expect an Iterator or Iterable for every partition and should return an Iterator or Iterable in return.

*args, **kwargsBag, Item, Delayed, or object

Arguments and keyword arguments to pass to func. Partitions from this bag will be the first argument, and these will be passed after.

Examples

>>> import dask.bag as db
>>> b = db.from_sequence(range(1, 101), npartitions=10)
>>> def div(nums, den=1):
...     return [num / den for num in nums]

Using a python object:

>>> hi = b.max().compute()
>>> hi
100
>>> b.map_partitions(div, den=hi).take(5)
(0.01, 0.02, 0.03, 0.04, 0.05)

Using an Item:

>>> b.map_partitions(div, den=b.max()).take(5)
(0.01, 0.02, 0.03, 0.04, 0.05)

Note that while both versions give the same output, the second forms a single graph, and then computes everything at once, and in some cases may be more efficient.