dask.bag.read_avro
dask.bag.read_avro¶
- dask.bag.read_avro(urlpath, blocksize=100000000, storage_options=None, compression=None)[source]¶
Read set of avro files
Use this with arbitrary nested avro schemas. Please refer to the fastavro documentation for its capabilities: https://github.com/fastavro/fastavro
- Parameters
- urlpath: string or list
Absolute or relative filepath, URL (may include protocols like
s3://
), or globstring pointing to data.- blocksize: int or None
Size of chunks in bytes. If None, there will be no chunking and each file will become one partition.
- storage_options: dict or None
passed to backend file-system
- compression: str or None
Compression format of the targe(s), like ‘gzip’. Should only be used with blocksize=None.