dask.dataframe.read_hdf
dask.dataframe.read_hdf¶
- dask.dataframe.read_hdf(pattern, key, start=0, stop=None, columns=None, chunksize=1000000, sorted_index=False, lock=True, mode='r')[source]¶
Read HDF files into a Dask DataFrame
Read hdf files into a dask dataframe. This function is like
pandas.read_hdf
, except it can read from a single large file, or from multiple files, or from multiple keys from the same file.- Parameters
- patternstring, pathlib.Path, list
File pattern (string), pathlib.Path, buffer to read from, or list of file paths. Can contain wildcards.
- keygroup identifier in the store. Can contain wildcards
- startoptional, integer (defaults to 0), row number to start at
- stopoptional, integer (defaults to None, the last row), row number to
stop at
- columnslist of columns, optional
A list of columns that if not None, will limit the return columns (default is None)
- chunksizepositive integer, optional
Maximal number of rows per partition (default is 1000000).
- sorted_indexboolean, optional
Option to specify whether or not the input hdf files have a sorted index (default is False).
- lockboolean, optional
Option to use a lock to prevent concurrency issues (default is True).
- mode{‘a’, ‘r’, ‘r+’}, default ‘r’. Mode to use when opening file(s).
- ‘r’
Read-only; no data can be modified.
- ‘a’
Append; an existing file is opened for reading and writing, and if the file does not exist it is created.
- ‘r+’
It is similar to ‘a’, but the file must already exist.
- Returns
- dask.DataFrame
Examples
Load single file
>>> dd.read_hdf('myfile.1.hdf5', '/x')
Load multiple files
>>> dd.read_hdf('myfile.*.hdf5', '/x')
>>> dd.read_hdf(['myfile.1.hdf5', 'myfile.2.hdf5'], '/x')
Load multiple datasets
>>> dd.read_hdf('myfile.1.hdf5', '/*')