dask.dataframe.DataFrame.sort_values
dask.dataframe.DataFrame.sort_values¶
- DataFrame.sort_values(by: str | list[str], npartitions: Optional[Union[int, Literal['auto']]] = None, ascending: bool = True, na_position: Union[Literal['first'], Literal['last']] = 'last', sort_function: collections.abc.Callable[[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame] | None = None, sort_function_kwargs: collections.abc.Mapping[str, Any] | None = None, **kwargs) dask.dataframe.core.DataFrame [source]¶
Sort the dataset by a single column.
Sorting a parallel dataset requires expensive shuffles and is generally not recommended. See
set_index
for implementation details.- Parameters
- by: str or list[str]
Column(s) to sort by.
- npartitions: int, None, or ‘auto’
The ideal number of output partitions. If None, use the same as the input. If ‘auto’ then decide by memory use.
- ascending: bool, optional
Sort ascending vs. descending. Defaults to True.
- na_position: {‘last’, ‘first’}, optional
Puts NaNs at the beginning if ‘first’, puts NaN at the end if ‘last’. Defaults to ‘last’.
- sort_function: function, optional
Sorting function to use when sorting underlying partitions. If None, defaults to
M.sort_values
(the partition library’s implementation ofsort_values
).- sort_function_kwargs: dict, optional
Additional keyword arguments to pass to the partition sorting function. By default,
by
,ascending
, andna_position
are provided.
Examples
>>> df2 = df.sort_values('x')