dask.dataframe.Index.drop_duplicates
dask.dataframe.Index.drop_duplicates¶
- Index.drop_duplicates(split_every=None, split_out=1, shuffle=None, **kwargs)[source]¶
Return Index with duplicate values removed.
This docstring was copied from pandas.core.indexes.base.Index.drop_duplicates.
Some inconsistencies with the Dask version may exist.
- Known inconsistencies:
keep=False will raise a
NotImplementedError
- Parameters
- keep{‘first’, ‘last’,
False
}, default ‘first’ (Not supported in Dask) ‘first’ : Drop duplicates except for the first occurrence.
‘last’ : Drop duplicates except for the last occurrence.
False
: Drop all duplicates.
- keep{‘first’, ‘last’,
- Returns
- Index
See also
Series.drop_duplicates
Equivalent method on Series.
DataFrame.drop_duplicates
Equivalent method on DataFrame.
Index.duplicated
Related method on Index, indicating duplicate Index values.
Examples
Generate an pandas.Index with duplicate values.
>>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
The keep parameter controls which duplicate values are removed. The value ‘first’ keeps the first occurrence for each set of duplicated entries. The default value of keep is ‘first’.
>>> idx.drop_duplicates(keep='first') Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')
The value ‘last’ keeps the last occurrence for each set of duplicated entries.
>>> idx.drop_duplicates(keep='last') Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object')
The value
False
discards all sets of duplicated entries.>>> idx.drop_duplicates(keep=False) Index(['cow', 'beetle', 'hippo'], dtype='object')