dask.dataframe.Series

dask.dataframe.Series¶

class dask.dataframe.Series(dsk, name, meta, divisions)[source]¶

Parallel Pandas Series

Do not use this class directly. Instead use functions like dd.read_csv, dd.read_parquet, or dd.from_pandas.

Parameters

dsk: dict: The dask graph to compute this Series
_name: str: The key prefix that specifies which keys in the dask comprise this particular Series
meta: pandas.Series: An empty pandas.Series with names, dtypes, and index matching the expected output.
divisions: tuple of index values: Values along which we partition our blocks on the index

See also

dask.dataframe.DataFrame

__init__(dsk, name, meta, divisions)¶

Methods

`__init__`(dsk, name, meta, divisions)
`abs`()	Return a Series/DataFrame with absolute numeric value of each element.
`add`(other[, level, fill_value, axis])	Return Addition of series and other, element-wise (binary operator add).
`add_prefix`(prefix)	Prefix labels with string prefix.
`add_suffix`(suffix)	Suffix labels with string suffix.
`align`(other[, join, axis, fill_value])	Align two objects on their axes with the specified join method.
`all`([axis, skipna, split_every, out])	Return whether all elements are True, potentially over an axis.
`any`([axis, skipna, split_every, out])	Return whether any element is True, potentially over an axis.
`apply`(func[, convert_dtype, meta, args])	Parallel version of pandas.Series.apply
`astype`(dtype)	Cast a pandas object to a specified dtype `dtype`.
`autocorr`([lag, split_every])	Compute the lag-N autocorrelation.
`between`(left, right[, inclusive])	Return boolean Series equivalent to left <= series <= right.
`bfill`([axis, limit])	Fill NA/NaN values by using the next valid observation to fill the gap.
`clear_divisions`()	Forget division information
`clip`([lower, upper, out, axis])	Trim values at input threshold(s).
`combine`(other, func[, fill_value])	Combine the Series with a Series or scalar according to func.
`combine_first`(other)	Update null elements with value in the same location in 'other'.
`compute`(**kwargs)	Compute this dask collection
`compute_current_divisions`([col])	Compute the current divisions of the DataFrame.
`copy`([deep])	Make a copy of the dataframe
`corr`(other[, method, min_periods, split_every])	Compute correlation with other Series, excluding missing values.
`count`([split_every])	Return number of non-NA/null observations in the Series.
`cov`(other[, min_periods, split_every])	Compute covariance with Series, excluding missing values.
`cummax`([axis, skipna, out])	Return cumulative maximum over a DataFrame or Series axis.
`cummin`([axis, skipna, out])	Return cumulative minimum over a DataFrame or Series axis.
`cumprod`([axis, skipna, dtype, out])	Return cumulative product over a DataFrame or Series axis.
`cumsum`([axis, skipna, dtype, out])	Return cumulative sum over a DataFrame or Series axis.
`describe`([split_every, percentiles, ...])	Generate descriptive statistics.
`diff`([periods, axis])	First discrete difference of element.
`div`(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator truediv).
`divide`(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator truediv).
`dot`(other[, meta])	Compute the dot product between the Series and the columns of other.
`drop_duplicates`([subset, split_every, ...])	Return DataFrame with duplicate rows removed.
`dropna`()	Return a new Series with missing values removed.
`enforce_runtime_divisions`()	Enforce the current divisions at runtime
`eq`(other[, level, fill_value, axis])	Return Equal to of series and other, element-wise (binary operator eq).
`explode`()	Transform each element of a list-like to a row.
`ffill`([axis, limit])	Fill NA/NaN values by propagating the last valid observation to next valid.
`fillna`([value, method, limit, axis])	Fill NA/NaN values using the specified method.
`first`(offset)	Select initial periods of time series data based on a date offset.
`floordiv`(other[, level, fill_value, axis])	Return Integer division of series and other, element-wise (binary operator floordiv).
`ge`(other[, level, fill_value, axis])	Return Greater than or equal to of series and other, element-wise (binary operator ge).
`get_partition`(n)	Get a dask DataFrame/Series representing the nth partition.
`groupby`([by, group_keys, sort, observed, dropna])	Group Series using a mapper or by a Series of columns.
`gt`(other[, level, fill_value, axis])	Return Greater than of series and other, element-wise (binary operator gt).
`head`([n, npartitions, compute])	First n rows of the dataset
`idxmax`([axis, skipna, split_every, numeric_only])	Return index of first occurrence of maximum over requested axis.
`idxmin`([axis, skipna, split_every, numeric_only])	Return index of first occurrence of minimum over requested axis.
`isin`(values)	Whether elements in Series are contained in values.
`isna`()	Detect missing values.
`isnull`()	DataFrame.isnull is an alias for DataFrame.isna.
`kurtosis`([axis, fisher, bias, nan_policy, ...])	Return unbiased kurtosis over requested axis.
`last`(offset)	Select final periods of time series data based on a date offset.
`le`(other[, level, fill_value, axis])	Return Less than or equal to of series and other, element-wise (binary operator le).
`lt`(other[, level, fill_value, axis])	Return Less than of series and other, element-wise (binary operator lt).
`map`(arg[, na_action, meta])	Map values of Series according to an input mapping or function.
`map_overlap`(func, before, after, args, *kwargs)	Apply a function to each partition, sharing rows with adjacent partitions.
`map_partitions`(func, args, *kwargs)	Apply Python function on each DataFrame partition.
`mask`(cond[, other])	Replace values where the condition is True.
`max`([axis, skipna, split_every, out, ...])	Return the maximum of the values over the requested axis.
`mean`([axis, skipna, split_every, dtype, ...])	Return the mean of the values over the requested axis.
`median`([method])	Return the median of the values over the requested axis.
`median_approximate`([method])	Return the approximate median of the values over the requested axis.
`memory_usage`([index, deep])	Return the memory usage of the Series.
`memory_usage_per_partition`([index, deep])	Return the memory usage of each partition
`min`([axis, skipna, split_every, out, ...])	Return the minimum of the values over the requested axis.
`mod`(other[, level, fill_value, axis])	Return Modulo of series and other, element-wise (binary operator mod).
`mode`([dropna, split_every])	Return the mode(s) of the Series.
`mul`(other[, level, fill_value, axis])	Return Multiplication of series and other, element-wise (binary operator mul).
`ne`(other[, level, fill_value, axis])	Return Not equal to of series and other, element-wise (binary operator ne).
`nlargest`([n, split_every])	Return the largest n elements.
`notnull`()	DataFrame.notnull is an alias for DataFrame.notna.
`nsmallest`([n, split_every])	Return the smallest n elements.
`nunique`([split_every, dropna])	Return number of unique elements in the object.
`nunique_approx`([split_every])	Approximate number of unique rows.
`persist`(**kwargs)	Persist this dask collection into memory
`pipe`(func, args, *kwargs)	Apply chainable functions that expect Series or DataFrames.
`pow`(other[, level, fill_value, axis])	Return Exponential power of series and other, element-wise (binary operator pow).
`prod`([axis, skipna, split_every, dtype, ...])	Return the product of the values over the requested axis.
`product`([axis, skipna, split_every, dtype, ...])	Return the product of the values over the requested axis.
`quantile`([q, method])	Approximate quantiles of Series
`radd`(other[, level, fill_value, axis])	Return Addition of series and other, element-wise (binary operator radd).
`random_split`(frac[, random_state, shuffle])	Pseudorandomly split dataframe into different pieces row-wise
`rdiv`(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator rtruediv).
`reduction`(chunk[, aggregate, combine, meta, ...])	Generic row-wise reductions.
`rename`([index, inplace, sorted_index])	Alter Series index labels or name
`repartition`([divisions, npartitions, ...])	Repartition dataframe along new divisions
`replace`([to_replace, value, regex])	Replace values given in to_replace with value.
`resample`(rule[, closed, label])	Resample time-series data.
`reset_index`([drop])	Reset the index to the default index.
`rfloordiv`(other[, level, fill_value, axis])	Return Integer division of series and other, element-wise (binary operator rfloordiv).
`rmod`(other[, level, fill_value, axis])	Return Modulo of series and other, element-wise (binary operator rmod).
`rmul`(other[, level, fill_value, axis])	Return Multiplication of series and other, element-wise (binary operator rmul).
`rolling`(window[, min_periods, center, ...])	Provides rolling transformations.
`round`([decimals])	Round each value in a Series to the given number of decimals.
`rpow`(other[, level, fill_value, axis])	Return Exponential power of series and other, element-wise (binary operator rpow).
`rsub`(other[, level, fill_value, axis])	Return Subtraction of series and other, element-wise (binary operator rsub).
`rtruediv`(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator rtruediv).
`sample`([n, frac, replace, random_state])	Random sample of items
`sem`([axis, skipna, ddof, split_every, ...])	Return unbiased standard error of the mean over requested axis.
`shift`([periods, freq, axis])	Shift index by desired number of periods with an optional time freq.
`shuffle`(on[, npartitions, max_branch, ...])	Rearrange DataFrame into new partitions
`skew`([axis, bias, nan_policy, out, numeric_only])	Return unbiased skew over requested axis.
`squeeze`()	Squeeze 1 dimensional axis objects into scalars.
`std`([axis, skipna, ddof, split_every, ...])	Return sample standard deviation over requested axis.
`sub`(other[, level, fill_value, axis])	Return Subtraction of series and other, element-wise (binary operator sub).
`sum`([axis, skipna, split_every, dtype, out, ...])	Return the sum of the values over the requested axis.
`tail`([n, compute])	Last n rows of the dataset
`to_backend`([backend])	Move to a new DataFrame backend
`to_bag`([index, format])	Create a Dask Bag from a Series
`to_csv`(filename, **kwargs)	Store Dask DataFrame to CSV files
`to_dask_array`([lengths, meta])	Convert a dask DataFrame to a dask array.
`to_delayed`([optimize_graph])	Convert into a list of `dask.delayed` objects, one per partition.
`to_frame`([name])	Convert Series to DataFrame.
`to_hdf`(path_or_buf, key[, mode, append])	Store Dask Dataframe to Hierarchical Data Format (HDF) files
`to_json`(filename, args, *kwargs)	See dd.to_json docstring for more information
`to_sql`(name, uri[, schema, if_exists, ...])	See dd.to_sql docstring for more information
`to_string`([max_rows])	Render a string representation of the Series.
`to_timestamp`([freq, how, axis])	Cast to DatetimeIndex of Timestamps, at beginning of period.
`truediv`(other[, level, fill_value, axis])	Return Floating division of series and other, element-wise (binary operator truediv).
`unique`([split_every, split_out])	Return Series of unique values in the object.
`value_counts`([sort, ascending, dropna, ...])	Return a Series containing counts of unique values.
`var`([axis, skipna, ddof, split_every, ...])	Return unbiased variance over requested axis.
`view`(dtype)	Create a new view of the Series.
`visualize`([filename, format, optimize_graph])	Render the computation of this object's task graph using graphviz.
`where`(cond[, other])	Replace values where the condition is False.

Attributes

`attrs`	Dictionary of global attributes of this dataset.
`axes`
`divisions`	Tuple of `npartitions + 1` values, in ascending order, marking the lower/upper bounds of each partition's index.
`dtype`	Return data type
`index`	Return dask Index instance
`is_monotonic_decreasing`	Return boolean if values in the object are monotonically decreasing.
`is_monotonic_increasing`	Return boolean if values in the object are monotonically increasing.
`known_divisions`	Whether divisions are already known
`loc`	Purely label-location based indexer for selection by label.
`name`
`nbytes`	Number of bytes
`ndim`	Return dimensionality
`npartitions`	Return number of partitions
`partitions`	Slice dataframe by partitions
`shape`	Return a tuple representing the dimensionality of a Series.
`size`	Size of the Series or DataFrame as a Delayed object.
`values`	Return a dask.array of the values of this dataframe

dask.dataframe.DataFrame.where

dask.dataframe.Series.add