dask.dataframe.Series
dask.dataframe.Series¶
- class dask.dataframe.Series(dsk, name, meta, divisions)[source]¶
Parallel Pandas Series
Do not use this class directly. Instead use functions like
dd.read_csv,dd.read_parquet, ordd.from_pandas.- Parameters
- dsk: dict
The dask graph to compute this Series
- _name: str
The key prefix that specifies which keys in the dask comprise this particular Series
- meta: pandas.Series
An empty
pandas.Serieswith names, dtypes, and index matching the expected output.- divisions: tuple of index values
Values along which we partition our blocks on the index
See also
- __init__(dsk, name, meta, divisions)¶
Methods
__init__(dsk, name, meta, divisions)abs()Return a Series/DataFrame with absolute numeric value of each element.
add(other[, level, fill_value, axis])Return Addition of series and other, element-wise (binary operator add).
add_prefix(prefix)Prefix labels with string prefix.
add_suffix(suffix)Suffix labels with string suffix.
align(other[, join, axis, fill_value])Align two objects on their axes with the specified join method.
all([axis, skipna, split_every, out])Return whether all elements are True, potentially over an axis.
any([axis, skipna, split_every, out])Return whether any element is True, potentially over an axis.
apply(func[, convert_dtype, meta, args])Parallel version of pandas.Series.apply
astype(dtype)Cast a pandas object to a specified dtype
dtype.autocorr([lag, split_every])Compute the lag-N autocorrelation.
between(left, right[, inclusive])Return boolean Series equivalent to left <= series <= right.
bfill([axis, limit])Fill NA/NaN values by using the next valid observation to fill the gap.
Forget division information
clip([lower, upper, out, axis])Trim values at input threshold(s).
combine(other, func[, fill_value])Combine the Series with a Series or scalar according to func.
combine_first(other)Update null elements with value in the same location in 'other'.
compute(**kwargs)Compute this dask collection
compute_current_divisions([col])Compute the current divisions of the DataFrame.
copy([deep])Make a copy of the dataframe
corr(other[, method, min_periods, split_every])Compute correlation with other Series, excluding missing values.
count([split_every])Return number of non-NA/null observations in the Series.
cov(other[, min_periods, split_every])Compute covariance with Series, excluding missing values.
cummax([axis, skipna, out])Return cumulative maximum over a DataFrame or Series axis.
cummin([axis, skipna, out])Return cumulative minimum over a DataFrame or Series axis.
cumprod([axis, skipna, dtype, out])Return cumulative product over a DataFrame or Series axis.
cumsum([axis, skipna, dtype, out])Return cumulative sum over a DataFrame or Series axis.
describe([split_every, percentiles, ...])Generate descriptive statistics.
diff([periods, axis])First discrete difference of element.
div(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator truediv).
divide(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator truediv).
dot(other[, meta])Compute the dot product between the Series and the columns of other.
drop_duplicates([subset, split_every, ...])Return DataFrame with duplicate rows removed.
dropna()Return a new Series with missing values removed.
enforce_runtime_divisions()Enforce the current divisions at runtime
eq(other[, level, fill_value, axis])Return Equal to of series and other, element-wise (binary operator eq).
explode()Transform each element of a list-like to a row.
ffill([axis, limit])Fill NA/NaN values by propagating the last valid observation to next valid.
fillna([value, method, limit, axis])Fill NA/NaN values using the specified method.
first(offset)Select initial periods of time series data based on a date offset.
floordiv(other[, level, fill_value, axis])Return Integer division of series and other, element-wise (binary operator floordiv).
ge(other[, level, fill_value, axis])Return Greater than or equal to of series and other, element-wise (binary operator ge).
Get a dask DataFrame/Series representing the nth partition.
groupby([by, group_keys, sort, observed, dropna])Group Series using a mapper or by a Series of columns.
gt(other[, level, fill_value, axis])Return Greater than of series and other, element-wise (binary operator gt).
head([n, npartitions, compute])First n rows of the dataset
idxmax([axis, skipna, split_every, numeric_only])Return index of first occurrence of maximum over requested axis.
idxmin([axis, skipna, split_every, numeric_only])Return index of first occurrence of minimum over requested axis.
isin(values)Whether elements in Series are contained in values.
isna()Detect missing values.
isnull()DataFrame.isnull is an alias for DataFrame.isna.
kurtosis([axis, fisher, bias, nan_policy, ...])Return unbiased kurtosis over requested axis.
last(offset)Select final periods of time series data based on a date offset.
le(other[, level, fill_value, axis])Return Less than or equal to of series and other, element-wise (binary operator le).
lt(other[, level, fill_value, axis])Return Less than of series and other, element-wise (binary operator lt).
map(arg[, na_action, meta])Map values of Series according to an input mapping or function.
map_overlap(func, before, after, *args, **kwargs)Apply a function to each partition, sharing rows with adjacent partitions.
map_partitions(func, *args, **kwargs)Apply Python function on each DataFrame partition.
mask(cond[, other])Replace values where the condition is True.
max([axis, skipna, split_every, out, ...])Return the maximum of the values over the requested axis.
mean([axis, skipna, split_every, dtype, ...])Return the mean of the values over the requested axis.
median([method])Return the median of the values over the requested axis.
median_approximate([method])Return the approximate median of the values over the requested axis.
memory_usage([index, deep])Return the memory usage of the Series.
memory_usage_per_partition([index, deep])Return the memory usage of each partition
min([axis, skipna, split_every, out, ...])Return the minimum of the values over the requested axis.
mod(other[, level, fill_value, axis])Return Modulo of series and other, element-wise (binary operator mod).
mode([dropna, split_every])Return the mode(s) of the Series.
mul(other[, level, fill_value, axis])Return Multiplication of series and other, element-wise (binary operator mul).
ne(other[, level, fill_value, axis])Return Not equal to of series and other, element-wise (binary operator ne).
nlargest([n, split_every])Return the largest n elements.
notnull()DataFrame.notnull is an alias for DataFrame.notna.
nsmallest([n, split_every])Return the smallest n elements.
nunique([split_every, dropna])Return number of unique elements in the object.
nunique_approx([split_every])Approximate number of unique rows.
persist(**kwargs)Persist this dask collection into memory
pipe(func, *args, **kwargs)Apply chainable functions that expect Series or DataFrames.
pow(other[, level, fill_value, axis])Return Exponential power of series and other, element-wise (binary operator pow).
prod([axis, skipna, split_every, dtype, ...])Return the product of the values over the requested axis.
product([axis, skipna, split_every, dtype, ...])Return the product of the values over the requested axis.
quantile([q, method])Approximate quantiles of Series
radd(other[, level, fill_value, axis])Return Addition of series and other, element-wise (binary operator radd).
random_split(frac[, random_state, shuffle])Pseudorandomly split dataframe into different pieces row-wise
rdiv(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator rtruediv).
reduction(chunk[, aggregate, combine, meta, ...])Generic row-wise reductions.
rename([index, inplace, sorted_index])Alter Series index labels or name
repartition([divisions, npartitions, ...])Repartition dataframe along new divisions
replace([to_replace, value, regex])Replace values given in to_replace with value.
resample(rule[, closed, label])Resample time-series data.
reset_index([drop])Reset the index to the default index.
rfloordiv(other[, level, fill_value, axis])Return Integer division of series and other, element-wise (binary operator rfloordiv).
rmod(other[, level, fill_value, axis])Return Modulo of series and other, element-wise (binary operator rmod).
rmul(other[, level, fill_value, axis])Return Multiplication of series and other, element-wise (binary operator rmul).
rolling(window[, min_periods, center, ...])Provides rolling transformations.
round([decimals])Round each value in a Series to the given number of decimals.
rpow(other[, level, fill_value, axis])Return Exponential power of series and other, element-wise (binary operator rpow).
rsub(other[, level, fill_value, axis])Return Subtraction of series and other, element-wise (binary operator rsub).
rtruediv(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator rtruediv).
sample([n, frac, replace, random_state])Random sample of items
sem([axis, skipna, ddof, split_every, ...])Return unbiased standard error of the mean over requested axis.
shift([periods, freq, axis])Shift index by desired number of periods with an optional time freq.
shuffle(on[, npartitions, max_branch, ...])Rearrange DataFrame into new partitions
skew([axis, bias, nan_policy, out, numeric_only])Return unbiased skew over requested axis.
squeeze()Squeeze 1 dimensional axis objects into scalars.
std([axis, skipna, ddof, split_every, ...])Return sample standard deviation over requested axis.
sub(other[, level, fill_value, axis])Return Subtraction of series and other, element-wise (binary operator sub).
sum([axis, skipna, split_every, dtype, out, ...])Return the sum of the values over the requested axis.
tail([n, compute])Last n rows of the dataset
to_backend([backend])Move to a new DataFrame backend
to_bag([index, format])Create a Dask Bag from a Series
to_csv(filename, **kwargs)Store Dask DataFrame to CSV files
to_dask_array([lengths, meta])Convert a dask DataFrame to a dask array.
to_delayed([optimize_graph])Convert into a list of
dask.delayedobjects, one per partition.to_frame([name])Convert Series to DataFrame.
to_hdf(path_or_buf, key[, mode, append])Store Dask Dataframe to Hierarchical Data Format (HDF) files
to_json(filename, *args, **kwargs)See dd.to_json docstring for more information
to_sql(name, uri[, schema, if_exists, ...])See dd.to_sql docstring for more information
to_string([max_rows])Render a string representation of the Series.
to_timestamp([freq, how, axis])Cast to DatetimeIndex of Timestamps, at beginning of period.
truediv(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator truediv).
unique([split_every, split_out])Return Series of unique values in the object.
value_counts([sort, ascending, dropna, ...])Return a Series containing counts of unique values.
var([axis, skipna, ddof, split_every, ...])Return unbiased variance over requested axis.
view(dtype)Create a new view of the Series.
visualize([filename, format, optimize_graph])Render the computation of this object's task graph using graphviz.
where(cond[, other])Replace values where the condition is False.
Attributes
attrsDictionary of global attributes of this dataset.
axesdivisionsTuple of
npartitions + 1values, in ascending order, marking the lower/upper bounds of each partition's index.Return data type
indexReturn dask Index instance
is_monotonic_decreasingReturn boolean if values in the object are monotonically decreasing.
is_monotonic_increasingReturn boolean if values in the object are monotonically increasing.
Whether divisions are already known
Purely label-location based indexer for selection by label.
nameNumber of bytes
Return dimensionality
npartitionsReturn number of partitions
partitionsSlice dataframe by partitions
Return a tuple representing the dimensionality of a Series.
Size of the Series or DataFrame as a Delayed object.
Return a dask.array of the values of this dataframe