dask.dataframe.Series
dask.dataframe.Series¶
- class dask.dataframe.Series(dsk, name, meta, divisions)[source]¶
Parallel Pandas Series
Do not use this class directly. Instead use functions like
dd.read_csv
,dd.read_parquet
, ordd.from_pandas
.- Parameters
- dsk: dict
The dask graph to compute this Series
- _name: str
The key prefix that specifies which keys in the dask comprise this particular Series
- meta: pandas.Series
An empty
pandas.Series
with names, dtypes, and index matching the expected output.- divisions: tuple of index values
Values along which we partition our blocks on the index
See also
- __init__(dsk, name, meta, divisions)¶
Methods
__init__
(dsk, name, meta, divisions)abs
()Return a Series/DataFrame with absolute numeric value of each element.
add
(other[, level, fill_value, axis])Return Addition of series and other, element-wise (binary operator add).
add_prefix
(prefix)Prefix labels with string prefix.
add_suffix
(suffix)Suffix labels with string suffix.
align
(other[, join, axis, fill_value])Align two objects on their axes with the specified join method.
all
([axis, skipna, split_every, out])Return whether all elements are True, potentially over an axis.
any
([axis, skipna, split_every, out])Return whether any element is True, potentially over an axis.
apply
(func[, convert_dtype, meta, args])Parallel version of pandas.Series.apply
astype
(dtype)Cast a pandas object to a specified dtype
dtype
.autocorr
([lag, split_every])Compute the lag-N autocorrelation.
between
(left, right[, inclusive])Return boolean Series equivalent to left <= series <= right.
bfill
([axis, limit])Fill NA/NaN values by using the next valid observation to fill the gap.
Forget division information
clip
([lower, upper, out, axis])Trim values at input threshold(s).
combine
(other, func[, fill_value])Combine the Series with a Series or scalar according to func.
combine_first
(other)Update null elements with value in the same location in 'other'.
compute
(**kwargs)Compute this dask collection
compute_current_divisions
([col])Compute the current divisions of the DataFrame.
copy
([deep])Make a copy of the dataframe
corr
(other[, method, min_periods, split_every])Compute correlation with other Series, excluding missing values.
count
([split_every])Return number of non-NA/null observations in the Series.
cov
(other[, min_periods, split_every])Compute covariance with Series, excluding missing values.
cummax
([axis, skipna, out])Return cumulative maximum over a DataFrame or Series axis.
cummin
([axis, skipna, out])Return cumulative minimum over a DataFrame or Series axis.
cumprod
([axis, skipna, dtype, out])Return cumulative product over a DataFrame or Series axis.
cumsum
([axis, skipna, dtype, out])Return cumulative sum over a DataFrame or Series axis.
describe
([split_every, percentiles, ...])Generate descriptive statistics.
diff
([periods, axis])First discrete difference of element.
div
(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator truediv).
divide
(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator truediv).
dot
(other[, meta])Compute the dot product between the Series and the columns of other.
drop_duplicates
([subset, split_every, ...])Return DataFrame with duplicate rows removed.
dropna
()Return a new Series with missing values removed.
enforce_runtime_divisions
()Enforce the current divisions at runtime
eq
(other[, level, fill_value, axis])Return Equal to of series and other, element-wise (binary operator eq).
explode
()Transform each element of a list-like to a row.
ffill
([axis, limit])Fill NA/NaN values by propagating the last valid observation to next valid.
fillna
([value, method, limit, axis])Fill NA/NaN values using the specified method.
first
(offset)Select initial periods of time series data based on a date offset.
floordiv
(other[, level, fill_value, axis])Return Integer division of series and other, element-wise (binary operator floordiv).
ge
(other[, level, fill_value, axis])Return Greater than or equal to of series and other, element-wise (binary operator ge).
Get a dask DataFrame/Series representing the nth partition.
groupby
([by, group_keys, sort, observed, dropna])Group Series using a mapper or by a Series of columns.
gt
(other[, level, fill_value, axis])Return Greater than of series and other, element-wise (binary operator gt).
head
([n, npartitions, compute])First n rows of the dataset
idxmax
([axis, skipna, split_every, numeric_only])Return index of first occurrence of maximum over requested axis.
idxmin
([axis, skipna, split_every, numeric_only])Return index of first occurrence of minimum over requested axis.
isin
(values)Whether elements in Series are contained in values.
isna
()Detect missing values.
isnull
()DataFrame.isnull is an alias for DataFrame.isna.
kurtosis
([axis, fisher, bias, nan_policy, ...])Return unbiased kurtosis over requested axis.
last
(offset)Select final periods of time series data based on a date offset.
le
(other[, level, fill_value, axis])Return Less than or equal to of series and other, element-wise (binary operator le).
lt
(other[, level, fill_value, axis])Return Less than of series and other, element-wise (binary operator lt).
map
(arg[, na_action, meta])Map values of Series according to an input mapping or function.
map_overlap
(func, before, after, *args, **kwargs)Apply a function to each partition, sharing rows with adjacent partitions.
map_partitions
(func, *args, **kwargs)Apply Python function on each DataFrame partition.
mask
(cond[, other])Replace values where the condition is True.
max
([axis, skipna, split_every, out, ...])Return the maximum of the values over the requested axis.
mean
([axis, skipna, split_every, dtype, ...])Return the mean of the values over the requested axis.
median
([method])Return the median of the values over the requested axis.
median_approximate
([method])Return the approximate median of the values over the requested axis.
memory_usage
([index, deep])Return the memory usage of the Series.
memory_usage_per_partition
([index, deep])Return the memory usage of each partition
min
([axis, skipna, split_every, out, ...])Return the minimum of the values over the requested axis.
mod
(other[, level, fill_value, axis])Return Modulo of series and other, element-wise (binary operator mod).
mode
([dropna, split_every])Return the mode(s) of the Series.
mul
(other[, level, fill_value, axis])Return Multiplication of series and other, element-wise (binary operator mul).
ne
(other[, level, fill_value, axis])Return Not equal to of series and other, element-wise (binary operator ne).
nlargest
([n, split_every])Return the largest n elements.
notnull
()DataFrame.notnull is an alias for DataFrame.notna.
nsmallest
([n, split_every])Return the smallest n elements.
nunique
([split_every, dropna])Return number of unique elements in the object.
nunique_approx
([split_every])Approximate number of unique rows.
persist
(**kwargs)Persist this dask collection into memory
pipe
(func, *args, **kwargs)Apply chainable functions that expect Series or DataFrames.
pow
(other[, level, fill_value, axis])Return Exponential power of series and other, element-wise (binary operator pow).
prod
([axis, skipna, split_every, dtype, ...])Return the product of the values over the requested axis.
product
([axis, skipna, split_every, dtype, ...])Return the product of the values over the requested axis.
quantile
([q, method])Approximate quantiles of Series
radd
(other[, level, fill_value, axis])Return Addition of series and other, element-wise (binary operator radd).
random_split
(frac[, random_state, shuffle])Pseudorandomly split dataframe into different pieces row-wise
rdiv
(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator rtruediv).
reduction
(chunk[, aggregate, combine, meta, ...])Generic row-wise reductions.
rename
([index, inplace, sorted_index])Alter Series index labels or name
repartition
([divisions, npartitions, ...])Repartition dataframe along new divisions
replace
([to_replace, value, regex])Replace values given in to_replace with value.
resample
(rule[, closed, label])Resample time-series data.
reset_index
([drop])Reset the index to the default index.
rfloordiv
(other[, level, fill_value, axis])Return Integer division of series and other, element-wise (binary operator rfloordiv).
rmod
(other[, level, fill_value, axis])Return Modulo of series and other, element-wise (binary operator rmod).
rmul
(other[, level, fill_value, axis])Return Multiplication of series and other, element-wise (binary operator rmul).
rolling
(window[, min_periods, center, ...])Provides rolling transformations.
round
([decimals])Round each value in a Series to the given number of decimals.
rpow
(other[, level, fill_value, axis])Return Exponential power of series and other, element-wise (binary operator rpow).
rsub
(other[, level, fill_value, axis])Return Subtraction of series and other, element-wise (binary operator rsub).
rtruediv
(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator rtruediv).
sample
([n, frac, replace, random_state])Random sample of items
sem
([axis, skipna, ddof, split_every, ...])Return unbiased standard error of the mean over requested axis.
shift
([periods, freq, axis])Shift index by desired number of periods with an optional time freq.
shuffle
(on[, npartitions, max_branch, ...])Rearrange DataFrame into new partitions
skew
([axis, bias, nan_policy, out, numeric_only])Return unbiased skew over requested axis.
squeeze
()Squeeze 1 dimensional axis objects into scalars.
std
([axis, skipna, ddof, split_every, ...])Return sample standard deviation over requested axis.
sub
(other[, level, fill_value, axis])Return Subtraction of series and other, element-wise (binary operator sub).
sum
([axis, skipna, split_every, dtype, out, ...])Return the sum of the values over the requested axis.
tail
([n, compute])Last n rows of the dataset
to_backend
([backend])Move to a new DataFrame backend
to_bag
([index, format])Create a Dask Bag from a Series
to_csv
(filename, **kwargs)Store Dask DataFrame to CSV files
to_dask_array
([lengths, meta])Convert a dask DataFrame to a dask array.
to_delayed
([optimize_graph])Convert into a list of
dask.delayed
objects, one per partition.to_frame
([name])Convert Series to DataFrame.
to_hdf
(path_or_buf, key[, mode, append])Store Dask Dataframe to Hierarchical Data Format (HDF) files
to_json
(filename, *args, **kwargs)See dd.to_json docstring for more information
to_sql
(name, uri[, schema, if_exists, ...])See dd.to_sql docstring for more information
to_string
([max_rows])Render a string representation of the Series.
to_timestamp
([freq, how, axis])Cast to DatetimeIndex of Timestamps, at beginning of period.
truediv
(other[, level, fill_value, axis])Return Floating division of series and other, element-wise (binary operator truediv).
unique
([split_every, split_out])Return Series of unique values in the object.
value_counts
([sort, ascending, dropna, ...])Return a Series containing counts of unique values.
var
([axis, skipna, ddof, split_every, ...])Return unbiased variance over requested axis.
view
(dtype)Create a new view of the Series.
visualize
([filename, format, optimize_graph])Render the computation of this object's task graph using graphviz.
where
(cond[, other])Replace values where the condition is False.
Attributes
attrs
Dictionary of global attributes of this dataset.
axes
divisions
Tuple of
npartitions + 1
values, in ascending order, marking the lower/upper bounds of each partition's index.Return data type
index
Return dask Index instance
is_monotonic_decreasing
Return boolean if values in the object are monotonically decreasing.
is_monotonic_increasing
Return boolean if values in the object are monotonically increasing.
Whether divisions are already known
Purely label-location based indexer for selection by label.
name
Number of bytes
Return dimensionality
npartitions
Return number of partitions
partitions
Slice dataframe by partitions
Return a tuple representing the dimensionality of a Series.
Size of the Series or DataFrame as a Delayed object.
Return a dask.array of the values of this dataframe