BaseSeriesEstimator¶
The BaseSeriesEstimator
class is a base class for estimators that take a single series (both univariate and multivariate) as input rather than a collection of time series (see BaseCollectionEstimator
). This notebook describes the major design issues to keep in mind if using any class that inherits from BaseSeriesEstimator
. These are: - BaseSeriesTransformer
for single series transformations - BaseSegmenter
for segmentation - BaseAnomalyDetector
for anomaly detection
To use any algorithms extending the base estimator all you need to understand is the meaning of the axis
parameter and the capability tags. BaseSeriesEstimator
handles the preprocessing required before being used in methods such as fit
and predict
. These are used in inheriting base classes by applying the protected method _preprocess_series
. The key steps to note are: 1. Input data type should be a np.ndarray
, a pd.Series
or a pd.DataFrame
. 2. The axis
function parameter describes the direction of time. If axis==0
then each column is a time series, and each row is a time point: i.e. the shape of the input data is (n_timepoints, n_channels)
. axis==1
indicates the time series are in rows, i.e. the shape of the data is (n_channels, n_timepoints)
. It is important to set this correctly or check the default used, otherwise your data may be processed incorrectly. 3. The axis
class attribute of an estimator controls how the input
data is interpreted in methods such as fit
, predict
and transform
. The input data will be transformed into the type required by the estimator as determined by the tag X_inner_type
, which should be a string, either "np.ndarray"
or "pd.DataFrame"
. 1D input data is converted to 2D in the BaseSeriesEstimator
with the axis determined by the estimator axis
attribute. 4. If the estimator can only work with univariate time series (capability:multivariate
set to
False), then the input data shape must be 1D or 2D with the selected channel axis being size 1. 5. If the estimator can only work with multivariate time series (capability:univariate
set to False), then the input data must be 2D, with the selected channel axis
greater than 1. pd.Series
input is not supported in this case.
We demonstrate this with calls to private methods. This is purely to aid understanding and should not be used in practice.
[14]:
import numpy as np
import pandas as pd
from aeon.base import BaseSeriesEstimator
# We use the abstract base class for example purposes, regular classes will not
# have a class axis parameter.
bs = BaseSeriesEstimator(axis=0)
Univariate examples¶
[15]:
# By default, "capability:multivariate" is False, axis is 0 and X_inner_type is
# np.ndarray
# With this config, the output should always be an np.ndarray shape (100, 1)
d1 = np.random.random(size=(100))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
f"1. Input type = {type(d1)}, input shape = {d1.shape}, "
f"output type = {type(d2)}, output shape = {d2.shape}"
)
1. Input type = <class 'numpy.ndarray'>, input shape = (100,), output type = <class 'numpy.ndarray'>, output shape = (100, 1)
[16]:
# The axis parameter will not change the output shape of 1D inputs such as pd.Series
# or univariate np.ndarray
d1 = np.random.random(size=(100))
d2 = bs._preprocess_series(d1, axis=1, store_metadata=True)
print(
f"2. Input type = {type(d1)}, input shape = {d1.shape}, "
f"output type = {type(d2)}, output shape = {d2.shape}"
)
2. Input type = <class 'numpy.ndarray'>, input shape = (100,), output type = <class 'numpy.ndarray'>, output shape = (100, 1)
[17]:
# A 2D array with the channel axis of size 1 will produce the same result
d1 = np.random.random(size=(100, 1))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
f"3. Input type = {type(d1)}, input shape = {d1.shape}, "
f"output type = {type(d2)}, output shape = {d2.shape}"
)
3. Input type = <class 'numpy.ndarray'>, input shape = (100, 1), output type = <class 'numpy.ndarray'>, output shape = (100, 1)
[18]:
# The shape used can be swapped, but the axis parameter must be set correctly
d1 = np.random.random(size=(1, 100))
d2 = bs._preprocess_series(d1, axis=1, store_metadata=True)
print(
f"4. Input type = {type(d1)}, input shape = {d1.shape}, "
f"output type = {type(d2)}, output shape = {d2.shape}"
)
4. Input type = <class 'numpy.ndarray'>, input shape = (1, 100), output type = <class 'numpy.ndarray'>, output shape = (100, 1)
[19]:
# Other types will be converted to X_inner_type
d1 = pd.Series(np.random.random(size=(100)))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
f"5. Input type = {type(d1)}, input shape = {d1.shape}, "
f"output type = {type(d2)}, output shape = {d2.shape}"
)
5. Input type = <class 'pandas.core.series.Series'>, input shape = (100,), output type = <class 'numpy.ndarray'>, output shape = (100, 1)
[20]:
d1 = pd.DataFrame(np.random.random(size=(100, 1)))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
f"6. Input type = {type(d1)}, input shape = {d1.shape}, "
f"output type = {type(d2)}, output shape = {d2.shape}"
)
6. Input type = <class 'pandas.core.frame.DataFrame'>, input shape = (100, 1), output type = <class 'numpy.ndarray'>, output shape = (100, 1)
[21]:
# Passing a multivariate array will raise an error if capability:multivariate is False
d1 = np.random.random(size=(100, 5))
try:
bs._preprocess_series(d1, axis=0, store_metadata=True)
except ValueError as e:
print(f"7. {e}")
7. Multivariate data not supported by BaseSeriesEstimator
Multivariate examples¶
[22]:
# The capability:multivariate tag must be set to True to work with multivariate series
# If the estimator does not have this tag, then the implementation cannot handle the
# input
bs = bs.set_tags(**{"capability:multivariate": True})
# Both of these can be True at the same time, but for examples sake we disable
# univariate
bs = bs.set_tags(**{"capability:univariate": False})
[23]:
# axis 0 means each row is a time series
d1 = np.random.random(size=(100, 5))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
f"1. Input type = {type(d1)}, input shape = {d1.shape}, "
f"output type = {type(d2)}, output shape = {d2.shape}"
)
n_channels = bs.metadata_["n_channels"]
print(f"n_channels: {n_channels}")
1. Input type = <class 'numpy.ndarray'>, input shape = (100, 5), output type = <class 'numpy.ndarray'>, output shape = (100, 5)
n_channels: 5
[24]:
# axis 1 means each column is a time series. If the axis is set incorrectly, the
# output shape will be wrong
d1 = np.random.random(size=(100, 5))
d2 = bs._preprocess_series(d1, axis=1, store_metadata=True)
print(
f"2. Input type = {type(d1)}, input shape = {d1.shape}, "
f"output type = {type(d2)}, output shape = {d2.shape}"
)
n_channels = bs.metadata_["n_channels"]
print(f"n_channels: {n_channels}")
2. Input type = <class 'numpy.ndarray'>, input shape = (100, 5), output type = <class 'numpy.ndarray'>, output shape = (5, 100)
n_channels: 100
[25]:
# Conversions work similar to univariate series, but there is more emphasis on correctly
# setting the axis parameter
d1 = pd.DataFrame(np.random.random(size=(100, 5)))
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
print(
f"3. Input type = {type(d1)}, input shape = {d1.shape}, "
f"output type = {type(d2)}, output shape = {d2.shape}"
)
n_channels = bs.metadata_["n_channels"]
print(f"n_channels: {n_channels}")
3. Input type = <class 'pandas.core.frame.DataFrame'>, input shape = (100, 5), output type = <class 'numpy.ndarray'>, output shape = (100, 5)
n_channels: 5
[26]:
# Passing a univariate array will raise an error if capability:univariate is False
d1 = pd.Series(np.random.random(size=(100,)))
try:
d2 = bs._preprocess_series(d1, axis=0, store_metadata=True)
except ValueError as e:
print(f"4. {e}")
4. Univariate data not supported by BaseSeriesEstimator
If implementing a new estimator that extends BaseSeriesEstimator
then just set the axis
to the shape you want to work with by passing it to the BaseSeriesEstimator
constructor. If your estimator can handle multivariate series, set the tag and set the capability:multivariate
tag to True
. Set the X_inner_type
tag if you wish to use a datatype other than np.ndarray
.
Generated using nbsphinx. The Jupyter notebook can be found here.