Interfacing with the Pandas Package#
The astropy.timeseries package is not the only package to provide
functionality related to time series. Another notable package is pandas, which provides a pandas.DataFrame
class. The main benefits of astropy.timeseries in the context of astronomical
research are the following:
The time column is a
Timeobject that supports very high precision representation of times, and makes it easy to convert between different time scales and formats (e.g., ISO 8601 timestamps, Julian Dates, and so on).The data columns can include
Quantityobjects with units.The
BinnedTimeSeriesclass includes variable-width time bins.There are built-in readers for common time series file formats, as well as the ability to define custom readers/writers.
Nevertheless, there are cases where using pandas DataFrame
objects might make sense, so we provide methods to convert to/from
DataFrame objects.
Example#
Consider a concise example starting from a DataFrame:
>>> import pandas
>>> import numpy as np
>>> from astropy.utils.introspection import minversion
>>> df = pandas.DataFrame()
>>> df['a'] = [1, 2, 3]
>>> times = np.array(['2015-07-04', '2015-07-05', '2015-07-06'], dtype=np.datetime64)
>>> df.set_index(pandas.DatetimeIndex(times), inplace=True)
>>> df
a
2015-07-04 1
2015-07-05 2
2015-07-06 3
We can convert this to an astropy TimeSeries using
from_pandas():
>>> from astropy.timeseries import TimeSeries
>>> ts = TimeSeries.from_pandas(df)
>>> ts
<TimeSeries length=3>
time a
Time int64
----------------------------- -----
2015-07-04T00:00:00.000000000 1
2015-07-05T00:00:00.000000000 2
2015-07-06T00:00:00.000000000 3
Converting to DataFrame can also be done with
to_pandas():
>>> ts['b'] = [1.2, 3.4, 5.4]
>>> df_new = ts.to_pandas()
>>> df_new
a b
time
2015-07-04 1 1.2
2015-07-05 2 3.4
2015-07-06 3 5.4
Missing values in the time column are supported and correctly converted to a pandas’ NaT object:
>>> ts.time[2] = np.nan
>>> ts
<TimeSeries length=3>
time a b
Time int64 float64
----------------------------- ----- -------
2015-07-04T00:00:00.000000000 1 1.2
2015-07-05T00:00:00.000000000 2 3.4
——— 3 5.4
>>> df_missing = ts.to_pandas()
>>> df_missing
a b
time
2015-07-04 1 1.2
2015-07-05 2 3.4
NaT 3 5.4