anndata.AnnData
- class anndata.AnnData(X=None, obs=None, var=None, uns=None, obsm=None, varm=None, layers=None, raw=None, dtype='float32', shape=None, filename=None, filemode=None, asview=False, *, obsp=None, varp=None, oidx=None, vidx=None)
An annotated data matrix.
AnnData
stores a data matrixX
together with annotations of observationsobs
(obsm
,obsp
), variablesvar
(varm
,varp
), and unstructured annotationsuns
.An
AnnData
objectadata
can be sliced like aDataFrame
, for instanceadata_subset = adata[:, list_of_variable_names]
.AnnData
’s basic structure is similar to R’s ExpressionSet [Huber15]. If setting an.h5ad
-formatted HDF5 backing file.filename
, data remains on the disk but is automatically loaded into memory if needed. See this blog post for more details.- Parameters
- X :
ndarray
|spmatrix
|DataFrame
|None
Union
[ndarray
,spmatrix
,DataFrame
,None
] (default:None
) A #observations × #variables data matrix. A view of the data is used if the data type matches, otherwise, a copy is made.
- obs :
DataFrame
|Mapping
|None
Union
[DataFrame
,Mapping
[str
,Iterable
[Any
]],None
] (default:None
) Key-indexed one-dimensional observations annotation of length #observations.
- var :
DataFrame
|Mapping
|None
Union
[DataFrame
,Mapping
[str
,Iterable
[Any
]],None
] (default:None
) Key-indexed one-dimensional variables annotation of length #variables.
- uns :
Mapping
|None
Optional
[Mapping
[str
,Any
]] (default:None
) Key-indexed unstructured annotation.
- obsm :
ndarray
|Mapping
|None
Union
[ndarray
,Mapping
[str
,Sequence
[Any
]],None
] (default:None
) Key-indexed multi-dimensional observations annotation of length #observations. If passing a
ndarray
, it needs to have a structured datatype.- varm :
ndarray
|Mapping
|None
Union
[ndarray
,Mapping
[str
,Sequence
[Any
]],None
] (default:None
) Key-indexed multi-dimensional variables annotation of length #variables. If passing a
ndarray
, it needs to have a structured datatype.- layers :
Mapping
|None
Optional
[Mapping
[str
,Union
[ndarray
,spmatrix
]]] (default:None
) Key-indexed multi-dimensional arrays aligned to dimensions of
X
.- dtype :
dtype
|str
Union
[dtype
,str
] (default:'float32'
) Data type used for storage.
- shape :
Tuple
[int
,int
] |None
Optional
[Tuple
[int
,int
]] (default:None
) Shape tuple (#observations, #variables). Can only be provided if
X
isNone
.- filename :
PathLike
|None
Optional
[PathLike
] (default:None
) Name of backing file. See
h5py.File
.- filemode : {‘r’, ‘r+’} |
None
Optional
[Literal
[‘r’, ‘r+’]] (default:None
) Open mode of backing file. See
h5py.File
.
- X :
See also
read_h5ad
,read_csv
,read_excel
,read_hdf
,read_loom
,read_zarr
,read_mtx
,read_text
,read_umi_tools
Notes
AnnData
stores observations (samples) of variables/features in the rows of a matrix. This is the convention of the modern classics of statistics [Hastie09] and machine learning [Murphy12], the convention of dataframes both in R and Python and the established statistics and machine learning packages in Python (statsmodels, scikit-learn).Single dimensional annotations of the observation and variables are stored in the
obs
andvar
attributes asDataFrame
s. This is intended for metrics calculated over their axes. Multi-dimensional annotations are stored inobsm
andvarm
, which are aligned to the objects observation and variable dimensions respectively. Square matrices representing graphs are stored inobsp
andvarp
, with both of their own dimensions aligned to their associated axis. Additional measurements across both observations and variables are stored inlayers
.Indexing into an AnnData object can be performed by relative position with numeric indices (like pandas’
iloc()
), or by labels (likeloc()
). To avoid ambiguity with numeric indexing into observations or variables, indexes of the AnnData object are converted to strings by the constructor.Subsetting an AnnData object by indexing into it will also subset its elements according to the dimensions they were aligned to. This means an operation like
adata[list_of_obs, :]
will also subsetobs
,obsm
, andlayers
.Subsetting an AnnData object returns a view into the original object, meaning very little additional memory is used upon subsetting. This is achieved lazily, meaning that the constituent arrays are subset on access. Copying a view causes an equivalent “real” AnnData object to be generated. Attempting to modify a view (at any attribute except X) is handled in a copy-on-modify manner, meaning the object is initialized in place. Here’s an example:
batch1 = adata[adata.obs["batch"] == "batch1", :] batch1.obs["value"] = 0 # This makes batch1 a “real” AnnData object
At the end of this snippet:
adata
was not modified, andbatch1
is its own AnnData object with its own data.Similar to Bioconductor’s
ExpressionSet
andscipy.sparse
matrices, subsetting an AnnData object retains the dimensionality of its constituent arrays. Therefore, unlike with the classes exposed bypandas
,numpy
, andxarray
, there is no concept of a one dimensional AnnData object. AnnDatas always have two inherent dimensions,obs
andvar
. Additionally, maintaining the dimensionality of the AnnData object allows for consistent handling ofscipy.sparse
matrices andnumpy
arrays.Attributes
Transpose whole object.
Change to backing mode by setting the filename of a
.h5ad
file.True
if object is view of another AnnData object,False
otherwise.True
if object is backed on disk,False
otherwise.Dictionary-like object with values of the same dimensions as
X
.Number of observations.
Number of variables/features.
One-dimensional annotation of observations (
pd.DataFrame
).Names of observations (alias for
.obs.index
).Multi-dimensional annotation of observations (mutable structured
ndarray
).Pairwise annotation of observations, a mutable mapping with array-like values.
Unstructured annotation (ordered dictionary).
One-dimensional annotation of variables/ features (
pd.DataFrame
).Names of variables (alias for
.var.index
).Multi-dimensional annotation of variables/features (mutable structured
ndarray
).Pairwise annotation of observations, a mutable mapping with array-like values.
Methods
chunk_X
([select, replace])Return a chunk of the data matrix
X
with random or specified indices.chunked_X
([chunk_size])Return an iterator over the rows of the data matrix
X
.concatenate
(*adatas[, join, batch_key, …])Concatenate along the observations axis.
copy
([filename])Full copy, optionally on disk.
obs_keys
()List keys of observation annotation
obs
.obs_names_make_unique
([join])Makes the index unique by appending a number string to each duplicate index element: ‘1’, ‘2’, etc.
obs_vector
(k, *[, layer])Convenience function for returning a 1 dimensional ndarray of values from
X
,layers
[k]
, orobs
.List keys of observation annotation
obsm
.rename_categories
(key, categories)strings_to_categoricals
([df])Transform string annotations to categoricals.
to_df
([layer])Generate shallow
DataFrame
.Load backed AnnData object into memory.
Transpose whole object.
uns_keys
()List keys of unstructured annotation.
var_keys
()List keys of variable annotation
var
.var_names_make_unique
([join])Makes the index unique by appending a number string to each duplicate index element: ‘1’, ‘2’, etc.
var_vector
(k, *[, layer])Convenience function for returning a 1 dimensional ndarray of values from
X
,layers
[k]
, orobs
.List keys of variable annotation
varm
.write
([filename, compression, …])Write
.h5ad
-formatted hdf5 file.write_csvs
(dirname[, skip_data, sep])Write annotation to
.csv
files.write_h5ad
([filename, compression, …])Write
.h5ad
-formatted hdf5 file.write_loom
(filename[, write_obsm_varm])Write
.loom
-formatted hdf5 file.write_zarr
(store[, chunks])Write a hierarchical Zarr array store.