anndata.experimental.AnnCollection

class anndata.experimental.AnnCollection(adatas, join_obs='inner', join_obsm=None, join_vars=None, label=None, keys=None, index_unique=None, convert=None, harmonize_dtypes=True, indices_strict=True)

Lazily concatenate AnnData objects along the obs axis.

This class doesn’t copy data from underlying AnnData objects, but lazily subsets using a joint index of observations and variables. It also allows on-the-fly application of prespecified converters to .obs attributes of the AnnData objects.

Subsetting of this object returns an AnnCollectionView, which provides views of .obs, .obsm, .layers, .X from the underlying AnnData objects.

Parameters

adatas : Sequence[AnnData] | {str: AnnData}Union[Sequence[AnnData], Dict[str, AnnData]]: The objects to be lazily concatenated. If a Mapping is passed, keys are used for the keys argument and values are concatenated.
join_obs : {‘inner’, ‘outer’} | NoneOptional[Literal[‘inner’, ‘outer’]] (default: 'inner'): If “inner” specified all .obs attributes from adatas will be inner joined and copied to this object. If “outer” specified all .obsm attributes from adatas will be outer joined and copied to this object. For “inner” and “outer” subset objects will access .obs of this object, not the original .obs attributes of adatas. If None, nothing is copied to this object’s .obs, a subset object will directly access .obs attributes of adatas (with proper reindexing and dtype conversions). For None`the inner join rule is used to select columns of `.obs of adatas.
join_obsm : {‘inner’} | NoneOptional[Literal[‘inner’]] (default: None): If “inner” specified all .obsm attributes from adatas will be inner joined and copied to this object. Subset objects will access .obsm of this object, not the original .obsm attributes of adatas. If None, nothing is copied to this object’s .obsm, a subset object will directly access .obsm attributes of adatas (with proper reindexing and dtype conversions). For both options the inner join rule for the underlying .obsm attributes is used.
join_vars : {‘inner’} | NoneOptional[Literal[‘inner’]] (default: None): Specify how to join adatas along the var axis. If None, assumes all adatas have the same variables. If “inner”, the intersection of all variables in adatas will be used.
label : str | NoneOptional[str] (default: None): Column in .obs to place batch information in. If it’s None, no column is added.
keys : Sequence[str] | NoneOptional[Sequence[str]] (default: None): Names for each object being added. These values are used for column values for label or appended to the index if index_unique is not None. Defaults to incrementing integer labels.
index_unique : str | NoneOptional[str] (default: None): Whether to make the index unique by using the keys. If provided, this is the delimeter between “{orig_idx}{index_unique}{key}”. When None, the original indices are kept.
convert : Callable | {str: Callable} | {str: {str: Callable}} | NoneUnion[Callable, Dict[str, Callable], Dict[str, Dict[str, Callable]], None] (default: None): You can pass a function or a Mapping of functions which will be applied to the values of attributes (.obs, .obsm, .layers, .X) or to specific keys of these attributes in the subset object. Specify an attribute and a key (if needed) as keys of the passed Mapping and a function to be applied as a value.
harmonize_dtypes : bool (default: True): If True, all retrieved arrays from subset objects will have the same dtype.
indices_strict : bool (default: True): If True, arrays from the subset objects will always have the same order of indices as in selection used to subset. This parameter can be set to False if the order in the returned arrays is not important, for example, when using them for stochastic gradient descent. In this case the performance of subsetting can be a bit better.

Examples

>>> from scanpy.datasets import pbmc68k_reduced, pbmc3k_processed
>>> adata1, adata2 = pbmc68k_reduced(), pbmc3k_processed()
>>> adata1.shape
(700, 765)
>>> adata2.shape
(2638, 1838)
>>> dc = AnnCollection([adata1, adata2], join_vars='inner')
>>> dc
AnnCollection object with n_obs × n_vars = 3338 × 208
  constructed from 2 AnnData objects
    view of obsm: 'X_pca', 'X_umap'
    obs: 'n_genes', 'percent_mito', 'n_counts', 'louvain'
>>> batch = dc[100:200] # AnnCollectionView
>>> batch
AnnCollectionView object with n_obs × n_vars = 100 × 208
    obsm: 'X_pca', 'X_umap'
    obs: 'n_genes', 'percent_mito', 'n_counts', 'louvain'
>>> batch.X.shape
(100, 208)
>>> len(batch.obs['louvain'])
100

Attributes

`attrs_keys`	Dict of all accessible attributes and their keys.
`convert`	On the fly converters for keys of attributes and data matrix.
`has_backed`	`True` if `adatas` have backed AnnData objects, `False` otherwise.
`n_obs`	Number of observations.
`n_vars`	Number of variables/features.
`obs`	One-dimensional annotation of observations.
`obsm`	Multi-dimensional annotation of observations.
`shape`	Shape of the lazily concatenated data matrix

Methods

`iterate_axis`(batch_size[, axis, shuffle, …])	Iterate the lazy object over an axis.
`lazy_attr`(attr[, key])	Get a subsettable key from an attribute (array-like) or an attribute.
`to_adata`()	Convert this AnnCollection object to an AnnData object.