anndata - Annotated data#
anndata is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface.
Discuss development on GitHub.
Read the documentation.
Ask questions on the scverse Discourse.
Install via
pip install anndataorconda install anndata -c conda-forge.See Scanpy’s documentation for usage related to single cell data. anndata was initially built for Scanpy.
anndata is part of the scverse project (website, governance) and is fiscally sponsored by NumFOCUS. Please consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.
Citation#
If you use anndata in your work, please cite the anndata pre-print as follows:
anndata: Annotated data
Isaac Virshup, Sergei Rybakov, Fabian J. Theis, Philipp Angerer, F. Alexander Wolf
bioRxiv 2021 Dec 19. doi: 10.1101/2021.12.16.473007.
You can cite the scverse publication as follows:
The scverse project provides a computational ecosystem for single-cell omics data analysis
Isaac Virshup, Danila Bredikhin, Lukas Heumos, Giovanni Palla, Gregor Sturm, Adam Gayoso, Ilia Kats, Mikaela Koutrouli, Scverse Community, Bonnie Berger, Dana Pe’er, Aviv Regev, Sarah A. Teichmann, Francesca Finotello, F. Alexander Wolf, Nir Yosef, Oliver Stegle & Fabian J. Theis
Nat Biotechnol. 2023 Apr 10. doi: 10.1038/s41587-023-01733-8.
News#
Muon paper published 2022-02-02#
Muon has been published in Genome Biology [^cite_bredikhin22].
Muon is a framework for multimodal data built on top of AnnData.
COVID-19 datasets distributed as h5ad 2020-04-01#
In a joint initiative, the Wellcome Sanger Institute, the Human Cell Atlas, and the CZI distribute datasets related to COVID-19 via anndata’s h5ad files: covid19cellatlas.org.
Latest additions#
Version 0.9#
0.9.2 the future#
Bugfix
Views of
awkward.Arrays now work withawkward>=2.3#1040 @ivirshupFix ufuncs of views like
adata.X[:10].cov(axis=0)returning views #1043 @flying-sheepFix instantiating AnnData where
.Xis aDataFramewith an integer valued index #1002 @flying-sheepFix
read_zarr()when used onzarr.Group#1057 @ivirshup
0.9.1 2023-04-11#
Bugfix
0.9.0 2023-04-11#
Features
Added experimental support for dask arrays #813 @syelman @rahulbshrestha
obsm,varmandunscan now hold AwkwardArrays #647 @giovp, @grst, @ivirshupAdded experimental functions
anndata.experimental.read_dispatched()andanndata.experimental.write_dispatched()which allow customizing IO with a callback #873 @ilan-gold @ivirshupBetter error messages during IO #734 @flying-sheep, @ivirshup
Unordered categorical columns are no longer cast to object during
anndata.concat()#763 @ivirshup
Documentation
New tutorials for experimental features
File format description now includes a more formal specification #882 @ivirshup
Interoperability: new page on interoperability with other packages #831 @ivirshup
Expanded docstring more documentation for
backedargument ofanndata.read_h5ad()#812 @jeskowagnerDocumented how to use alternative compression methods for the
h5adfile format, seeAnnData.write_h5ad()#857 @nigeil
Breaking changes
The
AnnDatadtypeargument no longer defaults tofloat32#854 @ivirshupPreviously deprecated
force_densearugmentAnnData.write_h5ad()has been removed. #855 @ivirshupPreviously deprecated behaviour around storing adjacency matrices in
unshas been removed #866 @ivirshup
Other updates
Deprecations
AnnData.concatenate()is now deprecated in favour ofanndata.concat()#845 @ivirshup
Bug fixes
Fixed order dependent outer concatenation bug #904 @ivirshup, reported by @szalata
Fixed bug in renaming categories #790 @ivirshup, reported by @perrin-isir
Fixed IO bug when keys in
unsended in_categories#806 @ivirshup, reported by @HrovatinFixed
raw.to_adatanot populatingobsaligned values whenrawwas assigned through the setter #939 @ivirshup
Version 0.8#
0.8.1 the future#
Bug fixes
Fix warning from
rename_categories#790 I VirshupRemove backwards compat checks for categories in
unswhen we can tell the file is new enough #790 I VirshupCategorical arrays are now created with a python
boolinstead of anumpy.bool_#856
Documentation
0.8.0 14th March, 2022#
IO Specification
Warning
The on disk format of AnnData objects has been updated with this release.
Previous releases of anndata will not be able to read all files written by this version.
For discussion of possible future solutions to this issue, see #698
Internal handling of IO has been overhauled.
This should make it much easier to support new datatypes, use partial access, and use AnnData internally in other formats.
Each element should be tagged with an
encoding_typeandencoding_version. See updated docs on the file formatSupport for nullable integer and boolean data arrays. More data types to come!
Experimental support for low level access to the IO API via
read_elem()andwrite_elem()
Features
Added PyTorch dataloader
AnnLoaderand lazy concatenation objectAnnCollection. See the tutorials #416 S RybakovCompatibility with
h5adfiles written from Julia #569 I KatsMany logging messages that should have been warnings are now warnings #650 I Virshup
Significantly more efficient
anndata.read_umi_tools()#661 I VirshupFixed deepcopy of a copy of a view retaining sparse matrix view mixin type #670 M Klein
In many cases
Xcan now beNone#463 R Cannoodt #677 I Virshup. Remaining work is documented in #467.Removed hard
xlrddependency I Virshupobsandvardataframes are no longer copied by default onAnnDatainstantiation #371 I Virshup
Bug fixes
Fixed issue where
.copywas creating sparse matrices views when copying #670 michalk8Fixed issue where
.Xmatrix read in fromzarrwould always havefloat32values #701 I VirshupRaw.to_adata`now includesobspin the output #404 G Eraslan
Dependencies
xlrddropped as a hard dependencyNow requires
h5pyv3.0.0or newer