datalad create-sibling-ria(1) | General Commands Manual | datalad create-sibling-ria(1) |
datalad create-sibling-ria - creates a sibling to a dataset in a RIA store
datalad create-sibling-ria [-h] -s NAME [-d DATASET] [--storage-name NAME] [--alias ALIAS] [--post-update-hook] [--shared {false|true|umask|group|all|world|everybody|0xxx}] [--group GROUP] [--storage-sibling MODE] [--existing MODE] [--new-store-ok] [--trust-level TRUST-LEVEL] [-r] [-R LEVELS] [--no-storage-sibling] [--push-url ria+<ssh|file>://<host>[/path]] [--version] ria+<ssh|file|https>://<host>[/path]
Communication with a dataset in a RIA store is implemented via two siblings. A regular Git remote (repository sibling) and a git-annex special remote for data transfer (storage sibling) -- with the former having a publication dependency on the latter. By default, the name of the storage sibling is derived from the repository sibling's name by appending "-storage".
The store's base path is expected to not exist, be an empty directory, or a valid RIA store.
RIA URL format ~~~~~~~~~~~~~~
Interactions with new or existing RIA stores require RIA URLs to identify the store or specific datasets inside of it.
The general structure of a RIA URL pointing to a store takes the form ria+ssh://[user@]hostname:/absolute/path/to/ria-store, or ria+file:///absolute/path/to/ria-store)
The general structure of a RIA URL pointing to a dataset in a store (for example for cloning) takes a similar form, but appends either the datasets UUID or a ~ symbol followed by the dataset's alias name: In addition, specific version identifiers can be appended to the URL with an additional @ symbol:
RIA store layout ~~~~~~~~~~~~~~~~
A RIA store is a directory tree with a dedicated subdirectory for each dataset in the store. The subdirectory name is constructed from the DataLad dataset ID, e.g. '124/68afe-59ec-11ea-93d7-f0d5bf7b5561', where the first three characters of the ID are used for an intermediate subdirectory in order to mitigate files system limitations for stores containing a large number of datasets.
By default, a dataset in a RIA store consists of two components: A Git repository (for all dataset contents stored in Git) and a storage sibling (for dataset content stored in git-annex).
It is possible to selectively disable either component using ``storage-sibling 'off'`` or ``storage-sibling 'only'``, respectively. If neither component is disabled, a dataset's subdirectory layout in a RIA store contains a standard bare Git repository and an 'annex/' subdirectory inside of it. The latter holds a Git-annex object store and comprises the storage sibling. Disabling the standard git-remote ('storage-sibling=only') will result in not having the bare git repository, disabling the storage sibling ('storage-sibling=off') will result in not having the 'annex/' subdirectory.
Optionally, there can be a further subdirectory 'archives' with (compressed) 7z archives of annex objects. The storage remote is able to pull annex objects from these archives, if it cannot find in the regular annex object store. This feature can be useful for storing large collections of rarely changing data on systems that limit the number of files that can be stored.
Each dataset directory also contains a 'ria-layout-version' file that identifies the data organization (as, for example, described above).
Lastly, there is a global 'ria-layout-version' file at the store's base path that identifies where dataset subdirectories themselves are located. At present, this file must contain a single line stating the version (currently "1"). This line MUST end with a newline character.
It is possible to define an alias for an individual dataset in a store by placing a symlink to the dataset location into an 'alias/' directory in the root of the store. This enables dataset access via URLs of format:
Compared to standard git-annex object stores, the 'annex/' subdirectories used as storage siblings follow a different layout naming scheme ('dirhashmixed' instead of 'dirhashlower'). This is mostly noted as a technical detail, but also serves to remind git-annex powerusers to refrain from running git-annex commands directly in-store as it can cause severe damage due to the layout difference. Interactions should be handled via the ORA special remote instead.
Error logging ~~~~~~~~~~~~~
To enable error logging at the remote end, append a pipe symbol and an "l" to the version number in ria-layout-version (like so '1|l0).
Error logging will create files in an "error_log" directory whenever the git-annex special remote (storage sibling) raises an exception, storing the Python traceback of it. The logfiles are named according to the scheme issue with which dataset. Because logging can potentially leak personal data (like local file paths for example), it can be disabled client-side by setting the configuration variable "annex.ora-remote.<storage-sibling-name>.ignore-remote-config".
datalad is developed by The DataLad Team and Contributors <team@datalad.org>.
2023-01-25 | datalad create-sibling-ria 0.18.1 |