| datalad run(1) | General Commands Manual | datalad run(1) |
datalad run - run an arbitrary shell command and record its impact on a dataset.
datalad run [-h] [-d DATASET] [-i PATH] [-o PATH] [--expand {inputs|outputs|both}] [--assume-ready {inputs|outputs|both}] [--explicit] [-m MESSAGE] [--sidecar {yes|no}] [--dry-run {basic|command}] [-J NJOBS] [--version] ...
It is recommended to craft the command such that it can run in the root directory of the dataset that the command will be recorded in. However, as long as the command is executed somewhere underneath the dataset root, the exact location will be recorded relative to the dataset root.
If the executed command did not alter the dataset in any way, no record of the command execution is made.
If the given command errors, a COMMANDERROR exception with the same exit code will be raised, and no modifications will be saved. A command execution will not be attempted, by default, when an error occurred during input or output preparation. This default ``stop`` behavior can be overridden via --on-failure ....
In the presence of subdatasets, the full dataset hierarchy will be checked for unsaved changes prior command execution, and changes in any dataset will be saved after execution. Any modification of subdatasets is also saved in their respective superdatasets to capture a comprehensive record of the entire dataset hierarchy state. The associated provenance record is duplicated in each modified (sub)dataset, although only being fully interpretable and re-executable in the actual top-level superdataset. For this reason the provenance record contains the dataset ID of that superdataset.
A few placeholders are supported in the command via Python format specification. "{pwd}" will be replaced with the full path of the current working directory. "{dspath}" will be replaced with the full path of the dataset that run is invoked on. "{tmpdir}" will be replaced with the full path of a temporary directory. "{inputs}" and "{outputs}" represent the values specified by --input and --output. If multiple values are specified, the values will be joined by a space. The order of the values will match that order from the command line, with any globs expanded in alphabetical order (like bash). Individual values can be accessed with an integer index (e.g., "{inputs[0]}").
Note that the representation of the inputs or outputs in the formatted command string depends on whether the command is given as a list of arguments or as a string (quotes surrounding the command). The concatenated list of inputs or outputs will be surrounded by quotes when the command is given as a list but not when it is given as a string. This means that the string form is required if you need to pass each input as a separate argument to a preceding script (i.e., write the command as "./script {inputs}", quotes included). The string form should also be used if the input or output paths contain spaces or other characters that need to be escaped.
To escape a brace character, double it (i.e., "{{" or "}}").
Custom placeholders can be added as configuration variables under "datalad.run.substitutions". As an example:
Add a placeholder "name" with the value "joe"::
% datalad configuration --scope branch set datalad.run.substitutions.name=joe
% datalad save -m "Configure name placeholder" .datalad/config
Access the new placeholder in a command::
% datalad run "echo my name is {name} >me"
Run an executable script and record the impact on a dataset::
% datalad run -m 'run my script' 'code/script.sh'
Run a command and specify a directory as a dependency for the run. The contents of the dependency will be retrieved prior to running the script::
% datalad run -m 'run my script' -i 'data/*' 'code/script.sh'
Run an executable script and specify output files of the script to be unlocked prior to running the script::
% datalad run -m 'run my script' -i 'data/*' -o 'output_dir/*'
'code/script.sh'
Specify multiple inputs and outputs::
% datalad run -m 'run my script' -i 'data/*' -i 'datafile.txt' -o
'output_dir/*' -o 'outfile.txt' 'code/script.sh'
Use ** to match any file at any directory depth recursively. Single * does not check files within matched directories.::
% datalad run -m 'run my script' -i 'data/**/*.dat' -o 'output_dir/**'
'code/script.sh'
datalad is developed by The DataLad Team and Contributors <team@datalad.org>.
| 2025-06-15 | datalad run 1.1.5 |