TAKTUK(1) | TakTuk Deployment Engine | TAKTUK(1) |
TakTuk - a tool for large scale remote execution deployment
taktuk [-behinsvzMP] [-c connector] [-d limit] [-f filename] [-g duration] [-l login] [-m hostname [-[ args ... -]]] [-o stream=template] [-t timeout] [-u limit] [-w size] [-B parameter=expression] [-C separators] [-E character] [-F filename] [-G hostname [-[ args ... -]]] [-L hostname] [-I interpreter] [-O separators] [-R stream=filename] [-S files] [-T command] [-V path ] [-W scheme] [commands ... ]
TakTuk is a tool for broadcasting the remote execution of one ore more commands to a set of one or more distant machines. TakTuk combines local parallelization (using concurrent deployment processes) and work distribution (using an adaptive work-stealing algorithm) to achieve both scalability and efficiency.
TakTuk is especially suited to interactive tasks involving several distant machines and parallel remote executions. This is the case of clusters administration and parallel program debugging.
TakTuk also provides a basic communication layer to programs it executes. This communication layer uses the communication infrastructure set up by TakTuk during its deployment. It is available both for the Perl and the C langages and is described in TakTuk(3) and taktukcomm(3) respectively.
Caution, in TakTuk options are parsed in the order given on the command line. This means that TakTuk is not POSIX compliant regarding options order. This is important as some of the options change the behavior of following ones (and only these ones - e.g. -l applies to following -m options). The defaults settings of TakTuk can be obtained by using the "--print-defaults" option. The following options are given by category in alphabetical order.
0 - TakTuk is ready 1 - TakTuk is numbered 2 - TakTuk terminated 3 - connection failed 4 - connection initialized 5 - connection lost 6 - command started 7 - command failed 8 - command terminated 9 - numbering update failed 10 - pipe input started 11 - pipe input failed 12 - pipe input terminated 13 - file reception started 14 - file reception failed 15 - file reception terminated 16 - file send failed 17 - Invalid target 18 - No target 19 - Message delivered 20 - Invalid destination 21 - Destination not available anymore 22 - Wait complete 23 - Wait reduce complete
The function event_msg($) can be used in the template to translate this code into a string that describes the event. Relevant fields include $host, $position, $rank, $count and others listed below.
a template is a Perl expression that should evaluate to the string eventually displayed. Within a specification, some variables might be used depending on the concerned stream:
at the end of the day, the specification is evaluated for each line of the concerned stream and the result printed on the root node. Note that the newline has to be added explicitly as "\n" in the template if needed.
WARNING: take care of your specification, if the Perl syntax is not correct lots of awfull compilation error messages will be displayed and TakTuk execution will fail.
WARNING: use this option only before any remote node specification ("-m" or "-f") otherwise you might get serious synchronization issues in TakTuk. Using TakTuk point-to-point communication along with this option will fail and produce TakTuk warnings.
Nevertheless, if you use TakTuk to transfer large files, because of I/O bandwidth disparities in various parts of a system, TakTuk memory use might grow too large and performance can be severely degraded when the system starts swapping. In such situations, limiting the size of the internal cache will keep TakTuk in main memory and preserve the performance.
These option are not useful for most users. They are used either internally by TakTuk itself or for development purposes.
After the options parsing, TakTuk expects some commands either on the remaining of the command line (batch mode) or on the standard input (interactive mode). These commands are actions to be performed by TakTuk using the logical network infrastructure set up during the deployment. By default, commands might be separated by ; or newlines. For all the commands, any non ambiguous prefix can be used instead of their full name. In interactive mode, TakTuk has support for "readline" (history, command line editing) if installed on your system.
When TakTuk commands accept arguments, they should be enclosed into matching delimiters (indicated by * below). In other words, * might be replaced either by any non alphanumeric character or by a pair of matching braces, brackets or parenthesis. These delimiters must be separated from their content (using the options separator). If the argument contains a closing delimiter preceded by a separator, then it is probably a good idea to escape it (see -E option) or to protect the whole arguments string if given on the command line.
Taktuk understands the following commands:
WARNING: this command is not atomic. If you manage somehow to initiate a input file command from two different TakTuk instances, data will probably be interleaved. In this case you should synchronize the two instances. This is not required when spreading files only from the root node.
WARNING: this command is not atomic. If you manage somehow to initiate a message file command from two different TakTuk instances, data will probably be interleaved. In this case you should synchronize the two instances. This is not required when spreading files only from the root node.
WARNING: new nodes added to TakTuk network using this command are not numbered. Further use of network renumbering or update is necessary to get TakTuk logical numbering.
The TakTuk command "exec" accepts optional parameters. These parameters are used to specify a target id for the command, or to attach actions triggered by timeouts to commands execution. An "exec" command accepts any number of parameters. These parameters are interpreted from left to right using the following syntax:
Caution: this value overwrite any target id that could have been automatically assigned by TakTuk. Therefore, it is not recommended to mix the use of explicitly assigned target ids with the use of automatically assigned target ids.
Notice that each timeout can have any number of attached callbacks. They will be processed in the order they are given as parameters.
To change some default setting use the variable TAKTUK_NAME where NAME is the name of the according long option in upper case and with dashes replaced by underscores. For option taking complex value (such as "--debug") just add an underscore and the field you want to change in upper case at the end of the name. Using "taktuk --print-defaults" will give you examples of names used to change default settings. Note that defining in the environment a default setting not used by TakTuk has no effect.
You can also change some TakTuk default settings locally without propagating the change in the deployment tree. To do this, use the variable TAKTUK_MY_NAME where name is defined as above. As before, these local settings are overridden by propagated settings and command line options.
Hostnames given to TakTuk might be simple machine name or complex hosts lists specifications. In its general form, an hostname is made of an host set and an optional exclusion set separated by a slash. Each of those sets is a comma separated list of host templates. Each of these templates is made of constant part (characters outside brackets) and optional range parts (characters inside brackets). Each range part is a comma separated list of intervals or single values. Each interval is made of two single values separated by a dash. This is true for all hostnames given to TakTuk (both with -m or -f options).
In other words, the following expressions are valid host
specifications:
node1
node[19]
node[1-3]
node[1-3],otherhost/node2
node[1-3,5]part[a-b]/node[3-5]parta,node1partb
they respectively expand to:
node1
node19
node1 node2 node3
node1 node3 otherhost
node1parta node2parta node2partb node3partb node5partb
Notice that these list of values are not regular expressions ("node[19]" is "node19" and not "node1, node2, ...., node9"). Intervals are implemented using the perl magical auto increment feature, thus you can use alphanumeric values as interval bounds (see perl documentation, operator ++ for limitations of this auto increment).
The TakTuk command line and the "TakTuk::send" routine accept a set specification as destination host(s). A set specification is made of interval specifications separated by slashes. An interval specification is either made of a single number, two numbers separated by a dash or a single number followed by a plus symbol (this last case match the interval that goes from the number to the highest numbered TakTuk destination). Of course the two numbers specifying an interval must be given in increasing order.
The remote peers included in a set specification are all the peer which logical number belong to at least one interval of the set. Here are some exemples of set specifications :
1 the peer numbered 1 2-7 the peers numbered 2,3,4,5,6 and 7 2-4/1/10 the peers numbered 1,2,3,4 an 10 3+ the peers from 3 to the highest numbered 5+/1 the peers from 5 to the highest numbered and the peer 1
The target number is a number assigned by TakTuk to all processes it executes (successfully started or not using "exec" or "taktuk_perl" commands). By default, this number starts from 0 and goes to the total number of processes that have been executed since TakTuk launch minus one. Target processes of a "TakTuk::send" or a TakTuk command can be expressed with the same syntax as in the case of sets.
Furthermore, TakTuk understands several special targets. The special target "all" targets all processes: this means that the command is applied to all executing local processes (message or input data are duplicated and sent to all of them), this is the default for the "input" and "kill" commands. The special target "any" targets the first eligible process. In the case of a message this is the first process that issues a "TakTuk::recv" and that is not already the target of another message, this is the default for the "message" command. Finally, the special target "output" targets the output stream "message" rather than a process.
The following examples illustrate the basic use of TakTuk on a few machines and the use of developer options. Notice that TakTuk is designed to scale to much more peers than the number involved in these examples.
taktuk -s -m toto.nowhere.com broadcast exec [ hostname ]
In this example, "-s" asks TakTuk to propagate its own code on remote hosts. It can be removed by installing the "taktuk" executable on "toto.nowhere.com". By the following we will assume that TakTuk is installed on all the remote hosts.
The "-m toto.nowhere.com" describe the set of remote hosts to be contacted by TakTuk and "broadcast exec [ hostname ]" is a command that will be executed by the TakTuk interpreter.
This example can be written in many other ways. In interactive mode, the same execution might become:
taktuk -m toto.nowhere.com
here TakTuk is blocked waiting for commands from stdin. Thus, we just have to type:
broadcast exec { hostname } Ctrl-D
here you can notice that parameters to the "exec" TakTuk command (as all commands parameters) can be enclosed in any reasonable pair of delimiters. We might also write the list of hosts involved in the command in a file "machine" that contains:
toto.nowhere.com
and the TakTuk command becomes:
taktuk -f machine broadcast exec - hostname -
We could also use another file "options" that contains:
-f machine
and use it as the options line given to TakTuk:
taktuk -F options broadcast exec \( hostname \)
Finally, everything could be stored in a last file "command_line" that contains:
-f machine broadcast exec = hostname =
and the following command achieve the same result:
taktuk -F command_line
All of these variants have the same effect: they execute "hostname" on "toto.nowhere.com" and the output of the program is forwarded to the localhost. In this case:
toto.nowhere.com: hostname: somepid: output > toto.nowhere.com
taktuk -m localhost broadcast exec [ 'if [ $RANDOM -gt 10000 ];then echo greater;else echo lower;fi' ]
In this example, quotes are necessary to prevent the shell from interpreting the "$" and ";" characters and to prevent the closing brace for "if" toe be considered as closing the "exec" command. In this case the variable will be interpolated only on remote hosts. This same example can also be expressed using shortcuts and intercative mode:
taktuk -m localhost -E%
then type:
b e [ if [ $RANDOM -gt 10000 %];then echo greater;else echo lower;fi ] Ctrl-D
Notice the closing bracket used in the test that should not be interpreted as the closing bracket for "exec" arguments. In such case, a simpler solution is probably to use another kind of braces
taktuk -m localhost
and then:
b e { if [ $RANDOM -gt 10000 ];then echo
greater;else echo lower;fi }
Ctrl-D
Usually, if you want to be safe, you can quote all commands parameters. Nevertheless, notice that parameters should not be quoted in interactive mode as input lines are not interpreted by the shell.
taktuk -m localhost broadcast exec timeout 2 [ sleep 10 ]
the callback executed when a timeout occurs can also be something else than a TERM signal. This can be another signal (KILL for instance):
taktuk -m localhost broadcast exec timeout 2 kill 9 [ sleep 10 ]
or any valid TakTuk command:
taktuk -m localhost broadcast exec timeout 2 action broadcast exec [ echo hello ] [ sleep 10 ]
or even several timeouts and several callbacks:
taktuk -m localhost b e t 2 a e [ echo hello ] k 30 t 10 k 9 [ sleep 5 ]
in this last example, the command "sleep 5" is executed by TakTuk. After 2 seconds, the first timeout will be triggered, it will execute the command "echo hello" and send a USR1 signal to the first command ("sleep 5"). The second timeout is set to 10 seconds. Thus, it will never occur as the "sleep 5" command will be terminated before its expiration.
Notice that it is usually a bad idea to use a too large window as it results in too much local load and bad distribution of work (something like 10 is often sufficient).
You can also force TakTuk to use more specific topologies. For instance, to execute "echo $$" using a flat-tree as deployment topology, just disable work-stealing in TakTuk:
taktuk -d -1 -m host1 -m host2 -m host3 broadcast exec [ 'echo $$' ]
and to use a chain-like topology, either encode the topology in arguments structure:
taktuk -m host1 -[ -m host2 -[ -m host3 -] -] broadcast exec [ 'echo $$' ]
or limits the arity of the dynamic tree to 1:
taktuk -d 1 -m host1 -m host2 -m host3 broadcast exec [ 'echo $$' ]
Finally, the default will use a dynamically constructed topology:
taktuk -d 0 -m host1 -m host2 -m host3 broadcast exec [ 'echo $$' ]
taktuk -b -m node1.cluster1 -m node2.cluster1 -m node3.cluster1 -m node4.cluster1 -e -b -m node1.cluster2 -m node2.cluster2 -m node3.cluster2 -m node4.cluster2 -e broadcast exec [ hostname ]
This command has the effect of deploying TakTuk on two clusters (cluster 1 and 2) made of four nodes (node 1 to 4), preventing deployed nodes from one cluster to be used to deploy nodes from the other cluster. Finally, once the deployment is complete, it executes the command "hostname" on all these nodes.
./taktuk -m host1 -[ exec [ hostname ] -] -m host2 -[ exec [ id ] -] -m host3 -[ exec [ 'echo $TAKTUK_RANK; ls' ] -] quit
but this could also be given using set specification (in this case logical number are used for hosts):
./taktuk -m host1 -m host3 -m host8 1 exec [ hostname ], 2 exec [ id ], 3 exec [ 'echo $TAKTUK_RANK; ls' ]
or in interactive mode:
./taktuk -m host1 -m host3 -m host8 1 exec [ hostname ] 2 exec [ id ] 3 exec [ echo $TAKTUK_RANK; ls ] Ctrl-D
Nevertheless keep in mind that in general these logical numbers do not match the position of hosts on the command line.
taktuk -s -m host1 -m host2 -m host3 broadcast exec [ perl -- - ] broadcast input file [ essai.pl ] broadcast input close Ctrl-D
copying a file named "message.txt" to the "/tmp" directory of each remote host is thus as easy as:
taktuk -s -m host1 -m host2 -m host3 broadcast put [ message.txt ] [ /tmp ] Ctrl-D
but the older method still works (and does almost the same as the previous command):
taktuk -s -m host1 -m host2 -m host3 broadcast exec [ cat - >/tmp/message.txt ] broadcast input file [ message.txt ] broadcast input close Ctrl-D
although it requires to be more careful about shell interpretation when typing everything directly on the command line:
taktuk -s -m host1 -m host2 -m host3 broadcast exec [ 'cat - >/tmp/message.txt' ]\;broadcast input file [ message.txt ]
notice in this latter command that the "input close" is not necessary as TakTuk closes inputs of all spawned commands when quitting.
the "get" command also makes possible things that were previously very difficult in TakTuk, files collecting. The following command gets the file "/tmp/message.txt" from each remote host and copies it locally to "message-number.txt" where "number" is the logical rank of the source node:
taktuk -s -m host1 -m host2 -m host3 broadcast get [ /tmp/message.txt ] [ message-$rank.txt ] Ctrl-D
finally, it seems important to mention that "put/get" commands can copy directories and keep files permissions unchanged.
my $rank = TakTuk::get('rank'); my $count = TakTuk::get('count'); if ($rank == 1) { print "I'm process 1\n"; if ($count > 1) { TakTuk::send(to=>2, body=>"Hello world"); } } elsif ($rank == 2) { print "I'm process 2\n"; my ($from, $message) = TakTuk::recv(); print "Process $to received $message from $from\n"; }
then the execution of the following command:
taktuk -m localhost -m localhost broadcast taktuk_perl [ - ]\;broadcast input file [ communication.pl ]
would produce an output similar to:
Astaroth.local: taktuk_perl: 3523: output > I'm process 2 Astaroth.local: taktuk_perl: 3523: output > Process 2 received Hello world from 1 Astaroth.local: taktuk_perl: 3523: status > 0 Astaroth.local: taktuk_perl: 3524: output > I'm process 1 Astaroth.local: taktuk_perl: 3524: status > 0
if the file "communication.pl" was placed in the login directory of the user, this could have also been executed by the more simple:
taktuk -m localhost -m localhost broadcast taktuk_perl [ communication.pl ]
taktuk -o status -m host1 -m host2 broadcast exec [ 'echo $TAKTUK_RANK' ]
or removing the prompt before each line of output from commands:
taktuk -o output='"$line\n"' -m host1 -m host2 broadcast exec [ 'echo $TAKTUK_RANK' ]
or even changing the prompt to make it display only the stream type:
taktuk -o default='"$type > $line\n"' -m host1 -m host2 broadcast exec [ 'echo $TAKTUK_RANK' ]
and it also possible to redirect the status to file descriptor 2 only for the second host:
taktuk -m host1 -R status=2 -m host2 broadcast exec [ 'echo $TAKTUK_RANK' ]
and so on...
By default the debugging level of packages is set to 2 (everything is printed out except "debug" messages). It might be changed for each package using the -D option. For instance the following code executes "true" on "toto.nowhere.com" and prints out every bit of internal messaging:
taktuk -D default=1 -m toto.nowhere.com broadcast exec [ true ]
but one could have executed the same command keeping only messages from the "scheduler" package:
taktuk -D scheduler=1 -m toto.nowhere.com broadcast exec [ true ]
or ensuring an execution exempted of any warning or error messages:
taktuk -D default=4 -m toto.nowhere.com broadcast exec [ true ]
taktuk -r
Notice that in this mode the behavior of TakTuk can seem very cryptic. This is not intended for ordinary users.
The development of TakTuk is still in progress, so there are propably numbers of bugs. For now, the following characteristics (some of them are not really bugs) have been identified :
You might also want to have a look at:
http://taktuk.gforge.inria.fr/Bugs.txt
where all the temporary bugs are listed version by version.
The original concept of TakTuk has been proposed by Cyrille Martin in his PhD thesis. People involved in this work include Jacques Briat, Olivier Richard, Thierry Gautier and Guillaume Huard.
The author of the version 3 (perl version) and current maintainer of the package is Guillaume Huard.
TakTuk is provided under the terms of the GNU General Public License version 2 or later.
2022-07-28 | perl v5.34.0 |