fspl(1) | fspl Manual | fspl(1) |
fspl - sequential, distributed job queue processing
fspl [ OPTIONS ] COMMAND [ command_options ]
fspl is the CLI part of the Filespooler (https://www.complete.org/filespooler) package.
fspl is a Unix-style tool that facilitates local or remote command execution, complete with stdin capture, with easy integration with various tools. Here's a brief Filespooler feature list:
Filespooler consists of a command-line tool (fspl) for interacting with queues. It also consists of a Rust library that is used by fspl. main.rs for fspl is just a few lines long.
This manual is the reference for fspl. The filespooler homepage, <https://www.complete.org/filespooler/> contains many examples, instructions on how to integrate with everything from file syncers to encryption tools, and so forth. Please refer to it for further information.
The basic idea is this:
The key way to ensure the ordered processing of the job queue is with a sequence number. This is a 64-bit unsigned integer. It is stored in a seqfile on both the sending and the receiving side. On the sending side, the seqfile is standalone; there is only an accompanying .lock file for it. On the receiving side, the seqfile and its accompanying lock file live within the queue directory.
When the seqfile is referenced on the sending side, it will be created and initialized with the value 1 if it does not already exist. On the receiving side, it is created as part of fspl queue-init.
In either case, the seqfile consists of one newline-terminated line, containing the next number to process. On the sending side, this is used by fspl prepare as the sequence number for the next generated packet. On the receiving side, it is used by fspl queue-process to determine which job to process next (unless changed by --order-by).
The queue has this general layout:
queuedir/ Top-level queue directory
nextseq Sequence file
nextseq.lock Lock file
jobs/ Job files stored here
When passing the --queuedir to one of the fspl queue- commands, you give it the path to the top-level queuedir as shown here.
You are free to create additional directories within the queuedir so long as they don't use one of the names listed above. This can be helpful for receiving queue contents in certain situations.
You can specify --append-only to fspl queue-init, which will cause the nextseq and nextseq.lock files to be omitted. This has the effect of making the queue write-only. This can be useful if you are synchronizing the jobs subdirectory between machines, but still want to be able to use fspl queue-write to add jobs to that folder. It will prevent fspl queue-process from running. You can still inspect an append-only queue with commands like fspl queue-ls and fspl queue-info.
Job files live within queuedir/jobs. They all must follow this naming pattern:
fspl-*.fspl
This pattern is specifically designed to facilitate safe injection of job files into the queue by other tools. Many other tools prepend or append a temporary string to a filename to signify that it has not yet been fully transferred. The Filespooler assumption is that once a file appears in jobs/ with a name matching this pattern, than it has been fully transferred and can be processed at any time.
So long as the filename begins with fspl- and ends with .fspl, you are free to put whatever string you like in the middle. The only other requirement, of course, is that each job must have a unique filename within the directory. To simplify things, you can pipe a job file to fspl queue-write and let that command take care of naming. Or, you can generate a random (or non-random) string yourself in a shell script.
The job file itself consists of a small binary header, which is CRC32-checked. This header is normally less than 100 bytes and the length of it is encoded within the file. Following the header, if --input was given to fspl prepare, whatever was piped to prepare is included as the "payload". This will be piped to the executor command when run by fspl queue-process or fspl stdin-process. The payload is not validated by CRC or length by Filespooler, since this is assumed to be the role of the transport layer. The website contains examples of using GPG or other tools to ensure integrity.
There are three types of job files:
To expand slightly on the discussion above about adding files to the queue:
A common way to do this if your transport tool doesn't use a nice temporary name is to transport the file to an adjacent directory, and then use mv(1) or, better, make a hard link with ln(1) to get the file into the jobs/ directory. Note that in both cases, you must take care that you are not crossing a filesystem boundary; on some platforms such as Linux, mv will revert to copy instead of rename if you cross the boundary and then the assumptions about completeness are violated.
Job files are, by default, stored exactly as laid out above. However, in many cases, it may be desirable to store them "encoded" - compressed or encrypted. In this case, the output from fspl prepare can be piped through, say, gzip and the resulting packet can still be stored in jobs/ by fspl queue-write or any related tool.
Now, however, we arrive at the question: how can Filespooler process a queue containing files that have been compressed, encrypted, or so forth?
Every fspl queue command takes an optional --decoder (or -d) parameter, which is a command string that will be executed by the shell. This decoder command will receive the entire job file (not just the payload) piped to it on stdin, and is expected to write the decoded file to stdout.
The fspl stdin pairs to the queue commands do not accept a decoder parameter, since it is assumed you would do that in the pipeline on the way to the stdin command.
For instance:
date | fspl prepare -s ~/state -i - | gzip | fspl queue-write -q ~/queue fspl queue-ls -q ~/queue -d zcat ID creation timestamp filename 48 2022-05-07T21:07:02-05:00 fspl-48aa52ad-c65c-478a-9d37-123d4bebcb30.fspl
Normally, fspl ignores files that fail to decode the header. If you omit the --decoder, it may just look like your queue is empty. (Using --log-level=debug will illuminate what is happening.)
As mentioned, Filespooler is designed to be used as a distributed, asynchronous, ordered command queue. The homepage contains many more examples. Here is one simple example of using ssh as a transport to get commands to a remote queue:
tar -cpf - /usr/local | fspl prepare -s ~/state -i - | ssh remote queue-write -q ~/queue
fspl is a Rust program. If you don't already have Rust installed, it can be easily installed from <https://www.rust-lang.org/>.
Once Rust is installed, Filespooler can be installed with this command:
cargo install filespooler
From a checked-out source tree, it can be built by running cargo build --release. The executable will then be placed in target/release/xbnet.
You can also obtain pre-built binaries for x86_64 Linux from <https://salsa.debian.org/jgoerzen/filespooler/-/releases> .
fspl prepare will save certain environment variables to the packet, which will be set later at process time. fspl {queue,stdin}-process will set a number of useful environment variables in the execution environment. fspl {queue,stdin}-info will show the environment that will be passed to the commands. See each of these for further discussion.
In general, the commands exit with 0 on success and nonzero on failure. The concept of success and failure can be complicated in some situations; see the discussion of the process command.
These situations explicitly cause a nonzero (error) exit code:
These situations explicitly terminate with success (0):
Next to every seqfile on both the sender and within the queue on the recipient is a file named seqfile.lock. An exclusive lock is held on this file during the following conditions:
fspl will exit with an error code if it cannot obtain the lock when it needs it.
These are situations that explicitly do NOT obtain a lock:
Note that if the queue is being actively processed while a queue-ls is in process, a race condition is possible if a file disappears between the readdir() call and the time the file is opened for reading, which could potentially cause queue-ls to fail. queue-ls intentionally does not attempt to acquire the lock, however, because it would always fail while the queue is being processed in that case, preventing one from being able to list the queue at all while long-running jobs are in process.
Note that fspl queue-write does not need to obtain a lock. The fspl stdin- series of commands also do not obtain a lock.
Taken together, this means that any given queue is intended to be processed sequentially, not in parallel. However, if parallel processing is desired, it is trivial to iterate over the jobs and use fspl stdin-process in whatever custom manner you would like. Also, since queues are so lightweight, there is no problem with creating thousands of them.
These options may be specified for any command, and must be given before the command on the command line.
Every subcomand accepts --help to display a brief summary of options, invoked as: fspl SUBCOMMAND --help .
Generates a packet (job file data) and writes it to stdout. This file can be piped to other programs (particularly fspl queue-write) or saved directly to disk.
Usage:
fspl prepare [ OPTIONS ] -s FILE [ -- PARAMS... ]
In addition to these options, any environment variable beginning with FSPL_SET_ will be saved in the packet and will be set in the execution environment at processing time.
These commands create a non-command packet, one which is either considered to always fail or to always succeed (nop). These two commands take only one option, which is required:
Prints the sequence number that will be used by the next prepare command.
Usage:
fspl prepare-get-next -s FILE
Changes the sequence number that will be used by the next prepare command.
Usage:
fspl prepare-set-next -s FILE ID
These two commands display information about a given packet. This information is printed to stdout in a style that is similar to how the shell sets environment variables. In fact, it shows precisely the environment variables that will be set by a corresponding process command.
stdin-info expects the packet to be piped in to stdin; queue-info will find it in the given queue.
This command will not attempt to read the payload of the file; it will only read the header. (Note that this is not a guarantee that some layer of the system may not try to read a few KB past the header, merely a note that running this command will not try to read all of a 1TB packet.)
Usage:
fspl queue-info [ OPTIONS ] -q DIR -j ID
fspl stdin-info
Options (valid for queue-info only):
Example:
fspl queue-info -q /tmp/queue -j 45 -d zcat FSPL_SEQ=45 FSPL_CTIME_SECS=1651970311 FSPL_CTIME_NANOS=425412511 FSPL_CTIME_RFC3339_UTC=2022-05-08T00:38:31Z FSPL_CTIME_RFC3339_LOCAL=2022-05-07T19:38:31-05:00 FSPL_JOB_FILENAME=fspl-29342606-02a0-438c-81f2-efdfb80afbe9.fspl FSPL_JOB_QUEUEDIR=/tmp/bar FSPL_JOB_FULLPATH=/tmp/bar/jobs/fspl-29342606-02a0-438c-81f2-efdfb80afbe9.fspl FSPL_PARAM_1=hithere FSPL_SET_FOO=bar
Some notes on these variables:
These two commands extract the payload (if any) from the given packet. This is written to stdout. No header or other information is written to stdout.
stdin-payload expect the packet to be piped in to stdin; queue-stdout will find it in the given queue.
The payload will be piped to the command started by the process commands. The payload will be 0-bytes if -i was not passed to fspl prepare, or if an empty payload was given to it.
Usage:
fspl queue-payload [ OPTIONS ] -q DIR -j ID
fspl stdin-payload
Options (valid for queue-payload only):
Process packet(s). stdin-process will process exactly one packet on stdin. queue-process will process zero or more packets, depending on the content of the queue and options given.
Usage:
fspl queue-process [ OPTIONS ] -q DIR COMMAND [ -- PARAMS... ]
fspl stdin-process [ OPTIONS ] COMMAND [ -- PARAMS... ]
Common options:
Options valid only for queue-process:
The environment is set as described above. Note that since no queue directory or filename is relevant with the stdin-process flavor, those variables are unset under stdin-process.
To skip a failing job at the head of the queue, you can use fspl queue-set-next, or alternatively, fspl queue-process --on-error Delete --maxjobs 1 to cause it to be deleted. You would probably not wish to combine this with timestamp ordering.
Changes the sequence number that will be used by the next fspl queue-process command.
Usage:
fspl queue-set-next -q DIR ID
Receives a packet on stdin and writes it to the queue. This command does not bother to decode, process, or validate the packet in any way. It simply writes it to the queue safely, using a temporary filename until completely written, at which point it is renamed to a **fspl-*.fspl** file with a random middle part.
Usage:
fspl queue-write -q DIR
Creates the queue directory and the needed files and subdirectories within it.
Usage:
fspl queue-init -q DIR
Generates a filename matching the fspl-*.fspl pattern, which will be valid for a job file in a Filespooler queue. This is often useful when generating a filename that will be used by a tool other than fspl queue-write.
Usage:
fspl gen-filename
Example:
fspl gen-filename fspl-b3bd6e63-f62c-49ee-8c46-6677069d2c58.fspl
Generates a random UUID and prints it to stdout. This is generated using the same algorithm as fspl queue-write uses. It can be used in scripts for making your own unique filenames.
Usage:
fspl gen-uuid
Example:
fspl gen-uuid 2896c849-37c5-4a6d-8b90-0cf63e3e9daa
Displays the copyright and license information for fspl.
John Goerzen <jgoerzen@complete.org>
<https://www.complete.org/filespooler/>
Copyright (C) 2022 John Goerzen <jgoerzen@complete.org>
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
John Goerzen.
May 2022 | John Goerzen |