XZ(1) | XZ Utils | XZ(1) |
xz, unxz, xzcat, lzma, unlzma, lzcat - Compress or decompress .xz and .lzma files
xz [option...] [file...]
unxz is equivalent to xz --decompress.
xzcat is equivalent to xz --decompress --stdout.
lzma is equivalent to xz --format=lzma.
unlzma is equivalent to xz --format=lzma --decompress.
lzcat is equivalent to xz --format=lzma --decompress
--stdout.
When writing scripts that need to decompress files, it is recommended to always use the name xz with appropriate arguments (xz -d or xz -dc) instead of the names unxz and xzcat.
xz is a general-purpose data compression tool with command line syntax similar to gzip(1) and bzip2(1). The native file format is the .xz format, but the legacy .lzma format used by LZMA Utils and raw compressed streams with no container format headers are also supported. In addition, decompression of the .lz format used by lzip is supported.
xz compresses or decompresses each file according to the selected operation mode. If no files are given or file is -, xz reads from standard input and writes the processed data to standard output. xz will refuse (display an error and skip the file) to write compressed data to standard output if it is a terminal. Similarly, xz will refuse to read compressed data from standard input if it is a terminal.
Unless --stdout is specified, files other than - are written to a new file whose name is derived from the source file name:
If the target file already exists, an error is displayed and the file is skipped.
Unless writing to standard output, xz will display a warning and skip the file if any of the following applies:
After successfully compressing or decompressing the file, xz copies the owner, group, permissions, access time, and modification time from the source file to the target file. If copying the group fails, the permissions are modified so that the target file doesn't become accessible to users who didn't have permission to access the source file. xz doesn't support copying other metadata like access control lists or extended attributes yet.
Once the target file has been successfully closed, the source file is removed unless --keep was specified. The source file is never removed if the output is written to standard output or if an error occurs.
Sending SIGINFO or SIGUSR1 to the xz process makes it print progress information to standard error. This has only limited use since when standard error is a terminal, using --verbose will display an automatically updating progress indicator.
The memory usage of xz varies from a few hundred kilobytes to several gigabytes depending on the compression settings. The settings used when compressing a file determine the memory requirements of the decompressor. Typically the decompressor needs 5 % to 20 % of the amount of memory that the compressor needed when creating the file. For example, decompressing a file created with xz -9 currently requires 65 MiB of memory. Still, it is possible to have .xz files that require several gigabytes of memory to decompress.
Especially users of older systems may find the possibility of very large memory usage annoying. To prevent uncomfortable surprises, xz has a built-in memory usage limiter, which is disabled by default. While some operating systems provide ways to limit the memory usage of processes, relying on it wasn't deemed to be flexible enough (for example, using ulimit(1) to limit virtual memory tends to cripple mmap(2)).
The memory usage limiter can be enabled with the command line option --memlimit=limit. Often it is more convenient to enable the limiter by default by setting the environment variable XZ_DEFAULTS, for example, XZ_DEFAULTS=--memlimit=150MiB. It is possible to set the limits separately for compression and decompression by using --memlimit-compress=limit and --memlimit-decompress=limit. Using these two options outside XZ_DEFAULTS is rarely useful because a single run of xz cannot do both compression and decompression and --memlimit=limit (or -M limit) is shorter to type on the command line.
If the specified memory usage limit is exceeded when decompressing, xz will display an error and decompressing the file will fail. If the limit is exceeded when compressing, xz will try to scale the settings down so that the limit is no longer exceeded (except when using --format=raw or --no-adjust). This way the operation won't fail unless the limit is very small. The scaling of the settings is done in steps that don't match the compression level presets, for example, if the limit is only slightly less than the amount required for xz -9, the settings will be scaled down only a little, not all the way down to xz -8.
It is possible to concatenate .xz files as is. xz will decompress such files as if they were a single .xz file.
It is possible to insert padding between the concatenated parts or after the last part. The padding must consist of null bytes and the size of the padding must be a multiple of four bytes. This can be useful, for example, if the .xz file is stored on a medium that measures file sizes in 512-byte blocks.
Concatenation and padding are not allowed with .lzma files or raw streams.
In most places where an integer argument is expected, an optional suffix is supported to easily indicate large integers. There must be no space between the integer and the suffix.
The special value max can be used to indicate the maximum integer value supported by the option.
If multiple operation mode options are given, the last one takes effect.
Preset | DictSize | CompCPU | CompMem | DecMem |
-0 | 256 KiB | 0 | 3 MiB | 1 MiB |
-1 | 1 MiB | 1 | 9 MiB | 2 MiB |
-2 | 2 MiB | 2 | 17 MiB | 3 MiB |
-3 | 4 MiB | 3 | 32 MiB | 5 MiB |
-4 | 4 MiB | 4 | 48 MiB | 5 MiB |
-5 | 8 MiB | 5 | 94 MiB | 9 MiB |
-6 | 8 MiB | 6 | 94 MiB | 9 MiB |
-7 | 16 MiB | 6 | 186 MiB | 17 MiB |
-8 | 32 MiB | 6 | 370 MiB | 33 MiB |
-9 | 64 MiB | 6 | 674 MiB | 65 MiB |
Preset | DictSize | CompCPU | CompMem | DecMem |
-0e | 256 KiB | 8 | 4 MiB | 1 MiB |
-1e | 1 MiB | 8 | 13 MiB | 2 MiB |
-2e | 2 MiB | 8 | 25 MiB | 3 MiB |
-3e | 4 MiB | 7 | 48 MiB | 5 MiB |
-4e | 4 MiB | 8 | 48 MiB | 5 MiB |
-5e | 8 MiB | 7 | 94 MiB | 9 MiB |
-6e | 8 MiB | 8 | 94 MiB | 9 MiB |
-7e | 16 MiB | 8 | 186 MiB | 17 MiB |
-8e | 32 MiB | 8 | 370 MiB | 33 MiB |
-9e | 64 MiB | 8 | 674 MiB | 65 MiB |
A custom filter chain allows specifying the compression settings in detail instead of relying on the settings associated to the presets. When a custom filter chain is specified, preset options (-0 ... -9 and --extreme) earlier on the command line are forgotten. If a preset option is specified after one or more custom filter chain options, the new preset takes effect and the custom filter chain options specified earlier are forgotten.
A filter chain is comparable to piping on the command line. When compressing, the uncompressed input goes to the first filter, whose output goes to the next filter (if any). The output of the last filter gets written to the compressed file. The maximum number of filters in the chain is four, but typically a filter chain has only one or two filters.
Many filters have limitations on where they can be in the filter chain: some filters can work only as the last filter in the chain, some only as a non-last filter, and some work in any position in the chain. Depending on the filter, this limitation is either inherent to the filter design or exists to prevent security issues.
A custom filter chain is specified by using one or more filter options in the order they are wanted in the filter chain. That is, the order of filter options is significant! When decoding raw streams (--format=raw), the filter chain is specified in the same order as it was specified when compressing.
Filters take filter-specific options as a comma-separated list. Extra commas in options are ignored. Every option has a default value, so you need to specify only those you want to change.
To see the whole filter chain and options, use xz -vv (that is, use --verbose twice). This works also for viewing the filter chain options used by presets.
Filter | Alignment | Notes |
x86 | 1 | 32-bit or 64-bit x86 |
ARM | 4 | |
ARM-Thumb | 2 | |
ARM64 | 4 | 4096-byte alignment is best |
PowerPC | 4 | Big endian only |
IA-64 | 16 | Itanium |
SPARC | 4 |
The robot mode is activated with the --robot option. It makes the output of xz easier to parse by other programs. Currently --robot is supported only together with --version, --info-memory, and --list. It will be supported for compression and decompression in the future.
xz --robot --version will print the version number of xz and liblzma in the following format:
XZ_VERSION=XYYYZZZS
LIBLZMA_VERSION=XYYYZZZS
XYYYZZZS are the same on both lines if xz and liblzma are from the same XZ Utils release.
Examples: 4.999.9beta is 49990091 and 5.0.0 is 50000002.
xz --robot --info-memory prints a single line with three tab-separated columns:
In the future, the output of xz --robot --info-memory may have more columns, but never more than a single line.
xz --robot --list uses tab-separated output. The first column of every line has a string that indicates the type of the information found on that line:
The columns of the file lines:
The columns of the stream lines:
The columns of the block lines:
If --verbose was specified twice, additional columns are included on the block lines. These are not displayed with a single --verbose, because getting this information requires many seeks and can thus be slow:
The columns of the summary lines:
Since xz 5.1.2alpha:
The columns of the totals line:
If --verbose was specified twice, additional columns are included on the totals line:
Since xz 5.1.2alpha:
Future versions may add new line types and new columns can be added to the existing line types, but the existing columns won't be changed.
Notices (not warnings or errors) printed on standard error don't affect the exit status.
xz parses space-separated lists of options from the environment variables XZ_DEFAULTS and XZ_OPT, in this order, before parsing the options from the command line. Note that only options are parsed from the environment variables; all non-options are silently ignored. Parsing is done with getopt_long(3) which is used also for the command line arguments.
XZ_OPT=-2v tar caf foo.tar.xz foo
XZ_OPT=${XZ_OPT-"-7e"} export XZ_OPT
The command line syntax of xz is practically a superset of lzma, unlzma, and lzcat as found from LZMA Utils 4.32.x. In most cases, it is possible to replace LZMA Utils with XZ Utils without breaking existing scripts. There are some incompatibilities though, which may sometimes cause problems.
The numbering of the compression level presets is not identical in xz and LZMA Utils. The most important difference is how dictionary sizes are mapped to different presets. Dictionary size is roughly equal to the decompressor memory usage.
Level | xz | LZMA Utils |
-0 | 256 KiB | N/A |
-1 | 1 MiB | 64 KiB |
-2 | 2 MiB | 1 MiB |
-3 | 4 MiB | 512 KiB |
-4 | 4 MiB | 1 MiB |
-5 | 8 MiB | 2 MiB |
-6 | 8 MiB | 4 MiB |
-7 | 16 MiB | 8 MiB |
-8 | 32 MiB | 16 MiB |
-9 | 64 MiB | 32 MiB |
The dictionary size differences affect the compressor memory usage too, but there are some other differences between LZMA Utils and XZ Utils, which make the difference even bigger:
Level | xz | LZMA Utils 4.32.x |
-0 | 3 MiB | N/A |
-1 | 9 MiB | 2 MiB |
-2 | 17 MiB | 12 MiB |
-3 | 32 MiB | 12 MiB |
-4 | 48 MiB | 16 MiB |
-5 | 94 MiB | 26 MiB |
-6 | 94 MiB | 45 MiB |
-7 | 186 MiB | 83 MiB |
-8 | 370 MiB | 159 MiB |
-9 | 674 MiB | 311 MiB |
The default preset level in LZMA Utils is -7 while in XZ Utils it is -6, so both use an 8 MiB dictionary by default.
The uncompressed size of the file can be stored in the .lzma header. LZMA Utils does that when compressing regular files. The alternative is to mark that uncompressed size is unknown and use end-of-payload marker to indicate where the decompressor should stop. LZMA Utils uses this method when uncompressed size isn't known, which is the case, for example, in pipes.
xz supports decompressing .lzma files with or without end-of-payload marker, but all .lzma files created by xz will use end-of-payload marker and have uncompressed size marked as unknown in the .lzma header. This may be a problem in some uncommon situations. For example, a .lzma decompressor in an embedded device might work only with files that have known uncompressed size. If you hit this problem, you need to use LZMA Utils or LZMA SDK to create .lzma files with known uncompressed size.
The .lzma format allows lc values up to 8, and lp values up to 4. LZMA Utils can decompress files with any lc and lp, but always creates files with lc=3 and lp=0. Creating files with other lc and lp is possible with xz and with LZMA SDK.
The implementation of the LZMA1 filter in liblzma requires that the sum of lc and lp must not exceed 4. Thus, .lzma files, which exceed this limitation, cannot be decompressed with xz.
LZMA Utils creates only .lzma files which have a dictionary size of 2^n (a power of 2) but accepts files with any dictionary size. liblzma accepts only .lzma files which have a dictionary size of 2^n or 2^n + 2^(n-1). This is to decrease false positives when detecting .lzma files.
These limitations shouldn't be a problem in practice, since practically all .lzma files have been compressed with settings that liblzma will accept.
When decompressing, LZMA Utils silently ignore everything after the first .lzma stream. In most situations, this is a bug. This also means that LZMA Utils don't support decompressing concatenated .lzma files.
If there is data left after the first .lzma stream, xz considers the file to be corrupt unless --single-stream was used. This may break obscure scripts which have assumed that trailing garbage is ignored.
The exact compressed output produced from the same uncompressed input file may vary between XZ Utils versions even if compression options are identical. This is because the encoder can be improved (faster or better compression) without affecting the file format. The output can vary even between different builds of the same XZ Utils version, if different build options are used.
The above means that once --rsyncable has been implemented, the resulting files won't necessarily be rsyncable unless both old and new files have been compressed with the same xz version. This problem can be fixed if a part of the encoder implementation is frozen to keep rsyncable output stable across xz versions.
Embedded .xz decompressor implementations like XZ Embedded don't necessarily support files created with integrity check types other than none and crc32. Since the default is --check=crc64, you must use --check=none or --check=crc32 when creating files for embedded systems.
Outside embedded systems, all .xz format decompressors support all the check types, or at least are able to decompress the file without verifying the integrity check if the particular check is not supported.
XZ Embedded supports BCJ filters, but only with the default start offset.
Compress the file foo into foo.xz using the default compression level (-6), and remove foo if compression is successful:
xz foo
Decompress bar.xz into bar and don't remove bar.xz even if decompression is successful:
xz -dk bar.xz
Create baz.tar.xz with the preset -4e (-4 --extreme), which is slower than the default -6, but needs less memory for compression and decompression (48 MiB and 5 MiB, respectively):
tar cf - baz | xz -4e > baz.tar.xz
A mix of compressed and uncompressed files can be decompressed to standard output with a single command:
xz -dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt
On GNU and *BSD, find(1) and xargs(1) can be used to parallelize compression of many files:
find . -type f \! -name '*.xz' -print0 \
| xargs -0r -P4 -n16 xz -T1
The -P option to xargs(1) sets the number of parallel xz processes. The best value for the -n option depends on how many files there are to be compressed. If there are only a couple of files, the value should probably be 1; with tens of thousands of files, 100 or even more may be appropriate to reduce the number of xz processes that xargs(1) will eventually create.
The option -T1 for xz is there to force it to single-threaded mode, because xargs(1) is used to control the amount of parallelization.
Calculate how many bytes have been saved in total after compressing multiple files:
xz --robot --list *.xz | awk '/^totals/{print $5-$4}'
A script may want to know that it is using new enough xz. The following sh(1) script checks that the version number of the xz tool is at least 5.0.0. This method is compatible with old beta versions, which didn't support the --robot option:
if ! eval "$(xz --robot --version 2> /dev/null)" ||
[ "$XZ_VERSION" -lt 50000002 ]; then
echo "Your xz is too old." fi unset XZ_VERSION LIBLZMA_VERSION
Set a memory usage limit for decompression using XZ_OPT, but if a limit has already been set, don't increase it:
NEWLIM=$((123 << 20)) # 123 MiB OLDLIM=$(xz --robot --info-memory | cut -f3) if [ $OLDLIM -eq 0 -o $OLDLIM -gt $NEWLIM ]; then
XZ_OPT="$XZ_OPT --memlimit-decompress=$NEWLIM"
export XZ_OPT fi
The simplest use for custom filter chains is customizing a LZMA2 preset. This can be useful, because the presets cover only a subset of the potentially useful combinations of compression settings.
The CompCPU columns of the tables from the descriptions of the options -0 ... -9 and --extreme are useful when customizing LZMA2 presets. Here are the relevant parts collected from those two tables:
Preset | CompCPU |
-0 | 0 |
-1 | 1 |
-2 | 2 |
-3 | 3 |
-4 | 4 |
-5 | 5 |
-6 | 6 |
-5e | 7 |
-6e | 8 |
If you know that a file requires somewhat big dictionary (for example, 32 MiB) to compress well, but you want to compress it quicker than xz -8 would do, a preset with a low CompCPU value (for example, 1) can be modified to use a bigger dictionary:
xz --lzma2=preset=1,dict=32MiB foo.tar
With certain files, the above command may be faster than xz -6 while compressing significantly better. However, it must be emphasized that only some files benefit from a big dictionary while keeping the CompCPU value low. The most obvious situation, where a big dictionary can help a lot, is an archive containing very similar files of at least a few megabytes each. The dictionary size has to be significantly bigger than any individual file to allow LZMA2 to take full advantage of the similarities between consecutive files.
If very high compressor and decompressor memory usage is fine, and the file being compressed is at least several hundred megabytes, it may be useful to use an even bigger dictionary than the 64 MiB that xz -9 would use:
xz -vv --lzma2=dict=192MiB big_foo.tar
Using -vv (--verbose --verbose) like in the above example can be useful to see the memory requirements of the compressor and decompressor. Remember that using a dictionary bigger than the size of the uncompressed file is waste of memory, so the above command isn't useful for small files.
Sometimes the compression time doesn't matter, but the decompressor memory usage has to be kept low, for example, to make it possible to decompress the file on an embedded system. The following command uses -6e (-6 --extreme) as a base and sets the dictionary to only 64 KiB. The resulting file can be decompressed with XZ Embedded (that's why there is --check=crc32) using about 100 KiB of memory.
xz --check=crc32 --lzma2=preset=6e,dict=64KiB foo
If you want to squeeze out as many bytes as possible, adjusting the number of literal context bits (lc) and number of position bits (pb) can sometimes help. Adjusting the number of literal position bits (lp) might help too, but usually lc and pb are more important. For example, a source code archive contains mostly US-ASCII text, so something like the following might give slightly (like 0.1 %) smaller file than xz -6e (try also without lc=4):
xz --lzma2=preset=6e,pb=0,lc=4 source_code.tar
Using another filter together with LZMA2 can improve compression with certain file types. For example, to compress a x86-32 or x86-64 shared library using the x86 BCJ filter:
xz --x86 --lzma2 libfoo.so
Note that the order of the filter options is significant. If --x86 is specified after --lzma2, xz will give an error, because there cannot be any filter after LZMA2, and also because the x86 BCJ filter cannot be used as the last filter in the chain.
The Delta filter together with LZMA2 can give good results with bitmap images. It should usually beat PNG, which has a few more advanced filters than simple delta but uses Deflate for the actual compression.
The image has to be saved in uncompressed format, for example, as uncompressed TIFF. The distance parameter of the Delta filter is set to match the number of bytes per pixel in the image. For example, 24-bit RGB bitmap needs dist=3, and it is also good to pass pb=0 to LZMA2 to accommodate the three-byte alignment:
xz --delta=dist=3 --lzma2=pb=0 foo.tiff
If multiple images have been put into a single archive (for example, .tar), the Delta filter will work on that too as long as all images have the same number of bytes per pixel.
xzdec(1), xzdiff(1), xzgrep(1), xzless(1), xzmore(1), gzip(1), bzip2(1), 7z(1)
XZ Utils: <https://tukaani.org/xz/>
XZ Embedded: <https://tukaani.org/xz/embedded.html>
LZMA SDK: <http://7-zip.org/sdk.html>
2022-12-01 | Tukaani |