DOKK / manpages / debian 10 / conv-tools / dirconv.1.en
DIRCONV(1) General Commands Manual DIRCONV(1)

dirconvlocate and transcode mixed-encoding file names

dirconv [-078dFhnpruvw] [-f charset] [-x regex] [path ...]

The dirconv utility recursively scans the specified path(s) and classifies files and directories according to whether their names are pure 7-bit ASCII, non-ASCII but valid UTF-8, double-UTF-8 (WTF-8), or neither.

Names in the latter category are assumed to be Latin-1, unless a different encoding is specified with the -f option.

By default, the dirconv utility then prints the names that are neither pure 7-bit ASCII nor valid UTF-8.

The following options are available:

Print a NUL character rather than a newline after each path. This option has no effect if the -n option was also specified.
Select names that are pure 7-bit ASCII.
Select names that contain non-ASCII characters but are not valid UTF-8. This is the default unless the -7, -u and / or -w options are specified.
Show debugging information. This option can be specified multiple times to increase the level of detail.
In conjunction with the -r option, force renaming a file when the target already exists.
charset
Specify the assumed character set for non-ASCII, non-UTF-8 names. The default is “iso8859-1”.
Print a usage message and exit.
In conjunction with the -r option, show what would have happened, but do not actually rename any files.
Print the selected names.
Attempt to convert the selected names to UTF-8 and rename the files and directories.
Select names which contain non-ASCII characters and are valid UTF-8 but not WTF-8.
Print the source reversion number and exit.
Select names which seem to be WTF-8-encoded.
regex
Do not inspect files and directories whose unconverted names match the specified POSIX extended regular expression.

iconv(1), regex(3).

The dirconv utility and this manual page were written by Dag-Erling Smørgrav ⟨des@des.no⟩ for the University of Oslo.

The dirconv utility works by attempting to decode each name as if it were a sequence of UTF-8 characters. It is possible, but highly unlikely, that a random string of characters in a non-UTF single-byte encoding would look like a valid UTF-8 sequence.

Reliable detection of WTF-8 is only possible if the original 8-bit encoding is known.

The exclusion filter is applied name conversion. Character classes are unlikely to work as expected on unconverted names.

November 18, 2014