FINDIMAGEDUPES(1p) | User Contributed Perl Documentation | FINDIMAGEDUPES(1p) |
findimagedupes - Finds visually similar or duplicate images
findimagedupes [option ...] [--] [ - | [file ...] ]
Options: -f, --fingerprints=FILE -c, --collection=FILE -M, --merge=FILE -p, --program=PROGRAM -P, --prune -s, --script=FILE -a, --add -i, --include=TEXT -r, --rescan -I, --include-file=FILE -n, --no-compare -q, --quiet -t, --threshold=AMOUNT -v, --verbosity=LIST -0, --null -h, --help -R, --recurse --man
With no options, compares the specified files and does not use nor update any fingerprint database.
Directories of images may be specified instead of individual files; Sub-directories of these are not searched unless --recurse is used.
Without -0, the list is specified one file per line, such as produced by find(1) with its "-print" option.
With -0, the list is expected to be null-delimited, such as produced by find(1) with its "-print0" option.
Matches are also sought in any fingerprint databases specified.
The program attempts to produce well-formed collections. In particular, it will print a warning and exclude any file whose name contains newline or doublequote. (In this situation, gqview(1) seems to create a .gqv collection file that it silently fails to read back in properly.)
May be abbreviated as --fp or --db.
This option may be given multiple times when --merge is used. (Note: FILE could contain commas, so multiple databases may not be specified as a single comma-delimited list.)
It is executed after any code included using --include-file.
May be given multiple times. Code will be concatenated.
It is executed before any code included using --include.
By default, image existence is not checked unless there is a conflict. To force removal of defunct data, use --prune as well.
A list of image files is not required if this option is used. However, if a list is provided, fingerprint data for the files will be copied or (re)generated as appropriate.
When --merge is used, the original fingerprint databases are not modified, even if --prune is used.
See also: --rescan
PROGRAM must be the full path to an existing executable file. For more flexibility, see the --include and --include-file options.
See also: --script
Databases specified by --fingerprints are only modified if --merge is not used.
Usually, progress, warning and error messages are printed on stderr. If this option is given, warnings are not displayed. If it is given twice or more, errors are not displayed either.
Information requested with --verbosity is still displayed.
If used with --add, only the fingerprints of files specified on the commandline are (re)generated.
Implies --prune.
When used without --program, two skeletal shell functions are generated: "VIEW" simply echo(1)s its arguments; the empty function "END" runs after files-processing is finished.
To display to terminal (or feed into a pipe), use "-" as FILE.
If --script is not given, the script is still created in memory and is executed immediately. So, with the default VIEW and END functions, lines containing sets of duplicates are displayed. See: EXAMPLES
See also: --include, --include-file
A fractional part may be given but it is only accurate to 100/256 (0.390625) for percentage and it is meaningless for "bits". Default is "90%" ("25b") if not specified.
Alternatively, --verbosity may be given multiple times, and accumulates. Note that this may not be sensible. For example, to be useful, md5 output probably should not be merged with fingerprint data.
findimagedupes compares a list of files for visual similarity.
1) Read image. 2) Resample to 160x160 to standardize size. 3) Grayscale by reducing saturation. 4) Blur a lot to get rid of noise. 5) Normalize to spread out intensity as much as possible. 6) Equalize to make image as contrasty as possible. 7) Resample again down to 16x16. 8) Reduce to 1bpp. 9) The fingerprint is this raw image data.
1) Take fingerprint pairs and xor them. 2) Compute the percentage of 1 bits in the result. 3) If percentage exceeds threshold, declare files to be similar.
Any other return values indicate an internal error of some sort.
To be written.
To be written.
There is a memory leak somewhere.
Killing the programme may corrupt the fingerprint database(s).
Changing version of GraphicsMagick invalidates fingerprint databases.
Directory recursion is deliberately not implemented: Composing a file-list and using it with "-" is a more flexible approach.
Repetitions are culled before comparisons take place, so a commandline like "findimagedupes a.jpg a.jpg" will not produce a match.
The program needs a lot of memory. Probably not an issue, unless your machine has less than 128MB of free RAM and you try to compare more than a hundred-thousand files at once (and the program will run quite slowly with that many files anyway---about eight hours initially to generate fingerprints and another ten minutes to do the actual comparing).
gqview - GTK based multiformat image viewer
gthumb - an image viewer and browser for GNOME
Jonathan H N Chin <code@jhnc.org>
Copyright 2006-2008 by Jonathan H N Chin <code@jhnc.org>. This program is free software; you may redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
This code has been written from scratch. However it owes its existence to findimagedupes by Rob Kudla and uses the same duplicate-detection algorithm.
Hey! The above document had some coding errors, which are explained below:
2019-01-12 | perl v5.28.1 |