unpaper [options] (input patterns output
patterns | input files output files)
unpaper is a post-processing tool for scanned sheets of paper,
especially for book pages that have been scanned from previously created
photocopies. The main purpose is to make scanned book pages better readable
on screen after conversion to PDF. Additionally, unpaper might be useful to
enhance the quality of scanned pages before performing optical character
recognition (OCR).
unpaper tries to clean scanned images by removing dark edges that
appeared through scanning or copying on areas outside the actual page
content (e.g. dark areas between the left-hand-side and the right-hand-side
of a double- sided book-page scan). The program also tries to detect
misaligned centering and rotation of pages and will automatically straighten
each page by rotating it to the correct angle. This process is called
"deskewing". Note that the automatic processing will sometimes
fail. It is always a good idea to manually control the results of unpaper
and adjust the parameter settings according to the requirements of the
input. Each processing step can also be disabled individually for each
sheet.
Input and output files can be in either .pbm, .pgm
or .ppm format, thus generally in .pnm format, as also used by
the Linux scanning tools scanimage and scanadf. Conversion to
PDF can e.g. be achieved with the Linux tools pgm2tiff, tiffcp
and tiff2pdf.
Input and output files need to be designed either by using
patterns or an ordered list of input and output files; if patterns are used,
such as %04d, then they are substituted for the input and output
sheet number before opening the file for input or output.
If you're not using patterns, then the program expects one or two
input files depending on what is passed as --input-pages and one or
two output files depending on what is passed as --output-pages, in
order.
Missing output file names are fatal and will stop processing;
missing initial input file names are fatal, and so is any missing input file
if a range of sheets is defined through --sheet or
--end-sheet.
unpaper accepts files in PNM format, which means they might
be in .pbm, .pgm, .ppm or .pnm format, which is
what is produced by Linux command line scanning tools such as
scanimage and scanadf.
- -end sheet ; --end-sheet
sheet
- Number of last sheet to process in multi-sheet mode. -1 indicates
processing until no more input file with the corresponding page number is
available (default: -1)
- -# sheet-range ; --sheet sheet-range
- Optionally specifies which sheets to process in the range between
start-sheet and end-sheet.
- --post-mirror { v | h |
v,h }
- Mirror the image, after any other processing except possible
post-rotation. Either v (for vertical mirroring), h (for
horizontal mirroring) or v,h (for both) can be specified.
- --pre-shift h,
v
- Shift the image before further processing. Values for h (horizontal shift)
and v (vertical shift) can either be positive or negative.
- --post-shift h,
v
- Shift the image after other processing. Values for h (horizontal shift)
and v (vertical shift) can either be positive or negative.
- --pre-wipe left, top,
right, bottom
- Manually wipe out an area before further processing. Any pixel in a wiped
area will be set to white. Multiple areas to be wiped may be specified by
multiple occurrences of this options.
- --post-wipe left, top,
right, bottom
- Manually wipe out an area after processing. Any pixel in a wiped area will
be set to white. Multiple areas to be wiped may be specified by multiple
occurrences of this options.
- --pre-mask x1, y1, x2,
y2
- Specify masks to apply before any other processing. Any pixel outside a
mask will be set to white, unless another mask includes this pixel.
Only pixels inside a mask will remain. Multiple masks may be
specified. No deskewing will be applied to the masks specified by
--pre-mask.
- -s { width, height | size-name }
; --size { width, height | size-name }
- Change the sheet size before other processing is applied. Content on the
sheet gets zoomed to fit to the appropriate size, but the aspect ratio is
preserved. Instead, if the sheet's aspect ratio changes, the zoomed
content gets centered on the sheet.
Possible values for size-name are: a5, a4,
a3, letter, legal. All size names can also be
applied in rotated landscape orientation, use a4-landscape,
letter-landscape etc.
- --post-zoom
factor
- Change the sheet size according to the given factor after processing is
done.
- -bn { v | h | v, h } ;
--blackfilter-scan-direction { v | h | v, h }
- Directions in which to search for solidly black areas. Either v
(for vertical searching), h (for horizontal searching) or
v,h (for both) can be specified. The blackfilter works by moving a
virtual bar across each page. The darkness inside the virtual bar is
determined and if it exceeds blackfilter-scan-threshold black
pixels in the area are filled. During filling the blackness of each pixel
is determined by black-threshold. The bar is then moved by
blackfilter-scan-step in the scanning direction. Once a page border
is encountered the bar is moved down (horizontal scan) or right (vertical
scan) by its blackfilter-scan-size.
- -bi intensity ;
--blackfilter-intensity intensity
- Intensity with which to delete black areas. This deletes pixels around the
virtual scan bar. Larger values will leave less noise-pixels around former
black areas, but may delete page content. (default: 20)
- -li ratio ;
--blurfilter-intensity ratio
- Relative intensity with which to delete tiny clusters of pixels. Any
blurred area which contains at most the ratio of dark pixels will be
cleared. (default: 0.01)
- -p x, y; --mask-scan-point x,
y
- Manually set starting point for mask-detection. Multiple
--mask-scan-point options may be specified to detect multiple
masks.
- -m x1, y1, x2, y2; --mask x1, y1,
x2, y2
- Manually add a mask, in addition to masks automatically detected around
the --mask-scan-point coordinates (unless --no-mask-scan is
specified).
Any pixel outside a mask will be set to white, unless another
mask covers this pixel.
- -mn { v \| h \| v,h };
--mask-scan-direction { v \| h \| v,h }
- Directions in which to search for mask borders, starting from
--mask-scan-point coordinates. Either v (for vertical mirroring),
h (for horizontal mirroring) or v,h (for both) can be
specified. (default: h, as v may cut text- paragraphs on
single-page sheets)
- -mm w, h; --mask-scan-minimum
w, h
- Minimum allowed size of an auto-detected mask. Masks detected below this
size will be ignored and set to the size specified by mask-scan-maximum.
(default: 100,100)
- -mM w, h; --mask-scan-maximum
w, h
- Maximum allowed size of an auto-detected mask. Masks detected above this
size will be shrunk to the maximum value, each direction individually.
(default: sheet size, or page size derived from --layout
option)
- -mc color; --mask-color
color
- Color value with which to wipe out pixels not covered by any mask. Maybe
useful for testing in order to visualize the effect of masking. (Note that
an RGB-value is expected: R*65536 + G*256 + B.)
- -dn { left \| top \| right \|
bottom },...; --deskew-scan-direction { left \| top \| right \| bottom
},...
- Edges from which to scan for rotation. Each edge of a mask can be used to
detect the mask's rotation. If multiple edges are specified, the average
value will be used, unless the statistical deviation exceeds
--deskew-scan-deviation. Use left for scanning from the left
edge, top for scanning from the top edge, right for scanning
from the right edge, bottom for scanning from the bottom. Multiple
directions can be separated by commas. (default: left,right)
- -b threshold; --black-threshold
threshold
- Brightness ratio below which a pixel is considered black (non-gray). This
is used by the gray-filter and the blackfilter. This value is also used
when converting a grayscale image to black-and-white mode (default:
0.33)
- -ip { 1 \| 2 }; --input-pages {
1 \| 2 }
- If 2 is specified, read two input images instead of one and
internally combine them to a doubled-layout sheet before further
processing. Before internally combining, --pre-rotation is
optionally applied individually to both input images as the very first
processing steps.
- -op { 1 \| 2 }; --output-pages
{ 1 \| 2 }
- If 2 is specified, write two output images instead of one, as a
result of splitting a doubled-layout sheet after processing. After
splitting the sheet, --post-rotation is optionally applied
individually to both output images as the very last processing step.
- -S { width, height \| size-name
}; --sheet-size { width, height \| size-name }
- Force a fix sheet size. Usually, the sheet size is determined by the input
image size (if input-pages=1), or by the double size of the first
page in a two-page input set (if input-pages=2). If the input image
is smaller than the size specified here, it will appear centered and
surrounded with a white border on the sheet. If the input image is bigger,
it will be centered and the edges will be cropped. This option may also be
helpful to get regular sized output images if the input image sizes
differ. Standard size-names like a4-landscape, letter, etc.
may be used (see --size). (default: as in input file)
- --sheet-background {
black \| white }
- Sets a color with which the sheet is filled before any image is loaded and
placed onto it. This can be useful when the sheet size and the image size
differ.
- --no-mask-scan
sheet-range
- Disables mask-detection. Masks explicitly set by --mask will still
have effect. Individual sheet indices can be specified.
- --no-mask-center
sheet-range
- Disables auto-centering of each mask. Auto-centering is performed by
default if the --layout option has been set. Individual sheet
indices can be specified.
- --no-wipe
sheet-range
- Disables explicit wipe-areas. This means the effect of parameter
--wipe can be disabled individually per sheet.
- --no-border
sheet-range
- Disables explicitly set borders. This means the effect of parameter
--border can be disabled individually per sheet.
- --no-border-align
sheet-range
- Disables aligning of the area detected by border-scanning (see
--border-align). Individual sheet indices can be specified.
- -n sheet-range; --no-processing
sheet-range
- Do not perform any processing on a sheet except pre/post rotating and
mirroring, and file-depth conversions on saving. This option has the same
effect as setting all --no-xxx options together. Individual sheet
indices can be specified.
- --no-multi-pages
- Disable multi-page processing even if the input filename contains a
% (usually indicating the start of a placeholder for the page
counter).
- --dpi dpi
- Dots per inch used for conversion of measured size values, like e.g.
21cm,27.9cm. Mind that this parameter should occur before
specifying any size value with measurement suffix. (default:
300)
- -t { pbm \| pgm \| ppm }; --type
{ pbm \| pgm> \| ppm }
- Output file type (and bit depth). If not specified, the one with the same,
or closest, pixel format as the original input files will be used.
- pbm
- Portable Bit Map, monochrome raw image.
- pgm
- Portable Grayscale Map, 8-bit per pixel grayscale raw image.
- ppm
- Portable Pixel Map, 24-bit per pixel RGB raw image.
- -T ; --test-only
- Do not write any output. May be useful in combination with
--verbose to get information about the input.
- -si nr; --start-input
nr
- Set the first page number to substitute for '%d' in input filenames. Every
time the input file sequence is repeated, this number gets increased by 1.
(default: (startsheet-1)*inputpages+1)
- -so nr; --start-output
nr
- Set the first page number to substitute for '%d' in output filenames.
Every time the output file sequence is repeated, this number gets
increased by 1. (default: (startsheet-1)*outputpages+1)
- --insert-blank nr
[,nr...]
- Use blank input instead of an input file from the input file sequence at
the specified index-positions. The input file sequence will be interrupted
temporarily and will continue with the next input file afterwards. This
can be useful to insert blank content into a sequence of input
images.
- --replace-blank nr
[,nr...]
- Like --insert-blank, but the input images at the specified index
positions get replaced with blank content and thus will be ignored.
- --overwrite
- Allow overwriting existing files. Otherwise the program terminates with an
error if an output file to be written already exists.
- -vv
- Even more verbose output, show parameter settings before processing.
2022, The unpaper Authors