pdf2djvu - creates DjVu files from PDF files
pdf2djvu
[{-o | --output} output-djvu-file]
[option...] pdf-file...
pdf2djvu
{-i | --indirect} index-djvu-file
[option...] pdf-file...
pdf2djvu {--version | --help |
-h}
This program creates a DjVu file from one or more Portable
Document Format files.
pdf2djvu accepts the following options:
-o, --output=output-djvu-file
Generate a bundled multi-page document. Write the file
into output-djvu-file instead of standard output.
-i, --indirect=index-djvu-file
Generate an indirect multi-page document. Use
index-djvu-file as the index file name; put the component files into
the same directory. The directory must exist and be writable.
--page-id-template=template
Specifies the naming scheme for page identifiers. Consult
the “TEMPLATE LANGUAGE” section for the template language
description.
The default template is “p{page:04*}.djvu”.
For portability reasons, page identifiers:
•must consist only of lowercase ASCII letters,
digits, _, +, - and dot,
•cannot start with a +, - or a dot,
•cannot contain two consecutive dots,
•must end with the .djvu or the .djv
extension.
--page-id-prefix=prefix
Equivalent to
“--page-id-template=prefix{page:04*}.djvu”.
--page-title-template=template
Specifies the template for page titles. Consult the
“TEMPLATE LANGUAGE” section for the template language
description.
The default template is “{label}”.
--no-page-titles
Don't set page titles.
-d, --dpi=resolution
Specifies the desired resolution to resolution
dots per inch. The default is 300 dpi. The allowed range is: 72 ≤
resolution ≤ 6000.
--media-box
Use MediaBox to determine page size. CropBox is used by
default.
--page-size=widthxheight
Specifies the preferred page size to width pixels
× height pixels. The actual page size may be altered in order to
respect aspect ratio and DjVu limitations on resolution. (This option takes
precedence over -d/--dpi.)
--guess-dpi
Try to guess native resolution by inspecting embedded
images. Use with care.
--bg-slices=n+...+n,
--bg-slices=n,...,n
Specifies the encoding quality of the IW44 background
layer. This option is similar to the
-slice option of
c44.
Consult the
c44(1) manual page for details. The default is
72+11+10+10.
--bg-subsample=n
Specifies the background subsampling ratio. The default
is 3. Valid values are integers between 1 and 12, inclusive.
--fg-colors=default
Try to preserve all the foreground layer colors. This is
the default.
--fg-colors=web
Reduce foreground layer colors to the web palette (216
colors). This option is not recommended.
--fg-colors=n
Use GraphicsMagick to reduce number of distinct colors in
the foreground layer to n. Valid values are integers between 1 and
4080. This option is not recommended.
--fg-colors=black
Discard any color information from the foreground
layer.
--monochrome
Render pages as monochrome bitmaps. With this option,
--bg-... and --fg-... options are
not respected.
--loss-level=n
Specifies the aggressiveness of the lossy compression.
The default is 0 (lossless). Valid values are integers between 0 and 200,
inclusive. This option is similar to the
-losslevel option of
cjb2; consult the
cjb2(1) manual page for details. This option
can be used only if the
--monochrome option is also enabled.
--lossy
Synonym for --loss-level=100.
--anti-alias
Enable font and vector anti-aliasing. This option is not
recommended.
--no-metadata
Don't extract the metadata.
By default:
•The following entries of the document information
dictionary are extracted: Title, Author, Subject, Creator, Producer,
CreationDate, ModDate. Timestamps are formatted according to RFC
3999[1], with date and time components separated by a single space.
•The XMP metadata is extracted (or created) and
updated accordingly.
Note
If multiple input documents are specified, only metadata of the first one is
taken into account.
--verbatim-metadata
Keep the original metadata intact.
--no-outline
Don't extract the document outline.
--hyperlinks=border-avis
Make hyperlink borders always visible.
By default, a hyperlink border is visible only when the mouse is
over the hyperlink.
--hyperlinks=#RRGGBB
Force the specified border color for hyperlinks.
--no-hyperlinks, --hyperlinks=none
Don't extract hyperlinks.
--no-text
Don't extract the text.
--words
Extract the text. Record the location of every word. This
is the default.
--lines
Extract the text. Record the location of every line,
rather that every word.
--crop-text
Extract no text outside the page boundary.
--no-nfkc
Do not apply
NFKC[2] normalization on the text,
except for characters from the
Alphabetic Presentation Forms block[3]
(U+FB00–U+FB4F), which are normalized unconditionally.
The default is to apply NFKC normalization on all characters.
--filter-text=command-line
Filter the text through the
command-line. The
provided filter must preserve whitespace, control characters and decimal
digits.
This option implies --no-nfkc.
-p, --pages=page-range
Specifies pages to convert.
page-range is a
comma-separated list of sub-ranges. Each sub-range is either a single page
(e.g. 17) or a contiguous range of pages (e.g. 37-42). Duplicate
page numbers are not allowed. Pages are numbered from 1.
The default is to convert all pages.
-j, --jobs=n
Use n threads to perform conversion. The default
is to use one thread.
-j0, --jobs=0
Determine automatically how many threads to use to
perform conversion.
-v, --verbose
Display more informational messages while converting the
file.
-q, --quiet
Don't display informational messages while converting the
file.
--version
Output version information and exit.
-h, --help
Display help and exit.
The following environment variables affects pdf2djvu on
Unix systems:
OMP_*
Details of runtime behavior with respect to parallelism
can be controlled by several environment variables. Please refer to the
OpenMP API specification[4] for details.
TMPDIR
pdf2djvu makes heavy use of temporary files. It
will store them in a directory specified by this variable. The default is
/tmp.
The template language is roughly modeled on the Python string
formatting syntax[5].
A template is a piece of text which contains fields, surrounded by
curly braces {}. Fields are replaced with appropriately formatted values
when the template is evaluated. Moreover, {{ is replaced with a single { and
}} is replaced with a single }.
Each field consists of a variable name, optionally followed by a
shift, optionally followed by a format specification.
The shift is a signed (i.e. starting with a + or - character)
integer.
The format specification consists of a colon, followed by a width
specification.
The width specification is a decimal integer defining the minimum
field width. If not specified, then the field width will be determined by
the content. Preceding the width specification with a zero (0) character
enables zero-padding.
The width specification is optionally followed by an asterisk (*)
character, which increases the minimum field width to the width of the
longest possible content of the variable.
dpage
Page number in the DjVu document.
page, spage
Page number in the PDF document.
label
Page label (logical page number) in the PDF document.
This variable is available only for page titles.
Unless the --monochrome option is on, pdf2djvu uses the
following naive layer separation algorithm:
1.For each page, do the following:
1.Rasterize the page into a pixmap, in the usual
manner.
2.Rasterize the page into another pixmap, omitting the
following page elements:
•text,
•1 bit-per-pixel raster images,
•vector elements (except fills of large
areas).
3.Compare both pixmaps, pixel by pixel:
1.If their colors match, classify the pixel as a part of
the background layer.
2.Otherwise, classify the pixel as a part of the
foreground layer.
If you find a bug in pdf2djvu, please report it at the issue
tracker[6] or to the mailing list[7].
- 1.
- RFC 3999
https://www.ietf.org/rfc/rfc3339
- 2.
- NFKC
https://unicode.org/reports/tr15/
- 3.
- Alphabetic Presentation Forms block
https://unicode.org/charts/PDF/UFB00.pdf
- 4.
- OpenMP API specification
https://www.openmp.org/specifications/
- 5.
- Python string formatting syntax
https://docs.python.org/2/library/string.html#format-string-syntax
- 6.
- the issue tracker
https://github.com/jwilk/pdf2djvu/issues
- 7.
- the mailing list
https://groups.io/g/pdf2djvu