djvused - Multi-purpose DjVu document editor.
djvused [options] djvufile
Program djvused is a powerful command line tool for
manipulating multi-page documents, creating or editing annotation chunks,
creating or editing hidden text layers, pre-computing thumbnail images, and
more. The program first reads the DjVu document djvufile and executes
a number of djvused commands.
Djvused commands can be read from a specific file (when option
-f is specified), read from the command line (when option -e
is specified), or read from the standard input (the default).
- -v
- Cause djvused to print a command line prompt before reading
commands and a brief message describing how each command was executed.
This option is very useful for debugging djvused scripts and also for
interactively entering djvused commands on the standard input.
- -f scriptfile
- Cause djvused to read commands from file scriptfile.
- -e command
- Cause djvused to execute the commands specified by the option
argument commands. It is advisable to surround the djvused commands
by single quotes in order to prevent unwanted shell expansion.
- -s
- Cause djvused to save the file djvufile after executing the
specified commands. This is similar to executing command save
immediately before terminating the program.
- -u
- Cause djvused to print hidden text and annotations as UTF-8 instead
of encoding non-ASCII characters with octal escape sequences for maximal
portability. This option is convenient for manually editing or viewing the
djvused output. This option also causes the emission of an UTF-8 BOM under
Windows.
- -n
- Cause djvused to disregard save commands. This is useful for
debugging djvused scripts without overwriting files on your disk.
There are many ways to use program djvused. The following
examples illustrate some common uses of this program.
Command size outputs the width and height of the selected
pages using a HTML friendly syntax. For instance, the
following command prints the size of page 3 of document
myfile.djvu.
-
- djvused myfile.djvu -e 'select 3; size'
Command print-pure-txt outputs the text associated with a
page or a document. For instance, the following shell command outputs the
text for the entire document. Lines and pages are delimited by the usual
control characters.
-
- djvused myfile.djvu -e 'print-pure-txt'
Command print-txt produces a more extensive output
describing the structure and the location of the text components. The syntax
of this output is described later in this man page. For instance, the
following shell command outputs extended text information for page 3
of document myfile.djvu.
-
- djvused myfile.djvu -e 'select 3; print-txt'
Annotation data can be extracted using command print-ant.
The syntax of the annotation data is described later in this man page. For
instance, the following shell command outputs the annotation data for the
first page of document myfile.djvu.
-
- djvused myfile.djvu -e 'select 1; print-ant'
Command print-ant only prints the annotations stored in the
selected component file. Command print-merged-ant also retrieves
annotations from all the component files referenced by the current page
(using INCL chunks) and prints the merged information.
Three commands, output-txt, output-ant, and
output-all, produce djvused scripts. For instance, the following
shell command produces a djvused script, myfile.dsed, that recreates
all the text and annotation data in document myfile.djvu.
-
- djvused myfile.djvu -e 'output-all' >
myfile.dsed
Script myfile.dsed is a text file that can be easily
edited. The following shell command then recreates the text and annotation
information in file myfile.djvu.
-
- djvused myfile.djvu -f myfile.dsed -s
Both commands save-page and save-page-with create a
DjVu file representing the selected component file of a document. The
following shell command, for instance, creates a file p05.djvu
containing page 5 of document myfile.djvu.
-
- djvused myfile.djvu -e 'select 5; save-page
p05.djvu'
Each page of a document might import data from another component
file using the so-called inclusion ( INCL ) chunks. Command
save-page then produces a file with unresolved references to imported
data. Such a file should then be made part of a multi-page document
containing the required data in other component files. On the other hand,
command save-page-with copies all the imported data into the output
file. This file is directly usable. Yet collecting several such files into a
multi-page document might lead to useless data replication.
Commands set-thumbnails constructs thumbnails that can be
later displayed by DjVu viewers. The following shell command, for instance,
computes thumbnails of size 64x64 pixels for all pages of file
myfile.djvu.
-
- djvused myfile.djvu -e 'set-thumbnails 64' -s
Command lines might contain zero, one, or more djvused commands
and an optional comment. Multiple djvused commands must be separated by a
semicolon character ';'. Comments are introduced by the '#' character and
extend until the end of the command line.
Multi-page DjVu documents are composed of a number of component
files. Most component files describe a specific page of a document. Some
component files contain information shared by several pages such as shared
image data, shared annotations or thumbnails. Many djvused commands operate
on selected component files. All component files are initially selected. The
following commands are useful for changing the selection.
- n
- Print the total number of pages in the document.
- ls
- List all component files in the document. Each line contains an optional
page number, a letter describing the component file type, the size of the
component file, and identifier of the component file. Component file type
letters P, I, A, and T respectively stand for
page data, shared image data, shared annotation data, and thumbnail data.
Page numbers are only listed for component files containing page data.
When it is set, the optional page title (see command set-page-title
below) is displayed after the component file identifier.
- select
[fileid]
- Select the component file identified by argument fileid. Argument
fileid must be either a page number or a component file identifier.
The select command selects all component files when the argument
fileid is omitted.
- select-shared-ant
- Select a component file containing shared annotations. Only one such
component file is supported by the current DjVu software. This component
file usually contains annotations pertaining to the whole document as
opposed to specific pages. An error message is displayed if there is no
such component file.
- create-shared-ant
- Create and select a component file containing shared annotations. This
command only selects the shared annotation component file if such a
component file already exists. Otherwise it creates a new shared
annotation component file and makes sure that it is imported by all pages
in the document.
- showsel
- Shows the currently selected component files with the same format as
command ls.
- print-pure-txt
- Print the text stored in the hidden text layer of the selected pages. A
similar capability is offered by program djvutxt. Structural
information is sometimes represented by control characters. Text from
different pages is delimited by form feed characters ("\f").
Lines are delimited by newline characters ("\n"). Columns,
regions, and paragraphs are sometimes delimited by vertical tab
("\013"), group separators ("\035") and unit
separators ("\037") respectively.
- print-txt
- Prints extensive hidden text information for the selected pages. This
information describes the structure of the text on the document page and
locates the structural elements in the page image. The syntax of this
output is described later in this man page.
- remove-txt
- Remove the hidden text information from the selected component files. For
instance, executing commands select and remove-txt removes
all hidden text information from the DjVu document.
- set-txt
[djvusedtxtfile]
- Insert hidden text information into the selected pages. The optional
argument djvusedtxtfile names a file containing the hidden text
information. This file must contain data similar to what is produced by
command print-txt. When the optional argument is omitted, the
program reads the hidden text information from the djvused script until
reaching an end-of-file or a line containing a single period.
- output-txt
- Prints a djvused script that reconstructs the hidden text information for
the selected pages. This script can later be edited and executed by
invoking program djvused with option -f.
- print-ant
- Prints the annotations of the selected component file. The annotation data
is represented using a simple syntax described later in this
document.
- print-merged-ant
- Merge the annotations stored in the selected component files with the
annotations imported from other component files such as the shared
annotation component file.. The annotation data is represented using a
simple syntax described later in this document.
- remove-ant
- Remove the annotation information from the selected component files. For
instance, executing commands select and remove-ant removes
all annotation information from the DjVu document.
- set-ant
[djvusedantfile]
- Insert annotations into the selected component file. The optional argument
djvusedantfile names a file containing the annotation data. This
file must contain data similar to what is produced by command
print-ant. When the optional argument is omitted, the program reads
the annotation data from the djvused script itself until reaching an
end-of-file or a line containing a single period.
- output-ant
- Print a djvused script that reconstructs the annotation information for
the selected pages. This script can later be edited and executed by
invoking program djvused with option -f.
- print-meta
- Print the metadata part of the annotations for the selected component
file. This command displays a subset of the information printed by command
print-ant using a different syntax. metadata are organized as
key-value pairs. Each printed line contains the key name such as
author, title,etc., followed by a tab character
("\t") and a double-quoted string representing the
UTF-8 encoded metadata value.
- remove-meta
- Remove the metadata part of the annotations of the selected component
files.
- set-meta
[djvusedmetafile]
- Set the metadata part of the annotations of the selected component file.
The remaining part of the annotations is left unchanged. The optional
argument djvusedmetafile names a file containing the metadata. This
file must contain data similar to what is produced by command
print-meta. When the optional argument is omitted, the program
reads the annotation data from the djvused script itself until reaching an
end-of-file or a line containing a single period.
- print-xmp
- Print the XMP metadata string contained in the annotation chunk of the
selected component file. This command displays in fact a subset of the
information printed by command print-ant.
- remove-xmp
- Removes the XMP tag from the annotation chunk of the selected component
file.
- set-xmp
[xmpfile]
- Set the XMP metadata part of the annotations of the selected component
file. The remaining part of the annotations is left unchanged. The
optional argument xmpfile names a file containing the XMP metadata
in a format similar to that produced by command print-xmp. When the
optional argument is omitted, the program reads the XMP annotation data
from the djvused script itself until reaching an end-of-file or a line
containing a single period.
- output-all
- Print a djvused script that reconstructs both the hidden text and the
annotation information for the selected pages. This script can later be
edited and executed by invoking program djvused with option
-f.
- print-outline
- Print the outline of the document. Nothing is printed if the document
contains no outline.
- remove-outline
- Removes the outline from the document.
- set-outline
[djvusedoutlinefile]
- Insert outline information into the document. The optional argument
djvusedoutlinefile names a file containing the outline information.
This file must contain data similar to what is produced by command
print-outline. When the optional argument is omitted, the program
reads the hidden text information from the djvused script until reaching
an end-of-file or a line containing a single period.
- set-thumbnails
sz
- Compute thumbnails of size szxsz pixels and insert them into
the document. DjVu viewers can later display these thumbnails very
efficiently without need to download the data for each page. Typical
thumbnail size range from 48 to 128 pixels.
- remove-thumbnails
- Remove the pre-computed thumbnails from the DjVu document. New thumbnails
can then be computed using command set-thumbnails.
The above commands only modify the memory image of the DjVu
document. The following commands provide means to save the modified data
into the file system.
- save
- Save the modified DjVu document back into the input file djvufile
specified by the arguments of the program djvused. Nothing is done
if the DjVu file was not modified. Passing option -s program
djvused is equivalent to executing command save before
exiting the program.
- save-bundled
filename
- Save the current DjVu document as a bundled multi-page DjVu document named
filename. A similar capability is offered by program
djvmcvt.
- save-indirect
filename
- Save the current DjVu document as an indirect multi-page DjVu document.
The index file of the indirect document will be named filename. All
other files composing the indirect document will be saved into the same
directory as the index file. A similar capability is offered by program
djvmcvt.
- save-page
filename
- Save the selected component file into DjVu file filename. The
selected component file might import data from another component file
using the so-called inclusion ( INCL ) chunks. This command
then produces a file with unresolved references to imported data. Such a
file should then be made part of a multi-page document containing the
required data in other component files.
- save-page-with
filename
- Save the selected component file into DjVu file filename. All data
imported from other component files is copied into the output file as
well. This command always produces a usable DjVu file. On the other hand,
collecting several such files into a multi-page document might lead to
useless data replication.
- help
- Display a help message listing all commands supported by
djvused.
- dump
- Display the EA IFF 85 structure of the document or of the
selected component file. A similar capability is offered by program
djvudump.
- size
- Display the width and the height of the selected pages. The dimensions of
each page are displayed using a syntax suitable for direct insertion into
the <EMBED...></EMBED> tags. This command also
displays the default page orientation when it is different from zero.
- set-rotation
[+-]rot
- Changes the default orientation of the selected pages. The orientation is
expressed as an integer in range 0..3 representing a number of 90 degree
counter-clockwise rotations. When the argument is preceded by a sign
+ or -, argument rot counts how many additional 90
degree counter-clockwise rotations should be applied to the page.
Otherwise, argument rot represents the desired absolute page
orientation. Only DjVu pages can be rotated. Pages represented as a raw
IW44 image cannot be rotated.
- set-dpi
dpi
- Sets the resolution of the page image in dots per inche. Argument
dpi should be in range 25..6000.
- set-page-title
title
- Sets a page title for the selected page. When page titles are available,
recent versions of the DjVuLibre viewers display these page titles instead
of page numbers and also accept them in page selection options. Command
ls can be used to see both the page titles and page identifiers. To
unset a page title, simply make it equal to the page identifier.
Djvused uses a simple parenthesized syntax to represent both
annotations and hidden text.
- This syntax is the native syntax used by DjVu for storing annotations.
Program djvused simply compresses the annotation data using the
bzz(1) algorithm.
- This syntax differs from the native syntax used by DjVu for storing the
hidden text. Program djvused performs the translations between the
compact binary representation used by DjVu and the easily modifiable
parenthesized syntax.
Djvused files are ASCII text files. The legal
characters in djvused files are the printable ASCII
characters and the space, tab, cr, and nl characters. Using other characters
has undefined results.
Djvused files are composed of a sequence of expressions separated
by blank characters (space, tab, cr, or nl). There are four kind of
expressions, namely integers, symbols, strings and lists.
- Integers:
- Integer numbers are represented by one or more digits, with the usual
interpretation.
- Symbols:
- Symbols, or identifiers, are sequences of printable ascii characters
representing a name or a keyword. Acceptable characters are the
alpha-numeric characters, the underscore "_", the minus
character "-", and the hash character "#". Names
should not begin with a digit or a minus character.
- Strings:
- Strings denote an arbitrary sequence of bytes, usually interpreted as a
sequence of UTF-8 encoded characters. Strings in djvused
files are similar to strings in the C language. They are surrounded by
double quote characters. Certain sequences of characters starting with a
backslash ("\") have a special meaning. A backslash followed by
letter "a", "b", "t", "n",
"v", "f", "r", "\", and stands for
the ascii character BEL(007), BS(008), HT(009), LF(010), VT(011), FF(012),
CR(013), BACKSLASH(134) and DOUBLEQUOTE(042) respectively. A backslash
followed by one to three digits stands for the byte whose octal code is
expressed by the digits. All other backslash sequences are illegal. All
non printable ascii characters must be escaped.
- Lists:
- Lists are sequence of expressions separated by blanks and surrounded by
parentheses. All expressions types are acceptable within a list, including
sub-lists.
The building blocks of the hidden text syntax are lists
representing each structural component of the hidden text. Structural
components have the following form:
-
- (type xmin ymin
xmax ymax ... )
The symbol type must be one of page, column,
region, para, line, word, or char, listed
here by decreasing order of importance. The integers xmin,
ymin, xmax, and ymax represent the coordinates of a
rectangle indicating the position of the structural component in the page.
Coordinates are measured in pixels and have their origin at the bottom left
corner of the page. The remaining expressions in the list either is a single
string representing the encoded text associated with this structural
component, or is a sequence of structural components with a lesser type.
The hidden text for each page is simply represented by a single
structural element of type page. Various level of structural
information are acceptable. For instance, the page level component might
only specify a page level string, or might only provide a list of lines, or
might provide a full hierarchy down to the individual characters.
The outline syntax is a single list of the form
-
- (bookmarks ...)
The first element of the list is symbol bookmarks. The
subsequent elements are lists representing the toplevel outline entries.
Each outline entry is represented by a list with the following form:
-
- (title url ... )
The string title is the title of the outline entry. The
destination string url can be either an arbitrary percent encoded
URL, or composed of the hash character ("#")
followed by a page name or number, or composed of the question mark
character ("?") followed by cgi-style arguments interpreted by the
djvu viewer. The remaining expressions in the list describe subentries of
this outline entry.
Annotations are represented by a sequence of annotation
expressions. The following annotation expressions are recognized:
- (background color)
- Specify the color of the viewer area surrounding the DjVu image. Colors
are represented with the X11 hexadecimal syntax #RRGGBB. For
instance, #000000 is black and #FFFFFF is white.
- (zoom zoomvalue)
- Specify the initial zoom factor of the image. Argument zoomvalue
can be one of stretch, one2one, width, page,
or composed of the letter d followed by a number in range 1 to 999
representing a zoom factor (such as in d300 or d150 for
instance.)
- (mode modevalue)
- Specify the initial display mode of the image. Argument modevalue
is one of color, bw, fore, or back.
- (align horzalign vertalign)
- Specify how the image should be aligned on the viewer surface. By default
the image is located in the center. Argument horzalign can be one
of left, center, or right. Argument vertalign
can be one of top, center, or bottom.
- (maparea url comment area
...)
- Define an hyper-link for the specified destination.
Argument url can have one of the following forms:
-
- href
(url href target)
where href is a string representing the destination and
target is a string representing the target frame for the hyper-link,
as defined by the HTML anchor tag <A>.
The destination string href can be either an arbitrary percent
encoded URL, or composed of the hash character
("#") followed by a page name or number, or composed of the
question mark character ("?") followed by cgi-style arguments
interpreted by the djvu viewer. Page numbers may be prefixed with an
optional sign to represent a page displacement. For instance the strings
"#-1" and "#+1" can be used to access the
previous page and the next page.
Argument comment is a string that might be displayed by the
viewer when the user moves the mouse over the hyper-link.
Argument area defines the shape and the location of the
hyperlink. The following forms are recognized:
-
- (rect xmin ymin width
height)
(oval xmin ymin width
height)
(poly x0 y0 x1 y1
... )
(text xmin ymin width
height)
(line x0 y0 x1
y1)
All parameters are numbers representing coordinates. Coordinates
are measured in pixels and have their origin at the bottom left corner of
the page.
The remaining expressions in the maparea list represent the
visual effect associated with the hyper-link.
A first set of options defines how borders are drawn for
rect, oval, polygon, or text hyperlink
areas.
-
- (none)
(xor)
(border color)
(shadow_in [thickness])
(shadow_out [thickness])
(shadow_ein [thickness])
(shadow_eout [thickness])
where parameter color has syntax #RRGGBB as
described above, and parameter thickness is an integer in range 1 to 32. The
last four border options are only supported for rect hyperlink areas.
Although the border mode defaults to (xor), it is wise to always
specify the border mode. Border options do not apply to line
areas.
When a border option is specified, the border becomes visible when
the user moves the mouse over the hyperlink. The border may be made always
visible by using the following option:
-
- (border_avis)
The following two options may be used with rect hyperlink
areas. The complete area will be highlighted using the specified color at
the specified opacity (0-100, default 50). Some viewers (e.g.,
djview4) support opacities in range 0-200 with 200 representing a
fully opaque color.
-
- (hilite color)
(opacity op)
This is often used with an empty URL for simply
emphasizing a specific segment of an image.
The following three options may be used with line areas to specify
an optional ending arrow, the line width and color. The default is a black
line with width 1 and without arrow.
-
- (arrow)
(width w)
(lineclr color)
Finally the following three options can be used with text areas.
The default background color is transparent. The default text color is
black. The pushpin option indicates that the text is symbolized by a
small pushpin icon. Clicking the icon reveals the text.
-
- (backclr bkcolor)
(textclr txtcolor)
(pushpin)
- (metadata ... (key value) ... )
- Define metadata entries. Each entry is identified by a symbol key
representing the nature of the meta data entry. The string value
represents the value associated with the corresponding key. Two sets of
keys are noteworthy: keys borrowed from the BibTex bibliography system,
and keys borrowed from the PDF DocInfo metadata. BibTex keys are always
expressed in lowercase, such as year, booktitle,
editor, author, etc.. DocInfo keys start with an uppercase
letter, such as Title, Author, Subject,
Creator, Produced, Trapped, CreationDate, and
ModDate. The values associated with the last two keys should be
dates expressed according to RFC 3339.
The current version of program djvused only supports
selecting one component file or all component files. There is no way to
select only a few component files.
This program was initially written by Léon Bottou
<leonb@users.sourceforge.net> and was improved by Yann Le Cun
<profshadoko@users.sourceforge.net>, Florin Nicsa, Bill Riemers
<docbill@sourceforge.net> and many others.