DJVUXML(1) DjVuLibre XML Tools DJVUXML(1)
NAME
djvutoxml, djvuxmlparser - DjVuLibre XML Tools.
SYNOPSIS
djvutoxml [options] inputdjvufile [outputxmlfile] djvuxmlparser [ -o djvufile ] inputxmlfileDESCRIPTION
The DjVuLibre XML Tools provide for editing the metadata, hyperlinks
and hidden text associated with DjVu files. Unlike
djvused(1) the
DjVuLibre XML Tools rely on the XML technology and can take advantage
of XML editors and verifiers.
DJVUTOXML
Program
djvutoxml creates a XML file
outputxmlfile containing a
reference to the original DjVu document
inputdjvufile as well as tags
describing the metadata, hyperlinks, and hidden text associated with
the DjVu file.
The following options are supported:
--page pagenum Select a page in a multi-page document. Without this option,
djvutoxml outputs the XML corresponding to all pages of the
document.
--with-text Specifies the
HIDDENTEXT element for each page should be
included in the output. If specified without the
--with-anno flag then the
--without-anno is implied. If none of the
--with-text, --without-text, --with-anno, or
--without-anno, flags are specified, then the
--with-text and
--with-anno flags are implied.
--without-text Specifies not to output the
HIDDENTEXT element for each page.
If specified without the
--without-anno flag then the
--with-anno flag is implied.
--with-anno Specifies the area
MAP element for each page should be
included in the output. If specified without the
--with-text flag then the
--without-text flag is implied.
--without-anno Specifies the area
MAP element for each page should not be
included in the output. If specified without the
--without-text flag then the
--with-text flag is implied.
DJVUXMLPARSER
Files produced by
djvutoxml can then be modified using either a text
editor or a XML editor. Program
djvuxmlparser parses the XML file
inputxmlfile in order to modify the metadata of the corresponding
DjVu file.
-o djvufile In principle the target DjVu file is the file referenced by
the
OBJECT element of the XML file. This option provides the
means to override the filename specified in the
OBJECT element.
DJVUXML DOCUMENT TYPE DEFINITION
The document type definition file (DTD)
/usr/share/djvu/pubtext/DjVuXML-s.dtd defines the input and output of the DjVu XML tools.
The DjVuXML-s DTD is a simplification of the HTML DTD:
http://www.w3c.org/TR/1998/REC-html40-19980424/sgml/dtd.html with a few new attributes added specific to DjVu. Each of the
specified pages of a DjVu document are represented as
OBJECT elements
within the
BODY element of the XML file. Each
OBJECT element may
contain multiple
PARAM elements to specify attributes like page name,
resolution, and gamma factor. Each
OBJECT element may also contain
one
HIDDENTTEXT element to specify the hidden text (usually generated
with an OCR engine) within the DjVu page. In addition each
OBJECT element may reference a single area
MAP element which contains
multiple
AREA elements to represent all the hyperlink and highlight
areas within the DjVu document.
PARAM Elements
Legal
PARAM elements of a DjVu
OBJECT include but are not limited to
PAGE for specifying the page-name,
GAMMA for specifying the gamma
correction factor (normally 2.2), and
DPI for specifying the page
resolution.
HIDDENTEXT Elements
The
HIDDENTEXT elements consists of nested elements of
PAGECOLUMNS, REGION, PARAGRAPH, LINE, and
WORD. The most deeply nested element
specified, should specify the bounding coordinates of the element in
top-down orientation. The body of the most deeply nested element
should contain the text. Most DjVu documents use either
LINE or
WORD as the lowest level element, but any element is legal as the lowest
level element. A white space is always added between
WORD elements
and a line feed is always added between
LINE elements. Since
languages such as Japanese do not use spaces between words, it is
quite common for Asian OCR engines to use
WORD as characters instead.
MAP Elements
The body of the
MAP elements consist of
AREA elements. In addition
to the attributes listed in
http://www.w3.org/TR/1998/REC-html40-19980424/struct/objects.html#edef-AREA,
the attributes
bordertype,
bordercolor,
border, and
highlight have
been added to specify border type, border color, border width, and
highlight colors respectively. Legal values for each of these
attributes are listed in the DjVuXML-s DTD. In addition, the shape
oval has been added to the legal list of shapes. An oval uses a
rectangular bounding box.
BUGS
Perhaps it would have been better to use CC2 style sheets with
standard HTML elements instead of defining the
HIDDENTEXT element.
CREDITS
The DjVu XML tools and DTD were written by Bill C. Riemers
<docbill@sourceforge.net> and Fred Crary.
SEE ALSO
djvu(1),
djvused(1), and
utf8(7).
DjVuLibre XML Tools 11/15/2002 DJVUXML(1)