In this tutorial, you will learn to use the ODFDOM Toolkit API to take an XML file that describes movies and turn it into an OpenDocument text file. Here is a partial screenshot of what the result will look like:
The program is written to be run from the command line program; you invoke it with a command like this:
The program starts out by declaring variables that you need to accomplish the
task. First, you need a variable to hold the name of the input file, a
Document for the parsed XML, and an
to allow you to easily access the information in the document.
You will need quite a few variables for the output OpenDocument file. In order to understand these, you need to know how an OpenDocument file is structured. It’s actually a .zip format file that contains a content.xml file that holds the main content (in this case, the text), and a styles.xml file that holds some of the presentation information. (There are other files in the .zip, but they don’t concern us in this tutorial.)
The presentation information in the styles.xml file consists of the named styles such as “Heading 1,” “Default,” and the other styles names that appear in the word processor’s drop-down menu.
The content.xml file for a word processing document has
all of the content as a child of an
<office:text> element. The
content.xml file also contains some presentation information;
the automatic styles. These are styles that are automatically
created when you click the bold or italic icons in the word processor.
Here, then, are the variables needed to process the output file.
Notice the naming conventions:
OdfOfficeStyles is the
class that represents the
This having been done, here is the
which creates an application and runs it via the
This has nothing to do with ODF; it’s just standard opening and parsing of an XML file, but it’s here for the sake of completeness:
setupOutputDocument() method starts by calling
newTextDocument() to create an ODF text
document from a template that is built into the library. Once you
have the document, the method gets the the Document Object Model (a
Document) for the content.xml and
setupOutputDocument() then retrieves the
automatic styles in content.xml and the named styles in
styles.xml (or creates them if they don’t exist yet).
It finishes by retrieving the
from the content DOM. All of the headings and paragraphs that make up the
document’s content will be children of this element.
The templates included in the ODFDOM toolkit have content in them; a
newly-created text document has a paragraph that contains no text. The
cleanOutDocument() method gets rid of this paragraph,
by repeatedly removing the first child of the
node until there are no more.
You create a style with an
OdfStyleStyle object. This
object has a
name property and a
property. (For details, see the
Each style belongs to a family. The family tells
what kind of element this style is applied to. Styles for paragraphs
or headings belong to the
Paragraph family; styles
for inline text belong to the
(For all the family names,
Within the style object are the style properties.
These properties come in
property sets, and a style can have properties from more
than one set. In the output
document, the heading that reads “The Cast”
uses properties from the
ParagraphProperties set to specify
its margins. It uses properties from the
TextProperties set to specify that it is italic.
(For all the property set names, see details
We add named styles to the styles.xml file in the
addOfficeStyles() method. It
starts off by retrieving the
default paragraph style and setting it to 10 point.
The italicized line in the preceding code does what we want, but only
for documents that have Western fonts. For documents that might
contain Asian or complex fonts (such as Hindi, Arabic, etc.)
you would also like to set the
FontSizeComplex. Similarly, when setting
FontWeight (for bold) or
italic), you will probably also want to set the
Since setting font weight and style and size are frequent occurrences, and since you really do want your documents to be international-friendly, the italicized line is replaced with this code:
This is a call to one of the following three utility routines to make your life easier:
addOfficeStyles() method adds several different styles
to the styles.xml DOM. There are separate styles for a
movie heading, cast heading, synopsis paragraph, and an entry (paragraph)
in the cast member list. This last style will also need to have specify
a tab stop with dots as a leader to separate the actor’s real name
from the name of the character she portrays.
Finally, the method creates an inline style for
the rating stars; they need to be slightly smaller than the movie title
I won’t present all of the method here lest your eyes glaze over. Instead, here is the code for setting up the style for paragraphs in the synopsis; the other styles use similar code.
The sequence you should follow is:
When you create the style using
newStyle(), it is
automatically added to the list of styles.
Note that the code that follows
does not set the top and bottom margins, so they default to zero.
The other style that is different from the others is the cast paragraph with its tab stop. The hierarchy of elements in the resulting XML is:
In this instance, you must explicitly create the
OdfParagraphProperties object. You didn’t have to
do this when using
style.setProperty(...), because that method
automatically creates the
<style:paragraph-properties> element for you.
Setting up all your styles is the most tedious part of the process of creating an OpenDocument file; adding content is relatively simple.
Processing the input consists of grabbing all the
<movie> elements, extracting the relevant sub-elements
and adding the appropriate ODFDOM objects to the output document. The
following method has a
catch block to catch
exceptions thrown by the subsidiary methods.
processTitle() method adds the movie’s title and
star rating; the stars are in a
OdfSpan object, since their
style requires a smaller font size. Instead of using
getElementsByTagName(), this method uses XPath to extract
The general sequence for adding content to the output document is to
create the appropriate ODF object and use
addStyledContent() to add the content (the second parameter
to the method) with the style specified as the first parameter.
Similar code adds the synopsis for each movie; the method needs a loop
to handle all the
<para> elements in the
<synopsis> element. The paragraphs are retrieved
with an XPath expression that returns a
Rather than create a separate
variable for the
<para>’s text content,
processSynopsis() sets the style on the paragraph when
it creates it. The method also extracts the
text all at once with the expression
shown in the italicized code.
And this is just more of the same. After adding a heading
for the cast, an XPath expression retrieves all the
<actor> nodes. Then a
for loop processes
each one, again using XPath to get the actor’s name and role.
If the actor’s role is mentioned, then
must add a tab to separate the name and role.
You can’t just
put a \t character into the output; ODF treats tabs and
newlines as if they were just a blank (it “normalizes” them).
addContentWhitespace() method, used in the
following code, will output a
<text:line-break> element when
it encounters a tab or newline in the content.
This is the easiest part of the program: only one line of actual code, surrounded by error handling.
You may download the src directory for this program. This directory comes from a NetBeans project.