What is WordMLToFO Stylesheet?
WordML is formally called "Wordprocessor Markup Language". Until now
Microsoft Word has the main native file format
called binary format (.doc extension) or Rich Text format (.rtf extension).
WordML has an XML file format and is called fully compatible like
these native file formats. In addition, WordML has the following
features.
Unlike the binary format, the specification of WordML is published
- You can embed user schema tags in WordML which is impossible in RTF or binary format
- XML applications can easily access WordML files using XSLT
You can build XML applications based on WordML features:
- Applying XSLT stylesheet to the user XML documents, you can easily
generate WordML files. You can view or print generated WordML files
using Word2003
- Applying XSLT stylesheet to WordML files, you can generate XML files.
It is very useful to convert Word documents stored in the corporate
information system using WordML as a middle format when doing Word to XML
document conversion
- Office 2003 XML Reference Schemas includes the document titled "Overview
of WordprocessingML" that simply explains WordML structure and examples.
WordMLToFO Features
- Element mapping - WordMLToFO style sheet maps WordML elements
to the XSL-FO elements in the following way
- Style expansion - Word document contains many styles and the
styles are applied paragraph or text-run or table, finally they are
formatted according to the applied stylesheet result. The style contains
table-style, paragraph-style and character-style. In contrast, XSL-FO
does not have style concept. All of the formatting property must be
described as the last result in the FO file after applying the styles.
As a result WordMLToFO stylesheet must apply following style, and then
output the last result to the FO file
- Image file generation - As XML file cannot contain binary
data images, WordML stores the image data using the Base64 encoded
string data. Following is the sample. Element w:binData contains the
image data portion
- Line space calculation - Microsoft Word has the complex line
layout specification such as fixed line spacing, at-least line spacing,
auto line spacing. WordMLToFO stylesheet computes line spacing to realize
to get the same formatting result as exact as possible. But the formatting
model differs between Word and XSL-FO. So we can not get the same formatting
result in all case. NOTE: WordMLToFO stylesheet uses external Java library
to process image file and line spacing calculation. If you use MSXML as
XSLT processor, the stylesheet cannot call Java libraries, so you cannot
use these functions
|