Version: 1.0
Author: Adrian Kosmaczewski
Date: Sunday, October 19, 2003
Programming Languages: C# (.NET 1.1), XSL-FO
Tools: Visual Studio.NET 2003
Platforms: Windows
Download: orugasystem.zip
Licence: See the “Kosmaczewski Licence.txt” and “Infozone Licence.txt” files.
The OrugaSystem project came to my mind first as an HTML <--> RTF converter. The idea came to me after reading the following article in CodeProject: “An XML based (we)blog with RSS feed” by Johan Danforth.
I liked the idea of using the RichTextBox for writing HTML content, and I found the way Johan uses to turn RTF into HTML really nice (loop over the text you’ve written, set a couple of boolean variables when you find bold or italics, and voila!). But I wondered (without knowing anything about RTF): what about sending the contents of the RichTextBox to the server and let it transform it to HTML? Several solutions for that: buy an RTF generator component (I didn’t want that), or create one. But for the latter, you need to understand RTF syntax. And that’s not the easiest part…
It all finally started the other day, when I found the book “RTF Pocket Guide”, from O’Reilly, and in just a couple of pages I had the information I needed to start reading and writing RTF files. On the web I found two very useful open-source XSL files that do the transformation from XSL-FO to RTF and HTML.
Finally, following a friend’s advice, I created this system that has the following design goals:
- It uses XSL-FO as an intermediary format;
- It can be extended to support other formats, both for input and output.
The name “Oruga” means “worm” in Spanish, and it is used to remember us the transformation of the worm into a butterfly… poetic isn’t it? :)
The idea is the following. The Crisalida class (”Cocoon” in Spanish) handles IParser instances and ISerializer instances. The IParser reads a file in some format (at the moment, there are IParsers for RTF, HTML and XSL-FO) and the ISerializer writes it to a generic stream (the same formats, RTF, HTML and XSL-FO are handled both for input and output). The system can write the output data to a network socket, to a file or directly to memory. More IParsers and ISerializers can be added in the future without problem.
The IParser classes use an XxxTextReader class (Xxx being Rtf, Html, etc) that provides similar functionality to that of the System.Xml.XmlTextReader class, that is, a fast, forward-only class that reads the stream of data provided for tokens. The IParser class uses that information to give instructions to a XslFoBuilder class that handles all the burden of the creation of the XslFoDocument instance, that represents the data in XSL-FO format.
The system, at a first stage, must be able to handle basic RTF files (in ANSI encoding, or Windows Codepage 1252), as well as simple HTML files, with only the paragraph information being taken into account. The system must be able to create the proper XSL-FO representation (using “fo:block” and “fo:inline” XML elements) and use it to build other formats from it.
Finally I used the NDoc code documentation generator for .NET for post-processing the XML documentation generated by the C# compiler.
Acknowledgements and References
- “An XML based (we)blog with RSS feed” by Johan Danforth: A very nice idea of how to control blog posting from a Windows Forms client!
- RTF Pocket Guide, by Sean M. Burke: The little great book that helped me understand RTF at last…
- Rich Text Format (RTF) Specification, version 1.6: Useful and comprehensive, but definitely too complicated for beginners…
- Using XSL FO with XEP 3.0: Excellent tutorial to learn basic XSL-FO concepts
- What is XSL-FO?: Good starting point for the XSL-FO specification.
- Browsing XSL FO: From here you can get the XSL developed by the RenderX team.
- Yahoo! Groups: xml-doc - Applying XML to technical documentation: From here I got the reference for the fop2rtf.xsl.
- Re: xsl stylesheet for transforming html to xsl:fo: From here I got the reference for the xhtml2fo stylesheet.
- SourceForge Project: NDoc: Code Documentation Generator for .NET
- SourceForge Project: html2fo: Converter from HTML to XSL-FO.