Word, WordPad and RTF

In yet another stunning victory for Microsoft's cross-compatibility, their RTF system is incompatible between their own products. Let me begin by explaining that I have a Delphi app that uses a mail merge system to merge database text into a document. Naturally, you can do this a lot of different ways but, for my particular instance, I need to merge from a non-ODBC compliant database and then automatically E-mail the resulting document to the correct person. This is a management tool we use fairly heavily.

In the past, I've always just used raw HTML formatting because it's handy and relatively standard. The particular request I've been working on is to format the resulting merge so that it can be printed in a "one summary page per item" format. HTML has the ability to do this with a style sheet as such:


<STYLE TYPE="text/css">
     P.breakhere {page-break-before: always}
</STYLE>

and use it via:


<p class="breakhere">

Thanks to HTML Goodies for the information.

Now, the downside to this is that HTML print formatting appears to be considered optional by... well... everyone. I initially sent the HTML to my Gmail account. It appears fine but without the page breaks. I would expect this considering Gmail has it's own HTML enclosure. I would not expect the PRINT to ignore all of the HTML formatting, but it does. I suppose this makes some sense in light of the header information they put at the top of the output. The process of placing the header must remove the rest of the print formatting. Keep in mind this is the equivalent of an HTML E-mail, not an HTML attachment. When I converted over to an HTML attachment, Gmail dutifully opens the attachment and FF3 prints the document out just fine.

On to Outlook. One would think that Outlook should honor HTML formatting since, unlike Gmail, it doesn't actually use any HTML for display or printing. This is not so. HTML prints from Outlook without the page breaks from the normal reader pane. Thinking I could force the issue, I dbl-clicked the message in Outlook (which of course puts it in Word) and voila - no difference. This is turning into a major hassle. Additionally, because Outlook is so clever, it won't really let you send an HTML attachment. Instead, it displays it in-line with the same problems printing as with an HTML E-mail.

On to the solution...

I had to select some format other than HTML but it must be 7-bit text-based because of the way my Delphi merge app works. That pretty much leaves me with PDF or RTF as easy standards that everyone can open. I settled on RTF because it's easy to create an RTF file in OpenOffice, save it out and hack it apart with TSE Pro. My merge codes are just enclosed in << >> (ala MS Word style) so a name would be


<<name>>

. Anyway, this all works, although I don't really recommend reading RTF as a form of entertainment. Many trial and error attempts later I had a document that works fine in WordPad and OpenOffice. It just won't work in Word.

This is not to say that Word doesn't open the document. It does, it just ignores the page breaks. I tried the same RTF file in Word 07 and Word 03 with the same results. Even more annoying, if I open the file in WordPad and then save it back out as RTF, Word still won't have the page breaks. After reading the RTF documentation, I was left with the impression that \page should always generate a new page. It turns out that Word won't take just a \page like the documentation, WordPad and OpenOffice. It HAS to be "\par \page \par " and please don't forget the ending space. In Word, this works fine. In WordPad and OO, it generates a leading blank line. In my case, I can call this close enough sine the summary pages are so short anyway. If I was pressed for space though, it would be an issue that would probably force me to PDF.

In another quirky little thing, when I created my file in OO and saved it out as RTF, my file size was 38k. If I open the file in WordPad and save it out, it shrinks to 13k. If I open the original in Word 07 and save it out, it grows to 87k. How can the same RTF encoding size vary that much from the same company? Even more annoying, why aren't WordPad and Word interoperable using RTF?

Comments

Unknown said…

Makes me remember another DOC-RTF we-cannot-even-implement-our-own-file-formats from MS. Back when MS made Win Word 97 as a follow-up to version 6, they updated the file format. Word 97 had an option to save as Word 6 Doc, but what you saved was not the Word 6 Doc format, it was a RTF hidden with the extension .doc, so only if you looked inside the raw file you would realised what was wrong. If your document had any graphics in it, the Word 97 saving a version 6 "Doc" format would be like 20* bigger than if same document had been saved with actual Word 6. The problem was later corrected in a service release.

November 3, 2008 at 4:36 PM

Chris said…

Not to dismiss your overall moral, but just "\page " works for me. So, and most minimally, the following opens fine in Word:

{\rtf1\pard
Please insert a page break here:\page Thank you.\par
}

November 4, 2008 at 4:59 AM

Marshall Fryman said…

CCR -

A simple document does indeed appear to work. I didn't try a very basic document. In the particular case I was fighting, I have a table on page 1 and a table on page 2, both with formatting for various cells being outlined or not. It may be something in the table interpretation or even the formatting that Word doesn't like. At any rate, thanks for clarifying that at least sometimes it works as advertised. Thanks for keeping me honest.

Rif -

I haven't thought about the old versions of Word in a while. We used to have problems with users opening their documents from floppy. Word 6 would always save it's temp files to the same location. Eventually, the floppy would be full, Word would crash and you'd lose all of your work. Oh, the good old days. :)

m

November 4, 2008 at 8:22 AM

Mike said…

Thanks Marshall, I hit this exact issue with page breaks today and I'm glad you took the trouble of documenting it. Wrapping in an (unnecessary) \par makes it work perfectly.

October 27, 2011 at 1:31 AM

Ruminated Rumblings

Search This Blog

Querying GitHub Issues from Google App Script

Word, WordPad and RTF

Labels

Comments

Popular posts from this blog

SMTP Mail and Indy (again)

Detecting a virtualized environment

Delphi Case with Strings