Microsoft Office: Action Figures Sold Separately*

February 20, 2008  (Jeffrey Kabbe)

Do you remember that toy you wanted for Christmas when you were a kid? I’m talking about the toy that you begged your parents for and put on your Christmas list twice. The TV commercial made the toy look totally awesome (or even “rad”). This was the toy that would make all your childhood dreams come true! How would you feel if you had gotten that toy and then, frankly, it sucked?

That’s probably how many people feel this week. Microsoft released documentation for its binary Office file formats – Word, Excel, and PowerPoint. People have been clamoring for it for years, and it’s finally here. Now that the wait is over, many will no doubt wonder what the fuss was all about.

Microsoft’s newest Office file formats, introduced in Office 2007, have documentation available. Microsoft has been working hard to get these new formats, called Office Open XML (OOXML), standardized by international standards bodies. Microsoft is working hard to encourage adoption of OOXML, both by standards bodies and users. The OOXML file formats are enabled by default in Office 2007, causing many people to use the new formats without even realizing it. More than once I have had to return a “docx” file – created with Word 2007 – back to the sender and ask for a “doc” file instead. Microsoft created a converter for translating “docx” files into “rtf” files that can be read by older versions of Microsoft Word. The converter is inconvenient to use, though, and the results are sometimes quite different from the original.

The reality is that most people – particularly businesses – haven’t upgraded to Microsoft Office 2007. Indeed, even businesses that have upgraded probably have years worth of documents that were created with older versions of Office. It is these older versions of Microsoft Office where all the action is. Open Office, in particular, seems targeted at users of Office XP and Office 2003. One sticking point that seems to keep many people from switching to Open Office (and it’s very attractive price tag) is true Microsoft Office compatibility. Office support is only partial, and often a document created in Microsoft Word will not display properly in Open Office. Other contendors – AbiWord, Nisus Writer (Express and Pro), and Pages – share the same fate.

Unless an application can guarantee 100% Microsoft Office compatibility, the argument goes, it won’t be able to win many converts. The explanation for the lack of compatibility usually involves a complaint about Microsoft’s closed Office file formats. The file formats introduced in Office 2007 differ from the earlier formats in one critical way: the Office 2007 formats store everything in a readable text format; the old formats are binary files. After years of asking, pleading, hoping, wishing, and dreaming by programmers, Microsoft finally revealed the meaning of all the bits in those binary files. And, now that they have, nothing is likely to change.

The problem is that the specifications are hopelessly complex. The specification for the current Microsoft Word binary file format (used by Word 97, 2000, XP, 2003, and 2007) is a 210 page PDF document. What many had thought was a deliberate attempt to obfuscate and deliberately hinder the competition, may turn out to be just a series of sensible design choices that reflect the long histories of the applications in Microsoft Office. Joel Spolsky has a great article on the subject that I highly recommend. Joel is a software developer and writes primarily for a technical crowd, but most of the article should be accessible to non-techies.

Only time will tell if the binary Office file format specs end up being useful to the programming community. I think we may see some incremental improvements to existing Office file format support, but I doubt we will see anything revolutionary.

* No, I am not talking about Source Fource.

Leave a Reply