a 'mooh' point

clearly an IBM drone

Document translation sucks (When Rob is right, he's right)

It is very seldom I read one of Rob's posts and think "That is just so true" - but yesterday was one of those occasions. I was reading through his latest post about load of different documents in a couple of applications and I couldn't help but smile when I got to the part where Rob made som observations about possible reasons for the poor load times of ODF-files using Microsoft Office 2003:

What is a file filter? It is like 1/2 of a translator. Instead of translating from one disk format to another disk format, it simply loads the disk format and maps it into an application-specific memory model that the application logic can operate directly on. This is far more efficient than translation. This is the untold truth that the layperson does not know. But this is how everyone does it. That is how we support formats in SmartSuite. That is how OpenOffice does it. And that is how MS Office does it for the file formats they care about. In fact, that is the way that Novell is now doing it now, since they discovered that the Microsoft approach is doomed to performance hell.


I have been trying to pitch my idea of "document format channels" for some time now. The basic idea is not to do translations between formats but to support the feature sets of both formats in the major applications.

I remember when I participated in the interop-work for the Danish Government in Fall 2007 and we tried to say something clever about the dissapointing results we saw of translation, we heard the rumours of Novell skipping the XSLT-translation of ODF to OOXML (and vice versa) and instead extend the internal object model of Novell's edition of OpenOffice.org . This was there the idea was born.

The idea was to round-trip documents in the format they were born and not to attempt translation (also, how the hell do you translate e.g. a digital signature between an ODF-file and an OOXML-file?).  What triggered the "vision" was that 1) the formats are not fully compatible and 2) translation sucks. In every interop-session I have attended and in every piece of interop-work I have participated in, there has been one, crystal clear conclusion:

When you translate, you loose information.

Essentially, translation is a poor-man's document consumption, because if you loose information when translating - why would do it? As Rob so correctly points out - when Microsoft chooses to use translators to enable "support" for ODF in their Microsoft Office suites, it's really another way of saying: "We don't really care about ODF". The same thing naturally goes for OpenOffice.org (and spin-offs). When they insist of implementing just import filters for OOXML and use translators to do so - they are saying exactly the same: "We don't really care about OOXML". In both cases what they are communicating to their users is really

We don't care that you loose information - you'll just have to settle for half of the correct solution

It's the same message I hear when some of my colleagues come to me and say: "Jesper, I finished the piece of code you wanted me to do". Sometimes I am blessed with conversations like:

Colleage: I finished the code piece
Jesper: Cool - does it work all right?
Colleage: Eh well, it compiles just fine ...

Is that good enough?

(and with this friendly post, I can only hope "someone" will accept the LinkedIn-invitation I sent in February just before the BRM in Geneva ... or maybe I should try Diigo instead?)

Smile