a 'mooh' point

clearly an IBM drone

Document translation sucks (When Rob is right, he's right)

It is very seldom I read one of Rob's posts and think "That is just so true" - but yesterday was one of those occasions. I was reading through his latest post about load of different documents in a couple of applications and I couldn't help but smile when I got to the part where Rob made som observations about possible reasons for the poor load times of ODF-files using Microsoft Office 2003:

What is a file filter? It is like 1/2 of a translator. Instead of translating from one disk format to another disk format, it simply loads the disk format and maps it into an application-specific memory model that the application logic can operate directly on. This is far more efficient than translation. This is the untold truth that the layperson does not know. But this is how everyone does it. That is how we support formats in SmartSuite. That is how OpenOffice does it. And that is how MS Office does it for the file formats they care about. In fact, that is the way that Novell is now doing it now, since they discovered that the Microsoft approach is doomed to performance hell.


I have been trying to pitch my idea of "document format channels" for some time now. The basic idea is not to do translations between formats but to support the feature sets of both formats in the major applications.

I remember when I participated in the interop-work for the Danish Government in Fall 2007 and we tried to say something clever about the dissapointing results we saw of translation, we heard the rumours of Novell skipping the XSLT-translation of ODF to OOXML (and vice versa) and instead extend the internal object model of Novell's edition of OpenOffice.org . This was there the idea was born.

The idea was to round-trip documents in the format they were born and not to attempt translation (also, how the hell do you translate e.g. a digital signature between an ODF-file and an OOXML-file?).  What triggered the "vision" was that 1) the formats are not fully compatible and 2) translation sucks. In every interop-session I have attended and in every piece of interop-work I have participated in, there has been one, crystal clear conclusion:

When you translate, you loose information.

Essentially, translation is a poor-man's document consumption, because if you loose information when translating - why would do it? As Rob so correctly points out - when Microsoft chooses to use translators to enable "support" for ODF in their Microsoft Office suites, it's really another way of saying: "We don't really care about ODF". The same thing naturally goes for OpenOffice.org (and spin-offs). When they insist of implementing just import filters for OOXML and use translators to do so - they are saying exactly the same: "We don't really care about OOXML". In both cases what they are communicating to their users is really

We don't care that you loose information - you'll just have to settle for half of the correct solution

It's the same message I hear when some of my colleagues come to me and say: "Jesper, I finished the piece of code you wanted me to do". Sometimes I am blessed with conversations like:

Colleage: I finished the code piece
Jesper: Cool - does it work all right?
Colleage: Eh well, it compiles just fine ...

Is that good enough?

(and with this friendly post, I can only hope "someone" will accept the LinkedIn-invitation I sent in February just before the BRM in Geneva ... or maybe I should try Diigo instead?)

Smile

Comments (12) -

I think its important that all vendors should support the major document formats available. Perhaps its about time Microsoft finally wakes up and provides "choice" to its customers by supporting ODF in its products. They already have native support for "inferior" file formats like HTML, RTF, CSV and TXT. Why not ODF too?

yk.

Luc Bollen

@Jesper:

"The same thing naturally goes for OpenOffice.org (and spin-offs). When they insist of implementing just import filters for OOXML and use translators to do so - they are saying exactly the same"
To my knowledge, OpenOffice.org uses an import filter for OOXML (as you mentioned), *not* a translator.

"When Microsoft chooses to use translators to enable "support" for ODF in their Microsoft Office suites, it's really another way of saying: "We don't really care about ODF"."
In fact, Microsoft is not even using ODF translators in MS Office: they simply fund a stand alone translator project.  My interpretation is not that MS says "We don't care about ODF", they really says "Go to hell, ODF".

Microsoft claims for a year that they want an ISO stamp for OOXML "to offer choice of format to the users", but their acts clearly tell the true story: they really want to *prevent* choice of format.

If OpenOffice.org is able in a quite short timeframe to have an import filter for OOXML (while claiming that users don't need a second ISO format), how come that MS has not been able in more than 2 years since ISO acceptance of ODF, to include an import filter in MS Office (while claiming that they want to provide a choice of ISO formats) ?

Luc,

To my knowledge, OpenOffice.org uses an import filter for OOXML (as you mentioned), *not* a translator.

The difference is subtle. The point I am trying to make is that we need full support for the formats - regardless of the import mechanism. Using a translator is a bad idea since it is error-prone and information gets lost. Using a filter but without adequate enhancement of the internal object model of the consuming application is almost just as bad - you just arrive at a crappy document a bit sooner because of the faster load time.

Smile

If OpenOffice.org is able in a quite short timeframe to have an import filter for OOXML (while claiming that users don't need a second ISO format), how come that MS has not been able in more than 2 years since ISO acceptance of ODF, to include an import filter in MS Office (while claiming that they want to provide a choice of ISO formats) ?

Please - have you tested the OOXML-import feature of OOo? it doesn't even come close to being usable. To talk about "in a quite short timeframe to have an import filter for OOXML" is rather pointless. I believe that OOo themselves have labeled the import filter as "experimental".

Luc Bollen

@Jesper: "have you tested the OOXML-import feature of OOo? it doesn't even come close to being usable."

Of course, the quality of the OOo filter is low:
- OOo3 is still beta,
- the OOXML specification itself is still quite "experimental".  Two and a half month after the BRM, Microsoft, Ecma and ISO are still busy reshuffling the chapters and including more than 1000 changes in it (have you heard anything about a release date?).
- in addition, OOXML was definitely not designed to ensure that MS competitors will be able to easily write good import/export filters.

The point I was making is that OOo is already busy providing native support for OOXML, even before the spec is available.  While MS has still not *announced* any native ODF support 3 years after the 1.0 spec has been published by OASIS and 2 years after it has been approved by ISO.

This being said, I agree with the main point of your post: document translation sucks, and Microsoft knows this perfectly.

Interesting.

NitPick: Uh, how do you translate a digital signature on an ODF file onto a modified ODF file?  I get your point, since the signature can't survive a translation into any other format because that changes the signed material, just as it can't survive editing of the same one (the whole point of digital signatures).

By the way, at the Oslo SC34 meeting there was a resolution that suggested DIS29500 final version was in the hands of JTC1, and SC34 wanted copies as soon as possible in support of the activities that were initiated on behalf of 29500.  Do you have any information on what the hold-up is?

Hal,

Ive actually asked the secretary of SC34 for this file. This is the response so far:

> The final text prepared by the project editor has been submitted to ITTF.
> However, I was asked by ITTF not to distribute it to the SC 34 members
> since the availability of the slightly different version from the
> actually published version would be confusing.
>
> ITTF informed me that the proof will be available in a few weeks and I
> am intending to post the proof (after confirmation by the project editor)
> on the SC 34 website as soon as it becomes available.

Im not sure why there would be more than one version at this stage in time.

yk.

orcmid,

NitPick: Uh, how do you translate a digital signature on an ODF file onto a modified ODF file? I get your point, since the signature can't survive a translation into any other format because that changes the signed material, just as it can't survive editing of the same one (the whole point of digital signatures).

(I assume you mean "digital signature on an OOXML file onto a modified ODF file" or vice versa, right?)

The problem with translation and digital signatures is that the verification chain gets destroyed when translating a document. In quite a few government usages of documents the change-log or "editorial process" of a round-tripped document is bound by legal requirements to document on each step who changed what in the process. When translating the document from ODF to OOXML the digital signature becomes moot for the other participants in the process.

About DIS29500: I have no solid information. As Yoon-Kit mentions, is is also my impression that IS29500 sits with ITTF, but I have no "proof" of that.

I meant digital signature on any file to the same or another format.  The only thing you can do to preserve a digital signature is copy a file, no alteration, of format or content, preserves the signature.  The only solution in all cases is to resign the material.  Likewise, in editing an signed ODF document, only the last signature survives.  

So the chain of provenance and auditability depends on being able to preserve the old version and somehow provide an attestation about the new version (1) being based on it, and (2) being signed by whoever did it and also any cosigners of the attestation.  

I agree that any conversion of the format creates this requirement when authenticity matters.  This will also happen in archives and when document formats are "upgraded" as necessary to be usable with future or different software.  (Even making a PDF in a government archive, for example, or making a translation into another language, etc.)

With OPC, OOXML makes it a little easier to establish this sort of thing and even carry attestations as extra material, but in truth neither format seems to provide specific assistance for the preservation of a chain of authenticity in a collaborative work or a reformatted/translated work.

All I am saying that even with a channel model, as I understand it, where the input and output are in the same format, any editing invalidates any pre-existing digital signature.

If you mean that means the original in its original format is the only one that should be considered authentic, I agree.  This is a read-only case and probably not one that we need require either translation or filtering for to accomplish viewing.  Maybe providing viewers is for different scenarios and use cases?

Interesting topic.  Thanks for bringing it up.

Document translations sucks indeed and this was was one of the the major reasons for supporting OOXML as 99% of existing documents can be translated to OOXML with near perfection.
Creating ODF without having backwards compatibility with existing  document in mind was a major mistake as it is not what profeesional organizations want. For organizations migrating to a new XML format will be a major hassle. You would not want extra problems in translating documents to add to that hassle.

Im not sure why there would be more than one version at this stage in time.
The ITTF might be doing some editorial cleanup on the project editors work ? Making it more ISO styled ?
SC34 has the project editors version which then of course might slightly differ from the official one. Could be anoying and confusing especially if the paging or numbering changes.

Luc Bollen

Jesper,

It seems that Microsoft finally listen to its customers (indeed, I still buy and use a lot of Microsoft products):
"The point I was making is that OOo is already busy providing native support for OOXML, even before the spec is available. While MS has still not *announced* any native ODF support 3 years after the 1.0 spec has been published by OASIS and 2 years after it has been approved by ISO."

After a very long waiting time, things start to move: "Today, Microsoft announced that [...] Office 2007 Service Pack 2 will add native support for OpenDocument Format (ODF) 1.1"
www.sdtimes.com/.../article.aspx?ArticleID=32228
And this news is supported by Doug Mahugh and Jason Matusow declarations!

Thank you for this wise move, Microsoft.

Let's hope now that the quality of the ODF support will be great (better than the quality of the current translators), and that Microsoft participation in the OASIS ODF working group will be in good faith, and not with the purpose of blocking everything.  I'm quite optimistic about this, as Microsoft desperately needs to restore the reputation lost in the OOXML affair, and a "tricky move" here will have a devastating effect for them.

In a Japanese government project, I used a very complicated
spreadsheet.  It was created by Excel.  OpenOffice was able
to read the file reasonably fast, and save an ODF file in
a reasonable amount of time.  But OpenOffice took about 20
minutes to load the ODF file!!  So, I gave up the use of OpenOffice.
I do not know what causes the problem.  Does the support of
ODF in OpenOffice need translation?

Comments are closed