a 'mooh' point

clearly an IBM drone

What is conformance, really?

The OOXML/ODF-blogsphere has been in a frenzy the last couple of weeks after a couple of posts made by yours truly and Alex Brown that was picked up by Rob Weir. I don't want to get into the technical details here - you should catch up on the conversations taking place in the comment sections of their respective blogs.

Bu I do want to talk a bit about conformance - because conformance should be much more than schema-validation. To be able to have a clear perspective, we need to look in the two specifications for how conformance is described.

ODF 1.0 (IS 26300):

Conformance is described in section 1.5 

Documents that conform to the OpenDocument specification MAY contain elements and attributes not specified within the OpenDocument schema. Such elements and attributes must not be part of a namespace that is defined within this specification and are called foreign elements and attributes.

So this means that the only requirements for a document to have an "ODF-conformant" sticker slapped on it is to be able to validate against the ODF schema. If the document contains elements or attributes not defined in ODF 1.0, they should be marked with their own namespaces. This is actually all there is to say about conformance of individual documents in ODF 1.0 .

The section further describes conformance requirements for consuming and producing applications:

Conforming applications either MUST read documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place, or MUST write documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place.

So this section describes requirements to how foreign elements are handled when writing and reading ODF documents.

OOXML 1.0 (IS 29500):

The conformance clauses for OOXML were (drastically) changed at the BRM. Conformance in OOXML is described with more details and most specifically it contains conformance clauses for the OOXML-package itself, the so-called "OPC-package".

As with ODF, an OOXML 1.0 document is conformant if it adheres to the schema described in the standard.

More specifically it says in Part 1 section 2.4

Document conformance is purely syntactic; it involves only Items 1 and 2 in §2.3 above.

  • A conforming document shall conform to the schema (Item 1 above) and any additional syntax constraints (Item 2).

Now, this is already more difficult to "put down on paper" than the ODF-equivilant. Because "Item 1" and "Item 2" are described in Part 1 section 2.3 as

  1. Schemas and an associated validation procedure for validating document syntax against those schemas. (The validation procedure includes un-zipping, locating files, processing the extensibility elements and attributes, and XML Schema validation.)
  2. Additional syntax constraints in written form, wherever these constraints cannot feasibly be expressed in the schema language.


As a side-note, Item 2 above was the exact reason Stepháne Rodriguez' example with the broken Calculation Chain was actually a non-conforming OOXML-document, but that's a completely different story.

Moreover OOXML describes a few "conformance classes", specifically "Wordprocessing", "Spreadsheet" and "Presentation"-classes. The intent here is to be able to claim conformance to parts of the OOXML-spec.

And just as ODF contained requirements for applications, so does OOXML. But it takes conformance a bit wider. Since there is an "Item 1" and "Item 2" above, there is also an "Item 3". This was modified at the BRM and now says:

3. Descriptions of element semantics. The semantics of an element refers to its intended interpretation by a human being. 

In section 2.5 of Part 1 it now says:

Application conformance incorporates both syntax and semantics; it involves items 1, 2 and 3 in §2.3 above.

So a conforming application also has to abide by the semantics of the specification of elements and attributes. In lay-man's terms this could be described as "A conforming application has to treat content faithfully with respect to the specification of it". So it basically tells applications not to make their own interpretation of the elements it encounter as it traverses the XML-tree.

Now, I know that this is just a crude introduction to conformance of ODF- and OOXML-documents, but I think it is important to get the ball rolling and to give everyone a feeling of the complexity of the concept.

Thoughts, anyone? 

Conformance of ODF-documents

Ever since the now infamous article by Alex Brown the blogsphere has been filled with interpretations of the, really not so surprising, results - that the OOXML document with the original ECMA-376 spec does not conform to IS 29500.

The, really not so surprising, conclusions have been "Office 2007 does not even produce valid OOXML" followed closely by statements like "This shows that Microsoft Office 2007 should not be allowed since it does not produce valid OOXML".

Hmmm ... ok.

As some of you might remember, I participated in some lab tests with OOXML/ODF interop in Fall 2007. Basically I sat in a small room with guys from IBM, Microsoft, Novell and some guys from the Danish National IT- and Telecom Agency sifting through documents, converting them and examining the resulting XML generated. The documents we worked on were supplied by different parts of the Danish public sector. They were basically told to use some of their existing documents as basis for the parts of the tests they participated in. So these documents were real-world-documents.

One of the things we tested was to see if the documents were in compliance with their respective specs. The original OOXML-documents we tested were all compliant to the ECMA-376 spec ... but it was a different case with the ODF-documents. So the other day I tried to validate all the sent-in original ODF-documents supplied to us.

The results are illustrated in the table below:

File name

Generator

Konklusion

DFFE_Afgået svar til Jane Doe.odt

OpenOffice.org/2.3

not valid

DFFE_SJ_(1) - 15-06-2007 Foreløbig Høring om forslag.odt

OpenOffice.org/2.0

valid

GRIBSKOV_bek-281(BS).odt

OpenOffice.org/2.0

valid

GRIBSKOV_Standardbrev ifm ITST pilotprojekt.odt

OpenOffice.org/2.2

valid

GRIBSKOV_Udkast til Forslag til Lokalplan.odt

OpenOffice.org/2.1

not valid

ITST standardbrev ODT.odt

OpenOffice.org/2.0

valid

ITST Testdokument ODT.odt

OpenOffice.org/2.2

not valid

RM Kursusmateriale.odt

OpenOffice.org/2.0

not valid

RM Standardbrev 2s.odt

OpenOffice.org/2.3

not valid

The table contains information about the file name of the original document, the application that generated it (from the META-file in the ODF-package) and if the document passed the test.

Overall conclusion of this was:

Application

Creates consistantly valid ODF?

OpenOffice.org/2.0

 

OpenOffice.org/2.1

 

OpenOffice.org/2

OpenOffice.org/2.3

 

So should we demand that OOo not be used at all? Of course not, but we should keep the pressure on the OOo-team to fix their code ... just as we should with Microsoft and Microsoft Office.