a 'mooh' point

clearly an IBM drone

Versioning of OOXML (thank you for all the fish)

One of the most pressing matters we had to deal with in Okinawa was a question raised by quite a few people including members of the national body of Switzerland as well as hAl on the blogs of Alex Brown, Doug Mahugh and yours truly:

How can you tell if a document is generated using the original set of schemas or the new (improved) ones?

The truth is: you can’t.

Well, at least not at the moment. You can get a hint from sniffing at various parts of the document, but there is no definitive way to do it. We all agreed that we had to come up with a solution, and we discussed (at length in session as well as during breaks, dinners and sight-seeing) what to do.

Roughly speaking, there are a few ways we could do it, including

  • Changing the namespace-name of the schemas
  • Expand the conformance attribute to indicate version of OOXML
  • Adding an optional version attribute to the root elements of the documents (WordpressingML, SpreadsheetML and PresentationML) defaulting to the original edition of  ECMA-376.

Version attribute

Let me start with the last option, since it is the easiest one to explain and understand.

ODF has a “version”-attribute in the root element of ODF-documents. It is defined in the urn:oasis:names:tc:opendocument:xmlns:office:1.0-namespace, so when creating e.g. an ODF spreadsheet using OOo 3, you will see the following xml-fragment:

[code:xml]<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" (...)
  office:version="1.2">
</office:document-content>[/code]

The above would tell you to use version 1.2 of the ODF-spec – currently being drafted by OASIS.

We could do a similar thing with OOXML, that is, having an optional version-attribute with the version number of the applied flavor of OOXML. This approach would have some clear advantages. First and foremost it would allow all the existing applications supporting OOXML to do absolutely nothing to their existing code base to continue to be able to read and process OOXML-files in ECMA-376 1st Ed format. It would also enable them to use any existing schema-validation of content and all existing files in ECMA-376 would still be perfectly valid.

Expanding the conformance attribute

Another thing to do would be to expand the new conformance attribute. At the BRM in Geneva a new conformance attribute was added to the root elements to display to which version of OOXML the document conforms. You will perhaps recognize this XML-fragment
[code:xml]<w:document conformance=”strict”>
</w:document>[/code]
We could also use this attribute and add version information to it. A way to do it would be
[code:xml]<w:document conformance=”transitional-1.0”>
</w:document>[/code]
for the ECMA-376 1st Ed and something else for any subsequent versions.

Fixing or solving?

The problem with the two alternatives mentioned above is that they provide an immediate fix, but they are in no way panaceas for the issue of versioning. In Geneva we split up OOXML into 4 distinct parts and tried the best we could to make sure, that they were “islands” within themselves. So in the original submission’s Part 2 dealing with OPC, there were dependencies to WordPressingML (AFAIK) and these were removed. The result is that you can now refer to ISO/IEC 29500-2 should you in your implementation need a packaging format where OPC suits your needs. The basic idea was exactly this; to provide a way for other standards to be able to “plug in” to OOXML and reuse specific parts of it.

The two fixes described above provide a fix for the problem with versioning of “the document stuff”; text documents, spreadsheets and presentations – but they do nothing for Part 2 and Part 3 (under the assumption that Part 4 will not change). The trouble is - this is not only a theoretical problem. ECMA TC46 working with XPS (Xml Paper Specification) has based the package format for XPS on OPC. But it is difficult for them to refer to ISO/IEC 29500-2 OPC since it is not possible to distinguish the namespace name from its predecessor ECMA-376 1st Ed. So unless we figure out a solution, they will have to refer to ECMA-376 1st Ed (and it was my impression that they’d prefer to refer to ISO OPC instead).

This is kind of annoying or maybe even embarrassing. We (the ISO process) chose to split up OOXML to allow reuse – but the first time someone knocks on our door and wishes to do exactly that – we (unless we find a solution to this problem) will have to say: “Well, we didn’t actually mean it”.

Change the namespace-name

An entirely different approach would be to change the namespace name(s) of IS29500. The original names where along the lines of

http://schemas.openxmlformats.org/package/2006/content-types
http://schemas.openxmlformats.org/package/2006/relationships
http://schemas.openxmlformats.org/spreadsheetml/2006/main
(…)

So an alternative solution would be to change the values of the namespace name. The names above could be changed to

http://schemas.openxmlformats.org/package/IS29500-2008/content-types
http://schemas.openxmlformats.org/package/ IS29500-2008/relationships
http://schemas.openxmlformats.org/spreadsheetml/IS29500-2008/main

(I would have liked to use colon as seperator between the ISO project number and year, but according to http://www.w3.org/TR/REC-xml/#sec-common-syn, it seems colons are not allowed in namespace names.)

What would be the consequence of this?

The up-side

Basically, changing the namespace name would solve the problem with distinguishing between ECMA-376 1st Ed and IS29500:2008. It would be trivial to distinguish content based on either standard and it would apply to all parts of the specification. Actually, it would apply to all schemas in the specification, so it would enable someone to create a document based on ECMA-376 OPC, IS29500 WordpressingML and ECMA-376 DrawingML (even though this is permitted in the current version of OOXML). It would also give us the chance to have a fresh start with IS29500:2008 and give us a clean slate for our further work.

The down-side

Changing the namespace is sadly not a silver bullet – unfortunately the free lunch comes with nausea as well. The trouble is – by changing the namespace, applications that support ECMA-376 will break if they try to load documents based on IS29500 since the namespace will be foreign to them.

The question is, though: shouldn’t they?

The purpose of XML namespaces are to identify the vocalulary of the elements of an XML-fragment. So the real question could be: are we talking about a new vocabulary when going from ECMA-376 to IS29500:2008? Are the changes from the BRM so drastic that we wouldn’t expect applications supporting ECMA-376 to be able to load documents conforming to IS29500?

Well, it was of importance to ECMA and most of the delegates at the BRM to ensure that whatever we did to change the specification did not render existing nonconformant. We succeeded quite well in doing just this,  so one could argue that the changes were not that big. However, this just concerns the transitional schemas. If you remember, the changes in schema structure were quite big. We divided one big chunk of schemas into two categories, “strict” and “transitional” and I would indeed argue that we changed the vocabulary by doing just that. We changed it from defining a vocabulary with a complete mess of legacy-stuff and new stuff into two separate piles with one “going-forward-vocabulary” and one “going-backwards-vocabulary”. Isn’t that big enough to change the namespace name?

Do it right the first time

At the WG4-meeting I was actually advocating for a simple addition of a version attribute and solve the bigger namespace problem at a later time for a revision of OOXML, but the more I think about it, the more I am convinced this is the wrong way. We are in a position right now where there are no applications out there supporting the full set of IS29500. Not changing the namespace name will not make the problem go away – it will just postpone the issue, and if we wait, the problem will become increasingly bigger as applications will surface with support for IS29500. The problem will be even bigger if you have a long list of supporting applications and not – as now – none a single one.

The more I think about it, the more I am sure the right way to do it is

  1. Add a new version attribute to the root elements defaulting to “1.0” which would be ECMA-376 1st Ed. IS29500:2008 would have version “1.1”.
  2. Change the namespace name for IS29500 in a matter as outlined above.

Vendors in the process of implementing IS29500 will then have to add some code to their application to support this.

But – I am in no way sure I have covered all angles. Am I missing something here?

Smile