a 'mooh' point

clearly an IBM drone

Markup compatibility and extensibility (MCE)

Part 3 of ISO/IEC 29500 is the fun part and if you haven’t read it yet, you really should do so – especially if you are thinking about implementing an IS29500-document consumer. Part 3 basically consists of two distinct areas – one that deals with compatibility and one that deals with extensibility. The first area is the target of this post.

To any markup consumer and producer of a format not cast in stone it is important to be able to ensure compatibility both forwards and backwards as the format changes over time. This is where the “compatibility-thingy” comes into play.

The compatibility-features of OOXML enable markup producers to target different versions of applications supporting different versions of the specification or different features all together. The tools to do this are called “Alternate Content Elements” (ACE) and “Compatibility-rule attributes” and Part 3 is supposedly an exact remake of how compatibility and extensibility is handled in the binary Microsoft Office files.

The latter tool enables markup producers to “force” other markup producers to preserve specific content – even if it is not known to them as well as instructing markup producers to which parts of the document could safely be ignored. It can even instruct markup consumers to fail if it doesn’t understand some parts of the markup. If this sounds kind-of “SOAP-ish” to you, the attribute name “MustUnderstand” to enable just this should sound even more familiar to you.

The first tool can be thought of as sort of “a switch statement for markup”. It allows a markup producer to serve alternate versions of markup to target alternate feature-sets of different applications. The diverging markup would be listed as different “alternate content blocks” or “ACB’s”, and it is essentially an intelligent way for a markup producer to tell a consumer that “if you don’t understand this bit, use this instead”.

An interesting use case would be to use ACE to improve interoperability when making text documents with mathematical content. It has long been a public secret that interoperability with OOXML was improving day by day – but not with mathematical content. Mathematical content in OOXML (or “OMML”) has for some reason not been a top priority with implementers of OOXML, so interoperability has been really, really bad.

Now, wouldn’t it be cool if there was some way for markup producers to serve MathML as well as OMML to consuming applications? Let’s face it – most of the competition to Microsoft Office 2007 is from applications supporting ODF, and they all (to a varying degree) support MathML. So a “safe assumption” would be that “if I create an OOXML text document with OMML and send it to a different application, it probably understands MathML much better than OMML”. Wouldn’t it be cool, if you could actually do this?

Well, to the rescue comes ACE.

ACE enables exactly this use case. ACE is based on qualified elements and attributes, so as long as you can distinguish between the qualified names of the content you are dealing with, ACE is your friend.

So let’s see how this would work out.

Take a look at this equation: 

 

In Office Math ML (OMML) this is represented as:

[code:xml]<m:oMath>
  <m:r>
    <m:t>a=</m:t>
  </m:r>
  <m:f>
    <m:num>
      <m:r>
        <m:t>b</m:t>
      </m:r>
    </m:num>
    <m:den>
      <m:r>
        <m:t>c</m:t>
      </m:r>
    </m:den>
  </m:f>
</m:oMath>
[/code]

In MathML thhe formula is represented as:

[code:xml]<math:math >
  <math:mrow>
    <math:mi>a</math:mi>
      <math:mo >=</math:mo>
      <math:mfrac>
        <math:mi>b</math:mi>
        <math:mi>c</math:mi>
      </math:mfrac>
    </math:mrow>
</math:math> [/code]

(both examples have been slightly shrinked)

So how would one specify both these ways of writing mathematical content? Well, it could look like this:

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document
  xmlns:omml="http://schemas.openxmlformats.org/officeDocument/2006/math"
  xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
 
  xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"

  >
  <w:body>
    <w:p>
      <omml:oMathPara>
        <mc:AlternateContent xmlns:mathml="http://www.w3.org/1998/Math/MathML">
          <mc:Choice Requires="mathml">
            <mathml:math >
              <mathml:mrow>
                <mathml:mi>a</mathml:mi>
                <mathml:mo >=</mathml:mo>
                <mathml:mfrac>
                  <mathml:mi>b</mathml:mi>
                  <mathml:mi>c</mathml:mi>
                </mathml:mfrac>
              </mathml:mrow>
            </mathml:math>
          </mc:Choice>
          <mc:Choice Requires="omml">
            <omml:oMath>
          <omml:r>
            <omml:t>a=</omml:t>
            </omml:r>
            <omml:f>
            <omml:num>
              <omml:r>
              <omml:t>b</omml:t>
            </omml:r>
            </omml:num>
            <omml:den>
              <omml:r>
                <omml:t>c</omml:t>
              </omml:r>
            </omml:den>
            </omml:f>
            </omml:oMath>
          </mc:Choice>
          <mc:Fallback>
            <!-- do whatever -->
          </mc:Fallback>
        </mc:AlternateContent>
      </omml:oMathPara>
    </w:p>
  </w:body>
</w:document>[/code]

So you simply add the compatibility-namespace the file and add the "AlternateContent"-element. This element includes a list of "choices" and possibly a fallback choice. The choices are evaluated in the sequence they appear in the list of "choices".

And the benefit? Well, you can now have your cake and eat it too. If the consuming application supports it, it will display the equation based on the mathml-fragment – otherwise it will use OMML.

This is immensely interesting and applies to all sorts of places and use cases – heck, you can even use it to gain advantage of some of the new stuff in the strict schemas of IS29500 while keeping intelligent compatibility with existing applications only supporting ECMA-376 1st Ed. Imagine the ECMA-376-way of doing dates in spreadsheets and now add the possibility of using some of the new functionality added at the BRM - and without the risk of breaking applications nor losing data.

… that is if we change the namespace of the strict schemas, of course.

Smile

Comments (5) -

Hi Jesper,

I've just re-read Part 3, and I don't see why this isn't all possible even if we keep the same namespace for the strict schemas... what am I missing?

Ah - is your point that if we change the namespace, we can put some of the syntax from strict documents into transitional documents using MCE, and still maintain compatibility with existing transitional applications? Yes - true - but I'm not sure how much advantage there is to doing this. I don't see "using a nicer syntax for dates" as being worth the cost of the change to the strict namespace - in terms of the barrier it puts between implementation of the transitional spec and implementation of the strict spec (particularly for those non-office-suite applications that most likely already implement the strict spec just as well as they do the transitional).

Hmmm - is your thought that we'd have documents with "islands of strict" in a sea of transitional, and that the "islands of strict" would gradually expand as more document producers/consumers preferred the use of strict, to the point where the changeover would be simple? Interesting idea.

On the data loss issue - since the transitional schemas allow the strict features too, we also have the data loss issue for transitional documents that use strict features, when opened in ECMA 376 office applications. I think this is far more serious than the data loss issue for strict documents opened in ECMA 376 applications, because it's far easier for a document producer to create "semi-strict" documents unwittingly.

The best way I can see of dealing with this is to create a conformance profile consisting of "transitional-only" schemas - i.e. the schemas for the transitional features only without any of the strict-only features. Since the schemas are automatically generated, it may be straightforward to produce these - I'm not sure. Is it Tristan who's been maintaining the current schemas and the subsetting code?

Inigo

Inigo

If you don't change the namespace, then consuming apps written in the world of ECMA 376-1 / Office 2007 will just happily go ahead and read documents without a problem.  When they find bad data, they are not equipped to deal with it, like ISO Dates in SML.

IMHO all namespaces must be changed when there are breaking changes, in the absence of a pre-defined mechanism that implementors would have had access to in order to define a strategy to deal with versioning issues.  They did not have that mechanism.

Cut once and cut deep.  If we try and make all the unpleasant changes in one fell swoop, we avoid the death by a thousand cuts issue that is really intolerable for implementors.

Since there does not appear to be any implementation that creates instances in this flavour of OOXML, now is the time to try and avoid any poisoning of the well, leading to everyone just sticking to the safe haven of the ECMA376-1 level.

Gareth

Hi Gareth,

Yes, I understand the argument... what I'm saying is that changing the strict namespace doesn't solve the data loss problem, since the bigger data loss problem is the transitional-with-strict-features opened in ECMA 376 applications.

Inigo

Inigo,

Maybe I wasn't clear, I was proposing that the namespace changes for transitional too, since breaking changes have occurred since ECMA 376-1.

Anything to avoid apps written for ECMA 376-1 blindly opening 'ISO' Open XML files.

Gareth

Can any of this be used on a website though, and particularly through Wordpress with a plugin, if needed? I'm a wedding DJ in San Diego and I'm interested in displaying some calculations online to show my clients disagrams displaying the unhappiness of many who wish they had spent more on their entertainment for their reception, but as of yet I haven't found a good method to display math formulas in a Wordpress page. Brides typically skimp on the entertainment budget and I am hoping a legit mathematical formula can show them that it is worth it to pay more for a great DJ.

Comments are closed