a 'mooh' point

clearly an IBM drone

Markup compatibility and extensibility (MCE)

Part 3 of ISO/IEC 29500 is the fun part and if you haven’t read it yet, you really should do so – especially if you are thinking about implementing an IS29500-document consumer. Part 3 basically consists of two distinct areas – one that deals with compatibility and one that deals with extensibility. The first area is the target of this post.

To any markup consumer and producer of a format not cast in stone it is important to be able to ensure compatibility both forwards and backwards as the format changes over time. This is where the “compatibility-thingy” comes into play.

The compatibility-features of OOXML enable markup producers to target different versions of applications supporting different versions of the specification or different features all together. The tools to do this are called “Alternate Content Elements” (ACE) and “Compatibility-rule attributes” and Part 3 is supposedly an exact remake of how compatibility and extensibility is handled in the binary Microsoft Office files.

The latter tool enables markup producers to “force” other markup producers to preserve specific content – even if it is not known to them as well as instructing markup producers to which parts of the document could safely be ignored. It can even instruct markup consumers to fail if it doesn’t understand some parts of the markup. If this sounds kind-of “SOAP-ish” to you, the attribute name “MustUnderstand” to enable just this should sound even more familiar to you.

The first tool can be thought of as sort of “a switch statement for markup”. It allows a markup producer to serve alternate versions of markup to target alternate feature-sets of different applications. The diverging markup would be listed as different “alternate content blocks” or “ACB’s”, and it is essentially an intelligent way for a markup producer to tell a consumer that “if you don’t understand this bit, use this instead”.

An interesting use case would be to use ACE to improve interoperability when making text documents with mathematical content. It has long been a public secret that interoperability with OOXML was improving day by day – but not with mathematical content. Mathematical content in OOXML (or “OMML”) has for some reason not been a top priority with implementers of OOXML, so interoperability has been really, really bad.

Now, wouldn’t it be cool if there was some way for markup producers to serve MathML as well as OMML to consuming applications? Let’s face it – most of the competition to Microsoft Office 2007 is from applications supporting ODF, and they all (to a varying degree) support MathML. So a “safe assumption” would be that “if I create an OOXML text document with OMML and send it to a different application, it probably understands MathML much better than OMML”. Wouldn’t it be cool, if you could actually do this?

Well, to the rescue comes ACE.

ACE enables exactly this use case. ACE is based on qualified elements and attributes, so as long as you can distinguish between the qualified names of the content you are dealing with, ACE is your friend.

So let’s see how this would work out.

Take a look at this equation: 

 

In Office Math ML (OMML) this is represented as:

[code:xml]<m:oMath>
  <m:r>
    <m:t>a=</m:t>
  </m:r>
  <m:f>
    <m:num>
      <m:r>
        <m:t>b</m:t>
      </m:r>
    </m:num>
    <m:den>
      <m:r>
        <m:t>c</m:t>
      </m:r>
    </m:den>
  </m:f>
</m:oMath>
[/code]

In MathML thhe formula is represented as:

[code:xml]<math:math >
  <math:mrow>
    <math:mi>a</math:mi>
      <math:mo >=</math:mo>
      <math:mfrac>
        <math:mi>b</math:mi>
        <math:mi>c</math:mi>
      </math:mfrac>
    </math:mrow>
</math:math> [/code]

(both examples have been slightly shrinked)

So how would one specify both these ways of writing mathematical content? Well, it could look like this:

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document
  xmlns:omml="http://schemas.openxmlformats.org/officeDocument/2006/math"
  xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
 
  xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"

  >
  <w:body>
    <w:p>
      <omml:oMathPara>
        <mc:AlternateContent xmlns:mathml="http://www.w3.org/1998/Math/MathML">
          <mc:Choice Requires="mathml">
            <mathml:math >
              <mathml:mrow>
                <mathml:mi>a</mathml:mi>
                <mathml:mo >=</mathml:mo>
                <mathml:mfrac>
                  <mathml:mi>b</mathml:mi>
                  <mathml:mi>c</mathml:mi>
                </mathml:mfrac>
              </mathml:mrow>
            </mathml:math>
          </mc:Choice>
          <mc:Choice Requires="omml">
            <omml:oMath>
          <omml:r>
            <omml:t>a=</omml:t>
            </omml:r>
            <omml:f>
            <omml:num>
              <omml:r>
              <omml:t>b</omml:t>
            </omml:r>
            </omml:num>
            <omml:den>
              <omml:r>
                <omml:t>c</omml:t>
              </omml:r>
            </omml:den>
            </omml:f>
            </omml:oMath>
          </mc:Choice>
          <mc:Fallback>
            <!-- do whatever -->
          </mc:Fallback>
        </mc:AlternateContent>
      </omml:oMathPara>
    </w:p>
  </w:body>
</w:document>[/code]

So you simply add the compatibility-namespace the file and add the "AlternateContent"-element. This element includes a list of "choices" and possibly a fallback choice. The choices are evaluated in the sequence they appear in the list of "choices".

And the benefit? Well, you can now have your cake and eat it too. If the consuming application supports it, it will display the equation based on the mathml-fragment – otherwise it will use OMML.

This is immensely interesting and applies to all sorts of places and use cases – heck, you can even use it to gain advantage of some of the new stuff in the strict schemas of IS29500 while keeping intelligent compatibility with existing applications only supporting ECMA-376 1st Ed. Imagine the ECMA-376-way of doing dates in spreadsheets and now add the possibility of using some of the new functionality added at the BRM - and without the risk of breaking applications nor losing data.

… that is if we change the namespace of the strict schemas, of course.

Smile