a 'mooh' point

clearly an IBM drone

Do your math - OOXML and OMML (Updated 2008-02-12)

As I promised in my latest article about ODF and MathML, I have worked a bit with the ECMA-equivilants of ODF and MathML: OOXML and OMML (Office Math ML).

A bit of introduction is propably a good idea:

In OOXML, mathematical content is structured using the internal markup language, Office Math ML or OMML, for short notation. OMML is closely tied to the structure of WordProcessingML and the look-and-feel is very similar. In contrast to the ODF-way, OMML is usually inserted inline in the WordProcessingML whereas it in ODF is kept in a seperat part of the package. 

Ok - now that that is done with - lets get on with the good stuf!

As in my previous article, I'll work with the same  base equation



Now, as I wrote in the other article, learning MathML is like learning a new (programming)-language, and I can tell you, it is no different with OMML. MathML arranges the mathematical elements by position whereas OMML arranges the mathematical elements by their explicit meaning, so a fraction is created in MathML as (simplified)

<math:mfrac>
  <math:mi >
π</math:mi>
  <math:mn>4</math:mn>
</math:mfrac>

and in OMML it is created as (simplyfied)

<m:f>
  <m:num>
    <m:r>π</m:r>
  </m:num>
  <m:den>
    <m:r>4</m:t>
  </m:den>
</m:f>

So when dealing with MathML and e.g. fractions, we look at a fraction with "something at the top and something at the bottom". When dealing with OMML, we deal with "numerators" and "denominators". It is rather clear to me, that any skills learned in MathML are not directly applicable to OMML - and vice versa. It took me about the same amount of tíme to "get" MathML as it did to "get" OMML. In both cases, I had not worked with the specific ML before. It has taken me about a day to research and write each article.

Anyway - back to the plot.

As always I work with my friend, "the minimal OOXML-file". It is an OOXML-file stripped from all the junk and cut down to the bare minimum - not even a single, not-used namespace declaration is left behind. You can see the minimal file here: Minimal OOXML.docx (1,16 kb).

So my task was a two-step-task: Since OOXML is rather new there is not that much information about OMML out there. So as first step I created a sample equation using Word 2007 to get a feeling of what it's all about. Then I found Part 4 of the OOXML-spec, located section 7 and started to put the OMML together. The OMML I ended with was this:

<m:oMathPara>
  <m:oMath>
    <m:r>
      <w:rPr>
        <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
      </w:rPr>
      <m:t>cos</m:t>
    </m:r>
    <m:f>
      <m:num>
        <m:r>
          <w:rPr>
            <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
          </w:rPr>
          <m:t>π</m:t>
        </m:r>
      </m:num>
      <m:den>
        <m:r>
          <m:t>4</m:t>
        </m:r>
      </m:den>
    </m:f>
    <m:r>
      <m:t>=</m:t>
    </m:r>
    <m:f>
      <m:num>
        <m:rad>
          <m:radPr>
          </m:radPr>
          <m:deg/>
          <m:e>
            <m:r>
              <m:t>2</m:t>
            </m:r>
          </m:e>
        </m:rad>
      </m:num>
      <m:den>
        <m:r>
          <m:t>2</m:t>
        </m:r>
      </m:den>
    </m:f>
  </m:oMath>

I bet you are now thinking what I was thinking: what the f***? That's a lot of markup! Well, the reason why there is so much markup is that each piece of text/data in the equation is encapsulated in a "run"-element that enables additional styling. If all this additional markup including other property-markup is removed, the result is this:

<m:oMathPara>
  <m:oMath>
    cos
    <m:f>
      <m:num>π</m:num>
      <m:den>4</m:den>
    </m:f>
    =
    <m:f>
      <m:num>
        <m:rad>
          <m:e>2</m:e>
        </m:rad>
      </m:num>
      <m:den>2</m:den>
    </m:f>
  </m:oMath>
</m:oMathPara>

Ain't that purdy?

The OOXML-file with the equation is available here: minimal ooxml with math.docx (1,25 kb). It displays like this in Microsoft Office 2007:

Why not just use MathML?

Before I go into the details with converting from MathML to OMML, I think it is appropriate to pause and look at how MathML and OMML differ from each other. As I noted above there is quite a lot of "overhead" in OMML with everything being encapsulated in "runs". But there is a reason for this. The overhead enables us to do a couple of things that we cannot do with MathML.

Everything fits

You can put virtually everything into a OMML-formula that you can put into a normal WordprocessingML-fragment. As Murray Sargent puts it:

Word needs to allow users to embed arbitrary span-level material (basically anything you can put into a Word paragraph) in math zones and MathML is geared toward allowing only math in math zones. A subsidiary consideration is the desire to have an XML that corresponds closely to the internal format, aiding performance and offering readily achievable robustness. Since both MathML and OMML are XMLs, XSLTs can (and have) been created to convert one into the other. So it seems you can have your cake and eat it too. Thank you XML!

MathML allows some styling of the individual text fragments in the equations, but that's basically it.

WordprocessingML look-and-feel is preserved

To me it is really nice to work with markup for equations that is similar to the markup surrounding it. If I was to use MathML inline instead of OMML, the markup would be completely different than the markup around it. You can say that using MathML enables you to reuse any MathML-skills you might have in advance. Similarly you can say, that using OMML for equations enables you to reuse the skills you have from working with WordprocessingML. It's kind of a "give-and-take"-sitiation.

Revision-control (change-tracking) is possible

Having the overhead enables change-tracking on the same granular level as with your regular text. You can track changes in your equations on a character-by-character basis. In Word 2007 it looks like this when I make a modification to the equation (multiply the second fraction with "2" and remove the cosine-function from the first fraction).

 

 

The markup enabling this is here (for removing the cosine function, where "w:del" means "delete"):

<w:del w:id="0" w:author=" Jesper Lund Stocholm" w:date="2008-01-30T10:41:00Z">
  <m:r>
    <w:rPr>
      <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
    </w:rPr>
    <m:t>cos</m:t>
  </m:r>
</w:del>

This is not at all possible when using MathML out-of-the-box. You cannot merge the MathML with other markup like this, and if you use MathML as it is done in ODF (i.e. not "inline) it is simply impossible (at least as far as I can see). MathML in ODF is treated as an external object. which means that it is encapsulated in a OpenDocument Draw frame. The markup for one of the files I used in the other article is like this:

<text:p text:style-name="Standard">
 <draw:frame
   draw:style-name="fr1"
   draw:name="Objekt1"
   text:anchor-type="as-char"
   svg:width="2.418cm"
   svg:height="1.034cm"
   draw:z-index="0"
 >
  <draw:object
    xlink:href="./MathML"
    xlink:type="simple"
    xlink:show="embed"
    xlink:actuate="onLoad"
  />
  <draw:image
    xlink:href="./ObjectReplacements/MathML"
    xlink:type="simple"
    xlink:show="embed"
    xlink:actuate="onLoad"
  />
 </draw:frame>
</text:p>

If I wanted to change some text like "Display equation below"  to "Disrply equation below" (add an 'r' and delete an 'a') in ODT, it would look something like this:

<text:p>
  Dis<text:change-start text:change-id="ct102825880"/>
  r<text:change-end text:change-id="ct102825880"/>
  pl<text:change text:change-id="ct102844952"/>
  y equation below
</text:p>

So registration of the changes are - as with OOXML - merged into the text being modified. I think you could mark the whole equation as "modified" in ODF by putting an <text:change-start>-element around the complete <draw:object>-element, but I am not sure it would work. Also, OpenOffice.org doesn't seem to register changes to MathML-zones at all. Using OpenOffice.org it looks like this

 

(I changed the denominator of the first fraction to "54") 

 

I cannot say that there are (or are not) other areas where MathML just doesn't cut it - these were just a couple of those that I have experienced myself. I do believe, though, that the examples above warrant the simply question:

Why the hell did OASIS ODF TC decide to use MathML in the first place?

Interoperability

Interoperability is clearly what the young kids want these days - so let's see what we can do with mathematical content. MathML and OMML are clearly two different markup languages, but is it possible to convert between them? Fortunately it is. Microsoft Office 2007 allows c/p of MathML into OMML-equations and it can even export OMML to MathML. Luckily for us the logic around this is not embedded into some fancy place in Microsoft Office 2007 - it is done using simple XSLT-transformations. They have made the stylesheets OMML2MML.xsl and MML2OMML.xls and if you apply these to either your OMML or MathML, it is translated to the other. Just for the fun of it I tried to convert the OMML-version of the equation to MathML. All I did was to find the OMML2MML.XSL and insert a single line in the XML-file document.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet type="text/xsl" href="OMML2MML.XSL"?>
<w:document
  xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
  xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
  >
  <w:body>
    <w:p>
      <m:oMathPara>
        <m:oMath>
          <m:r>
            <w:rPr>
              <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
            </w:rPr>
            <m:t>cos</m:t>
          </m:r>
...

(and then I processed the file using my favorite XSLT-translator)

I'm sure - if you are a "technical" person - you have found yourself using/writing some code and just before you press "Compile" or "Run" you think: "This is sooo not gonna work". This was one of those situations for me - but you know what, it actually worked in the first try. The MathML generated is this

<?xml version="1.0" encoding="utf-8"?>
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">
  <mml:mi mathvariant="italic">cos</mml:mi>
  <mml:mfrac>
    <mml:mrow>
      <mml:mi>π</mml:mi>
    </mml:mrow>
    <mml:mrow>
      <mml:mn>4</mml:mn>
    </mml:mrow>
  </mml:mfrac>
  <mml:mo>=</mml:mo>
  <mml:mfrac>
    <mml:mrow>
      <mml:mroot>
        <mml:mrow>
          <mml:mn>2</mml:mn>
        </mml:mrow>
        <mml:mrow />
      </mml:mroot>
    </mml:mrow>
    <mml:mrow>
      <mml:mn>2</mml:mn>
    </mml:mrow>
  </mml:mfrac>
</mml:math>

... and it validates as well (using Amaya and changing the XML-file from a UTF-16 file to UTF-8)

Ét voilá

Now, wouldn't it be cool if the MathML generated from the OMML could be used in a ODT-document? You know what ... it can! I took the MathML above and inserted it into one of the documents I made for the ODF/MathML-article and inserted it into the MathML-zone of the ODF-package. The file is available here: minimal-mathml-omml-inject.odt (1,31 kb).

The result of opening the file using OpenOffice.org:

In the words of Murray Sargent, I guess you can have you cake and eat it too after all.

Smile

Update:

When writing my post about where to get help for ODF-development I suddenly remembered that I missed a part of this article: "The quirks". Because - naturally there are quirks with using OMML with Microsoft Office 2007 ... just as there were with MathML and OpenOffice.org.

Now, if you take another look at the OMML/XML-fragment I created, there were to parts I really couldn't figure out a way to remove:

<m:oMathPara>
  <m:oMath>
    <m:r>
      <w:rPr>
        <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
      </w:rPr>
      <m:t>cos</m:t>
    </m:r>
    <m:f>
      <m:num>
        <m:r>
          <w:rPr>
            <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
          </w:rPr>
          <m:t>π</m:t>
        </m:r>
      </m:num>

Now, the <w:rPr>-elements should have absolutely nothing to do with the content of <w:t>-element - or more correctly, the visibility of the text in the <w:t>-element should not depend of existance of an <w:rPr>-element. But if the two <w:rPr>-sections are omitted, the "cos"-text as well as the π-sign are not displayed. I really have no idea of why this is to so if you do, please let me know. Maybe one of the Microsoft Office 2007-Math guys could step in here?