Do your math - ODF and MathML

by jlundstocholm 25. January 2008 19:56

When I studied at DTU (Technical University of Denmark) I basically lived in the Department of Mathematics. I did my bachelor project there and I did my thesis there. I think it would be fair to say that math is really in my blood (or was).

Of course - in those days we wrote our equations in LaTeX (not the suit) and I remember how we laughed diabolically at our co-students that did their papers in e.g. Microsoft Word and had to use the really, really annoying "Equation Editor" (shudder). I remember how we also laughed at the students that did pictures and graphs in e.g. Adobe PhotoShop or Visio (before it was aquired by Microsoft, afaik), coz everybody knew that it had to be done using xFig ... the program with the worst possible UI ever ... at least in those days.

For the purpose of these articles (an article about Microsoft Office 2007 and OMML will follow shortly) I dug into my thesis and looked at how math was displayed using LaTeX. I created a "reference equation" to use when trying to display some math in either ODF or OOXML. The test equation I made was this:

\begin{equation}
    \cos\Big(\fraq{\pi}{4}\Big) = \Big(\fraq{\sqrt{2}}{2}\Big)
\end{equation}

For those of you not speaking LaTeX fluently - you should consult the "Not so short introduction to LaTeX" chapter 3 - or simply behold the equation below:

  

In ODF mathematical notations are done using MathML (section 12.5) - a W3C-standard for displaying mathematical content. The mathematical content is embedded in the ODF-package as an object and as far as I can see, it is not possible to use MathML inline in the content of the paragraphs of the document itself. I have earlier talked about ODF being vague and this is imo one of the places where some clarity could help.

But - learning MathML is like learning a new language ... it doesn't really make sense in the beginning. So I started to poke around a bit on the W3C-website in search of some tools or tutorials that would help me figure ot what MathML is all about. I eventually found a W3C tool called Amaya. It's a MathML/SVG-tool developed by W3C and I used this tool to create the MathML for the base equation above. In Amaya it looks like this:

 

 

The interesting part, of course, it the MathML created by Amaya. The MathML (slightly modified, but validated) is listed below

<?xml version="1.0" encoding="utf-8" ?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <mtext>cos</mtext>
    <mo>(</mo>
    <mfrac>
      <mi>&pi;</mi>
      <mn>4</mn>
    </mfrac>
    <mo>)</mo>
    <mi>=</mi>
    <mo>(</mo>
    <mfrac>
      <msqrt>
        <mn>2</mn>
      </msqrt>
      <mn>2</mn>
    </mfrac>
    <mo>)</mo>
  </mrow>
</math>

If you look at the XML, it is pretty easy to identify the different parts of the equation.

So - in theory I should be able to put this into an ODF-document and it would be displayed when opening the document using OpenOffice.org - the reference implementation of ODF. 

Let's see

Smile

Step 1

Create an ODF-document using OpenOffice.org with an mathematical formula embedded.

Now, this was the easy part. I cannot figure out how to insert a regular "Pi"-sign in the formula, but the formula looks just fine. The file is available here: math.odt (9,72 kb). It looks like this:

 


 

Step 2

Clean the file for all the disturbing crap that the application puts in per default

This was a bit more tricky, since somehow it seems that the mathical formula can only be contained in a file called "content.xml" - otherwise OpenOffice.org simply shuts down. Also, I have removed alle meta-data, styling, extra namespace-declarations, embedded thumbnails and graphical representation of the formula. The cut-down ODT-file is available here: math-minimal.odt (1,43 kb). The visual representation is completely like the original file. 

Step 3

Inspect the MathML in the application created MathML-file

The MathML created by OpenOffice.org looks like this: 

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd">
<math:math xmlns:math="http://www.w3.org/1998/Math/MathML">
  <math:sema
ntics>
    <math:mrow>
      <math:mi>cos</math:mi>
      <math:mrow>
        <math:mfenced math:open="" math:close="">
          <math:mfrac>
            <math:mi math:fontstyle="italic">pi</math:mi>
            <math:mn>4</math:mn>
          </math:mfrac>
        </math:mfenced>
        <math:mo math:stretchy="false">=</math:mo>
        <math:mfenced math:open="" math:close="">
          <math:mfrac>
            <math:msqrt>
              <math:mn>2</math:mn>
            </math:msqrt>
            <math:mn>2</math:mn>
          </math:mfrac>
        </math:mfenced>
      </math:mrow>
    </math:mrow>
    <math:annotation math:encoding="StarMath 5.0">cos left ( pi over 4 right )  = left (sqrt{2}  over 2  right )</math:annotation>
  </math:sema
ntics>
</math:math>

There are a couple of things to note about this. Firstly, I don't understand the namespace declaration as

"<!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd">

The doctype should not matter at all - and why they chose to use a "DTD Modified W3C MathML 1.01" is beyond me. I'm not saying it's an error - I just don't get it. Enlighten me, pleze.  Secondly the MathML created looks different from the MathML created my Amaya. However - just as the same paragraph can be presented in all sorts of way using HTML and the same equation can be presented in different ways (e.g. sin2(x) + cos2(x) = 1 is basically the same as a2 + b2 = c2), the same equation can be created in an endless myriad of ways using MathML. Thirdly there are two distinct ways where the OOo MathML is different from the MathML of Amaya. Notice how it uses the <mfenced>-element to make a parenthesis instead of <mo>)</mo>. There is really no difference - however I tend to think that using the <mfenced>-element is slightly more sophisticated than the <mo>-element, but it's just a personal belief. Also, look at the usage of the <semantics> and <annotation>-elements. This is actually really cool. The <semantics>-elements are used to provide "meaning" to the MathML-markup and the content in the <annotation>-elements directly maps the MathML markup to the corresponding expression tree. Also, OpenOffice.org allows you to type in the annotation directly, thereby enabling some of the ease of writing LaTeX directly by hand.

Step 4

Validate the MathML-file using W3C-validator or Amaya 

The picture below shows the content.xml loaded and displayed in Amaya. The green dot in the bottom right corner indicates that the MathML is valid. I have also made a test with embedding the MathML in a HTML-document and validated it against the W3C-validator and the result is the same.

 

 

Super!

Step 5

Insert the MathML created by Amaya into the ODT-file and open the file using OpenOffice.org

Now, I have previously created the formula using Amaya and I just have to inject it into the ODT-file. I did and the file is available here: mathml-minimal-error.odt (1,23 kb). The result is, however, not as I expected

 


 

Ok - but as you might have noticed, all elements in the OOo MathML-file were namespace-prefixed, so maybe this will do the trick. I tried this as well but with the same result. File is available here: mathml-minimal-nsprefix-error.odt (1,24 kb).

Final step

Figure out what the hell is wrong 

I finally figured out what is wrong with the way OpenOffice.org handles MathML-content. It turns out that if I took the Amaya MathML (without ns-prefix) and inserted the MathML into the original content.xml-file but preserved the DOCTYPE-declaration, it works almost as expected. File is available here: mathml-minimal-inject-succes.odt (1,30 kb).



Well, some error are introduced. The Π-character is not displayed and the equation is displayed in bold. Also the equal-sign has disappeared as well.

Just for the fun of it I took the MathML-file generated by OpenOffice.org and removed the <semantics>-element as well as the <annotation>-element. File is available her: mathml-minimal-inject-no-semantics.odt (1,35 kb). The result when opening it in OpenOffice.org is .. well ... sad:



I have absolutely no idea of why it displays it like this. Removing the <semantics>-element and <annotation>-element should have no effect on the visual representation of the equation.

Conclusion?

Well, I don't really know what to conclude. Most of the things I have shown above are imo due to errors in the implementation of OpenOffice.org where MathML is clearly not implemented correctly sufficiently. It seems that there are some unwritten rules to how MathML is supposed to be used when working with it in OpenOffice.org, but they seem rather unclear and weird to me.

But how OpenOffice.org behaves is really not important to me - some implementations of ODF are better than others, and maybe other implementations do a better job at displaying MathML. The point should be how the specification says it should be used. Luckily the ODF-spec only talks about how MathML is used in a single place - section 12.5 Mathematical Content. It says that "Mathematical content is represented by MathML 2.0 (see [MathML])". The RelaxNG-snippet provided also tells us that you can put everything into a "math area", <math:math>:

<?xml version="1.0" encoding="UTF-8" ?>
<define name="math-math">
    <element name="math:math">
        <ref name="mathMarkup" />
    </element>
</define>
<!-- To avoid inclusion of the complete MathML schema, anything -->
<!-- is allowed within a math:math top-level element -->
<define name="mathMarkup">
    <zeroOrMore>
        <choice>
            <attribute>
                <anyName />
            </attribute>
            <text />
            <element>
                <anyName />
                <ref name="mathMarkup" />
            </element>
        </choice>
    </zeroOrMore>
</define>

So basically, all bets are off. I can only begin to wonder how other implementations of ODF use MathML.

And a small appetizer:

As soon as I get the time for it, I'll write an article as this one with Office 2007 and OMML. I will investigate how to markup mathematical content using OMML and I will also try to use the XSL-files provided by Microsoft in Office 2007 to create XSLT-translations of my base equation from OMML to MathML and vice versa.

... stay tuned ... 

Smile

Tags: ,

Generel

Comments

1/26/2008 7:52:27 AM #

Anon

Thanks for writing this article.  I was aware that the ODF spec is not clear
about the use of MathML, but was not aware of the details you described.

Anon |

1/26/2008 10:57:07 AM #

pingback

Pingback from blogs.msdn.com

Brian Jones: Open XML Formats : Links for 1-25-08

blogs.msdn.com |

1/26/2008 11:20:44 AM #

pingback

Pingback from geeks.ms

Links for 1-25-08 - Noticias externas

geeks.ms |

1/26/2008 1:30:04 PM #

pingback

Pingback from msdnrss.thecoderblogs.com

MSDN Blog Postings  » Links for 1-25-08

msdnrss.thecoderblogs.com |

1/28/2008 4:28:26 AM #

jlundstocholm

Anon,

Thanks for your reply - but please note that the vagueness you refer to is not in ODF but in the implementation of ODF in OpenOffice.org . I cannot find any other, better way to use an existing standard than how it is done with ODF and MathML. It simply says: "Use MathML for mathematical content". Imo, the cut doesn't get any cleaner than this.

However - what my article shows is that, as the saying goes "There is no such thing as a free lunch", interoperability doesn't come cheap, and you cannot simply have "interoperability per reference". Simply referencing other existing standards doesn't buy you interoperability. It might provide a way to reuse existing code or even reuse existing competencies, but interoperability is not an added bonus - not by a long shot.

... and you can quote me on this one Wink

jlundstocholm |

1/28/2008 9:44:35 PM #

Peder

<i>I cannot figure out how to insert a regular "Pi"-sign in the formula</i>

Use %pi for a pi sign.

Peder Sweden |

1/28/2008 9:45:40 PM #

Peder

[i]<i>I cannot figure out how to insert a regular "Pi"-sign in the formula</i>[i]

...and I cannot read... sigh.

Peder Sweden |

1/28/2008 9:47:01 PM #

Peder

<i>I cannot figure out how to insert a regular "Pi"-sign in the formula</i>

Or type. Sorry.

Peder Sweden |

1/28/2008 10:19:06 PM #

jlundstocholm

Peder,

Yes - and as I say in the article, I have used this when creating the MathML using Amaya. This way a Pi-sign is inserted into the MathML. The problem with this is that OpenOffice.org doesn't display this MathML-entity [&pi;] at all - it is simply gone.

And when I said I couldn't figure out to insert a Pi-sign (maybe it wasn't clear enough), I was referring to inserting it using the GUI of OpenOffice.org. Even when I insert the Pi-sign directly into the annotation-window as

cos left ( &pi; over 4 right )  = left (sqrt{2}  over 2  right )

... it still screws up the formula.

jlundstocholm Denmark |

1/29/2008 6:23:46 PM #

Peder

The OOo way of creating that formula (through the GUI) is:
cos left ( %pi over 4 right ) = left (sqrt{2} over 2 right )

Apparently OOo decided to change &pi; to %pi, along with all other
math symbols like %omega and %theta.

Peder Sweden |

1/29/2008 6:41:49 PM #

jlundstocholm

Peder,

Thanks for claryfying this. It displays the Pi-sign just fine now. Do you know if it is possible to add a Pi-sign using the GUI of OpenOffice.org and not through the annotation-bar? I have looked after an "Insert symbol" or something similar, but I cannot find it anyware.

Smile

jlundstocholm Denmark |

1/29/2008 11:33:09 PM #

Peder

I actually had typed a long "No you can't" answer when checked out StarOffice7 and found this:

When you're in formula mode, at least in OxygenOffice 2.3.1 (Win), you have a Sigma icon in the top menu (beneath the "Tools" menu).
They seem to have gone out of their way to obscure it.

In StarOffice7 it was a bit more visible, on the side bar.

There's an old tutorial over at documentation.openoffice.org/.../MathObjects.pdf btw.

Peder Sweden |

1/29/2008 11:55:58 PM #

jlundstocholm

Peder,

Tack så mycket,

I just checked in OOo and it works the same were in this program (the Sigma-sign).

Smile

jlundstocholm Denmark |

2/11/2008 11:27:58 PM #

pingback

Pingback from fenilsen.wordpress.com

ODF - et Ã¥pent alternativ? « Fredrik E. Nilsens Blogg

fenilsen.wordpress.com |

3/26/2008 9:57:40 AM #

pingback

Pingback from blogs.msdn.com

Brian Jones: Open XML Formats : Can I mention that it’s also in ODF?

blogs.msdn.com |

Comments are closed