a 'mooh' point

clearly an IBM drone

Object-embedding in OOXML with Microsoft Office 2007

(updated 2008-04-14, added links to external resources) 

Now that the ISO-vote and approval of OOXML is done with, it is time to continue the coverage of implementing OOXML as well as ODF – this time about OOXML, Microsoft Office 2007 and embedded objects.

As I have previously said, there are always quirks when it comes to implementations of any standard in large applications. I have covered a few of these already regarding mathematical content [0], [1] and it is no different with regards to object embedding. I should say that a source of inspiration to this article was Stepháne Rodrigues’ article about binary Parts of an OOXML-file (OPC-package).

Now, embedding objects in an OOXML-file is pretty straight-forward: Simply add the object somewhere in the package and make a reference to the location and specify what kind of file you are embedding. This is very similar to how it is done in ODF.

(note: the specific schema-fragments defining how to do this were dealt with and changed at the BRM, so I will not include these until the final version of IS 29500 is released. I will update this article according to the revised spec).

As I have noted earlier, interoperability happens at application-level, so it is worth pondering a bit on how the specification is implemented in the major implementations of it. So let’s see how Microsoft Office acts when embedding objects.

What I did was this: 

I used Microsoft Office 2007, created a text-document and I embedded an object in it – in this case an OpenOffice.org Calc Spreadsheet. The spreadsheet is also inspired by one of Stepháne Rodrigues’ articles, the infamous “OOXML is defective by design”.

 

The object is inserted and displayed in the document. When activating the object, I can edit it as if it was in OOo Calc itself. Actually it is OOo Calc itself. It is invoked using OLE and as a side-note it shows a cool thing about OLE – or similar other object linking techniques. Microsoft Office 2007 does not know anything about OpenOffice.org, yet it is still able to invoke the application and edit the embedded object.

 

Ok – now let’s look at the OOXML-file created. In the file document.xml the following fragment is located:


The <v:shape>-element is part of the nasty VML-dependency that luckily was dealt with at the BRM. This will be replaced by DrawingML in the final IS 29500. The <o:OLEObject>-element specifies the type of the embedded object (“opendocument.CalcDocument.1”) and the location of it (“rId5”). There is really nothing platform dependent here in the OOXML-markup.What is more interesting, though, is looking at the Calc-object after it is embedded. By navigating through the relationship-model of the OPC-package, the embedded object is located.

 

One might think that this file was simply the Calc-file renamed, but sadly this is not so. This file is actually the Calc-file wrapped in an OLE2 Compound file (“CF”). The CF-file is basically a stream wrapper which allows a number of streams to be persisted in a file as well as information about these streams. Using one of the many CF-viewers you can get the data of the wrapped file itself as well as the persisted information of it, here “com.sun.star.comp.Calc.SpreadsheetDocument _   Embedded Object _   opendocument.CalcDocument.1”.

 

 

Technically this is really not a big deal – there are well-known ways to manipulate these files on all platforms and most programming languages and extracting the required data should really be a no-brainer. OpenOffice.org is licensed under LGPL, so you can use the source-code from this to figure out how to do it on the platforms supported by OpenOffice.org. It is also pretty evident why Microsoft Office 2007 works this way. Microsoft Office 2007 is the latest incarnation of the Microsoft Office Suite – a suite that has depended on this file format since at least 1999 … and of course on OLE itself as well. So if you want to implement a document consumer, this is simply something to be aware of when consuming OOXML-files.

From the perspective of a developer, however, this is really annoying. I would definitely opt for Microsoft Office 2007 embedding the objects simply as the objects they are – and not wrapping them in a CF-wrapper. This is how it is done in OpenOffice.org. Granted, this suite does other weir(d) things like renaming the files and not being entirely clear how to embed all object types, but the objects are embedded as they are (unless they are OpenDocument objects). This is a benefit to me as a developer when examining OOXML-files, because I can simply extract the object in question from the document package and verify the file.

So this might be the first new post-vote change-modification to IS 29500:

 

When embedding objects an application shall not modify or wrap the embedded object in any way before embedding it in the package. When a document consumer encounters an embedded object, this shall not be converted to another object type without knowledge-based confirmation by the user.

 

This (or similar woring in standard-lingo) would prevent Microsoft Office in wrapping objects on CF-wrappers, but it would also prevent applications like OpenOffice.org on SUSE to convert embedded Excel-objects to Calc-spreadsheets. FYI, this kills interop too.

A final request: Microsoft, please, as you must already be implementing the changes from the BRM for Office 2007, would you be so kind to make this change to the application as well? It should really be a no-brainer, and if there should be any requirements in your code for the CF-files, feel free to load the objects, wrap them in an in-memory CF-file and take it from there.

Smile

What's up with OLE?

A few weeks back I made an article about how Microsoft Office 2007 dealt with password-protection of an OPC-package, since this feature is not a part of the OOXML-specification. The answer I found was that Microsoft Office 2007 persists the password-protected file as a OLE2 Compound File ... more commonly known as a "OLE-file". I also concluded that using OLE2 Compound Files is not a problem - and certainy not an issue regarding OOXML.

Now - the whole topic around OLE has been at the front row of the worldwide debates regarding OOXML. My personal opinion is that the people jumping up and down screaming about problems with OLE ... really haven't understood what OLE is.

So let me start by making a small recap' of what it is really all about.

... there is OLE and then there is OLE 

First of all:

there is "OLE" and then there is ... "OLE"

... or put in another way:

there is the "OLE-technology" and then there is the "OLE-file"

or in a third, more correct, way:

there is the "OLE application technology"  and then there are "Compound Files".

The foremore mentioned is the technology that - on the Windows platform - enables a program to use the UI of another program ... without launching the entire application itself. I mostly use this when editing MS Visio-documents in Word but other usages of this is using an Excel spreadsheet in an MS Word application. The OLE-technology itself is a tool on the Windows-platform that all applications can - and do - use to enable "utilizing other applications in their own applications". It is here important to understand, that there is (today) nothing really revolutionary about OLE. Another similar technology on the Windows-platform is DDE and on the Linux-platform it could be KParts and Bonobo. These technologies simply enable one program to communicate with another (simply put).

But what about these OLE-files?

Well, Compound Files are actually not dependant of OLE-technology. Or put in another way: you don't need OLE-technology to read and use the contents of a Compound File. Compound Files are just files. A Compound File is a collection of persisted streams - actually much like a ZIP-archive. Most commonly it is used because it brings the ability to "utilize a file system within a file". Of course you will need to know how to use the contents of the file, be it created by OpenOffice, Corel Draw, Adobe Acrobat or any application that might store its files using Compound Files. But this is seperate from being able to read and write to the contents of a Compound File.

Ok - I will not bother you any more with this. You should check out the original article about OLE and also look into the specification of the binary formats for Microsoft Office95 - Office2007, avilable from Microsoft. It is actually quite interesting. Just remember that OLE-technology and Compound Files are not the same thing.

And now for something completely different (kindof)

In the lab-tests I have been part of for the Danish Government (National IT and Telecom Agency) we have tested OLE-interoperability. It is important since it is quite normal to embed e.g. a spreadsheet file in a Text-processing file. So it is important that the contents of the file is actually usable when receiving it and opening using another application or on another platform.In this setup we only tested Compound File interop and not interop between OOXML and ODF.

What we did was this:

We created a ODF-file using OpenOffice where we embedded a Excel-spreadsheet (binary .DOC-file) (on the Windows-platform)

We sent this file to a number of different platforms and applications

  • Windows XP using OpenOffice.org 2.3 DA
  • Windows XP using OpenOffice Novell Edition
  • Linux using OpenOffice Novell Edition
  • Linux (SLED) using IBM Lotus Notes 8

We tried to open the file and documented what happened.

#
Setup  What happened? 
1
Windows XP using OpenOffice.org 2.3 DA OpenOffice.org opened the document and correctly displayed the contents of the spreadsheet. It was possible to edit the spreadsheet and save it back into the ODF-container
2
Windows XP using OpenOffice Novell Edition OpenOffice Novell Edition opened the document and correctly displayed the contents of the spreadsheet. It was possible to activate the spreadsheet but only in "read-only"-mode
3 Linux using OpenOffice Novell Edition OpenOffice Novell Edition opened the document and correctly displayed the contents of the spreadsheet. It was possible to activate the spreadsheet but only in "read-only"-mode
4
Linux (SLED) using IBM Lotus Notes 8 Lotus Notes 8 opened the document and correctly displayed the contents of the spreadsheet. When activating the spreadsheet the user was prompted to convert the spreadsheet. When accepting this it became editable and when saving it back into the ODF-container, the spreadsheet was persisted as an Open Document Spreadsheet.


So what we saw was basically 3 different approaches to handling the embedded object. In general the Excel-object (Compound file) itself was not a problem - regardless of application and platform. All combinations had no problems with opening the file and displaying the contents - even on platforms without OLE-technology present. The difference was in the applications and their handling of the object. OpenOffice.org presented the approach that most people would expect: it allowed editing the embedded object and saving it back into the container. OpenOffice Novell Edition allowed activating the embedded object but not saving it back into the container and Lotus 8 took the approach of converting the Excel-object to an Open Document Spreadsheet.

A conclusion?

Well, we took great care not to conclude much - that was not for us to do, we merely provided the technical background for post-lab conclusions. However - the pattern emerging from the description above was similar to a pattern we saw a lot. The problems were not in incompatibility between the formats but instead in how the applications and converters dealt with the formats. We also saw no indications that any of the formats were tied to a specific platform. There were no problems with roundtripping - or to put more clearly: the problem we saw when round-tripping documents were not caused by incompatibilities between the platforms (e.g. Linux and Windows) but between different behaviour in the applications implemented on either platform.

So is this good or bad news? Well, as always, truth lies in the eyes of the beholder ... but I think it is good news.