It seems about time somebody wrote a bit about how Microsoft has chosen to implement ISO/IEC 29500:2008, aka OOXML. As you might know, Microsoft claims that Microsoft Office 2010 will implement “29500” in Transitional (T) sense. As far as my tests have shown (and they are in no way a complete and thorough application test) there no signs that this is not in fact true. So by launch day of Microsoft Office 2010 we (as in “the world”) will have at least one big implementation of 29500.

However – the devil, as always, lies in the details. So the question is not *if* they have implemented 29500 – the question is *how*.

This will be the first post in a series looking at the details (from a format perspective) of how Microsoft has chosen to implement 29500.

[Note: Documents markup for documents conforming to Transitional conformance clause (T) and for those that conform to Strict conformance clause (S) are virtually identical – for simple documents.]

I will maintain a list of items for Microsoft to consider as I go along. It is available at this permanent location.

I will start by looking at Excel. Some of the most controversial parts of 29500 were focused on how Excel handles e.g. dates, VML, document protection etc., so this seems like a reasonable place to start. Also, the markup of SpreadsheetML is much easier to read than the markup of WordpressingML or PresentationML (methinks), so it should be the right place to kick this off.

Stay tuned for the first article looking at how the Excel team of Microsoft Office 2010 has dealt with ISO/IEC 29500.

Tags : office2010 ooxml

I know it has been a couple of weeks, but I just wanted to share current development with you.

On September 7th (in Danish), the Danish mirror committee to ISO/IEC JTC1 SC34 met at Danish Standards in Charlottenlund. On the agenda was, amongst other things, processing of documents under ballot. The relevant documents to WG4 was these

As appointed expert from Danish Standards in WG4, I have been working hard with the other experts in WG4 on these papers and I have for each meeting in Denmark provided oversights to the mirror committee on the current work. The members of the Danish committee have access to the same set of papers that I have, so we have primarily been discussing the more controversial ones - like usage of ISO-8601 dates in transitional files, reintroducing ST_OnOff in transitional schemas and changing the namespace name for strict files. A couple of times Danish committee members have requested information on more "trivial stuff", and we have then discussed this.

At the meeting of September 7th, I gave a quick sporadic overview of the more tough parts of COR1 and AMD1 and no comments were presented. We talked a bit about general principles of the work in WG4, but that was basically that.

After this, Denmark (Danish Standards) approved the document sets for COR1 and AMD1.

Obviously I think this is great news and the chairman of the Danish committee expressed his appreciation of the work put into creating these files.

Get the information straight from the horse's mouth from DCA website. If you are not speaking Danish, Google will do a rough translation for you.

I'll update this article shortly ...

... oh ... and I almost forgot ... they suggested using OOXML as well.

Norway has mandated use of PDF and/or ODF as document exchange formats. The baseline reference list of approved standards and formats has been released in a "version 2.0"-edition where, amongst other things, ODF has been approved in edition 1.1. An abstract of the text is

3.2.2 Dokumentstandarder for utveksling ved e-postvedlegg

Ved utveksling av dokumenter som vedlegg i e-post fra offentlig sektor til omverdenen (innbyggere og næringsliv), skal følgende standarder benyttes: PDF 1.4 – 1.6, PDF 1.7 (ISO 32000-1) eller PDF/A (ISO 19005-1) er obligatorisk format ved utveksling av dokumenter beregnet for lesing. ODF 1.1 (Oasis Standard 1. februar 2007) er obligatorisk og skal benyttes ved utveksling av dokumenter beregnet for redigering hos mottaker etter avsending fra offentlig myndighet. På grunn av begrenset utbredelse anbefales det midlertidig å legge ved ett eller flere tilleggsformater for å sikre allmenn tilgjengelighet. I slike tilfeller skal det tydelig informeres i e-posten om at vedleggene består av samme dokument gjort tilgjengelig i flere format.

Det er viktig å være oppmerksom på at publisering av dokumenter på nyere versjoner av PDF, kan føre til at en leser med støtte for en eldre versjon ikke kan lese hele dokumentet.

Ved mottak av ferdigstilte dokumenter i e-post fra innbyggere/ næringsliv, bør offentlig sektor som et minimum kunne håndtere følgende standarder: PDF, alle versjoner PNG (Portable Network Graphics, ISO/ IEC 15948:2003) JPEG (Joint Photographic Experts Group, ISO/IEC 10918-1) ODF, alle versjoner

For både ferdigstilte dokumenter og dokumenter for videre bearbeiding bør offentlig sektor også kunne motta alle andre formater med stor utbredelse innenfor anvendelsesområdet, som ikke gir den offentlige myndighet en urimelig stor konverteringsbyrde. Hvilke formater som konkret kan forventes vil være forskjellig innenfor sektorer og vil endre seg over tid.

Dokumentformatet OOXML ble publisert av ISO 18. november 2008. Den er besluttet fortsatt å være under observasjon.

I think this is great news for ODF that governments around the world are upgrading their procurement requirements to take advantage of the latest edition of ODF.

My translation of (parts of) the above is:

When exchanging documents as attachments in email from the public sector to users (citizens and corporations) the following standards must be used: PDF 1.4-1.6 (ISO-32000-1) or PFD/A (ISO 19005-1) are mandatory and must be used when exchanging documents designed for reading (only, ed). ODF 1.1 (OASIS Standard 1. February 2007) is mandatory and must be used when exchanging documents for editing purposes. Due to the limited market penetration (of ODF, ed), it is however recommended (temporarily) to attach additional document formats when sending data from a public institution.

(...)

OOXML was made public by ISO on November 18th 2008. It still under observation.

Traditionally, for every meeting we have in WG4, some conspiracy-theory is born on how much money the delegates received from Microsoft, how many sports-cars we each got from Microsoft or how we each had a Microsoft employee sitting on our laps dictating what we should say.

So, I thought I'd beat the usual nut jobs to it and present the attendance list myself. The minutes from the meeting will be available soon, but who wants to wait for exiting news like this?

The attendance list was this:

Name |
Affiliation |
Employer/sponsor |

Pia Lange |
Host | Dansk Standard |

Makato Murata |
WG4 Convener |
International University of Japan |

Sam Oh |
SC34 Chair |
Sungkyunkwan University |

Keld Simonsen |
NO HoD | DKUUG? |

Dave Welsh |
US HoD |
Microsoft |

Mario Wendt |
DE HoD |
Microsoft |

Klaus-Peter Eckert |
DE | Fraunhofer Fokus |

Jesper Lund Stocholm |
DK HoD |
CIBER |

Rex Jaeschke | ECMA HoD, project editor | Consultant |

Doug Mahugh |
ECMA |
Microsoft |

Shawn Villaron |
ECMA |
Microsoft |

Kimmo Bergious |
FI HoD |
Microsoft |

Alex Brown |
GB HoD |
Griffin Brown Digital Publishing Ltd. |

Gareth Horton |
GB |
Datawatch |

Jaeho Lee |
KR HoD |
University of Seoul |

Jung-Jin Yang |
KR |
The Catholic Univeristy of Korea |

So out of a total of ~~14~~16 people attending ... ~~5~~6 people were in some way affiliated with Microsoft and/or ECMA. What the hell, throw the Microsoft shill Alex Brown into the pot as well - that'll make it a total of ~~6~~7 people.

I don't know what to say ... I'm shocked.

Update: I have been notified that I missed two persons on the list, Dave Welch and Keld Simonsen. List and numbers have been updated accordingly.

At this very moment we are discussing re-introducing the values on/off to the simple type ST_OnOff in the **transitional part** of OOXML.

Background:Some countries (including Denmark and UK) argued during the DIS29500-process that the enumeration values "on" and "off" of the simple type ST_OnOff were inappropriate since they expanded the W3C Schema data type xsd:boolean. So at the BRM, these values were removed from the simply type ST_OnOff.

Now, that's all fine and dandy - only problem was that it made (according to a Microsoft estimate) 90% of all existing documents (and existing applications) non-conformant. Alex Brown demonstrated this in his article "OOXML and Office 2007 Conformanc: a smoke test". Further, it went directly aganst the scope of IS29500:2008 which was to "represent faithfully the existing corpus of word-processing documents, spreadsheets and presentations that have been produced by Microsoft Office pplications (from Microsoft Office 97 to Microsoft Office 2008, inclusive)".

So we have been disussing this quite a bit - because by re-introducing the values on/off would effectively be reversing a BRM decision ... in other words ... politically, it is a bit of a hot potatoe.

You might argue that this is a prime example of how Microsoft controls SC34/WG4 and how we simply align everything to what Microsoft or Microsoft Office does - but unless you consistantly opt for the sensational news, that position doesn't make very much sense.This has really nothing to do with aligning IS29500 with Microsoft Office; it has to do with aligning IS29500 with its scope.

Now, do note that we in WG4 cannot make decisions to alterating IS29500 - this is the prerogitive of the national bodies in SC34 or JTC1, so all we are doing is suggesting to the NBs that we think it is a good idea to reintroduce the two values.

So ... everyone on the "who's who" list of OOXML maintenance is in Copenhagen eagerly working our way through a zillion defect reports and proposals for IS29500. The pace varies from hour to hour, but it is almost all of it quite interesting (cough!).

We have quite a busy schedule in front of us for these three days in Copenhagen. The # of DRs have climbed above 250. As you can see on the statistics page of WG4, we have successfully closed about 138 of them (through-out the last few weeks) and we are working our way through the rest.

The topics for this week evolve around mondane tasks as sorting out editorial defect, discussions about technical comments and figuring out what to put in a AMD-bucket and which ones to put in the COR-bucket. It's all about the glamour and fancy life style here.

We are certainly living in interesting times ... and I am sure we'll get a lot done in these three days.

PS: Ooh ... and we are gonna burn a witch on Tuesday evening.

Way back when I was a math-major at university, we were taught about "operations on sets". A *set* could simply be "the natural numbers", which could be defined as all positive integers including the number 0. An *operation* on this set could be addition of numbers, multiplication of numbers and so forth. An operation can have a lot of characteristics, e.g "commutative", "associative" or "transitive". An "associative" *operator* means that you can *group the **operands* any way you want* *and a "commutative" operator means that you can change the order of the *operands*. Confused? Well, it's not that complex when you think of it. The mathematical operator "addition" is an "associative" operator (or "relation") since (1+2) + 3 = 6 and 1 + (2+3) = 6. The operator "divide" is __not__ associative since (1/2) / 3 = 1/6 whereas 1 / (2/3) = 3/2. Addition is also a commutative property since you can change the order of the numbers being added together. This is evident since 1+2+3 = 6 and 3+2+1 = 6. Similarly "subtraction" is not a commutative operator since 1-2-3 = -4 whereas 3-2-1 = 0.

The transitive characteristic is a bit different than this and the "everyday equivilant" would be when we *infer* something. So think of transitivity is a mathematical formulation of what we do when we *infer*.

The relation "is greater than" is a transitive characteristic - as well as "is equal to". Basically, a relation (is greater than) being transitive means, that if A > B and B > C then A > C.

The latter popped into my mind the other day when I was pondering over interoperability between implementations of document formats.

Ever since Rob's ingenious article "Update on ~~OpenOffice.org Calc~~ ODF interoperability", I haven't been able to get it out of my head.

1 / 2 / 3

This article will have to topics - one about extending OOXML using the built-in extension mechanisms and one about extending OOXML itself.

As I have written about earlier OOXML has a (fun) part containing mechanisms for extending OOXML with vendor/domain-specific extensions. That part is "Part 3 - Markup Compatibility and Extensibility". The part describes different techniques when extending OOXML - most interesting is propably the sections about "Markup Compatibility Attributes and Elements" describing ways to extend OOXML while enabling compatibility to e.g. earlier/current version of the specification.

So if you were a vendor wanting to add something to the spec - but couldn't wait for the slow ISO pace or simply needed the competitive edge of not revealing anything about future software releases to your competitors ... what could you do?

The first thing you should do is to decide if you want your new stuff to eventually make it into the spec. If you don't want that - you're done already.

Assuming you want it into the spec, here are a couple of hints to how you might approach it:

- Document your extensions thoroughly
- Present these extensions to SC34/WG4 with justification to how and why you want it into the spec
- Work with us to polish the nitty-gritty details that you overlooked
- Make sure there are no legal nor technical barriers to implementing these new features for your competitors
- Wait for the stuff to eventually be included in IS29500

So the real target of this is - if you haven't already guessed it - Microsoft. So to be even more specific, here's a little list of things to do for Microsoft - in case they want to extend IS29500:

You will propably have some additions to IS29500 in your implementation of Office 14. Assuming that you will at some point like these to be added to IS29500, this is what you should do:

- Document your extensions thoroughly. Remember, the quality of the documentation will be under the same scrutiny as the text of DIS29500 so please do it right the first time.
- Add the documentation of your extensions to your "Implementer's notes" on the DII-website.
- Present these extensions to SC34/WG4 with justification to how and why you want it into the spec.
- Work with us to polish the nitty-gritty details that you overlooked.
- Include the extensions and the documentation for it in your OSP.
- Wait for the stuff to eventually be included in IS29500.

Remember, the minute the first public beta of Office 14 hits the web, the documentation of the extensions as well as inclusion in OSP should be finished. Not a month later, not a week later - __on day one!__

There has been a lot of talk lately to how IS29500 will be extended in the future. Specifically, how - and where - will new additions be included? IS29500 is comprised of two schema sets - a strict set and a transitional set. Currently the strict set is created from the transitional set, so strict is in fact a proper subset of the transitional set.

However - there is no guarentee that this will always be so.

My gut feeling is that transitional should be preserved as the "reflection" of the existing Microsoft Office documents (until March 2008) - in other words in term with the scope of IS29500. I think that any new stuff should be added to the strict schema set only. The term "transitional" clearly implies this. As I recall the feeling in Geneva at the BRM, the idea behind the transitional set was, that eventually it would no longer be needed and hence removed from the standard - at some point in the future. If we continue to add new features to the transitional set, we will never get to the point where we can honor the sentiment of this particular issue.

... now at the moment, we haven't decided anything yet ... so right now anything goes.

But what are your thoughts?

Part 3 of ISO/IEC 29500 is the fun part and if you haven’t read it yet, you really should do so – especially if you are thinking about implementing an IS29500-document consumer. Part 3 basically consists of two distinct areas – one that deals with compatibility and one that deals with extensibility. The first area is the target of this post.

To any markup consumer and producer of a format not cast in stone it is important to be able to ensure compatibility both forwards and backwards as the format changes over time. This is where the “compatibility-thingy” comes into play.

The compatibility-features of OOXML enable markup producers to target different versions of applications supporting different versions of the specification or different features all together. The tools to do this are called “Alternate Content Elements” (ACE) and “Compatibility-rule attributes” and Part 3 is supposedly an exact remake of how compatibility and extensibility is handled in the binary Microsoft Office files.

The latter tool enables markup producers to “force” other markup producers to preserve specific content – even if it is not known to them as well as instructing markup producers to which parts of the document could safely be ignored. It can even instruct markup consumers to fail if it doesn’t understand some parts of the markup. If this sounds kind-of “SOAP-ish” to you, the attribute name “MustUnderstand” to enable just this should sound even more familiar to you.

The first tool can be thought of as sort of “a switch statement for markup”. It allows a markup producer to serve alternate versions of markup to target alternate feature-sets of different applications. The diverging markup would be listed as different “alternate content blocks” or “ACB’s”, and it is essentially an intelligent way for a markup producer to tell a consumer that “if you don’t understand this bit, use this instead”.

An interesting use case would be to use ACE to improve interoperability when making text documents with mathematical content. It has long been a public secret that interoperability with OOXML was improving day by day – but not with mathematical content. Mathematical content in OOXML (or “OMML”) has for some reason not been a top priority with implementers of OOXML, so interoperability has been really, really bad.

Now, wouldn’t it be cool if there was some way for markup producers to serve MathML as well as OMML to consuming applications? Let’s face it – most of the competition to Microsoft Office 2007 is from applications supporting ODF, and they all (to a varying degree) support MathML. So a “safe assumption” would be that “if I create an OOXML text document with OMML and send it to a different application, it probably understands MathML much better than OMML”. Wouldn’t it be cool, if you could actually do this?

Well, to the rescue comes ACE.

ACE enables exactly this use case. ACE is based on qualified elements and attributes, so as long as you can distinguish between the qualified names of the content you are dealing with, ACE is your friend.

So let’s see how this would work out.

Take a look at this equation:

In Office Math ML (OMML) this is represented as:

[code:xml]<m:oMath>

<m:r>

<m:t>a=</m:t>

</m:r>

<m:f>

<m:num>

<m:r>

<m:t>b</m:t>

</m:r>

</m:num>

<m:den>

<m:r>

<m:t>c</m:t>

</m:r>

</m:den>

</m:f>

</m:oMath>

[/code]

In MathML thhe formula is represented as:

[code:xml]<math:math >

<math:mrow>

<math:mi>a</math:mi>

<math:mo >=</math:mo>

<math:mfrac>

<math:mi>b</math:mi>

<math:mi>c</math:mi>

</math:mfrac>

</math:mrow>

</math:math> [/code]

(both examples have been slightly shrinked)

So how would one specify both these ways of writing mathematical content? Well, it could look like this:

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<w:document

xmlns:omml="http://schemas.openxmlformats.org/officeDocument/2006/math"

xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"

xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"

>

<w:body>

<w:p>

<omml:oMathPara>

<mc:AlternateContent xmlns:mathml="http://www.w3.org/1998/Math/MathML">

<mc:Choice Requires="mathml">

<mathml:math >

<mathml:mrow>

<mathml:mi>a</mathml:mi>

<mathml:mo >=</mathml:mo>

<mathml:mfrac>

<mathml:mi>b</mathml:mi>

<mathml:mi>c</mathml:mi>

</mathml:mfrac>

</mathml:mrow>

</mathml:math>

</mc:Choice>

<mc:Choice Requires="omml">

<omml:oMath>

<omml:r>

<omml:t>a=</omml:t>

</omml:r>

<omml:f>

<omml:num>

<omml:r>

<omml:t>b</omml:t>

</omml:r>

</omml:num>

<omml:den>

<omml:r>

<omml:t>c</omml:t>

</omml:r>

</omml:den>

</omml:f>

</omml:oMath>

</mc:Choice>

<mc:Fallback>

<!-- do whatever -->

</mc:Fallback>

</mc:AlternateContent>

</omml:oMathPara>

</w:p>

</w:body>

</w:document>[/code]

So you simply add the compatibility-namespace the file and add the "AlternateContent"-element. This element includes a list of "choices" and possibly a fallback choice. The choices are evaluated in the sequence they appear in the list of "choices".

And the benefit? Well, you can now have your cake and eat it too. If the consuming application supports it, it will display the equation based on the mathml-fragment – otherwise it will use OMML.

This is immensely interesting and applies to all sorts of places and use cases – heck, you can even use it to gain advantage of some of the new stuff in the strict schemas of IS29500 while keeping intelligent compatibility with existing applications only supporting ECMA-376 1st Ed. Imagine the ECMA-376-way of doing dates in spreadsheets and now add the possibility of using some of the new functionality added at the BRM - and without the risk of breaking applications nor losing data.

… that is if we change the namespace of the strict schemas, of course.

Copyright © 2018 - Powered by BlogEngine.NET 2.9.1.0 - Theme by Farzin Seyfolahi