a 'mooh' point

clearly an IBM drone

Embrace and extend - SVG in ODF revisited

One of the attack-vectors on OOXML has been the lack of reuse of existing standards. Specifically it lands directly in the discussion of DrawingML vs. SVG and OOML vs. MathML ... both of which are relatively interesting subjects. The argument has been why Microsoft chose not to reuse SVG and created DrawingML instead - and likewise with MathML and OMML.

Now, some of the arguments for reusing existing standards are:

  • Reuse of other people's code
    As a programmer, I love this - there is nothing more satisfying than being able to reuse something that others have made an effort to produce
  • Increase quality
    If something is an existing standard, someone else has propably reviewed it and the worst bugs have likely been removed.
  • Brain cycle reuse
    If you reuse some work already defined, you will propably be able to find someone in your organization that has skills in this area - and you avoid the costs of re-educating them to use a new tool.

So, with respect to ODF, it has tried to reuse as many standards as possible, so e.g. mathematical content is done using MathML and vector graphics are supposedly done using SVG. Microsoft has chosen a different path where they have created new formats for their formats, så mathematical content is done using OMML (Office Math Markup Language) and vector graphics are done using DrawingML.

A couple of weeks ago I heard some rumours that ODF had not actually only used SVG as vector graphics format but also even extended it beyond the standardized format. My initial response was that it had to be wrong information. One of the corner stones of ODF is namely that it reuses existing standards and that there is a "clean cut" between ODF and the standard it utilizes. This way I would be able to buy/aquire some library that supports SVG and simply incorporate it in my product implementing ODF. But if the referenced standard is extended - I will either experience less functionality due to extensions not being parts of the standard or I could experience crashing code when I try to pass the extended format to the external library - at least if it performs e.g. DTD/schema validation and finds out that invalid elements are present in the input.

So what did I do?

Basically I started by doing a random text-search in the ODF-spec for occurences of "[SVG]". One of the first things that caught my attention was the paragraph in section 1.3 Namespaces, Table 2 where it says:

Prefix Description
Namespace
svg
For elements and attributes that are compatible to elements or attributes defined in [SVG].
urn:oasis:names:tc:opendocument:xmlns: svg-compatible:1.0


The term "compatible to elements or attributes" seems quite odd to me, since it should not be necessary to specify this if the referenced standard is not extended. I did another quick search and I stumpled over these sections of the specification:

  • 14.14.2 SVG Gradients
  • 15.13.13 Line Join

Let me quickly walk through the contents of each section.

14.14.2 SVG Gradients

The contents of section 14.14.2 says, amongst other things.

In addition to the gradients specified in section 14.14.1, gradient may be defined by the SVG gradient elements <linarGradient> and <radialGradient> as specified in §13.2 of [SVG].

Cool!

Now, the section goes on as

The following rules apply to SVG gradients if they are used in documents in OpenDocument format:

  • The gradients must get a name. It is specified by the draw:name attribute.
  • For <linarGradient>, only the attributes gradientTransform, x1, y1, x2, y2 and spreadMethod will be evaluated.
  • For <radialGradient>, only the attributes gradientTransform, cx, cy, r, fx, fy and spreadMethod will be evaluated.
  • The gradient will be calculated like having a gradientUnits of objectBoundingBox, regardless what the actual value of the attribute is.
  • The only child element that is evaluated is <stop>.
  • For <stop>, only the attributes offset, stop-color and stop-opacity will be evaluated.

 So, to be able to determine if ODF is only referencing SVG, we need to look at section 13.2 in SVG spec. It says:

<!ELEMENT %SVG.linearGradient.qname; %SVG.linearGradient.content; >
<!-- end of SVG.linearGradient.element -->]]>
<!ENTITY % SVG.linearGradient.attlist "INCLUDE" >
<![%SVG.linearGradient.attlist;[
<!ATTLIST %SVG.linearGradient.qname;
    %SVG.Core.attrib;
    %SVG.Style.attrib;
    %SVG.Color.attrib;
    %SVG.Gradient.attrib;
    %SVG.XLink.attrib;
    %SVG.External.attrib;
    x1 %Coordinate.datatype; #IMPLIED
    y1 %Coordinate.datatype; #IMPLIED
    x2 %Coordinate.datatype; #IMPLIED
    y2 %Coordinate.datatype; #IMPLIED
    gradientUnits ( userSpaceOnUse | objectBoundingBox ) #IMPLIED
    gradientTransform %TransformList.datatype; #IMPLIED
    spreadMethod ( pad | reflect | repeat ) #IMPLIED  
>

So it seems that at least the attribute gradientUnits is not used in the ODF-adapted version of SVG.

If we look at <radialGradient>, we need to cross reference with the corresponding  DTD in SVG. It says:

<!ENTITY % SVG.radialGradient.extra.content "" >
<!ENTITY % SVG.radialGradient.element "INCLUDE" >
<![%SVG.radialGradient.element;[
<!ENTITY % SVG.radialGradient.content
    "(( %SVG.Description.class; )*, ( %SVG.stop.qname; | %SVG.animate.qname;
    | %SVG.set.qname; | %SVG.animateTransform.qname;
    %SVG.radialGradient.extra.content; )*)"
>
<!ELEMENT %SVG.radialGradient.qname; %SVG.radialGradient.content; >
<!-- end of SVG.radialGradient.element -->]]>
<!ENTITY % SVG.radialGradient.attlist "INCLUDE" >
<![%SVG.radialGradient.attlist;[
<!ATTLIST %SVG.radialGradient.qname;
    %SVG.Core.attrib;
    %SVG.Style.attrib;
    %SVG.Color.attrib;
    %SVG.Gradient.attrib;
    %SVG.XLink.attrib;
    %SVG.External.attrib;
    cx %Coordinate.datatype; #IMPLIED
    cy %Coordinate.datatype; #IMPLIED
    r %Length.datatype; #IMPLIED
    fx %Coordinate.datatype; #IMPLIED
    fy %Coordinate.datatype; #IMPLIED
    gradientUnits ( userSpaceOnUse | objectBoundingBox ) #IMPLIED
    gradientTransform %TransformList.datatype; #IMPLIED
    spreadMethod ( pad | reflect | repeat ) #IMPLIED
>

So here the attribute gradientUnits is not used as well. 

But luckily the good guys at ODF TC have solved this mystery for us - since they have decided that the value of the (non-existing) attribute gradientUnits is calculated as having a value of "objectBoundingBox", regardless of the value passed as this parameter. It's a bit odd, but I suppose it has something to do with the way the SVG-fragments positions themselves around the other objects in the document.

15.13.12 Line Join

The contents of section 15.13.13 is:

The attribute draw:stroke-linejoin specifies the shape at the corners of paths or other vector shapes, when they are stroked. The values are the same as for [SVG]'s strokelinejoin attribute, except that the attribute in addition to the values supported by SVG may have the value middle, which means that the mean value between the joints is used.

They have even been so kind to provide us with a schema fragment defining the possible usage of this feature in ODF:

<define name="style-graphic-properties-attlist" combine="interleave">
    <optional>
        <attribute name="draw:stroke-linejoin">
            <choice>
                <value>miter</value>
                <value>round</value>
                <value>bevel</value>
                <value>middle</value>
                <value>none</value>
                <value>inherit</value>
            </choice>
        </attribute>
    </optional>
</define>

Compare this with the DTD of SVG (Appendix A.1.7 Paint Attribute Model):

<!ENTITY % SVG.stroke-linejoin.attrib
    "stroke-linejoin ( miter | round | bevel | inherit ) #IMPLIED"
>

So the attribute value "middle"  is indeed an addition to SVG.

Conclusion 

You might be wondering if all this is really worth an entire article about a couple of additions/exclusions of SVG, and you kindda have a point. However, the devil lies in the details.

The modifications to SVG (even if they are minor) are bad enough as they are, because they basically kill high-fidelity interoperability when using existing SVG-libraries. When you are limiting the usage of some component (the limitations to the values of gradientUnits) you basically loose control with how existing data behaves. And when you enlarge a standard (addition of the middle-attribute of the stroke-linejoin element) you loose control with how your own data behaves when using it in other scenarios. You know, this is exactly what Microsoft did when they enlarged not only CSS but JavaScript. Maybe the memory of the ODF-founders is not that great, but I certainly remember the loads of crap-work we had to do in the late ninetees when creating web-pages to "IE5-compatible" browsers and "the rest". In fact - this nightmare still haunts us with the Microsoft additions to JavaScript.  Maybe they just thought: "If Microsoft pulled it off, so can we". I think that's a bad choice.

Also, you should note that ODF does not use SVG "as such" at all. They use fragments of SVG, i.e. elements with same names and attributes and then they fit it into the overall architecture of ODF. This is hardly "just referencing". As the paragraph says above (stroke-linejoin), the elements specifying this are not SVG-elements. They are similar to SVG-elements and even extended beyond this. I actually find it really hard to see or understand how the ODF TC can claim - with a straight face - that ODF only references SVG. I suppose that if I made my own JLSMarkup for document formats and used an element called <body> I would also be able to claim that I was reusing W3C xHTML 1.0. I just don't find it the right thing to do.

My only surprise is why this has not surfaced until now and how anyone can sit down and read in ODF (as being both pro-ODF or pro-choice) and not be just a little confused about how they could claim "just referencing existing standards", is a bit mind-baffling to me. I suppose ECMA could do the same with OOXML and claim "reusage of HTML DOM in OOXML-architecture" since a WordProcessingML-document contains both a <body>-element as well as a <p>-element.

Post scriptum

On his blog Brian Jones speculated in his last comment on the thread "Why all the secrecy?" if you could take an existing SVG-drawing, put it into an ODF-document and expect it to work. Well, just as OOXML, ODF has no limitations to what kind of data you might want to put into it, so usage of SVG in a ODF-document is indeed possible from a technical/architectural point of view. It is not a format question but an implementation-specific question. However - will it work?

ODF has several ways to embed data into the document. The two relevant means are inclusion of an SVG-drawing as an image and inclusion of an SVG-image as an object. ODF supports two ways to embed an object, as stipulated in section 9.3.3:

A document in OpenDocument format can contain two types of objects, as follows:

  1. Objects that have an OpenDocument representation. These objects are:'
    1. Formulas (represented as [MathML])
    2. Charts
    3. Spreadsheets
    4. Text documents
    5. Drawings
    6. Presentations
  2. Objects that do not have an XML representation. These objects only have a binary Representation, An example for this kind of objects OLE objects (see [OLE]).

 

Well, SVG is clearly XML but it is not an "OpenDocument representation" - but then again, neither is MathML, so I'll opt for using these two methods when trying to embed an SVG-drawing into a ODT-document:

  • Insert the SVG-drawing as an image
  • Insert the SVG-drawing as an XML part using the <draw:object>-element as specified in section 9.3.3 of the ODF spec.

I'll use the latest and greatest release of OOo, OpenOffice 2.3.1 DA, to try to display the files. You can see the SVG-file here: ex.svg (482,00 bytes)

Insert SVG as an image

I have created a small ODT-document and added the SVG-file to it. I have added an SVG-image to content.xml as a regular image and put the SVG-file in a folder by itself. The XML-file content.xml is displayed here below.

<?xml version="1.0" encoding="UTF-8" ?>
<office:document-content
 xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
 xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
 xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
 xmlns:xlink="http://www.w3.org/1999/xlink"
 xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
>
 <office:body>
  <office:text>
   <text:p >Test of insertion of SVG-image in ODT-document</text:p>
   <text:p >
    <draw:frame
     draw:style-name="fr1"
     draw:name="grafik1"
     text:anchor-type="paragraph"
     svg:width="17cm"
     svg:height="13cm"
     draw:z-index="0">
       <draw:image
      xlink:href="SVG/ex.svg"
      xlink:type="simple"
      xlink:show="embed"
      xlink:actuate="onLoad" />
      </draw:frame>
   </text:p>
  </office:text>
 </office:body>
</office:document-content>

As it is seen the SVG-image is simply added as a regular image using the ODF-modified version of SVG. The ODT-file is available here: test svg image.odt (1,48 kb). Anyone want to take a guess on what the result of opening this file will be?

 

 

Insert SVG as an "XML-object"

As noted above ODF allows insertion of objects with an "XML-representation" as just a text file. The construction of the ODF-package is a bit more complicated and I'd be happy if anyone could tell me if I made a mistake - and what the correct way would be. As basis for my file I have used an ODT-file with a formula in MathML embedded, an so I'll just again show the contents of the content.xml-file here below.

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
  xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
  xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
  office:version="1.0">
  <office:body>
    <office:text>
      <text:p >Test of insertion of SVG in OOo</text:p>
      <text:p >
        <draw:frame
          draw:name="My SVG drawing [JLS]"
          text:anchor-type="as-char"
          svg:width="1.011cm"
          svg:height="0.467cm"
          draw:z-index="0"
        >
          <draw:object
            xlink:href="./SVG"
            xlink:type="simple"
            xlink:show="embed"
            xlink:actuate="onLoad"
           />
        </draw:frame>
      </text:p>
    </office:text>
  </office:body>
</office:document-content>

Again an xlink reference to the SVG-file is "simply" added to content.xml. The ODT-file is available here: test insert svg.odt (1,48 kb). Anyone want to take a guess on what the result of opening this file will be?

 

 

 

So it seems to recognize the SVG filetype - it just doesn't understand how to process it.

I have a feeling that I might have made an error in the manifest, so I'll include it here and hopefully someone can pinpoint if there is an error:

<?xml version="1.0" encoding="UTF-8"?>
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0">
 <manifest:file-entry manifest:media-type="application/vnd.oasis.opendocument.text" manifest:full-path="/"/>
 <manifest:file-entry manifest:media-type="image/svg+xml" manifest:full-path="SVG/ex.svg"/>
 <manifest:file-entry manifest:media-type="application/vnd.oasis.opendocument.image" manifest:full-path="SVG/"/>
</manifest:manifest>

OOo and SVG

I have said before that the devil lies in the details - but here it actually lies right up-front. You see - OpenOffice.org does presently (version 2.3.1) not suppport SVG. It doesn't support SVG as regular images and it does not support SVG as providing vector graphics or "line art". You can import SVG-images with OOo, but it is converted to OpenDocument Draw and Open Document Draw data can be exported to SVG. The import/export is not done not using OOo itself but with a filter, that converts the SVG into the internal ODF Draw format. The feature of supporting SVG is apparently the single most requested feature in OOo, so maybe it will soon be a part of OOo. Also take a look at the "General note" on the "Unsuppoted SVG features"-page of the filter:

SVG and what's named SVG-compatible in OpenDocument is really different. Therefore, the import filter can only approximate the SVG contents.

Ooh - and incidentally - the way ODF and OOo handles SVG is exactly the same way OOXML and Microsoft Office 2007 handles MathML.

Smile

ECMA har udsendt de sidste svar

I går var så dagen, hvor de sidste svar fra ECMA blev gjort tilgængelige for de nationale råd rundt omkring i verden. Dermed har ECMA svaret på alle godt og vel 3500 kommentarer, der indløb i løbet af behandlingen af DIS 29500 i sommer/efterår 2007.

Under arbejdet med standarden og diskussionerne om den henover sommeren kunne jeg ikke lade være med at tænke på, at rigtigt mange af kommentarene var det rene vås eller i bedste fald ligegyldige. De var som lavet ud fra devisen "hvor jeg nu bevidst prøver at misforstå det - hvor er det så lettest henne?" (ex: OLE). Det er klart, at der var mange gode kommentarer, men mange af dem var faktuelt noget ævl.

Men jeg må erkende, når jeg nu sidder og kigger på resultatet af behandlingen af kommentarene, at den samlede mængde kommentarer har resulteret i en standard, der på mange måder er bedre end den var før. Standarden er helt enkelt blevet mere præcist formuleret og generelt lettere at anvende. Det er helt klart et anerkendende nik værd overfor alle de mennesker, der (om de er for- eller imod OOXML) har gennemtrævlet forslaget til standard. Tak til jer! Det er værd at understrege, at standarden ikke er blevet lavet totalt om - den er derimod blevet forbedret på en række områder, hvor den trængte til finpudsning. Selve arkitekturen er den samme, dvs den energi man skulle have brugt på at anvende den eksisterende ECMA-376 er bestemt ikke spildt. Af de punkter, hvor jeg synes de største forbedringer er kommet, er:

  • Der er ikke længere noget krav om at skulle anvende VML i nye dokumenter
  • Angivelse af landekoder skal nu ske som specificeret i RFC-4646
  • Det er mere tydeligt, at OOXML skal anvende eksisterende, velafprøvede hash-koder som bla. specificeret ved FIPS-180
  • Conformance-kravene er blevet mere tydelige
  • Den berømte "leap year bug" er nu markeret som forældet
  • Det er muligt at anvende datoer før 1900
  • Formel-specifikationerne for regneark er nu beskrevet i EBNF-notation

Og hvad så med resten af de mange kommentarer som fx "Compatibility-elements? Tja - nu nævnte jeg blot de dele, som jeg synes er de vigtigste (og så har jeg naturligvis sikkert glemt nogle andre vigtige).

Smile

Endnu en spand svar fra ECMA

Så er ECMA klar med endnu en spand svar til de forskellige lande i forbindelse med arbejdet omkring DIS 29500. Af deres pressemeddelelse kan det ses, at ECMA nu har svaret på 92% af de indkomne 3500 kommentarer, og det ser ud til, at det lykkes for dem at nå alle svar inden deadline på mandag d. 14. januar 2008. Af svarene på de danske kommentarer mangler nu kun ganske få at blive behandlet og det bliver spændende at se, hvad ECMA svarer på de sidste punkter.

Én ting jeg har haft svært ved at hitte ud af er, om ECMA får lov af ISO/IEC til at offentliggøre en ny samlet standard med alle rettelser indkluderet. Er der nogle af jer læsere, der har denne information? Så vidt jeg læser JTC1-direktiverne, så må de ikke offentliggøre de enkelte dispositioner i sig selv og heller ikke kommentarerne fra landene, så den eneste mulighed for at få svarene på kommentarene ud er vel at offentliggøre den endelige, fulde rapport. Jeg tror personligt ikke, at ECMA vil offentliggøre den fulde, reviderede, standard før efter BRM i februar - men uanset udfaldet er det jo lidt et valg imellem kolera og pest. Jeg skal være ærlig at indrømme, at jeg har nydt arbejdsroen i de sidste måneder efter 2. september 2007 og specielt efter ECMA begyndte at rundsende svarene til de enkelte lande. Det er klart, at der ikke har været så meget debat - faktisk meget mindre end jeg havde troet - men det er jo også en helt anden situation de enkelte lande står i. I første del af den 5 måneder lange ballot period var det i mine øjne en klar fordel, at OOXML blev diskuteret så bredt, for det fik afdækket en lang række mangler og uhensigtsmæssigheder ved standarden. Jeg tvivler på, at de enkelte lande have kunnet levere samme arbejde, hvis det ikke havde været for GrokDoc, IBM, Andy og andre, der har gennemtrævlet OOXML-spec for fejl. Der var en overhængende risiko for, at landene blot havde stemt "abstain" fordi de ikke kunne forstå spec - ganske som de gjorde med ODF i december 2006. Det gjorde de jo heldigvis ikke, og situationen nu er jo, at de enkelte lande skal se, om svarene fra ECMA til de enkelte kommentarer er god nok. Det er naturligvis et arbejde af en helt anden karakter, og det er min opfattelse, at vi her ikke har brug for nøglepersonerne fra den anden side af floden.

Men - det bliver spændende at se det endelige resultat af ECMA TC45s arbejde. 

Smile

 

Hyprocrisy 101

Software politics: Hypocracy 101

Course Title: Hypocrisy 101
Major: Software politics
Points earned: 10 ECTS
Prerequisites: none
Attending professor: Mr. Rob Weir, IBM

Course abstract:

When participating in the ever evolving landscape of software politics you need to master a variety of tools essential to ensure your success and accomplish your goals. This course will give you the ability to master any discussion involving software politics and to blow your opponents off the field

Course contents:

Deny, deny, deny: 5 lessons

Introduction: You will have to be able to deny anything at any place without flinching. Even if your opponent has fact-based arguments, simply dismiss these as either

  1. wrong
  2. from questionable resources
  3. or based on faulty assumptions

Remember, you choose what you want to comment on and you choose which arguments to take. 

You’re fucked either way: 4 lessons

Introduction: Never say anything wrong but also never say anything correct. If you are challenged on something you said, simply take any valid and solid resource and pick something out of context. If you are challenged on this, do one more iteration. Most documents can be interpreted either way, and only your imagination limits you in what you can do. 

Talk is silver but silence is gold: 6 lessons

Introduction: This is tricky to apply in a real-world, offline, discussion – but when applied to an asynchronous discussion on a blog, it works wonders. Whenever you feel challenged or cornered up in a discussion, simply leave for at few days. This will effectively dampen down the discussion and cool everything off. Then come back after a few days and pick up another, more easy, discussion as if nothing happened. If someone complains about you leaving the discussion – simply argue that you have a day-job to do to support wife and kids and bloggin’ is your secondary activity.'

Beat around the bush: 8 lessons

Introduction: Aside from not being concrete on anything, you will have to be able to master appearing to answer a question - when you are really not. A real-world example of a perfect example of this is from the blog of Brian Jones.

Brian Jones said:

You know what Rob, how about if you just take the questions and responses and post them on your own site? You are a member of the US national body and you have access to all of the materials. If it's no big deal, then post them for everyone.

Where to the attending Professor, Rob Weir, replyed:

I wouldn't want to give the Ecma password out, because the Ecma server is already slow as it is, and I wouldn't want to put more load on it.  Best to keep that for NB-access only.

Notice how this should be mastered? It appears that the question is answered ... when really, it's not.

 

With regards to which books and resources will be used throughout the course, feel free to contact the attending professor, preferably on his blog at http://www.robweir.com/blog/ .

Smile!

Yderligere svar fra ECMA

Svarene på de 750 unikke kommentarer fra de nationale råd er nu begyndt at komme i en lind strøm fra ECMA. Seneste udmelding fra ECMA er, at antallet af behandlede kommentarer nu er på 51% og tilgængelige fra det ISO/IEC-kontrollerede website. På seneste udmelding fra ECMA er tillige listet en del af de forslag, som de har valgt at følge. Det er en interessant liste, specielt da de faktisk var valgt at lave nogle ændringer, som jeg i hvert fald ikke havde regnet med de ville ændre. Listen er jo på deres website, men er tilføjet dette indlæg også:

  • Anvendelse af ISO-datoer
    Dette er faktisk én af de ting, som jeg er græsk/katolsk overfor, men det ser ud til, at man nu tillader persistering af datoer i regneark som både ISO-datoer og tal. Dermed gøres det på samme måde som i ODF, hvor man også har valget imellem tal eller datoer (hvilket i øvrigt har gået langt over hovedet på de fleste)
  • Internationalisering af ugedage
    Ét kritikpunkt af OOXML har været funktionen WEEKDAY() i regneark, der ikke tillader angivelse af en uge, der starter på andet end enten søndag eller mandag. Nu bliver det muligt at angive dagen, som ugen starter på
  • Sprog-angivelser
    Dette er så vidt jeg kan huske "ST_Lang-problematikken", hvor angivelse af datoer skulle gøre på to forskellige måder i stedet for at anvende en givet standard. Super!, at de nu bliver fixet
  • Side-kanter
    Angivelse af grafik til side-kanter var tidligere en lukket liste, dvs ikke ret fleksibelt. Nu bliver listen åben. Det er en god idé, da kontrol over dokumentets visuelle layout dermed gives tilbage til brugeren ... og ikke er forbeholdt formatet selv.
  • Formel-grammatik
    OOXML anvendte sin egen måde at beskrive formler, og ikke alene var det ikke en standardiseret måde at gøre det på, den var også mangelfuld. Nu ændres det til at anvende en såkaldt "udvidet BNF-notation". Den tekniske reference for dette er "ISO/IEC 14977:1996 – Syntactic metalanguage – Extended BNF."

Jeg deltog i ECMA-mødet 7. december 2007 og ISO/IEC-plenary 8. december 2007 i Kyoto som repræsentant for Danmark og jeg og de andre repræsentanter for forskellige råd var bla. en del af diskussionerne i ECMA af flere af punkterne herover. Det var rart at høre og få vished for, at de rent faktisk bekymrede sig for vores holdninger til deres rettelser og de spurgte flere gange konkret, om vores respektive holdninger til den ene eller den anden løsning. Det var tydeligt for mig, hvor vigtig godkendelsen af DIS29500 er, og formatet ikke var helligt i dets nuværende form, men at der var en accept af formatet ville blive ændret som et resultat af processen.

Skulle du være interesseret i at se mine egne, personlige, rejseoplevelser, så kig forbi www.ringtilcamilla.dk Smile

What's up with OLE?

A few weeks back I made an article about how Microsoft Office 2007 dealt with password-protection of an OPC-package, since this feature is not a part of the OOXML-specification. The answer I found was that Microsoft Office 2007 persists the password-protected file as a OLE2 Compound File ... more commonly known as a "OLE-file". I also concluded that using OLE2 Compound Files is not a problem - and certainy not an issue regarding OOXML.

Now - the whole topic around OLE has been at the front row of the worldwide debates regarding OOXML. My personal opinion is that the people jumping up and down screaming about problems with OLE ... really haven't understood what OLE is.

So let me start by making a small recap' of what it is really all about.

... there is OLE and then there is OLE 

First of all:

there is "OLE" and then there is ... "OLE"

... or put in another way:

there is the "OLE-technology" and then there is the "OLE-file"

or in a third, more correct, way:

there is the "OLE application technology"  and then there are "Compound Files".

The foremore mentioned is the technology that - on the Windows platform - enables a program to use the UI of another program ... without launching the entire application itself. I mostly use this when editing MS Visio-documents in Word but other usages of this is using an Excel spreadsheet in an MS Word application. The OLE-technology itself is a tool on the Windows-platform that all applications can - and do - use to enable "utilizing other applications in their own applications". It is here important to understand, that there is (today) nothing really revolutionary about OLE. Another similar technology on the Windows-platform is DDE and on the Linux-platform it could be KParts and Bonobo. These technologies simply enable one program to communicate with another (simply put).

But what about these OLE-files?

Well, Compound Files are actually not dependant of OLE-technology. Or put in another way: you don't need OLE-technology to read and use the contents of a Compound File. Compound Files are just files. A Compound File is a collection of persisted streams - actually much like a ZIP-archive. Most commonly it is used because it brings the ability to "utilize a file system within a file". Of course you will need to know how to use the contents of the file, be it created by OpenOffice, Corel Draw, Adobe Acrobat or any application that might store its files using Compound Files. But this is seperate from being able to read and write to the contents of a Compound File.

Ok - I will not bother you any more with this. You should check out the original article about OLE and also look into the specification of the binary formats for Microsoft Office95 - Office2007, avilable from Microsoft. It is actually quite interesting. Just remember that OLE-technology and Compound Files are not the same thing.

And now for something completely different (kindof)

In the lab-tests I have been part of for the Danish Government (National IT and Telecom Agency) we have tested OLE-interoperability. It is important since it is quite normal to embed e.g. a spreadsheet file in a Text-processing file. So it is important that the contents of the file is actually usable when receiving it and opening using another application or on another platform.In this setup we only tested Compound File interop and not interop between OOXML and ODF.

What we did was this:

We created a ODF-file using OpenOffice where we embedded a Excel-spreadsheet (binary .DOC-file) (on the Windows-platform)

We sent this file to a number of different platforms and applications

  • Windows XP using OpenOffice.org 2.3 DA
  • Windows XP using OpenOffice Novell Edition
  • Linux using OpenOffice Novell Edition
  • Linux (SLED) using IBM Lotus Notes 8

We tried to open the file and documented what happened.

#
Setup  What happened? 
1
Windows XP using OpenOffice.org 2.3 DA OpenOffice.org opened the document and correctly displayed the contents of the spreadsheet. It was possible to edit the spreadsheet and save it back into the ODF-container
2
Windows XP using OpenOffice Novell Edition OpenOffice Novell Edition opened the document and correctly displayed the contents of the spreadsheet. It was possible to activate the spreadsheet but only in "read-only"-mode
3 Linux using OpenOffice Novell Edition OpenOffice Novell Edition opened the document and correctly displayed the contents of the spreadsheet. It was possible to activate the spreadsheet but only in "read-only"-mode
4
Linux (SLED) using IBM Lotus Notes 8 Lotus Notes 8 opened the document and correctly displayed the contents of the spreadsheet. When activating the spreadsheet the user was prompted to convert the spreadsheet. When accepting this it became editable and when saving it back into the ODF-container, the spreadsheet was persisted as an Open Document Spreadsheet.


So what we saw was basically 3 different approaches to handling the embedded object. In general the Excel-object (Compound file) itself was not a problem - regardless of application and platform. All combinations had no problems with opening the file and displaying the contents - even on platforms without OLE-technology present. The difference was in the applications and their handling of the object. OpenOffice.org presented the approach that most people would expect: it allowed editing the embedded object and saving it back into the container. OpenOffice Novell Edition allowed activating the embedded object but not saving it back into the container and Lotus 8 took the approach of converting the Excel-object to an Open Document Spreadsheet.

A conclusion?

Well, we took great care not to conclude much - that was not for us to do, we merely provided the technical background for post-lab conclusions. However - the pattern emerging from the description above was similar to a pattern we saw a lot. The problems were not in incompatibility between the formats but instead in how the applications and converters dealt with the formats. We also saw no indications that any of the formats were tied to a specific platform. There were no problems with roundtripping - or to put more clearly: the problem we saw when round-tripping documents were not caused by incompatibilities between the platforms (e.g. Linux and Windows) but between different behaviour in the applications implemented on either platform.

So is this good or bad news? Well, as always, truth lies in the eyes of the beholder ... but I think it is good news. 

Where did my line go?

When we started doing our tests in the lab and started thinking about what we thought we would be seeing, we had a very clear understanding that it would not all be blue-sky conversions and that we would identify problems - some more severe than others. We were also pretty aware, that there would be areas, where conversion was just not possible.

But - I am pretty sure I speak for the rest of the group - we were quite surprised to see which areas this concerned.

On area where absolutely nothing could be converted was ... lines. Not only line art, not only complex line drawings ... but simply - lines.

Lines are done in OOXML as either VML or DrawingML and in ODF it is done using a SVG-derivative. The puzzling thing is, that this area is apparently simply left out in either of the converters. We made some simple documents (line.docx 10,47 kb) and (line.odt 6,60 kb)  [I have re-made these for this article]. When converting these files using CleverAge 1.0 on Microsoft Office 2003 and 2007, Novell OOXML Translator (on Windows and SLED) or IBM Lotus Notes 8 (on SLED), the lines are simply removed. They are not altered, they are not just hidden, they are not moved to a different location in the document ... they are just removed.

This is another example of the overall observation from our tests ... the quality of the converters are simply not good enough today. If you look at the XML in either of the files above, you will see, that even though they look different, they basically specify the same thing (start and end-point for the line drawn), so technically it should pose no problem to be able to do a better conversion.

It is often said, that the main problem with converting from ODF to OOXML (and vice versa) is incompatibilities between the formats. This example is by first glance suporting this argument, but if you dig a bit deeper into the technicality of it, is simply boils down to a problem with bad converters.

Conclusion: The world is seldom black/white ... even if people are trying to convince you so. More often, the world is grey and depressing as a rainy day. 

Vejledning fra IT- og Telestyrelsen

I går eftermiddags offentliggjorde IT- og Telestyrelsen deres vejledningsmateriale for anvendelse af åbne dokumentformater i den offentlige sektor. Hvis du er journalist eller IT-ansvarlig i en offentligt myndighed, så er deres vejledning et "must-read" for dig - den er fuld af værdifuld information.

Se den på http://dokumentformater.oio.dk 

What is a conversion, really?

I have been part of some work for the the Danish National Telecom and IT Agency (IT- og Telestyrelsen). They have coordinated quite a few projects around the country to evaluate the usage of ODF and OOXML and possible problems with co-existance of the two document standards. The website for this work is at http://dokumentformater.oio.dk .

The basic setup for the projects and tests has been:

How does a particular department handle the two document formats and possible conversion between them?

Which problems will arise given their current software install-base?

Is it possible to provide some guidance to the departments regarding which specific features of a document format to avoid since they cause problems?

In other words it has been a rather pragmatic approach based on trying to answer the question: "Why do you experience the problems you see?"

Observations

The first thing we realized during the very first day was something quite crucial:

We were not testing compatibility between two formats - instead we were testing quality of converter-tools and compatibility between the specific format and the internal object model the format is loaded into.

Converter-tools

Both OOXML and ODF are rather immature document formats in the market today since neither of them has a broad market penetration as such. Despite the document count on Google, ODF is not widely used and most people still save their work in .DOC-files -even though they have Microsoft Office 2007 installed. This means that conversion between them is also rather immature and this affects the quality of the converters and the results of converting between one format and another. The ODF-Converter project has an extensive list of the differences between the formats themselves and also a list of features currently not supported by the converter and similar lists exist of features not supported by the other tools used. Luckily it seems that the quality of the converters are drastically improving for each incremental new release.

We also noted that a converter is not "just a converter". It lives and breathes on the application it is installed. This was of particular interest when looking at the ODF-Converter Office Add-In and the SUN OOXML-converter. They are both add-ons to existing Office applications but the application behaviour we saw was in principle the same when using OpenOffice.org, IBM Lotus Notes 8 or OpenOffice Novell Edition.

The problem lies in the fact, that every application has an internal object model that determines how a document is persisted in memory in the application. The binary format for Microsoft Office files were essentially a binary dump of the current memory in the application and this basically counts for at lot of applications with binary file formats. Anyway - regardless of how a document is "converted" or "transformed" using another application than the originator, at the end of the day it has to be loaded into the internal object model for the receiving application. This essentially means, that unless there is a 100% air-tight 1-1-mapping of the document format and the internal object model ... information will be lost. This was one part of the problem - the other was the sequence of conversion. Take a look at the sequence listed here:

Sequence 01  Sequence 02
   
load original format Load original format
 ↓
Convert format to new format Load original format into internal object model
 ↓
load new format into internal object model (make changes)
 ↓
(make changes) Persist as new document format
 
Persist as new document format  

It is not entirely evident that this will produce the same output, and we have seen no evidence that any of applications tested did actually have a 1-1 mapping between (any) document format and their internal object model. This also counts for Microsoft Office and its corresponding file types and OpenOffice itself. In short, this was a fact that we had to deal with in our tests.

On a funny note:

The conversion tools we used were all based on XSLT-transformation between the document formats. They are both XML-formats, so it is a good choice. However, we heard rumours that Novell would dump their OOXML-converter (based on XSLT) and develop their own converter based on the internal object model. It will be interesting to see, if it brings greater quality to the converters.

On a lighter note:

We saw in our tests that using the binary Microsoft Office file format as a middle-man when converting from OOXML to ODF (and back) actually produced the best results ... by a long shot. Having this step and using the binary Office file format as a type of "Lingua Franca", was more or less the key to "flaw-less conversion". If you stop and think about it, it makes perfect sense why we saw this. The Microsoft Office Binary file format is well established in the market (not thanks to Microsoft, but to reverse engineering) and the format has been arround for a long time. Basically, all applications can read it and all applications can write it. But why is this interesting? Well, OOXML is an XML-version of the binary Office file format, so since there are "no problems" with converting from the binary format to ODF, it should be technically relatively easy to convert from OOXML to ODF, since OOXML is a binary version of the binary file format.

It is just a matter of time ... and continious improvement of the format converters. 

Første kommentarer fra ISO/IEC

Det er pudsigt som man kan have ret. I forgårs (18. november 2007) kom de første rettelser fra ISO/IEC. De har udsendt svarende på i alt 19% af det samlede antal. Deres pressemeddelelse kan ses på ECMAs hjemmeside, hvor den kan studeres nærmere. I korte træk er ISO/IEC-editor Rex Jaeschke begyndt at sende de første rettelser ud. Det sker via et website, hvor de enkelte nationale standardiseringsråd har mulighed for at hente rettelserne til deres forslag. Jeg ved ikke, om fx det franske standardiseringsråd har adgang til rettelserne til de danske, men foreløbigt ser det ud til, at det bliver tilfældet. Indholdet på Hjemmesiden er kun tilgængeligt for de enkelte NB'er (nationale råd), men ifølge Microsofts Brian Jones skyldes det ISO-regler, hvor der specificeres, at kommentarer til forslag og svar på disse er et anliggende imellem ISO og de enkelte lande. I Danmark er Dansk Standard og arbejdet i det underlagt nogle begrænsninger i forhold til fortrolighed, så her skal man nok ikke vente for meget information til at begynde med - men så vidt jeg ved er det diametralt modsat i den engelske pendant til Dansk Standard, så måske skulle man holde øje med de informationer, der kommer ud derfra. Eller - deres regler er ret meget lig de danske, men de har valgt en anden tilgangsvinkel i forhold til arbejdet med OOXML, hvor de bla. har anvendt Wiki'er til at indsamle information, og de har også offentliggjort deres kommentarer til DIS 29500 ligesom vi har gjort i Danmark. Igen er valget af format faldet på en wiki.

Jeg har tidligere refereret til ordsproget "May you live in interesting times" ... og mon ikke de er på vej igen?

Smile