a 'mooh' point

clearly an IBM drone

Losing data the silent way - ISO8601-dates

In Prague we spent quite some time discussing how to deal with the fact that applications supporting ECMA-376 1st Ed. not necessarily support ISO/IEC 29500:2008 strict as well. Our talks revolved primarily around how major implementations dealt with the modified functionality of the elements <cell> and <v> in SpreadsheetML now that ISO-dates are allowed as content of the <v>-element. But “dates in spreadsheets” is not the only place where changes occurred. Changes were also made to other areas, including

  • Object embedding
  • Comments in spreadsheets
  • Hash-functions for document protection

This will be the first post in a series of posts evolving around how IS29500 differs from ECMA-376 and how existing applications behave when encountering a document with new content. What I will do here is to create some sample documents and load them in the applications I have access to that supports OOXML the best. In my case these are Microsoft Office 2007 SP2, OpenOffice.org 3.0.1 and NeoOffice for Mac and Apple iWorks. If you want to contribute and you have access to other applications, please let me know the result and I’ll update the article with your findings. If you have access to Microsoft Office 2007 SP1, I'd really like to know. When the series is done I’ll post a bit about MCE and how it might help overcome some of the problems I have highlighted (if we’ll get to change the namespace for the strict edition of IS29500 schemas)

I should also note that as the series progresses, the examples I make will increase in complexity. A consequence of this will be that my examples will be more of a “magic-8-ball-type prediction” than “simple examples of IS29500-strict documents”. Since there is not a single application out there supporting IS29500-strict, the examples will be my “qualified guesses” to how applications might interpret IS29500-strict when they implement it.

ISO-8601 dates in SpreadsheetML

Let me first touch upon the problem with dates in SpreadsheetML since this was the problem we talked about the most. Gareth Horton from the UK national body hand-crafted a spreadsheet document with these new dates. I have modified his example a bit to better illustrate the point. Files are found at the bottom of this post.

In the original submission to ISO dates were persisted in SpreadsheetML as “Julian numbers” (serial representation) and subsequently formatted as dates using number format styles.

[code=xml]<sheetData>
  <row r="1">
    <c r="A1" s="1">
      <v>39904</v>
    </c>
  </row>
  <row r="2">
    <c r="A2" s="1">
      <v>39905</v>
    </c>
  </row>
(…)
  <row r="10">
    <c r="A10" s="1">
      <v>39913</v>
    </c>
  </row>
</sheetData>[/code]

So the above would create a column with 10 rows displaying the dates from April 1st to April 10th.

Let’s change one of the cells to contain a date persisted in ISO-8601 format.

[code=xml]<row r="9">
  <c r="A9" s="1" t="d">
    <v>2009-04-09T01:02:03.04Z</v>
  </c>
</row>[/code]

So the cell contains an ISO-8601 date and it is formatted using the same number format as the other cells. I have added a bit of additional data to the spreadsheet to illustrate the problem with using formulas on these values.

Result

The interesting thing to investigate iswhat happens when this cell is loaded in a popular OOXML-supporting application. Note here that the existing corpus of implementations supporting OOXML supports the initial edition of OOXML, ECMA-376 1st Ed.So they would have no way to look into the specification and see what to do with a cell containing an ISO/IEC 8601 date value.

Microsoft Excel 2007 SP2

As you can see Excel 2007 screws up the content of the cell. And on top of that, should you try to manipulate the content of the cells with formulas, they are also basically useless. The trouble? Well, you are not notified that Excel 2007 does not know how to handle the content of the cell, so chances are that you’ll never find out – until you find yourself in a position where there are real consequences to the faulty data and kittens are killed.

OpenOffice 3.0.1 Calc

 

 

The result here is almost the same. Data is lost and the user is not notified.

NeoOffice for Mac

 

Again we see the same result. This is not so strange, since the latest version of NeoOffice shares the same code base as OOo 3.0.1 so behavious should be the same.

iWorks 09 Numbers

 



Wow, so for iWorks on the Mac, the user is actually warned that something went wrong. Only trouble is - it does not warn you that the content of the cell is not valid - it informs you that the system cannot find the font "Calibri".

Conclusion

It is pretty hard to conclude enything but "this sucks!". None of the applications warn the user that they have lost data - and they all do exactly that - loose data.

Original file: Book1.xlsx (8.82 kb)

Modified file: 

book2.xlsx (8.22 kb)