a 'mooh' point

clearly an IBM drone

Trapped in a monopoly's web

Monopolies or monopoly-like situations are seldom a benefit for anyone in the long run - except for the monopolist itself, naturally.  This is regardless of wether the monopoly is controlled by Microsoft, iTunes, Ford, Fox News or Google. Sadly I've been caught in the dominated market of the latter.

Basically, I have/had a need to figure out the amount of a specific filetypes located on the internet. Luckily, Google has a method for doing just this. You simply supply a  "filetype:"-argument to your search, and you can then figure out, that there are roughly 93.700 files of the type "Open Document Text", which are created e.g by the office applications KOffice or OpenOffice. You can also determine that there are roughly 45.100.000 documents of the type "Microsoft Office Word Document", primarily created by the office application Microsoft Office. Now, you can also see, that there are just 1040 files of the type "docx", the filetype of the document type "OOXML".

See, this is kindda weird, since docx-files is the default format for the office application Microsoft Office 2007 - the latest edition of the Microsoft Office-suite ... and by far the most used office application in the world. 1040 files doesn't sound like a whole lot - and it doesn't seem to rightfully represent the document world as it is right now. Some have even spun this as the ... ahem ... naïve ... proof that the world doesn't care about OOXML and that the proposed market penetration of it is a joke.

So ... watcha-ya-gonna-do? Well, I looked at the Google search results and I noticed something. The results of the ODT-search included data like:



Notice that Google's index recognizes the file as a OpenDocument file and correctly displays a portion of the content of it.

But when I looked at the results for the docx-search, it listed data like this:

In case you don't read and understand Danish, Google says that the filetype is unknown ("Ikke genkendt") and - as a consequence - it cannot display the file correctly. Notice also that Google somehow actually managed to display the contents of the embedded files in the docx file container. Well, my conclusion of this is, that OOXML-files are propably not included in Google's index at all and that the few files represent a few "off bits" in Google's spiders/crawlers. Hence the comparison between the result of the odt-search and the docx-search is ... well, moot. Of course ... if you had, say, a business need for it, you could always conclude that there are no docx-files on the internet. My take on this argument is:

Dear Rob, your argument concerning the market penetration of OOXML is the - in the words of Comic Book Guy from the Simpson's - worst argument ever ...

And this brings me back to the title of this post -  coz what can you do, when Google fails? I've looked far and near to find a way to do a similar search, but so far without any luck. Some search-engines allow searh for file types - but limit the choices of valid file types. Most search engines doesn't allow file type search at all. I naturally tried searching using LIVE, but I cannot figure out if it is even possible to have it do the search for me. I have been through most of the engines listed through Search Engine Watch ... but it was a fruitless effort.

My question to you is: where can I go to to find the fact I'm looking for?