On Sat, Jul 10, 2004 at 12:02:18PM -0500, Peter Kupfer wrote:
> Out of curiosity, what is it that makes the OOo file format better? I
> not trying to argue. I am just trying to learn as much as possible.
The Microsoft file format is a binary dump of whatever is in memory. This is very frail. If a small part of the data gets corrupted, a significant part of the document can become lost. Furthermore, the file size tends to increase rapidly with the length of the document (roughly linearly).
In contrast, OpenOffice.org uses a collection of XML files compressed with ZIP. Consequences:
1) An extremely reliable format. The use of XML ensures that the document is well-structured with clearly defined tags. It is possible to lose several bytes to corruption and still recover the original file without error. This is because you can use the rules of XML and the OOo schema to reconstruct the missing pieces. Even in cases where the file is so hopelessly corrupted that it is not possible to repair it, you can still remove the XML tags and at least recover the *data* in the file (but not the formatting). I have done this myself for a couple of people. If this occurred with a .doc file, the data would be hopelessly lost.
2) The use of ZIP compression ensures reasonable file sizes. In particular, the file size of an OOo document grows slowly as the lenght of the document increases. ZIP compression seeks to reduce redundancy and repetition in the document. Generally speaking, the longer the document, the more redundancy.
Here is a simple test. Let's make a file that simply repeats the text "Hello world" over and over. Note, this is not a "typical" document by any means. But it's an easy way to demonstrate the effect of compression. Look at how the file size increases with the number of pages:
| # pages | OOo Files | MS Word |
| 1 | 6 kb | 17 kb |
| 10 | 6 kb | 105 kb |
| 100 | 9 kb | 948 kb |
| 1000 | 38 kb | 9776 kb |
You can see that the MS word file grows much faster than the OOo file.
3) Open standards
XML and ZIP are standard formats that are well established. There are dozens of libraries and tools for reading and writing those formats. So it is easier for other people to support the OOo format. It goes further than that, the OOo format is open, published and well documented. This means that it is easy for third parties to support this format, unencumbered by patents, or other issues. The OOo format also forms the basis of the OASIS file format, which is intended to become the new open standard for office applications. The OASIS committee includes corporations such as Corel, Sun and Boeing, as well as non-profit groups like the KDE League, as well as individuals. The goal of OASIS is to remove the barrier between different office applications by becoming the new standard. Currently only OOo uses it. But future versions of KOffice (1.4) will use it as well. This is the way it should be.
Do you know which mail client am I using? Probably not. But you don't have to worry about whether I can read your emails. Email is an open standard. Do you know what brand of phone your mom uses? Probably not. But you don't have to worry about whether her phone uses the same protocol as yours.
Why should office applications be any different?
:-)
I hope this answers your question. :-)
Cheers,
--
Daniel Carrera | No trees were harmed in the generation of this
PhD student. | e-mail. A significant number of electrons were,
Math Dept. UMD | however, severely inconvenienced.
Back to tips. Ways to contribute. Contact the webmaster.
For questions please contact peschtra@openoffice.peschtra.com.Last Edited 21 June 2006