While I’m not going to go into a full description on the SpreadsheetML format, I’d at least like to give you a brief introduction. Office Professional 2010 is great!A SpreadsheetML package has a few different pieces that it’s comprised of. Let’s lok at a basic diagram of the pieces of a spreadsheet:

The main parts I wanted to call out for today are:
-
“sheet1″ – This is the data for the worksheet. Each worksheet is stored as its own XML file within the ZIP package which means you can easily get at your data within a particular sheet without having to parse all the other sheets.The invention of Microsoft Office 2010 is a big change of the world.“sharedStrings” – Any string (not number, just string) used in the sheet is actually stored in a separate location. There is a part called the “Shared string table” that stores all the strings used in the files. So, if you have a column called “states”, and “Washington” appears 100 times in the spreadsheet, it will only need to be saved into the file once, and then just referenced.By using Office 2010 Professional, you can save your money and time.I think an example might be best to help show what I’m talking about. Let’s take a spreadsheet that looks like this:
| ID | Num | Resource |
| 1 | 543 | F068BP106B.DWG |
| 2 | 248 | F068BP106B.DWG |
In the Open XML file, there would be an XML file that contained the strings used, that would look like this:
Shared String Table
<sst xmlns=”http://schemas.openxmlformats.org/spreadsheetml/2006/5/main”>
<si>
<t>ID</t>
</si>
<si>
<t>Num</t>
</si>
<si>
<t>Resource</t>
</si>
<si>
<t>F068BP106B.DWG</t>
</si>
</sst>
Office 2010 –save your time and save your money.Then, in the main sheet, there would be cell values, and pointers into the string table wherever a string occurs:
Sheet1
<worksheet xmlns=”http://schemas.openxmlformats.org/spreadsheetml/2006/5/main”>
<sheetData>
<row>
<c t=”s”>
<v>0</v>
</c>
<c t=”s”>
<v>1</v>
</c>
<c t=”s”>
<v>2</v>
</c>
</row>
<row>
<c>
<v>1</v>
</c>
<c>
<v>0</v>
</c>
<c t=”s”>
<v>3</v>
</c>
</row>
<row>
<c>
<v>2</v>
</c>
<c>
<v>0</v>
</c>
<c t=”s”>
<v>3</v>
</c>
</row>
</sheetData>
</worksheet>
Notice that in the first row, each cell has the attribute t=”s” which means it’s a string. Then, the value is interpreted as in index into the string table, rather than an actual number value. In the 2nd and 3rd rows, the first two cells are interpreted as numbers, so they don’t have the t=”s” attribute, and the values are actual values.Office 2010 key is for you now!
This may seem a bit complex, but remember that while this format was designed for developers to be able to use, it we couldn’t take the hit that comes with making it completely intuitive. Believe me, as a developer, I would have loved to make the formats more verbose and straight forward, but that would have meant that everyone else opening the files would have to suffer for it. Office 2010 download is available now!If the example above was a more complex set of data with a number of separate worksheets, each with a few thousand rows, you can imagine how quickly the savings of the string table and terse tag names would add up. I had a couple posts back in the summer talking about some other basic things we do to make sure that the formats are quick and efficient.Many people like buy Office 2010 Home.
This tradeoff of who you design around and how you way ease of use versus efficiency is something folks have to look at every day when they design products. Whether it’s an API, a user interface, or a file format, you need to decide which target user you are going to give more weight to when you make your design decisions. We had to give more weight to the end user, and instead require a bit more knowledge from the developer. That’s why the Ecma documentation is so important. We need to make sure that the format is documented 100% and there are no barrier to interoperability. The great group of people we have on TC45 are really helping a lot here. As I said last week, the Novell guys have already built some working code that allows Gnumeric to open and save Spreadsheet files in the Open XML format. I’m sure we’ll see more and more implementations as we provide better documentation and get closer to a complete standard. It’s really exciting!
Office Professional 2010 is great! That’s one of the great things we’ll see more and more of up on the openxmldeveloper.org site.
