blue       earthtones
Emitting Well-Formed XML from Excel

Discussion: I've actually been amazed by the volume of technical and scientific data, all of a non-statistal or numeric character, that has been compiled in Excel workbooks over the course of decades. In 2008 I was handed an extensive mineral database (geology) which converged data seldom found in other sources, and even more seldom searchable on the internet other than by gleaning individual tidbits from a wide range of textbooks and specialized sources. This data had been compiled over time in an Excel workbook, and after a brief analysis, I concluded that in order to present this on the web using AJAX, the first step was to emit the entire dataset from the workbook as an XML database, and proceed from there.

Note: I've retired the custom code published on this page some years ago for Excel 2003 in deference to taking advantage of developments in Excel since then. Read on.

March 2013 produced the requirement to deliver an updated version of the mineral database as an .xml with .xsd file from Excel 2010 for a mobile app (that should appear in the Apple AppStore in a few months). Things have progressed a long way since Excel 2003 as you might imagine, and I took a different approach for this requirement.

Solution: It turns out to be a relatively straightforward process to emit well-formed XML directly from an Excel workbook today, though one risks encountering a few impenetrable error messages along the way unless every undocumented requirement is met, and the steps are taken in the correct order. So I'll share a "cookbook" here that avoids the pitfalls.

For completeness, I'll assume that you're starting with nothing but an Excel worksheet containing the data you want to transform to and export as XML. In this event, you'll need to accomplish the following five simple steps:

  1. create a small "specimen" xml file with two prototype elements in it
  2. create an .xsd file (XML Schema) from the prototype
  3. load the xsd into the Excel worksheet containing your data
  4. "map" the xsd to the columns in your worksheet
  5. export the XML file

It really is that simple, but with that being said, I will confess that it took me a few hours to get it right, so here's a complete example based on the steps outlined above.

Instructions provided here were developed with Firefox on Win-7 and Excel 2010, but this process will also work with other browsers, operating systems, and Excel versions.


1. Create a Specimen XML File

Using a plain-text editor such as Notepad++ , Windows Notepad, or Mac Edit, create an example of the XML you need. There must be TWO of the recurring elements, in the example below, <mineral>, so that you won't have to make a small hand-edit to the xsd after it's generated. The data content is generally unimportant as this file is just a template for the schema.

For this tutorial, the content of the specimen should be:

To create the specimen file:

  1. copy the specimen xml data:
    • right mouse-click the textarea above
    • click "Select All"
    • click "Copy"
  2. paste into your text file
  3. Save the file as minerals_template.xml

2. Create an .xsd File (XML Schema)

You could readily hand-code the xsd if you're determined to, but I find it's far simpler and quicker just to take advantage of one of the numerous websites that do the work of generating a well-formed xsd from your prototype xml for you. For this example, I've used Dan Wahlin's XML Schema Generator, who's output Excel groks well. To do this:

  • bring up XML Schema Generator in any browser
  • copy your specimen XML file text from step 1 above (for this tutorial: minerals_template.xml)
  • paste it into the text area provided
  • select Russian Doll Style
  • click the Generate Schema button
  • right mouse click the results, => Select All, => Copy
  • create a new local file with Notepad++, Notepad, or Edit, and Paste the results
  • save the file as minerals.xsd

You now have the .xsd file for Excel, and won't require the template produced in Step 1 again.


3. Load the.xsd File into Excel

For this step, you'll need the Developer features of Excel. If you don't already have this enabled, see the Microsoft tech note Enabling Excel Developer features for instructions. Then take the following steps:

  • open a new Excel workbook
  • navigate to the Developer ribbon
  • click Source in the XML section, this will open the XML Source window
  • click the XML Maps button on the lower right, this will open the XML Maps window
  • click Add in the XML Maps window
  • browse to the minerals.xsd file created in Step 2 above and click OK
  • the XML map should appear in thw XML Maps window

4. Map the xsd to the columns in your worksheet

  • copy the data from the textarea below:
    • right mouse-click the data below
    • click "Select All"
    • click "Copy"
  • Paste the data into cell A1 in your worksheet
  • Drag the topmost node of the map to the first cell of the data you want to map.
    For this tutorial, that means drag the mineral_database node to cell A1.

Result – Excel will map and highlight your data. It should appear as shown below.


5. Export the XML

Click Export in the XML section of the Developer ribbon and Save your XML file as minerals.xml.

You're done.


General

Of course, there are any number of similar steps that will bring you to an equally similar result depending on your starting point. Some, if not most, will also bring along some impenetrable error messages from Excel that can take quite some time to work out. One thing to bear in mind when experimenting with variations on this example is that Excel requires the maxOccurs="unbounded" property in the <xsd:element > tag in xsd in order to map more than one row.

Here are some closing thoughts:

  1. Tag Names: XML tag names are quite flexible, and there is no firm requirement on their style in the XML specification. You'll find a wide range of opinions in a web search, but that being said, experience has taught me that the most bullet-proof approach is to use lower case, underscore-separated names such as <refraction_indices>.
  2. Embedded HTML: In a word - don't. As a highly experienced UI developer I've tried every variation. Yes, CDATA tags will allow you to embed HTML in XML elements, but CDATA is - by definition - unparsed and will not render on a web page as you might like or expect. If you need to format XML data on a web page, such as the <formula> element in this example, use a JavaScript function or jQuery widget after the data is loaded. For an example of this, see Formatting Chemical Formulas in DHTML Pages elsewhere on this website.
  3. Copyright Notice: The data used in this example is courtesy of and Copyright © 2006 - 2013, Gelogy Park, LLC (theGeoZone.com)