06 November 2006

Working with XML - Part 2 - Using XPath to query XML

XPath is a simple language for querying XML documents in order to retrieve nodes matching particular criteria. There are some good references and tutorials out there to help you get to grips with the basics; i'd recommend reading XPath @ W3Schools for starters and then running through the Zvon XPath Tutorial before reading on.

XSL, which the next part of this post is about, makes much use of XPath so you need to get up to speed with it before you venture into XSL.


In part one we loaded the following XML in to the DOM and performed various operations with DOM properties and methods. To use XPath there are only two methods selectNodes which returns a node list and selectSingleNode which returns one node.

<?xml version="1.0" ?>
     <author id="12345">
        <name>Charles Dickens</name>
     <author id="23456">  
        <name>Rudyard Kipling</name>
        <title>Great Expectations</title>
        <title>The Jungle Book</title>

XPath allows you to do some quite complex data selection and analysis using a mixture of path syntax, axes, predicates and functions. Having said that i had quite a hard time finding decent examples of doing some quite simple stuff.

Important - Setting the SelectionLanguage property

If you're using the Microsoft XMLDOM COM component you need to set the SelectionLanguage property of the DOM document to "XPath" otherwise you'll get some very odd results - you do this as follows:

xmlDoc.setProperty "SelectionLanguage", "XPath"

Example 1 - Selecting nodes and checking return value

'a single node
strXPath = "/library/authors"
Set ndAuthors = xmlDoc.documentElement.selectSingleNode(strXPath)

'This test checks whether authors was found or not...
If ndAuthors Is Nothing Then
  'error not found!
  'do something with authors
End If

'multiple nodes
strXPath = "/library/authors/author"
Set nlAuthors = xmlDoc.documentElement.selectNodes(strXPath)

'This test checks whether author nodes were found or not...
If nlAuthors.Length = 0 Then
  'error not found!
  For Each ndAuthor In nlAuthors
     'do something with nodes
End If

If you ran through the Zvon XPath tutorial earlier you should now be able to do some basic selecting of nodes using the two methods i've just shown you.

In the next few examples i'm going to run through some of the things you'll probably want to do but tutorials like the Zvon one don't cover.

Example 2 - Predicates and Axes

Selecting the author element with id "12345"...

strXPath = "/library/authors/author[@id='12345']"
Set ndNode = xmlDoc.documentElement.selectSingleNode(strXPath)

Only selecting the author's name...

strXPath = "/library/authors/author[@id='12345']/name"
Set ndNode = xmlDoc.documentElement.selectSingleNode(strXPath)

Titles of books written by that author...

strXPath = "/library/books/book[author='12345']/title"
Set nlNodes = xmlDoc.documentElement.selectNodes(strXPath)

Example 3 - Functions

Simple counting of nodes...

strXPath = "count(/library/authors/author)"
Set ndCount = xmlDoc.documentElement.selectSingleNode(strXPath)

Combining count with the ancestor axis allows you to select nodes of a particular depth...

strXPath = "//*[count(ancestor::*) > 2]"
Set nlDeepNodes = xmlDoc.documentElement.selectNodes(strXPath)

Books with "The" in their title...

strXPath = "/library/books/book[contains(title,'The')]"
Set nlNodes = xmlDoc.documentElement.selectNodes(strXPath)

Example 4 - Common tasks

Remove nodes that match certain criteria...

strXPath = "/library/books/book[contains(title,'The')]"
Set nlNodes = xmlDoc.documentElement.selectNodes(strXPath)
For Each ndNode In nlNodes
   ndNode.parentNode.removeChild ndNode