XPath: Extracting Values from XML
XPath is a language that allows you to traverse through an XML document to find information. It's comparable to JSONPath but designed specifically for XML documents.
Syntax
XPath expressions are used to navigate through elements and attributes in an XML document. The root element is referred to with a single slash (/), and child elements are accessed via slash notation as well.
A basic template for writing XPath syntax would look like this:
/store/book[1]/title
This XPath expression would be used to access the title of the first book in the store.
Operators
- / : Selects from the root node.
- // : Selects nodes from the current node that match the selection no matter where they are.
- . : Selects the current node.
- .. : Selects the parent of the current node.
- @ : Selects attributes.
- : Matches any element node.
- [@attrib] : Selects elements based on the attribute value.
- [@attrib='value'] : Selects elements based on the attribute value exactly equal to the provided value.
Examples
Consider the following XML document:
<store>
<book>
<category>reference</category>
<author>Nigel Rees</author>
<title>Sayings of the Century</title>
<price>8.95</price>
</book>
<book>
<category>fiction</category>
<author>Evelyn Waugh</author>
<title>Sword of Honour</title>
<price>12.99</price>
</book>
<book>
<category>fiction</category>
<author>Herman Melville</author>
<title>Moby Dick</title>
<isbn>0-553-21311-3</isbn>
<price>8.99</price>
</book>
<bicycle>
<color>red</color>
<price>19.95</price>
</bicycle>
</store>
Here are some XPath expressions and their corresponding results based on the given XML document:
- /store/book/title: This XPath expression selects the title of all books in the store. The result would be a list of all the book titles
"Sayings of the Century", "Sword of Honour", "Moby Dick"
- /store/bicycle/color: This XPath expression selects the color of the bicycle in the store.
"red"
- //book[author='Evelyn Waugh']: This XPath expression selects the book(s) written by author 'Evelyn Waugh'. The result would be the details of the book.
"Sword of Honour"
- //book[1]: This XPath expression selects the first book in the book list. The result is a single book element
<book>
<category>reference</category>
<author>Nigel Rees</author>
<title>Sayings of the Century</title>
<price>8.95</price>
</book>
- //book[last()]: This XPath expression selects the last book in the book list. The result is a single book element:
<book> <category>fiction</category> <author>Herman Melville</author> <title>Moby Dick</title> <isbn>0-553-21311-3</isbn> <price>8.99</price> </book>
These examples further illustrate how XPath can be used to extract various types of information from an XML document, by selecting specific properties or by using operators to select multiple items based on their positions or values.