Insights Technical Hub How to select DOM elements with XPath

How to select DOM elements with XPath

The XML Path Language (XPath) is used for locating XML elements based on their relativity to other elements. Furthermore, DOM elements that are written with HTML can be accessed via XPath. Using XPath to locate or select an HTML element is advantageous because of the element relativity criteria. Elements can be found based on where there in the DOM tree, characteristics they have, where they may be on a rendered web page, or other elements they are near.

Requirements

An HTML editor (can be a text editor), and XPath parser. Many parsers are available, and methods are included in Javascript for parsing XPath.

Process

Once you have identified which element you wish to select, you can begin to write your XPath expression. In the below HTML fragment, we can write an expression for selecting any element:

<table> 
<tr> 
<td class="td class">Row value 1</td><td class="td class">Row value 2</td> 
</tr> 
</table>

The above HTML elements would render as a table with one row which contains two columns. The values within the columns are “Row value 1” and “Row value 2.” If we wanted to select the <tr> row element which is unnamed and has no specified ID, we could write an XPath expression to locate it.

To locate an element off of the root of the element hierarchy, you can prefix it with:

/

or:

//

to find an element in any other location. To find an element relative to another, you can use:

parent:: 
child:: 
ancestor:: 
descendant:: 
following-sibling:: 
preceding-sibling::

among others, where a node’s siblings are both direct children of the same parent, and ancestors or descendants include parents and children of an element, plus more distant relations. To select any element:

*

it’s parent:

..

the current node:

.

a node’s attributes:

@

A node’s contents are contained within brackets [ ] and multiple contents can be separated with and or or. A node’s value can also be specified using the text() function. We can select the <tr> row element with the following XPath expression:

//parent::td[@class="td class" and text()="Row value 1"]

This is one way to locate the row. In a larger HTML sample, there can be many <tr> elements, such as:

<table> 
<tr> 
<td class="td class">Row value 1</td><td class="td class">Row value 2</td> 
</tr> 
<tr> 
<td class="td class">Row value 3</td><td class="second td class">Row value 4</td> 
</tr> 
<tr> 
<td class="third td class">Row value 1</td><td class="td class">Row value 1</td> 
</tr> 
</table>

so other selection criteria can be useful. The XPath language contains many functions, operators, and node identifiers not covered here, which can help in defining selection criteria.

More Information

Please see a list of XPath functions, operators and node identifiers for more options when defining XPath expressions.