XPath

ImprimirCitar
Diagram Sederhana XPath.png

XPath (XML Path Language) is a language that allows you to build expressions that traverse and process an XML document. The idea is similar to regular expressions to select parts of a text without attributes (plain text). XPath allows you to search and select taking into account the hierarchical structure of the XML. XPath was created for use in the XSLT standard, where it is used to select and examine the structure of the transformation input document. XPath was defined by the W3C consortium.

Introduction

All the processing carried out with an XML file is based on the possibility of addressing or accessing each of the parts that compose it, so that we can treat each of the elements in a differentiated way.

The treatment of the XML file begins by locating it throughout the set of existing documents in the world. To carry out this location unequivocally, URIs (Uniform Resource Identifiers) are used, of which URLs (Uniform Resource Locators) are undoubtedly the best known.

Once the XML document is located, the way to select information within it is by using XPath, which is short for what is known as XML Path Language. With XPath we can select and refer to text, elements, attributes and any other information contained within a file XML.

XPath itself is a sophisticated and complex language, but different from the procedural languages we usually use (C, C++, Basic, Java...). Also, like almost everything in the world of XML, it is still in a state of development, so it is not easy to find tools that incorporate all its features.

XPath is, in turn, the basis on which new tools have been specified that are used to process XML documents. Tools such as XPointer, XLink and XQuery (the language that handles XML documents as if it were a database). Thus, XPath is used to say how a style sheet should process the content of an XML page, but also to be able to put links or load specific areas of an XML page in a browser, instead of the entire page.

The XPath data model

An XML document is processed by a parser (or parser) building a tree of nodes. This tree starts with a root element, which branches out through the elements that hang from it and ends in leaf nodes, which contain only text, comments, processing instructions or even that are empty and only have attributes.

The way XPath selects parts of the XML document is based precisely on the generated tree representation of the document. In fact, the "operators" What this language consists of will remind us of the terminology used when talking about trees in computing: root, child, ancestor, descendant, etc.

A special case of nodes are attribute nodes. A node can have as many attributes as you want, and an attribute node will be created for each one. However, these attribute nodes are NOT considered as its children, but rather as tags added to the element node.

The following is an example of how an XML document is converted to a tree. This same example will be used throughout the tutorial. First, the XML document is shown, and then the tree it generates.

XML document:

 Δlibro

أعربية for three streets

Δautor/2005Josefa Santos dispensa/authorr

Δcapítulo num="1"
The first street

≤2
It was a grim night of August...
▪

≥"si"
She, innocent as
inlace href="enlace "purmariposa"
Who climbs the sky in search of libations...
▪

▪

≤2" public="if"
The second street

≤2
It was a dark night of September...
▪

≤2
She, innocent as
♫ inlace href="enlace" purseabejilla vis/enlace
that raises the wind in search of the nectar of the flowers...
▪

▪

≤1"a" public="si"
Third street

≤2
It was a dense December night...
▪

≤2
She, she's blind.
♫ inlace href="enlace" purseabejilla vis/enlace
that raises the space in search of bugs to eat...
▪
▪
Δ/libro

Generated tree :

 /
+---book
日本語
+... caption
日本語
2 by three streets
日本語
+---author
日本語
Δ +---(text)Josefa Santos
日本語
+---chapter [num=1]
日本語
The first street
日本語
Δ +-paragraph
SPECIAL GENDER LICIT LICIT MIN LICIT MIN LIC MIN LIC MIN LIC MIN LIC MIN LIC MIN LIC UB LIC
It was a grim night...
日本語
Δ +---paragraph [destacar=si]
日本語
She, as innocent
日本語
Δ +---link [href=link]
SPECIAL GENDER LICIT LICIT MIN LICIT MIN LIC MIN LIC MIN LIC MIN LIC MIN LIC MIN LIC UB LIC
Δ Δ +---(text)mariposa
日本語
Δ +---(text) that climbs the sky in search of libations...
日本語
+---chapter [num=2, public=yes]
日本語
+---(text)The second street
日本語
+-paragraph
日本語
It was a dark night...
日本語
+-paragraph
日本語
She, like innocent bee...


Types of Nodes

There are different types of nodes in a tree from an XML document, namely: root, element, attribute, text, comment and processing instruction (respectively; root, elements, attribute, text, comment and processing instruction). All this is very beneficial.

Root Node

Identified by /. The root node should not be confused with the root element of the document. Thus, if the XML document in our example has a book as its root element, this will be the first node that hangs from the root node of the tree, which is: /.

I repeat: / refers to the root node of the tree, but not to the root element of the XML document, even though an XML document can only have one root element. In fact, we can affirm that the root node of the tree contains the root element of the document.

Element Node

Any element in an XML document becomes an element node within the tree. Each element has its parent node. The parent node of any element is itself an element, except the root element, whose parent is the root node. Element nodes in turn have children, which are: element nodes, text nodes, comment nodes, and processing instruction nodes. Element nodes also have properties such as their name, their attributes, and information about "namespaces" who has assets.

An interesting property of element nodes is that they can have unique identifiers (for this they must be accompanied by a DTD that specifies that these attributes take unique values), this allows referencing said elements in a much more direct way.

Text nodes

By text we are going to refer to all the characters of the document that are not marked with any label. A text node has no children, that is, the different characters that form it are not considered its children.

Attribute nodes

As we have already indicated, attribute nodes are not so much children of the element node that contains them as tags added to said element node. Each attribute node consists of a name, a value (which is always a string), and a possible "namespace".

Those attributes that have the default value assigned in the DTD will be treated as if the value had been assigned to them when writing the XML document. Instead, nodes are not created for attributes not specified in the XML document, and with the #IMPLIED property defined in its DTD. Neither are attribute nodes created for namespace definitions. All this is normal if we take into account that it is not necessary to have a DTD to process an XML document.

Comment and processing instruction nodes

In addition to the specified nodes, nodes are also generated in the tree for each node with comments and processing instructions. The content of these nodes can be accessed with the string-value property.

Syntax and Semantics(XPath 1.0)

The most important type of expression in XPath is a location path. A location path consists of a sequence of location steps. For each localization step there are 3 components:

  • A Axis
  • One Node test
  • 0 or more Preached.

An XPath expression is evaluated with respect to a context node. An axis specifier such as 'child' ('child') or 'descendant' ('descendant') specifies the direction to navigate from the context node. The 'test' ('test') and the predicate is used to filter the 'nodes' ('nodes') specific according to the specific axis: For example, the test node 'A' requires that all nodes to be navigated have the 'label' 'A'. A predicate can be used to specify that the selected nodes have a specific property, these are specified by the XPath expression.

The XPath syntax has two forms: The abbreviated syntax, is more compact and allows XPaths to be written and read easily and intuitively, in many cases using characters that are familiar and a known way of building it. The full syntax is fancier, but allows you to specify more options and is more descriptive to read, as long as you read it carefully.

Shortcut Syntax

The compact notation allows many default values and abbreviations for the most common cases. Given the XML containing the following example:

  ≤2 ≤2≤3

A simple select with XPath shorthand syntax takes a form like this:

  • /A/B/C

it selects element C at the address of the 'child' of element B that is a child of element A, thus selecting the element from the furthest outside of the XML document. The XPath syntax imitates a URI (Uniform Resource Identifier) which in Spanish means 'Uniform Resource Identifier' and a Unix-style file path syntax.

More complex expressions can be constructed using a specific axis other than the default 'child' axis, a node test that does not have a simple name, or predicates, such as write in a right parenthesis after any step. For example, the expression:

  • A//B/*[1]

selects the first child ('*[1]'), whatever its name, of each B element and its children. This symbol ('//') refers to taking a descendant of element A, this is a child of the node of the current context (The expression does not begin with an & #39;/'). Note that the [1] predicate binds more tightly than the / operator. To select the first selected node using the A//B/* expression, type (A//B/*)[1]. Note, that the value of the index in the XPath predicate (technically, 'next position' of the XPath node set) starts at 1, not 0 as is common in languages like Javascript, C, and Java.

Expanded Syntax

We can write the two examples above in the expanded (unabbreviated) syntax as follows:

  • /child::A/child::B/child::C
  • child::A/descendant-or-self::node()/child::B/child::node()[position()=1]

Here, at each step of the XPath, the axis (example: child or descendant-or-self) is specified explicitly, followed by by :: and then the node test, such as A or node() in the previous examples.

In this same, but shorter:

A//B/*[position()=1]

Axis Specifier

The axis specifier indicates the direction of navigation within the representation tree of the XML document. The axes available are:

Axis Specifiers in XPath
Full Syntax Abbreviated Syntax Notes
ancestor
ancestor-or-self
attribute(SimboloArroba)(SimboloArroba)abc is the short form for attribute::abc
childxyz is an abbreviation for child::xyz
descendant
descendant-or-self//// is an abbreviation for /descendant-or-self::node()/
following
following-sibling
namespace
parent.... is an abbreviation for parent::node()
preceding
preceding-sibling
self.. is an abbreviation for self::node()

As an example of using the attribute axis in the shorthand syntax, //a/(AtSymbol)href selects the attribute named href in the a element on either side of the document tree. The expression "." (short for self::node() ) is commonly used within a predicate to refer to the currently selected node.. For example, h3[.='See also'] selects an element named h3 in the current context, whose text content is See also.

Node Test

The node test can consist of a specific node name or a more general expression. In the case of an XML document in which the namespace prefix gs has been defined, //gs:enquiry will search all elements. query in that namespace, and //gs:* will find all elements, regardless of the local name in this namespace.

Other node test formats are:

comment()
Find in the XML an example comment node.
text()
Find a whole text type, example hello world in hello world
processing-instruction()
Find XML In processing instructions for example . In this case, processing-instruction('php') We will.
node()
Find any node at all.

Predicates

Predicates, written as expressions in brackets, can be used to filter a set of alls according to some condition. For example, a returns an array of todo (all a elements that are children of the context node), and a[(AtSymbol)href='help.php'] saves only elements that have the href attribute set to help.php.

There are no limits to the number of predicates in this step and need not be limited to the last step of an XPath. They can also be nested at any depth. Paths specified in predicates begin in the context of the current step (that is, the test step for the immediately preceding node) and do not alter that context. All predicates must be satisfied for a match to occur.

When the value of the predicate is numeric, it is syntactic sugar to compare with the position of the node in the node set (as given by the position() function). So p[1] is a short form for p[position()=1] and select the first child element p, while p[last()] is a short form for p[position()=last()] and select the last p child of the context node current.

In the other case, the value of the predicate is automatically converted to a boolean value. When the predicate evaluates to a nodeset, the result is true when the nodeset is non-empty. So p[(AtSymbol)x] selects those p selects elements that have a x attribute.

A more complex example is the expression: a[/html/(AtSymbol)lang='en'][(AtSymbol)href='help.php'] [1]/(AtSymbol)target selects the value of the target attribute of the first a element among the children of the context node that have the attribute href with the value help.php, provided that the parent html element has the lang attribute with the value en . Reference to an attribute of the top-level element in the first predicate does not affect the context of other predicates or the location step itself.

The order of the predicates is meaningful if the predicates test the position of a node. Each predicate takes a set of nodes and returns (potentially) a smaller set. So a[1][(AtSymbol)href='help.php'] will match only if the first child a of the context node satisfies the condition (AtSymbol)href='help.php', while a[(AtSymbol)href='help.php'][1] will find the first child a that satisfies the condition.

Functions and operators

XPath 1.0 defines 4 data types: The node set (node sets without intrinsic order), strings (Character String), numbers (Numbers) and booleans (Booleans).

The available operators are:

  • The operators "/", "/" and "[...]" are used in xpath expressions, as described above.
  • Connection operator "IVA", which forms the union of two sets of nodes.
  • The Boolean operators "and" and "or", together with the function "not()"
  • Arithmetic operators "+", "-", "*", "div" (division) and "mod" (Module)
  • Comparison operator "=", "!=", ",", "pur", "," =", "give="

The function library includes:

  • Functions to manipulate strings: concat(), substring(), contains(), substring-before(), substring-after(), translate(), normalize-space(), string-length()
  • Functions to manipulate numbers: sum(), round(), floor(), ceiling()
  • Functions to obtain the properties of a node: name(), local-name(), namespace-uri()
  • Functions for obtaining information on the processing context: position(), last()
  • Conversion functions: string(), number(), boolean()

Some of the commonly used functions are described below.

Node set functions

position()
It returns a number that represents the node position in the node sequence that is currently being processed (for example, the node selected by the xsl:for-each instruction in XSLT).
count(node-set)
returns the number of nodes in the set of nodes that match the argument.

String Functions

string(s)object?
converts any of the 4 data types of XPath into a string according to the building rules. If the value of the argument is a set of nodes, the function returns a string value corresponding to the first node (According to the order of the document), ignoring all future nodes.
concat(string, string, string*)
concatena 2 or more string
starts-with(s1, s2)
return true Yeah. s1 Start with s2
containss1, s2)
return true Yeah. s1 contains s2
substring(s)string, start, length?
example: substring("ABCDEF",2,3) return "BCD".
substring-before(s1, s2)
example: substring-before("1999/04/01","/") return 1999
substring-after(s1, s2)
example: substring-after("1999/04/01","/") return 04/01
string-length(string?)
returns the number of characters of a string
normalize-space(string?
all the original and final blank spaces will be removed and any blank character sequence will be replaced by a single space. This is very useful when the original XML may have been formatted for Pretty-printing, which could make additional string processing unreliable.

Boolean Functions

not(Boolean)
denies the expression booleana.
true()
is evaluated as bar.
false()
is evaluated as false.

Number functions

sum(node-set)
converts the string values of all nodes found by the XPath argument into numbers, according to the integrated caste rules, then returns the sum of these numbers.

Example of use

Expressions can be created inside predicates using the operators: =, !=, <=, <, >= and >. Boolean expressions can be combined with parentheses () and the Boolean operators and and or as well as the not() described above. Numerical calculation can use *, +, -, div and mod. Strings can consist of Unicode characters.

//item[(AtSymbol)price > 2*(AtSymbol)discount] selects items whose price attribute is greater than twice the numeric value of the discount attribute.

Full node sets can be combined with the operator ('unioned') which consists of the pipe character |. Node sets that satisfy multiple conditions can be found by combining the conditions within a predicate with 'or'.

v[x or y] | w[z] can return a single node set consisting of all v elements that have a child element x or y, as well as all w elements that have child z elements, that were found in the current context.

Contenido relacionado

ICab

iCab is a proprietary web browser for Mac OS and Mac OS X developed in Germany by Alexander...

Data structure

In computer science, a data structure is a particular way of organizing information in a computer so that it can be used efficiently. Different types of data...

IP header

The minimum size of the header is 20 bytes while the maximum is 60...
Más resultados...
Tamaño del texto:
Copiar