XPath

From RapidWiki

Jump to: navigation, search

Like regular expressions, XPath is a formal grammar for "scraping" datas out of any structured file format. Relying heavily on XML symbolic principles, it partially resolves the "multiple matching and capture" that is so tedious with regular expressions; thus, such a query grammar is well suited for transforming an XML file into an "excel-like" file.

Use cases have been reported where XPath succeeds in parsing Microsoft "*.doc" formats, where inner grammar is so complicated that regexps fail to parse it :

  • Use of "nbsp;" space delimiters, critical for tokenization process
  • Use of "object-like" lexicons, as CSS metadatas
  • Use of "span" tags

XPath tutorials and examples

to be completed

Xpath case uses in RapidMiner

to be completed

See Also

Regular expressions

Personal tools