XPath
From RapidWiki
Like regular expressions, XPath is a formal grammar for "scraping" datas out of any structured file format. Relying heavily on XML symbolic principles, it partially resolves the "multiple matching and capture" that is so tedious with regular expressions; thus, such a query grammar is well suited for transforming an XML file into an "excel-like" file.
Use cases have been reported where XPath succeeds in parsing Microsoft "*.doc" formats, where inner grammar is so complicated that regexps fail to parse it :
- Use of "nbsp;" space delimiters, critical for tokenization process
- Use of "object-like" lexicons, as CSS metadatas
- Use of "span" tags
[edit]
XPath tutorials and examples
to be completed
[edit]
Xpath case uses in RapidMiner
to be completed
[edit]
