Regular expressions
From RapidWiki
Contents |
Regular Expressions (REGEXP)
This article is a mockup : Feel free to modify it and to change its categories
Introduction
Regular Expressions are a letter-based formal grammar. As a formal language, "symbols" are letters and "formal words" are "matched patterns" which can be many words at a time. A regexp is a string, which letters are partly litterals to be matched as is, and partly metacharacters introducing specific rules between litterals. As the symbols used both for litterals and metacharacters can be the same, the symbol '\' escape should be used just before such ambiguous symbols to fix their role in the regexp. The different rules produced by metacharacters are grouped into categories, see references at the bottom of the page for more information.
Regexps are used for two purposes :
- To match a given short "pattern" with a "sample" text, where to find out the pattern (a bit like keywords colorized inside texts in search engines)
- Inside the matched text, to extract a useful sub-pattern which will be used typically in Yale/RapidMiner
Writing and using a regexp is actually writing the grammar rules for matching.
Uses in RapidMiner
In RapidMiner, Regexps are used in these operators :
- ExampleSource
- AttributeSubsetPreprocessing
- TextInput
- Segmenter
- FeatureExtraction
- SingleTextInput
- StringTextInput
- AttributeSumClusterCharacterizer
- Crawler
- MashUp
Some of them are deprecated, while using regexps :
The purpose of regexps in RapidMiner can be split into three categories :
- Unique text matching, like in AttributeSubsetPreprocessing, grabbing attributes' names or any "text query" parameter of the operators above
- Multiple match, for the purpose of feeding attributes values, like in Mashup and Feature Extraction
- "Exclusive" multiple match, where the regexp is used not to take the matched text, like in Segmenter or in ExampleSource
Text Queries
To be completed
Attribute feeding
To be completed
Segmentation
To be completed
See also
Articles
Documentation
Websites...
Les Expressions Régulières (french)
Michael Wurst's tutorial on regexp
