Regular expressions

From RapidWiki

Jump to: navigation, search

Contents

Regular Expressions (REGEXP)

This article is a mockup : Feel free to modify it and to change its categories

Introduction

Regular Expressions are a letter-based formal grammar. As a formal language, "symbols" are letters and "formal words" are "matched patterns" which can be many words at a time. A regexp is a string, which letters are partly litterals to be matched as is, and partly metacharacters introducing specific rules between litterals. As the symbols used both for litterals and metacharacters can be the same, the symbol '\' escape should be used just before such ambiguous symbols to fix their role in the regexp. The different rules produced by metacharacters are grouped into categories, see references at the bottom of the page for more information.

Regexps are used for two purposes :

  • To match a given short "pattern" with a "sample" text, where to find out the pattern (a bit like keywords colorized inside texts in search engines)
  • Inside the matched text, to extract a useful sub-pattern which will be used typically in Yale/RapidMiner

Writing and using a regexp is actually writing the grammar rules for matching.

Uses in RapidMiner

In RapidMiner, Regexps are used in these operators :

Some of them are deprecated, while using regexps :

The purpose of regexps in RapidMiner can be split into three categories :

Text Queries

To be completed

Attribute feeding

To be completed

Segmentation

To be completed

See also

Articles

XPath

Documentation

Websites...
Les Expressions Régulières (french)
Michael Wurst's tutorial on regexp

Tools

"Visual regexp" free

Personal tools