Mapping the data in RapidMiner table

From RapidWiki

Jump to: navigation, search

Contents

Lexicon

Example
ExampleSet
Feature
Attribute
Weighted Attribute
Weighted Example
Itemset

Types of attributes

Regular attributes

To be continued...

Special attributes

The three first special attributes are learner-related, while others are more general

Label

"Label" is a type of special attribute which is considered as the output of a learner. Thus :

  • Let's imagine a table with 10 attributes, from "att1" to "att10"
  • To [Training_a_learner train a learner] which output is "att3", let's modify att3 from "regular" to "label" with ChangeAttributeType
  • From there, to train a learner which output is "att1", use the same operator on "att3" from "label" to "regular", then reuse it on "att1" from "regular" to "label"
  • If the ExampleSet has been tagged with a ClusterModel, the "cluster" attribute can be changed into label with the same operator

Prediction

"Prediction" is created whenever an already trained learner is applied on an ExampleSet with the operator ModelApplier. "Prediction" corresponds to the label values computed by the learner, which can be compared with the actual label values.

Confidence

In case of a binominal or a polynominal label, confidence values are given, based on ratios like "likelihood" function.

Id

"Id" attribute, if not created, can be tagged with IdTagging : an Id per Example. Typically, this operator is used before a clustering treatment, not to mix the different examples. Moreover, when used with ExampleSetJoin, redundant Ids can be either removed or kept.

Cluster

A clustering operator like UPGMAClustering produces a ClusterModel object. The operator ClusterModel2ExampleSet can reuse this model and apply it on the last stacked ExampleSet. The output ExampleSet will have a new special attribute "cluster". Its values indicate which part of the clusters the different Examples belong to. As said before, it can be reused as a new label (see "label")

Weighted Example

Each ExampleSet can be weighted with a numerical value. Some learners can take into account these weights (trees and bayes nets mainly).

Batch

This special attribute is used with the "DataStream" plugin.

Personal special attributes

With ChangeAttributeType (deprecated) and ChangeAttributeRole, any attribute can become a special "home-made" attribute...
With ExchangeAttributeRoles, any couple of attributes can be swapped, so that a label becomes regular and the other way round. The interest of such an operator is that after working with a temporary label, old attributes can be re-established by swapping once more.

Types of values

  • Token, String. See also Text Mining
  • Nominal, binominal, polynominal
  • Numeric, integer, real
  • Special objects :
    • Valueseries
    • Datastream
    • Named Entity Recognition (NER)

Managing the data mapping

Three views : Metadata, data, plot.
Changing : types, names, values. See also : Preprocessing attributes

Personal tools