Mapping the data in RapidMiner table
From RapidWiki
Contents |
Lexicon
Example
ExampleSet
Feature
Attribute
Weighted Attribute
Weighted Example
Itemset
Types of attributes
Regular attributes
To be continued...
Special attributes
The three first special attributes are learner-related, while others are more general
Label
"Label" is a type of special attribute which is considered as the output of a learner. Thus :
- Let's imagine a table with 10 attributes, from "att1" to "att10"
- To [Training_a_learner train a learner] which output is "att3", let's modify att3 from "regular" to "label" with ChangeAttributeType
- From there, to train a learner which output is "att1", use the same operator on "att3" from "label" to "regular", then reuse it on "att1" from "regular" to "label"
- If the ExampleSet has been tagged with a ClusterModel, the "cluster" attribute can be changed into label with the same operator
Prediction
"Prediction" is created whenever an already trained learner is applied on an ExampleSet with the operator ModelApplier. "Prediction" corresponds to the label values computed by the learner, which can be compared with the actual label values.
Confidence
In case of a binominal or a polynominal label, confidence values are given, based on ratios like "likelihood" function.
Id
"Id" attribute, if not created, can be tagged with IdTagging : an Id per Example. Typically, this operator is used before a clustering treatment, not to mix the different examples. Moreover, when used with ExampleSetJoin, redundant Ids can be either removed or kept.
Cluster
A clustering operator like UPGMAClustering produces a ClusterModel object. The operator ClusterModel2ExampleSet can reuse this model and apply it on the last stacked ExampleSet. The output ExampleSet will have a new special attribute "cluster". Its values indicate which part of the clusters the different Examples belong to. As said before, it can be reused as a new label (see "label")
Weighted Example
Each ExampleSet can be weighted with a numerical value. Some learners can take into account these weights (trees and bayes nets mainly).
Batch
This special attribute is used with the "DataStream" plugin.
Personal special attributes
With ChangeAttributeType (deprecated) and ChangeAttributeRole, any attribute can become a special "home-made" attribute...
With ExchangeAttributeRoles, any couple of attributes can be swapped, so that a label becomes regular and the other way round. The interest of such an operator is that after working with a temporary label, old attributes can be re-established by swapping once more.
Types of values
- Token, String. See also Text Mining
- Nominal, binominal, polynominal
- Numeric, integer, real
- Special objects :
- Valueseries
- Datastream
- Named Entity Recognition (NER)
Managing the data mapping
Three views : Metadata, data, plot.
Changing : types, names, values.
See also : Preprocessing attributes