由用户需要离散化tion
Synopsis
This operator discretizes the selected numerical attributes into user-specified classes. The selected numerical attributes will be changed to nominal attributes.
Description
This operator discretizes the selected numerical attributes to nominal attributes. The numerical values are mapped to the classes according to the thresholds specified by the user in theclassesparameter. The user can define the classes by specifying the upper limit of each class. The lower limit of every class is automatically defined as the upper limit of the previous class. The lower limit of the first class is assumed to be negative infinity. 'Infinity' can be used to specify positive infinity as upper limit in the classes parameter. This is usually done in the last class. If a class is named as '?', the numerical values falling in this class will be replaced by unknown values in the resulting attributes.
Differentiation
Discretize by Binning
The Discretize By Binning operator creates bins in such a way that the range of all bins is (almost) equal.
Discretize by Frequency
The Discretize By Frequency operator creates bins in such a way that the number of unique values in all bins are (almost) equal.
Discretize by Size
The Discretize By Size operator creates bins in such a way that each bin has user-specified size (i.e. number of examples).
由熵离散化
The discretization is performed by selecting bin boundaries such that the entropy is minimized in the induced partitions.
Input
example set
This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for input because attributes are specified in their meta data. The Retrieve operator provides meta data along-with data. Note that there should be at least one numerical attribute in the input ExampleSet, otherwise use of this operator does not make sense.
Output
example set
The selected numerical attributes are converted into nominal attributes according to the user specified classes and the resultant ExampleSet is delivered through this port.
original
ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
preprocessing model
This port delivers the preprocessing model, which has information regarding the parameters of this operator in the current process.
Parameters
Create view
It is possible to create a View instead of changing the underlying data. Simply select this parameter to enable this option. The transformation that would be normally performed directly on the data will then be computed every time a value is requested and the result is returned without changing the data.
Attribute filter type
This parameter allows you to select the attribute selection filter; the method you want to use for selecting attributes. It has the following options:
- all: This option simply selects all the attributes of the ExampleSet. This is the default option.
- single: This option allows selection of a single attribute. When this option is selected another parameter (attribute) becomes visible in the Parameters panel.
- subset:这个选项允许选择多个attributes through a list. All attributes of ExampleSet are present in the list; required attributes can be easily selected. This option will not work if meta data is not known. When this option is selected another parameter becomes visible in the Parameters panel.
- regular_expression: This option allows you to specify a regular expression for attribute selection. When this option is selected some other parameters (regular expression, use except expression) become visible in the Parameters panel.
- value_type: This option allows selection of all the attributes of a particular type. It should be noted that types are hierarchical. For examplerealandintegertypes both belong to thenumerictype. Users should have basic understanding of type hierarchy when selecting attributes through this option. When this option is selected some other parameters (value type, use value type exception) become visible in the Parameters panel.
- block_type: This option is similar in working to thevalue_typeoption. This option allows selection of all the attributes of a particular block type. It should be noted that block types may be hierarchical. For examplevalue_series_startandvalue_series_endblock types both belong to thevalue_seriesblock type.When this option is selected some other parameters (block type,use block type exception) become visible in the Parameters panel.
- no_missing_values: This option simply selects all the attributes of the ExampleSet which don't contain a missing value in any example. Attributes that have even a single missing value are removed.
- numeric value filter: When this option is selected another parameter (numeric condition) becomes visible in the Parameters panel. All numeric attributes whose all examples satisfy the mentioned numeric condition are selected. Please note that all nominal attributes are also selected irrespective of the given numerical condition.
Attribute
The required attribute can be selected from this option. The attribute name can be selected from the drop down box of theparameterattribute if the meta data is known.
Attributes
The required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list. Attributes can be shifted to the right list, which is the list of selected attributes.
Regular expression
属性的名字匹配表达式will be selected. Regular expression is a very powerful tool but needs a detailed explanation to beginners. It is always good to specify the regular expression through theedit and preview regular expressionmenu. This menu gives a good idea of regular expressions and it also allows you to try different expressions and preview the results simultaneously.
Use except expression
If enabled, an exception to the first regular expression can be specified. When this option is selected another parameter(except regular expression) becomes visible in the Parameters panel.
Except regular expression
This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first regular expression (regular expression that was specified in theregular expressionparameter).
Value type
The type of attributes to be selected can be chosen from a drop down list.
Use value type exception
If enabled, an exception to the selected type can be specified. When this option is enabled, another parameter (except value type) becomes visible in the Parameters panel.
Except value type
The attributes matching this type will not be selected even if they match the previously mentioned type i.e.value typeparameter's value.
Block type
The block type of attributes to be selected can be chosen from a drop down list.
Use block type exception
If enabled, an exception to the selected block type can be specified. When this option is selected another parameter (except block type) becomes visible in the Parameters panel.
Except block type
The attributes matching this block type will be not be selected even if they match the previously mentioned block type i.e.block typeparameter's value.
Numeric condition
The numeric condition for testing examples of numeric attributes is specified here. For example the numeric condition '>6' will keep all nominal attributes and all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '>6 &&<11' or '<= 5 ||<0'. But && and || cannot be used together in one numeric condition. Conditions like '(>0 &&<2) || (>10 &&<12)' are not allowed because they use both && and ||. Use a blank space after '>', '=' and '<' e.g. '<5' will not work, so use '<5' instead.
Include special attributes
The special attributes are attributes with special roles. Special attributes are those attributes which identify the examples. In contrast regular attributes simply describe the examples. Special attributes are: id, label, prediction, cluster, weight and batch. By default all special attributes selected irrespective of the conditions in the Select Attribute operator. If this parameter is set to true, Special attributes are also tested against conditions specified in the Select Attribute operator and only those attributes are selected that satisfy the conditions.
Invert selection
If this parameter is set to true, it acts as a NOT gate, it reverses the selection. In that case all the selected attributes are unselected and previously unselected attributes are selected. For example if attribute 'att1' is selected and attribute 'att2' is unselected prior to checking of this parameter. After checking of this parameter 'att1' will be unselected and 'att2' will be selected.
Classes
This is the most important parameter of this operator. It is used to specify the classes into which the numerical values will be mapped. The names and upper limits of the classes are specified here. The numerical values are mapped to the classes according to the defined thresholds. The user can define the classes by specifying the upper limit of each class. The lower limit of every class is automatically defined as the upper limit of the previous class. The lower limit of the first class is assumed to be negative infinity. 'Infinity' can be used to specify positive infinity as upper limit in the classes parameter. This is usually done in the last class. If a class is named as '?', the numerical values falling in this class will be replaced by unknown values in the resulting attributes.