Impute Missing Values
Synopsis
This operator estimates values for the missing values of the selected attributes by applying a model learned for missing values.
Description
This is a nested operator i.e. it has a subprocess. This subprocess should always accept an ExampleSet and return a model. The Impute Missing Values operator estimates values for missing values by learning models for each attribute (except the label) and applying those models to the ExampleSet. The learner for estimating missing values should be placed in the subprocess of this operator. Please note that depending on the ability of the inner learner to handle missing values this operator might not be able to impute all missing values in some cases. This behavior leads to a warning. It might hence be useful to combine this operator with a subsequent Replace Missing Values operator.
Input
example set in
This input port expects an ExampleSet. It is the output of the Retrieve operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for the input because attributes are specified in their meta data. The Retrieve operator provides meta data along-with data.
Output
example set out
The missing values in the ExampleSet are replaced by the values estimated by the given model and the resultant ExampleSet is output of this port.
Parameters
Attribute filter type
This parameter allows you to select the attribute selection filter; the method you want to use for selecting attributes in which you want to replace missing values. It has the following options:
- all: This option simply selects all the attributes of the ExampleSet. This is the default option.
- single: This option allows selection of a single attribute. When this option is selected another parameter (attribute) becomes visible in Parameters panel.
- subset: This option allows selection of multiple attributes through a list. All attributes of ExampleSet are present in the list; required attributes can be easily selected. This option will not work if meta data is not known. When this option is selected another parameter becomes visible in Parameters panel.
- regular_expression: This option allows you to specify a regular expression for attribute selection. When this option is selected some other parameters (regular expression, use except expression) become visible in Parameters panel.
- value_type: This option allows selection of all the attributes of a particular type. It should be noted that types are hierarchical. For examplerealandintegertypes both belong tonumerictype. User should have basic understanding of type hierarchy when selecting attributes through this option. When this option is selected some other parameters (value type, use value type exception) become visible in Parameters panel.
- block_type:这个选项是类似的工作value_typeoption. This option allows selection of all the attributes of a particular block type. It should be noted that block types may be hierarchical. For examplevalue_series_startandvalue_series_endblock types both belong tovalue_seriesblock type. When this option is selected some other parameters (block type, use block type exception) become visible in Parameters panel.
- no_missing_values: This option simply selects all the attributes of the ExampleSet which don't contain a missing value in any example. Attributes that have even a single missing value are not selected.
- numeric_value_filter: When this option is selected another parameter (numeric condition) becomes visible in Parameters panel. All numeric attributes whose all examples satisfy the mentioned numeric condition are selected. Please note that all nominal attributes are also selected irrespective of the given numerical condition.
Attribute
The required attribute can be selected from this option. The attribute name can be selected from the drop down box of theparameterattribute if the meta data is known.
属性
The required attributes can be selected from this option. This opens a new window with two lists. All attributes are present in the left list and can be shifted to the right list which is the list of selected attributes.
Regular expression
属性whose name match this expression will be selected. Regular expression is a very powerful tool but needs a detailed explanation to beginners. It is always good to specify the regular expression throughedit and preview regular expressionmenu. This menu gives a good idea of regular expressions. It also allows you to try different expressions and preview the results simultaneously. This will enhance your concept of regular expressions.
Use except expression
If enabled, an exception to the first regular expression can be specified. When this option is selected another parameter (except regular expression) becomes visible in Parameters panel.
Except regular expression
This option allows you to specify a regular expression. Attributes matching this expression will be filtered out even if they match the first expression (expression that was specified inregular expressionparameter).
Value type
Type of attributes to be selected can be chosen from drop down list.
Use value type exception
If enabled, an exception to the selected type can be specified. When this option is selected another parameter (except value type) becomes visible in Parameters panel.
Except value type
属性matching this type will be removed from the final output even if they matched the previously mentioned type i.e.value typeparameter's value.
块类型
块类型of attributes to be selected can be chosen from drop down list.
Use block type exception
If enabled, an exception to the selected block type can be specified. When this option is selected another parameter (except block type) becomes visible in Parameters panel.
Except block type
属性matching this block type will be removed from the final output even if they matched the previously mentioned block type.
Numeric condition
Numeric condition for testing examples of numeric attributes is mention here. For example the numeric condition '>6' will keep all nominal attributes and all numeric attributes having a value of greater than 6 in every example. A combination of conditions is possible: '>6 &&<11' or '<= 5 ||<0'. But && and || cannot be used together in one numeric condition. Conditions like '(>0 &&<2) || (>10 &&<12)' are not allowed because they use both && and ||. Use a blank space after '>', '=' and '<' e.g. '<5' will not work, so use '<5' instead.
Invert selection
If this parameter is set to true, it acts as a NOT gate, it reverses the selection. In that case all the selected attributes are unselected and previously unselected attributes are selected. For example if attribute 'att1' is selected and attribute 'att2' is removed prior to selection of this parameter. After selection of this parameter 'att1' will be removed and 'att2' will be selected.
Include special attributes
Special attributes are attributes with special roles which identify the examples. In contrast regular attributes simply describe the examples. Special attributes are: id, label, prediction, cluster, weight and batch. By default all special attributes are delivered to the output port irrespective of the conditions in the Select Attribute operator. If this parameter is set to true, Special attributes are also tested against conditions specified in the Select Attribute operator and only those attributes are selected that satisfy the conditions.
Iterate
Set this parameter to true if you want to impute the missing values immediately (after having learned the corresponding concept) and iterate afterwards.
Learn on complete cases
If this parameter is set to true, concepts are learned for estimating missing values only on the basis of complete cases. This option should be used when the inner learning approach cannot handle missing values.
Order
This parameter specifies the order of attributes in which missing values should be estimated.
Sort
This parameter specifies the sort direction to be used in order strategy.
Use local random seed
This parameter indicates if alocal random seedshould be used for randomization. Using the same value of thelocal random seedwill produce the same randomization.
Local random seed
This parameter specifies thelocal random seed. This parameter is only available if theuse local random seedparameter is set to true.