Build Simulation
Synopsis
This operator allows you to build a new ExampleSet with similar statistical properties to a reference ExampleSet.
Description
This operator extracts the statistical properties from a reference ExampleSet (e.g. mean and standard deviation) and then builds a new one with the same statistical distribution as the input.
The sample size of the new simulated ExampleSet can be specified as well as the algorithm used to created the new values for the simulated Examples. The operator also provides the SimulationModel used to create the simulated ExampleSets as an object at the国防部output port. It can be stored and reused to create further simulated ExampleSets with the same properties by connecting it to the国防部input port of aBuild Simulationoperator.
This operator allows the user to set attribute values of the generated data to constant values. This can be done in two ways. Either by using theconstant_attributesparameter, which allows a manual definition of constant attributes. Alternatively, the user can provide an ExampleSet on the "con" port and specify a list of attributes and their values. This is generally preferable if there are many attributes which are supposed to be constant.
Input
exa
The reference ExampleSet.
国防部
A SimulationModel. If connected the operator will not fit a new simulation model, but use the provided one.
constant
An ExampleSet with the information on constant attributes. The name of the name and value attributes can be set with the corresponding attributes.
Output
exa
The simulated ExampleSet.
国防部
The SimulationModel, which can be used in another Build Simulation operator to avoid refitting the model. If the mod input is connected, you will receive the passed through simulation model, if not you will receive the fitted one.
ori
The original ExampleSet.
Parameters
Sample size
The number of simulated rows desired for the output.
Algorithm
This parameter allows you to select the algorithm used. It has the following options:
- normal_distribution: With this setting the operator assumes that each attribute in the reference ExampleSet is statisically independent from one another and follows its own normal distribution. The mean and the standard deviation for each input attribute is computed, and then a final new value x is built using the formula: x = (r*s)+m where r is a normally distributed random number, s the standard deviation, and m the mean of the respective attribute.
- correlated_normal_distribution: With this setting all attribute values are derived from a multi-dimensional, correlated normal distribution. Each new row X in the input ExampleSet is built using the formula: X = (R*L)+m where R is a row with normally distributed random data and L is the covariance matrix using Cholsky decomposition.
- empirical_distribution:这组设置操作员使用probability distribution derived from observed data without making any assumptions about the functional form of the population distribution that the data come from. We assume that every attribute is independed from another, and we can fit independend distributions for each of them. For detais on the implementation see:http://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/random/EmpiricalDistribution.html
Constant attributes
Allows you to manually specify constant attributes and their respective values.
Name attribute
If the constant port is connected, then you can provide the attribute names and values using an exampleset. This parameter defines which of the attributes in this constants exampleset contains the name of the attribute, which is to be constant.
Value attribute
If the constant port is connected, then you can provide the attribute names and values using an exampleset. This parameter defines which of the attributes in this constants exampleset contains the value of the attribute, which is to be constant.
使用本地随机seed
This parameter indicates if a local random seed should be used.
Local random seed
If the use local random seed parameter is checked this parameter determines the local random seed.