Skip to main content

Fill Data Gaps

Synopsis

This operator fills the gaps (based on the ID attribute) in the given ExampleSet by adding new examples in the gaps. The new example will have null values.

Description

The Fill Data Gaps operator fills the gaps (based on gaps in the ID attribute) in the given ExampleSet by adding new examples in the gaps. The new examples will have null values for all attributes (except the id attribute) which can be replenished by operators like the Replace Missing Values operator. It is ideal that the ID attribute should be of integer type. This operator performs the following steps:

  • The data is sorted according to the ID attribute
  • All occurring distances between consecutive ID values are calculated.
  • The greatest common divisor (GCD) of all distances is calculated.
  • All rows which would have an ID value which is a multiple of the GCD but are missing are added to the data set.

Input

example set input

This input port expects an ExampleSet. It is the output of the Subprocess operator in the attached Example Process. The output of other operators can also be used as input. It is essential that meta data should be attached with the data for the input because attributes are specified in their meta data.

Output

example set output

The gaps in the ExampleSet are filled with new examples and the resulting ExampleSet is output of this port.

original

The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

Use gcd for step size

This parameter indicates if the greatest common divisor (GCD) should be calculated and used as the underlying distance between all data points.

Step size

This parameter is only available when theuse gcd for step sizeparameter is set to false. This parameter specifies the step size to be used for filling the gaps.

开始

This parameter can be used for filling the gaps at the beginning (if they occur) before the first data point. For example, if the ID attribute of the given ExampleSet starts with 3 and thestartparameter is set to 1. Then this operator will fill the gaps in the beginning by adding rows with ids 1 and 2.

End

This parameter can be used for filling the gaps at the end (if they occur) after the last data point. For example, if the ID attribute of the given ExampleSet ends with 100 and theendparameter is set to 105. Then this operator will fill the gaps at the end by adding rows with ids 101 to 105.