Release of the version 0.6.1 of the Operator Toolbox

tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, MemberPosts:164RM Research
edited December 2018 inKnowledge Base

亲爱的Community,

We are glad to announce the release of the version 0.6.1 of the Operator Toolbox extension (Marketplace Link).

Two new Operators are included in the new version and two existing Operators are provided with an enhanced functionality.

Smote Upsampling

There are situtations when you only have a small number of Examples of one of your classes and you want to upsample your Examples to provide your machine learning algorithm with a larger number of Examples or with an equalized class distribution. ntil now you can do this by using the Sample (Boostrapping) Operator. But if you want to upsample with similar but not the same Examples you can now use the new Smote Upsampling Operator.

The Operator implements the Synthetic Minority Over-sampling Technique, as proposed byChawla et. al., Journal of Artificial Intelligence Research 16 (2002), 321 - 357.

The Operator only samples up the minority class. A new Example is generated by using a random Example of the minority class. Than theknearest neighbors (also from the minority class) of this Example are calculated and one of them is randomly chosen. The new Example is created on the line between the two Examples.

Figure 1 illustrates the principle functionality.

Smote.pngFigure 1: Illustration of the principle functionality of Smote Upsampling algorithm. All Examples are from the minority class.

Generate Univariate Series

The new Operator Generate Univariate Series is an enhanced version of the old Generate Date Series Operator of the Operator Toolbox extension .Besides the option of generating an equaly spaced date series, the new Operator is also capable to generate an equidistant real valued series. By the use of the parameterdata_typethe user can now specify if he wants a numeric (linear spaced) series or a date series. For the real valued series the min and max value and the step size can be specified. In case of a date series the already known parameters of the Generate Date Series Operator can be used to generate the series.

Figure 2 shows a process in which the new Operator is used to generate a series with values between -pi (-180°) and 2*pi (360°) with a stepsize of pi/9 (10°).The Generate Attributes Operator is used to calculates the sinus of these values.

Generate_Univariate_Series_Result.pngFigure 2: Process to generate a real valued series (called x) between -pi and 2*pi with a stepsize of pi/9. The parameters of the Generate Univariate Series Operator is shown, as well as the result of the sinus(x) calculation in Generate Attributes.

The shown process is also provided as a tutorial process in the Operator help. The old Generate Date Series Operator is now deprecated and will be removed in the future from the extension.

Enhancements

The Generate ExampleSet Operator is now capable to parse all data as nominal Attributes.

The Group Into Collection Operator now keeps the special roles, set for Attributes in the input ExampleSet, also in the ExampleSets of the Collection.

sgenzer Thomas_Ott gracylayla6 dbabrauskaite

Comments

  • ManarManar MemberPosts:9Newbie
    Thank you so much, but I have a question, please.
    Can we implement the smote with the TF-IDF in text classification?
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM ModeratorPosts:2,959Community Manager
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University ProfessorPosts:3400年RM Data Scientist
    of course you can add it on tf-idf'ed texts
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
登录orRegisterto comment.