Categories

Versions

Custom operators

On this page we explain how you can embed your Python code into RapidMiner processes even more by creating custom operators using thePython Learner,Python TransformerandPython Forecasteroperators. You can thenshare these custom operatorswith others who aren't adept Python coders.

The Python Learner operator

WithPython Learner,您可以创建一个基于Python的薪酬模式atible with RapidMiner's model interfaces. Models created using Python Learner (and custom operators derived from it) can be applied using the Apply Model operator, can be trained using RapidMiner's Cross-Validation operators, can be fine-tuned using Optimize operators, and so on.

When you drag a new Python Learner to the canvas from the Operators panel is RapidMiner Studio, you will get an operator similar to other learners (in operator color and input/output ports). It will also have a few predefined parameters which you can edit. See thelist of supported parameter typesbelow.

The operator information panel also shows the capabilities of this Python based learner.

To edit the parameter definitions and capabilities, click the gear icon on the Parameters panel. A JSON editor will appear where you can add and modify these traits. The JSON you write here will be validated and you will get warnings if something isn't correct. If you Apply an incorrect configuration, your input ports will disappear from the operator on the canvas.

To implement a Python learner, you need to define two functions namedrm_trainandrm_apply. As the name suggests, the first will run when you train a model in your Python Learner operator. The second will run when the model created with Python Learner is being applied with e.g. Apply Model. We advise to study the provided tutorial processes for more hints on how to implement these functions.

The Python Transformer operator

You can think of the Python Transformer as an Execute Python operator with user defined parameters and an arbitrary number of input and output ports.

The name of the operator, its parameters and their types and default values, as well as the input and output ports, are defined by clicking on the gear icon in the operator parameters panel, and editing the JSON definition in the editor that pops up. This is very similar to the one explained above for the Python Learner operator (Transformer doesn't take a list ofcapabilities, but it requiresinputsandoutputs). The JSON you write here will be validated and you will get warnings if something isn't correct. If you Apply an incorrect configuration, your input ports will disappear from the operator on the canvas. See thelist of supported parameter typesbelow.

In order for the code to execute as expected, you have to follow the same convention as for Execute Python: your main entry point will be therm_mainfunction, and the number and order of the function parameters and return values will correspond to the operator's input and output ports.

If you need multiple inputs and outputs for your Python Transformer based custom operator, you have to explicitly define this using theinputsandoutputspart of the JSON definition. Users accustomed to Execute Python’s dynamic ports may find this non-intuitive.

The sample parameter configuration and code present when you drag a new Python Transformer to the canvas contains all the above hints.

The Python Forecaster Operator

Note: thePython Forecasteroperator is only availablefrom version 9.10.2.

With thePython Forecaster运算符,您可以创建类似的预报模型to RapidMiner Studio's time series operators. It is also compatible with theApply Forecastoperator.

When you drag a new Python Forecaster to the canvas from the Operators panel in RapidMiner Studio, you will get an operator similar to the Python Learner operator (in operator colour and input/output ports). It will also have a few predefined parameters. You can also add extraparameters,inputsandoutputs. See thelist of supported parameter typesandconfigurable in/output portsbelow.

To implement a Python Forecaster, you need to define two functions namedrm_trainandrm_apply. As the name suggests, the first will run when you train a model in your Python Forecaster operator. The second will run when the model created by Python Forecaster is being applied with anApply Forecastoperator. So, this operator generates a Python Forecast model that is compatible with RapidMiner's Apply Forecast operator.

Python Forecasteroperator has similar parameters to thePython Learneroperator, which allows you to:

  • modify the JSON configuration,
  • modify the Python scripts,
  • save the operator,
  • use different environments,
    1. Python binary,
    2. virtual environment,
    3. conda environment.

But there are some extra parameters which are not defined in Python Learner. These are:

  • timeseries attribute: to be able to choose what to forecast,
  • has indices: allows the user to choose if the data has indices,
  • indices attribute: to be able to choose the index column,
  • sort time series: if the data is not sorted, it can sort it.

By default, there are two hidden parameters that can help using the timeseries attribute and the index in the Python script. These two extra parameters are:

  • series_name,
  • index_name.

They can be reached from the Python script the following way:

index_name = parameters['index_name'] series_name = parameters['series_name']

For more examples check the tutorial processes.

Supported parameter types

Here's a list of supported parameter types forPython Learner,Python TransformerandPython Forecasterwhich you can use in theparameterslist of your operator parameter configuration JSON:

Type in JSON Parameter appearance
string string in a textbox
category single-choice dropdown
boolean checkbox
integer integer in a textbox
real floating point number in a textbox

Each parameter definition has the following attributes, which are represented by key-value pairs in the tuple describing a parameter:

Attribute Mandatory? Description
name yes the parameter name shown on the operator parameter panel
type yes the parameter type (see above table for supported types)
categories only if type iscategory the choices shown in the parameter dropdown, displayed in the order provided by the user. Must be a list of values.
optional no if set to true, the operator will be executed even if the parameter value is empty
value only if optional isfalseor not provided default value of the parameter

Here are some examples to the above parameter definitions:

"parameters": [ { "name": "1st_parameter", "type": "string", "optional": true }, { "name": "2nd_parameter", "type": "integer", "value": 100 }, { "name": "3rd_parameter", "type": "category", "categories": [ "Category A", "Category B", "Category C", "Default Category" ], "value": "Default Category" }, { "name": "4th_parameter", "type": "boolean" }, { "name": "5th_parameter", "type": "real", "value": 3.1415 }, { "name": "6th_parameter", "type": "string", "optional": true } ]

User configurable input and output ports

Note: configurable input and output ports feature is only availablefrom version 9.10.2.

It is possible for the user to define additional input and output ports forPython Learner,Python TransformerandPython Forecaster. To do so, you have to add elements,JSON objects, to thearrayofinputsoroutputsin the editable JSON configuration.

Here are some examples for additional inputs:

"inputs": [ { "name": "additional input 1", "type": "table" }, { "name": "additional input 2", "type": "table" } ]

Here are some examples for extra output ports:

"outputs": [ { "name": "additional output 1", "type": "table" }, { "name": "additional output 2", "type": "table" } ]

To use the previously added ports check the following sections. You can also find examples in the tutorial processes of the Python Forecaster operator.

How to use user configurable input ports

After extending the inputs array, the user defined input ports can be reached from therm_trainmethod. Therm_trainis called with an*inputsargument which includes the additional input ports. So, if you add extra parameters to therm_train()method definitions you will be able to reach the inputs.

Here is an example for the Python Learner:

# The original definition: rm_train(X, y, parameters) # The new definition with two additional inputs: rm_train(X, y, additional_input_1, additional_input_2, parameters)

Here is an example for the Python Forecaster:

# The original definition: rm_train(index, series, parameters) # The new definition with two additional inputs: rm_train(index, series, additional_input_1, additional_input_2, parameters)

How to use user configurable output ports

After extending the outputs array, therm_trainmethod can transfer data to the user defined output ports. Therm_trainmethod returns a model, object, you want to pass to therm_applymethod. In thereturnstatement always thefirst objectwill be passed to therm_applymethod, additional ones will be transferred to the additional output ports. These data should bepandasDataFrames.

Here is an example with returning a model:

# The original return statement: return model # The new definition with two additional outputs: return model, additional_output_1, additional_output_2

An extra example for returning an object:

# The original return statement: return { 'model': model } # The new definition with two additional outputs: return { 'model': model }, additional_output_1, additional_output_2

Environment handling in custom Python operators

Similarly to the Execute Python operator, you can uncheck the use default Python parameter and specify which environment to use. In case of Python Learner, the model application will be done using the same environment that was used for training.

In case the model application is done on another machine (e.g. on RapidMiner AI Hub), ensure that the same Python environment with the same name is available, otherwise your execution will either fail, or produce unwanted results.

Sharing and distributing custom operators

When you are happy with how a Learner, Transformer or Forecaster you created behaves, the next step could be sharing it with others on your project. All of these operators have aSavebutton on their parameters panel.

When you click Save, then specify a location in your project or repository, a.pyopdescriptor file will be created.

Users can then drag this.pyopfile to the canvas in RapidMiner Studio, and the Learner or Transformer containing all your code and parameter definitions will be created, using the name you provided for your custom operator. This operator will not be editable, which ensures that code you wrote earlier will execute the same way as you intended it (provided the Python environment it uses is present on the machine running the RapidMiner process).

One drawback of this method of sharing is that it is not possible to update the operators after the.pyopdescriptor has been dragged to the canvas and a new operator was created based on it. If you need to ensure that these operators get updated, you need to distribute your custom operators as an extension. To do this, right-click on the folder containing your.pyopfiles and click onCreate Extension...Enter the details on the dialog that appears. You will also be given a list of the custom operators that will be compiled into your new extension. Click Create Extension.

Once the extension is created, you can distribute it as any other extension. When you want to update your operators, you create a new version of the extension and redistribute it to all users.

Note: the created extension will depend on the Python Scripting extension version 9.9 or later, so each user has to have that extension installed as well.