Categories

Versions

You are viewing the RapidMiner Developers documentation for version 9.7 -Check here for latest version

Python Scripting Extension

RapidMiner provides thePython Scripting extension, including the OperatorExecute Python. It enables you to run Python code within a RapidMiner process.

ExampleSetsare handled aspandasDataFrameobjects.

The Extension supports a variety of Python environment management tools, including the popularAnacondadistribution andvirtualenvwrapper.

Installation and configuration

The necessary installation and configurations differ based on where you want to install the extension. Read more below to install and configure the extension:

When you're done with the above steps, you should have an environment capable of running any of the tutorial processes provided with theExecute Pythonoperator

Usage

Here are some of the key features of the extension. Make sure to explore the tutorial processes provided with theExecute Pythonoperator as well.

To successfully execute your code inside RapidMiner, you need to structure your code in a way that you declare anrm_mainfunction as your main entry point.The number and order of input parameters and returned values of yourrm_mainfunction will correspond to the input and output ports of theExecute Pythonoperator.

Running scripts

You can execute your Python code either by editing it in-line with our basic script editor (it provides basic syntax highlighting but lacks all the powerful features of a Python IDE), or by specifying a script file in theExecute Pythonoperator'sscript fileparameter. If your script is stored in a location accessible via internet (such as GitHub), you can also read your script file directly from there with the help of theOpen Fileoperator.

Running notebooks

You can also executeipynbnotebooks with the help ofExecute Python. In this case, use thescript fileparameter of the operator to locate your notebook. The same consideration on how to structure code applies for notebook as for Python scripts.

If you tagged your notebook cells, we offer a selective tag based execution, allowing you to pick which cells to exclude from the execution. Alternatively, you can specify which cells to execute by providing a regular expression.

调整执行

Python环境中是一个伟大的方式来消除package dependency pollution and interference between different projects. In this case you will probably have multiple Python environments in use.

To customize the Python environment used in one specific Execute Python operator, all you need to do is uncheckuse default Pythonin the operator parameters, and provide your desired Python environment there. The same options are available as in the RapidMiner Studio preferences (see the installation and configuration chapter above).

Using RapidMiner macros

Macros added into the Python code inline with the% {myMacro}syntax will be parsed before the script execution, both in case of an inline script and one provided by script file. But, to no surprise, this piece of code then will only run inside RapidMiner, and will otherwise produce a syntax error.

Another, more pythonic way to tackle this is to check theenable macrosparameter on your Execute Python operator. Next, you need to add an extra parameter to yourrm_mainfunction, where macros will be accessible during your execution. This will allow you not only to read macro values, but also to define new ones, or overwrite the value of existing macros.

Running on Server

There are only a few special considerations to take into account when running Execute Python operators on RapidMiner Server, otherwise everything will work as expected.

When using an environment manager such as Anaconda, it is a good practice to have the same environments with the same name installed on Studio as well as on Server.

To make it easy and error-free to provision the same Python environments used for both web service and regular process execution, you should use the Python environment management functionality in ourPlatform Admin tool.

When opening an Execute Python operator in RapidMiner Studio, only the local Python environments will be listed, never the ones present on RapidMiner Server, even if the process was opened from a Server repository.