Skip to main content

Execute Python

TheExecute Pythonoperator enables a smooth integration of your Python code into your RapidMiner workflows. Your Python code may liveinside the operatororoutside the operator, in the目录.

Workflow execute python

  • At the input ports (inp), RapidMiner data tables are converted to Pandas DataFrames.

  • At the output ports (out), Pandas DataFrames are converted back to RapidMiner data tables.

  • The费尔port gives the operator access to Python code that isstored in the catalog. Alternatively, you can paste your Python code directly into theeditor provided by Execute Python.

  • Your Python code must be structured as a function calledrm_main(data1, data2, ...), with an arbitary number (possibly zero) of inputs and outputs. Each input or output ofrm_main()is a Pandas DataFrame.

    • The number of input and output Pandas DataFrames inrm_main()必须等于全国矿工工会ber of connected ports onExecute Python. In the screenshot above,rm_main(data1)accepts a single Pandas DataFrame as input and returns a single Pandas DataFrame as output.
    • If there are macros defined, the number of arguments inrm_main()should be the number of connected input ports plus one. The macros argument must take the form of a dictionary.

In what follows, we discuss the parameters of theExecute Pythonoperator.

Coding Environment

When your workflow includes theExecute Pythonoperator, thecoding environmentdefines which Python packages are available to your Python code. If you have multipleExecute Pythonoperators in your workflow, each of them can have its own coding environment.

The default Python coding environment is calledrm-base, but you are free to use an alternativecoding environment, so long as it includesPandasandPyArrow. Only coding environments that include PyArrow will appear in theCoding Environmentdropdown list.

Parameter coding environment

Use File Input Connector

Suppose you want to execute a Python program that has beenuploaded to the catalog.

Your Python program is external to theExecute Pythonoperator, so take the following steps:

  1. drag the file (e.g.,analytics-cloud.py) to the canvas,
  2. attach the resultingInputto the费尔port, and
  3. withinExecute Python, enable the parameterUse File Input Connector.

Execute Python via an external file

Python Code

info

Hint: if you have two versions of your Python code, you can have one program external to the operator (as described above) and one program internal to the operator (as described here) and toggle them via the switchUse File Input Connector.

Alternatively, disableUse File Input Connectorand selectPython Codeif you wish to paste your Python code directly into theExecute Pythonoperator.

Parameter python code

UnderParameters>Python Code, click on the Edit button to reveal the following sample Python code, then replace it by your own code. Note the helpful comments.

frompandasimportDataFrame

# Mandatory main function. This example expects a single input.
# However, the number of arguments has to be the number of input ports (can be none)
# and can be multiple too, in case of connecting multiple data sources to the operator.
# Please note that the input script file is not a data source.
# So, for example, if there is two data inputs and a script file connected to the operator,
# it should look like the following way:
# def rm_main(data1, data2): ...
# If there are macros defined, the number of arguments should be the number of input ports plus one.
# So, for example, if there is two data inputs, a script file connected and also macros are defined
# it should look like the following way:
# def rm_main(data1, data2, macros):
# Note that there is no rules for the names of the arguments, but numbers of them needs to match.
defrm_main(data):
print('Hello, world!')
# output can be found in Workload logs.
print(type(data))

#your code goes here

#for example:
data2=DataFrame([3,5,77,8])

# The returned values should be pandas dataframes.
# The number of connected output ports of the operator should match the number of elements in the returned tuple.
# In case of 1 output port:
# return data
# connect 2 output ports to see the results
returndata,data2

Python Resource

You can modify the default setting if your Python code requires additional resources -- if, for example, you have a complex workflow or huge data files.

  • 1 vcore, 2 GB (default)
  • 2 vcore, 4 GB
  • 4 vcore, 8 GB
  • 8 vcore, 16 GB

The following message provides a strong hint that you should modify this setting.

PyRunner instance is out of memory, please select one with more resources