Execute Python
TheExecute Pythonoperator enables a smooth integration of your Python code into your RapidMiner workflows. Your Python code may liveinside the operatororoutside the operator, in the目录.
At the input ports (inp), RapidMiner data tables are converted to Pandas DataFrames.
At the output ports (out), Pandas DataFrames are converted back to RapidMiner data tables.
The费尔port gives the operator access to Python code that isstored in the catalog. Alternatively, you can paste your Python code directly into theeditor provided by Execute Python.
Your Python code must be structured as a function called
rm_main(data1, data2, ...)
, with an arbitary number (possibly zero) of inputs and outputs. Each input or output ofrm_main()
is a Pandas DataFrame.- The number of input and output Pandas DataFrames in
rm_main()
必须等于全国矿工工会ber of connected ports onExecute Python. In the screenshot above,rm_main(data1)
accepts a single Pandas DataFrame as input and returns a single Pandas DataFrame as output. - If there are macros defined, the number of arguments in
rm_main()
should be the number of connected input ports plus one. The macros argument must take the form of a dictionary.
- The number of input and output Pandas DataFrames in
In what follows, we discuss the parameters of theExecute Pythonoperator.
Coding Environment
When your workflow includes theExecute Pythonoperator, thecoding environmentdefines which Python packages are available to your Python code. If you have multipleExecute Pythonoperators in your workflow, each of them can have its own coding environment.
The default Python coding environment is calledrm-base, but you are free to use an alternativecoding environment, so long as it includesPandasandPyArrow. Only coding environments that include PyArrow will appear in theCoding Environmentdropdown list.
Use File Input Connector
Suppose you want to execute a Python program that has beenuploaded to the catalog.
Your Python program is external to theExecute Pythonoperator, so take the following steps:
- drag the file (e.g.,analytics-cloud.py) to the canvas,
- attach the resultingInputto the费尔port, and
- withinExecute Python, enable the parameterUse File Input Connector.
Python Code
info
Hint: if you have two versions of your Python code, you can have one program external to the operator (as described above) and one program internal to the operator (as described here) and toggle them via the switchUse File Input Connector.
Alternatively, disableUse File Input Connectorand selectPython Codeif you wish to paste your Python code directly into theExecute Pythonoperator.
UnderParameters>Python Code, click on the Edit button to reveal the following sample Python code, then replace it by your own code. Note the helpful comments.
frompandasimportDataFrame
# Mandatory main function. This example expects a single input.
# However, the number of arguments has to be the number of input ports (can be none)
# and can be multiple too, in case of connecting multiple data sources to the operator.
# Please note that the input script file is not a data source.
# So, for example, if there is two data inputs and a script file connected to the operator,
# it should look like the following way:
# def rm_main(data1, data2): ...
# If there are macros defined, the number of arguments should be the number of input ports plus one.
# So, for example, if there is two data inputs, a script file connected and also macros are defined
# it should look like the following way:
# def rm_main(data1, data2, macros):
# Note that there is no rules for the names of the arguments, but numbers of them needs to match.
defrm_main(data):
print('Hello, world!')
# output can be found in Workload logs.
print(type(data))
#your code goes here
#for example:
data2=DataFrame([3,5,77,8])
# The returned values should be pandas dataframes.
# The number of connected output ports of the operator should match the number of elements in the returned tuple.
# In case of 1 output port:
# return data
# connect 2 output ports to see the results
returndata,data2
Python Resource
You can modify the default setting if your Python code requires additional resources -- if, for example, you have a complex workflow or huge data files.
- 1 vcore, 2 GB (default)
- 2 vcore, 4 GB
- 4 vcore, 8 GB
- 8 vcore, 16 GB
The following message provides a strong hint that you should modify this setting.
PyRunner instance is out of memory, please select one with more resources