Installing RapidMiner Radoop on RapidMiner AI Hub
This documentation assumes that RapidMiner AI Hub is deployedusing a containerized deployment method, and that a working Radoop connection is available in a repository or project, as described inConfiguring Radoop connections. In others cases please consultthe previous version of this documentation.
Prerequisites
The following requirements must be met before using RapidMiner Radoop in RapidMiner AI Hub:
- RapidMiner Radoop Extension installed and tested in RapidMiner Studio.
- 一个工作Radoop连接的Hadoop集群RapidMiner Studio, stored in a repository or project. SeeConfiguring RapidMiner Radoop Connectionsto learn how to create it.
- The same version of the RapidMiner Radoop extension installed in RapidMiner AI Hub. (Containerized deployments ship with a bundled Radoop extension, so you only need to ensure the version match.)
- A valid license for RapidMiner Radoop installed in RapidMiner AI Hub. You can obtain your licensefrom your RapidMiner Account portal.
Installing RapidMiner Radoop on RapidMiner AI Hub and the connected Job Agent(s)
As the Radoop extension is already in place in AI Hub when using our containerized deployment, the only needed step is to install your Radoop license obtained above.
要做到这一点,作为管理员登录到人工智能中心,then click on theInstall licenseaction on theAdministration --> Manage licensespage and paste your Radoop license key.
Using Radoop connections with RapidMiner AI Hub
Using Radoop connections with RapidMiner AI Hub is as easy as it is with RapidMiner Studio, but there are some caveats which will be discussed in detail below. The Radoop connection used by the RapidMiner process being executed in RapidMiner AI Hub must be in the same repository or project.
Important note: Radoop processes are not supported in RapidMiner AI Hub web services.
Managing multiple Hadoop users with RapidMiner AI Hub executions
When multiple users are running Radoop processes in RapidMiner AI Hub, it's a natural expectation that the jobs created on the Hadoop cluster by Radoop all run as individual users, for auditability.
It is also expected that such clusters are secured using Kerberos and keytabs are used for authentication, each user having their own keytab.
By using RapidMiner AI Hub's vault to securely store these keytabs for each user, it is possible to create a connection that uses each user's own keytab directly from the vault.
To do this, the connection manager or administrator setting up the connection for other users must edit the exported Radoop connection, then clickSet injected parameterson theSecuritytab and select the Kerberos keytab parameter to be injected from RapidMiner AI Hub.
Note: the RapidMiner AI Hub injection option is only available when the Radoop connection is stored in a RapidMiner AI Hub project. The legacy repository is not supported.
Note: administrators must ensure that each user has a valid keytab injected into their user’s vault in RapidMiner AI Hub. This task can be done using RapidMiner AI Hub’s REST APIs, and it is much easier when automated using a script. Please contact our support team to provide a sample script if needed.
Using Radoop Proxy with RapidMiner AI Hub executions
Radoop Proxy is automatically disabled when a process is executed on RapidMiner AI Hub, because in a typical setup, RapidMiner AI Hub runs inside the secure zone, so there is no need to route the traffic through the Radoop Proxy.
If this is not the case, and the RapidMiner AI Hub instance does need Radoop Proxy to access the Hadoop cluster, the Radoop connection needs to be adapted to support this scenario:
Open theManage Radoop Connectionswindow and edit the original Radoop connection that was exported to a repository or project.
On theRapidMiner AI Hubtab, checkForce Radoop Proxy on AI Hub.
- Save, thenExportthe connection to a repository or project.
Note: the Radoop connection and the Radoop Proxy connection must be in the same repository or project, and both need to be located on the same AI Hub where the execution will take place.