Categories

Versions

You are viewing the RapidMiner Radoop documentation for version 9.9 -Check here for latest version

Azure HDInsight 4.0

Configuring the Hadoop cluster

RapidMiner Radoop supports version 4.0 of Azure HDInsight, a cloud-based Hadoop service that is built upon Hortonworks Data Platform (HDP) distribution.

If you don't have an HDInsight cluster running in the Azure network, you can follow theAzure documentationto create one. Make sure to select Spark as a cluster type.

Azure Data Lake Storage Gen2 as primary storage and Enterprise security package arenotyet supported by Radoop in case of HDInsight 4.0

Hive setup

Complex functionality of Radoop is partly achieved by defining custom functions (UDF, UDAF and UDTF) toHiveserver2extending its capabilities.

Networking

If your networking allows direct access (DNS and reverse DNS for all hostnames including the alias) to all of the cluster nodes then you can skip this step.

Please follow the general description fornetworking setupfor accessing Hadoop cluster. In case of an isolated network setup, Radoop users will need the connection details for adeployed Radoop Proxy.

Setting up the connection in RapidMiner Studio

Westronglyrecommend using theImport from Cluster Managertool to create the connection, as several advanced properties required for correct operation are seamlessly gathered from the cluster during the import process.

  1. UseImport from Manager IconImport from Cluster Managertocreate the connectiondirectly from the configuration retrieved from Ambari.

  2. OnHadooptab, underAdvanced Hadoop Parametersprovide storage credentials for theprimary storageof the HDInsight cluster.

    Azure Storage credentials: On the Azure storage dashboard find theAccess keystab. Copy one of the keys and set is as the value offs.azure.account.key..blob.core.windows.netparameter in your Radoop Connection.

  3. On theHivetab, enter theDatabase Nameto connect to. Choose a database where privileges for all operations are granted for the given user. TickUDFs are installed manually.

  4. In case of using Radoop Proxy there should be aproxy connection readyto it. As a final step for a Radoop Connection tickUse Radoop Proxyon the Radoop Proxy tab and select aRadoop Proxy Connectionwhich had been created for this cluster.