Azure HDInsight 4.x, 5.x
Please be aware that Radoop 10.2.0 offers limited support for HDInsight 4.x, 5.x. For information about the available features and instructions for setting up Radoop in such clusters, please contact Altair support.
Configuring the Hadoop cluster
RapidMiner Radoop supports version 4.x, 5.x of Azure HDInsight, a cloud-based Hadoop service that is built upon Hortonworks Data Platform (HDP) distribution.
If you don't have an HDInsight cluster running in the Azure network, you can follow theAzure documentationto create one. Make sure to select Spark as a cluster type.
Azure Data Lake Storage Gen2 as primary storage and Enterprise security package arenotyet supported by Radoop in case of HDInsight 4.0, 5.0
Hive setup
Complex functionality of Radoop is partly achieved by defining custom functions (UDF, UDAF and UDTF) toHiveserver2extending its capabilities.
- InstallRapidminer Radoop UDF Jar files
- RegisterHive UDF functions for Radoop
Networking
If your networking allows direct access (DNS and reverse DNS for all hostnames including the alias) to all of the cluster nodes then you can skip this step.
Please follow the general description fornetworking setupfor accessing Hadoop cluster. In case of an isolated network setup, Radoop users will need the connection details for adeployed Radoop Proxy.
Setting up the connection in RapidMiner Studio
Westronglyrecommend using theImport from Cluster Managertool to create the connection, as several advanced properties required for correct operation are seamlessly gathered from the cluster during the import process.
UseImport from Cluster Managertocreate the connectiondirectly from the configuration retrieved from Ambari.
OnHadooptab, underAdvanced Hadoop Parametersprovide storage credentials for theprimary storageof the HDInsight cluster.
Azure Storage credentials: On the Azure storage dashboard find theAccess keystab. Copy one of the keys and set is as the value of
fs.azure.account.key.
parameter in your Radoop Connection..blob.core.windows.net On theHivetab, enter theDatabase Nameto connect to. Choose a database where privileges for all operations are granted for the given user. TickUDFs are installed manually.
In case of using Radoop Proxy there should be aproxy connection readyto it. As a final step for a Radoop Connection tickUse Radoop Proxyon the Radoop Proxy tab and select aRadoop Proxy Connectionwhich had been created for this cluster.