You are viewing the RapidMiner Radoop documentation for version 9.9 -Check here for latest version
Connecting to a CDH Quickstart VM
As of this writing the latest available version of Cloudera Quickstart VM is 5.13. This guide was created for that.
Start and configure the Quickstart VM
Download the Cloudera Quickstart VM from theCloudera website.
Import the OVA packaged VM to your virtualization environment (Virtualbox and VMware are covered in this guide).
It is strongly recommended to upgrade to Java 1.8 on the single-node cluster provided by the VM. Otherwise, the execution ofSingle Process PushdownandApply Modeloperators will fail.
You can take the following steps only if no clusters or Cloudera management services have been started yet. For the full upgrading process, readCloudera's guide.
Upgrading to Java 1.8:
- Start the VM.
- Download and unzip JDK 1.8 -- preferrablyjdk1.8.0_162or greater -- to
/usr/java/jdk1.8.0_162
. Add the following configuration line to
/etc/default/cloudera-scm-server
:export JAVA_HOME=/usr/java/jdk1.8.0_162
LaunchCloudera Express(or Enterprise trial version).
Open a web browser, and log in toCloudera Manager(
quickstart.cloudera:7180
) usingcloudera/cloudera
as credentials. Navigate toHosts/quickstart.cloudera/Configuration. InJava Home Directoryfield, enter/usr/java/jdk1.8.0_162
On the home page ofCloudera Manager, (re)start theCloudera QuickStartcluster andCloudera管理服务as well.
If you are using Virtualbox, make sure that the VM is shut down, and set the type of the primary network adapter fromNATtoHost-only. The VM will work only with this setting in a Virtualbox environment.
Start the VM and wait for the boot to complete. A browser with some basic information will appear.
Edit yourlocal
hosts
file (on your host operating system, not inside the VM) and add the following line (replace
with the IP address of the VM):quickstart.cloudera
Setup the connection in RapidMiner Studio
Click onNew Connection按钮,选择Add Connection Manually
SetHadoop usernameto
hive
. (As an alternative, you can setbothHadoop usernameandUsernameonHivetab to your own user.)Add
quickstart.cloudera
asNameNode AddressAdd
quickstart.cloudera
asResource Manager AddressAdd
quickstart.cloudera
asHive Server AddressSelect Cloudera Hadoop (CDH5) asHadoop version
Add the following entries to theAdvanced Hadoop Parameters:
Key Value dfs.client.use.datanode.hostname
true
(This parameter is not required when using theImport Hadoop Configuration Filesoption):
Key Value mapreduce.map.java.opts
-Xmx256m
Select the appropriateSpark Version(this should beSpark 1.6if you want use the VM's built-in Spark assembly jar) and set theAssembly Jar Locationto the following value:
local:///usr/lib/spark/lib/spark-assembly.jar