You are viewing the RapidMiner Radoop documentation for version 9.9 -Check here for latest version
Connecting to a CDH Quickstart VM
As of this writing the latest available version of Cloudera Quickstart VM is 5.13. This guide was created for that.
Start and configure the Quickstart VM
- Download the Cloudera Quickstart VM from theCloudera website. 
- Import the OVA packaged VM to your virtualization environment (Virtualbox and VMware are covered in this guide). 
- It is strongly recommended to upgrade to Java 1.8 on the single-node cluster provided by the VM. Otherwise, the execution ofSingle Process PushdownandApply Modeloperators will fail. - You can take the following steps only if no clusters or Cloudera management services have been started yet. For the full upgrading process, readCloudera's guide. - Upgrading to Java 1.8: - Start the VM.
- Download and unzip JDK 1.8 -- preferrablyjdk1.8.0_162or greater -- to/usr/java/jdk1.8.0_162.
- Add the following configuration line to - /etc/default/cloudera-scm-server:- export JAVA_HOME=/usr/java/jdk1.8.0_162
- LaunchCloudera Express(or Enterprise trial version). 
- Open a web browser, and log in toCloudera Manager( - quickstart.cloudera:7180) using- cloudera/clouderaas credentials. Navigate toHosts/quickstart.cloudera/Configuration. InJava Home Directoryfield, enter- /usr/java/jdk1.8.0_162
- On the home page ofCloudera Manager, (re)start theCloudera QuickStartcluster andCloudera管理服务as well. 
 
- If you are using Virtualbox, make sure that the VM is shut down, and set the type of the primary network adapter fromNATtoHost-only. The VM will work only with this setting in a Virtualbox environment. 
- Start the VM and wait for the boot to complete. A browser with some basic information will appear. 
- Edit yourlocal - hostsfile (on your host operating system, not inside the VM) and add the following line (replace- quickstart.cloudera 
Setup the connection in RapidMiner Studio
- Click on  New Connection按钮,选择 New Connection按钮,选择 Add Connection Manually Add Connection Manually
- SetHadoop usernameto - hive. (As an alternative, you can setbothHadoop usernameandUsernameonHivetab to your own user.)
- Add - quickstart.clouderaasNameNode Address
- Add - quickstart.clouderaasResource Manager Address
- Add - quickstart.clouderaasHive Server Address
- Select Cloudera Hadoop (CDH5) asHadoop version 
- Add the following entries to theAdvanced Hadoop Parameters: - Key - Value - dfs.client.use.datanode.hostname- true- (This parameter is not required when using theImport Hadoop Configuration Filesoption): - Key - Value - mapreduce.map.java.opts- -Xmx256m
- Select the appropriateSpark Version(this should beSpark 1.6if you want use the VM's built-in Spark assembly jar) and set theAssembly Jar Locationto the following value: - local:///usr/lib/spark/lib/spark-assembly.jar
 New Connection按钮,选择
New Connection按钮,选择 Add Connection Manually
Add Connection Manually