Categories

Versions

You are viewing the RapidMiner Radoop documentation for version 9.9 -Check here for latest version

Connecting to a CDH Quickstart VM

As of this writing the latest available version of Cloudera Quickstart VM is 5.13. This guide was created for that.

Start and configure the Quickstart VM

  1. Download the Cloudera Quickstart VM from theCloudera website.

  2. Import the OVA packaged VM to your virtualization environment (Virtualbox and VMware are covered in this guide).

  3. It is strongly recommended to upgrade to Java 1.8 on the single-node cluster provided by the VM. Otherwise, the execution ofSingle Process PushdownandApply Modeloperators will fail.

    You can take the following steps only if no clusters or Cloudera management services have been started yet. For the full upgrading process, readCloudera's guide.

    Upgrading to Java 1.8:

    • Start the VM.
    • Download and unzip JDK 1.8 -- preferrablyjdk1.8.0_162or greater -- to/usr/java/jdk1.8.0_162.
    • Add the following configuration line to/etc/default/cloudera-scm-server:

      export JAVA_HOME=/usr/java/jdk1.8.0_162
    • LaunchCloudera Express(or Enterprise trial version).

    • Open a web browser, and log in toCloudera Manager(quickstart.cloudera:7180) usingcloudera/clouderaas credentials. Navigate toHosts/quickstart.cloudera/Configuration. InJava Home Directoryfield, enter

      /usr/java/jdk1.8.0_162
    • On the home page ofCloudera Manager, (re)start theCloudera QuickStartcluster andCloudera管理服务as well.

  4. If you are using Virtualbox, make sure that the VM is shut down, and set the type of the primary network adapter fromNATtoHost-only. The VM will work only with this setting in a Virtualbox environment.

  5. Start the VM and wait for the boot to complete. A browser with some basic information will appear.

  6. Edit yourlocalhostsfile (on your host operating system, not inside the VM) and add the following line (replacewith the IP address of the VM):

    quickstart.cloudera

Setup the connection in RapidMiner Studio

  1. Click onNew Connection IconNew Connection按钮,选择Manual Connection IconAdd Connection Manually

  2. SetHadoop usernametohive. (As an alternative, you can setbothHadoop usernameandUsernameonHivetab to your own user.)

  3. Addquickstart.clouderaasNameNode Address

  4. Addquickstart.clouderaasResource Manager Address

  5. Addquickstart.clouderaasHive Server Address

  6. Select Cloudera Hadoop (CDH5) asHadoop version

  7. Add the following entries to theAdvanced Hadoop Parameters:

    Key Value
    dfs.client.use.datanode.hostname true

    (This parameter is not required when using theImport Hadoop Configuration Filesoption):

    Key Value
    mapreduce.map.java.opts -Xmx256m
  8. Select the appropriateSpark Version(this should beSpark 1.6if you want use the VM's built-in Spark assembly jar) and set theAssembly Jar Locationto the following value:

    local:///usr/lib/spark/lib/spark-assembly.jar