You are viewing the RapidMiner Radoop documentation for version 9.9 -Check here for latest version
Connecting to a 3.0.1+ Hortonworks Sandbox
在撰写本文时最新的一个vailable version of Hortonworks Data Platform (HDP) on Hortonworks Sandbox VM is 3.0.1. This guide was created for that.
Start and configure the Sandbox VM
Download the Hortonworks Sandbox VM for VirtualBox from theDownload website.
Import the OVA packaged VM to your virtualization environment (Virtualbox is covered in this guide).
Start the VM. After powering it on, you have to select the first option from the boot menu, then wait for the boot to complete.
Log in to the VM. You can do this by switching to the login console (Alt+F5), or even better via SSH on localhost port
2122
. It is important to note that there are 2 exposed SSH ports on the VM, one belongs to the VM itself (2122
), while the other (2222
) belongs to a Docker container running inside the VM. The username isroot
, the password ishadoop
for both.Edit the
/sandbox/proxy/generate-proxy-deploy-script.sh
by include the following ports in thetcpPortsHDP
array 8025, 8030, 8050, 10020, 50010.vi /sandbox/proxy/generate-proxy-deploy-script.sh
Find
tcpPortsHDP
variable, leaving the other values in place, add to the hashtable assignment:[8025]=8025 [8030]=8030 [8050]=8050 [10020]=10020 [50010]=50010
Run the editedgenerate-proxy-deploy-script.shvia
/sandbox/proxy/generate-proxy-deploy-script.sh
- 这将重建/sandbox/proxy/proxy-deploy.shscript along with config files in/sandbox/proxy/conf.dand/sandbox/proxy/conf.stream.d, thus exposing the additional ports added to the
tcpPortsHDP
hashtable in previous step.
- 这将重建/sandbox/proxy/proxy-deploy.shscript along with config files in/sandbox/proxy/conf.dand/sandbox/proxy/conf.stream.d, thus exposing the additional ports added to the
Run the/sandbox/proxy/proxy-deploy.shscript via
/sandbox/proxy/proxy-deploy.sh
- Running the
docker ps
command, will show an instance namedsandbox-proxyand the ports it has exposed. The inserted values to thetcpPortsHDP
hashtable should be shown in the output, looking like0.0.0.0:10020->10020/tcp.
- Running the
These changes only made sure that the referenced ports of the Docker container are accessible on the respective ports of the VM. Since the network adapter of the VM is attached to NAT, these ports are not accessible from your local machine. To make them available you have to add the port forwarding rules listed below to the VM. In VirtualBox you can find these settings underMachine/Settings/Network/Adapter 1/Advanced/Port Forwarding.
Name Protocol Host IP Host Port Guest IP Guest Port resourcetracker TCP 127.0.0.1 8025 8025 resourcescheduler TCP 127.0.0.1 8030 8030 resoucemanager TCP 127.0.0.1 8050 8050 jobhistory TCP 127.0.0.1 10020 10020 datanode TCP 127.0.0.1 50010 50010 Edit yourlocal
hosts
file (on your host operating system, not inside the VM), addsandbox.hortonworks.com
andsandbox-hdp.hortonworks.com
to your localhost entry. At the end it should look something like this:127.0.0.1 localhost sandbox.hortonworks.com sandbox-hdp.hortonworks.com
Reset Ambari access. Use an SSH client to login tolocalhost as root, this time using port
2222
!(For example, on OS X or Linux, use the commandssh root@localhost -p 2222
, password:hadoop
)- (At first login you have to set a new root password, do it and remember it.)
- Run
ambari-admin-password-reset
as root user. - Provide a new admin password for Ambari.
- Run
ambari-agent restart
.
Open the Ambari website:
http://sandbox.hortonworks.com:8080
- Login with
admin
and the password you chose in the previous step. - Navigate to theYARN/Configs/Memoryconfiguration page.
- Edit theMemory NodeSetting to at least 7 GB and click Override.
- User will be prompted to create a new "YARN Configuration Group", enter a new name.
- On the "Save Configuration Group" dialog, click theManage Hostsbutton.
- On the "Manage YARN Configuration Groups page" take the node in the "Default" group and add the node into the group created in the "YARN Configuration Group" name step.
- "Warning" Dialog will open requesting adding notes click theSavebutton.
- "Dependent Configurations" dialog will open with Ambari providing recommendations to modify some related properties automatically. If so, untick
tez.runtime.io.sort.mb
to keep its original value. Click theOkbutton.- Ambari may open a "Configurations" page suggesting stuff. Review accordingly, but this is out of the scope of this document, so just clickProceed Anyway.
- Navigate to theHive/Configs/Advancedconfiguration page.
In theCustom hiveserver2-sitesection. The
hive.security.authorization.sqlstd.confwhitelist.append
needs to be added via theAdd Property...and be set to the following (it must not contain whitespaces):radoop\.operation\.id|mapred\.job\.name|hive\.warehouse\.subdir\.inherit\.perms|hive\.exec\.max\.dynamic\.partitions|hive\.exec\.max\.dynamic\.partitions\.pernode|spark\.app\.name|hive\.remove\.orderby\.in\.subquery
Save the configuration and restart all affected services. More details on
hive.security.authorization.sqlstd.confwhitelist.append
can be found inHadoop Security/Configuring Apache Hive SQL Standard-based authorizationsection.
- Login with
Setup the connection in RapidMiner Studio
Click onNew Connectionbutton and chooseImport from Cluster Manageroption to create the connection directly from the configuration retrieved from Ambari.
On theImport Connection from Cluster Managerdialog enter
- Cluster Manager URL:
http://sandbox-hdp.hortonworks.com:8080
- Username:
admin
- Password: password used in Reset Amabari step.
- Cluster Manager URL:
ClickImport Configuration
Hadoop Configuration Importdialog will open up
- If successful clickNextbutton andConnection Settings对话框将打开。
- If failed clickBackbutton and review above steps and logs to solve issue(s).
On theConnection SettingsDialog, which opens whenNextbutton is clicked from step above.
Connection Namecan stay defaulted or be changed by user.
Globaltab
- Hadoop Version应该是
Hortonworks HDP 3.x
- SetHadoop usernameto
hadoop
.
- Hadoop Version应该是
Hadooptab
- NameNode Address应该是
sandbox-hdp.hortonworks.com
- NameNode Port应该是
8020
- Resource Manager Address应该是
sandbox-hdp.hortonworks.com
- Resource Manager Port应该是
8050
- JobHistory Server Address应该是
sandbox-hdp.hortonworks.com
- JobHistory Server Port应该是
10020
Advanced Hadoop Parametersadd the following parameters:
Key Value dfs.client.use.datanode.hostname
true
(This parameter is not required when using theImport Hadoop Configuration Filesoption):
Key Value mapreduce.map.java.opts
-Xmx256m
- NameNode Address应该是
Sparktab
- Spark Versionselect
Spark 2.3 (HDP)
- CheckUse default Spark path
- Spark Versionselect
Hivetab
- Hive Version应该是
HiveServer3 (Hive 3 or newer)
- Hive High Availability应该是checked
- ZooKeeper Quorum应该是
sandbox-hdp.hortonworks.com:2181
- ZooKeeper Namespace应该是
hiverserver2
- Database Name应该是
default
- JDBC URL Postfix应该是empty
- Username应该是
hive
- Password应该是empty
- UDFs are installed manuallyandUse custom database for UDFsare both unchecked
- Hive on Spark/Tez container reuse应该是checked
- Hive Version应该是
ClickOKbutton, theConnection Settingsdialog will close
User can test the connection created above onnManage Radoop Connectionspage select the connection created and clicking theQuick TestandFull Test...buttons.
If errors occur durning testing confirm that necessary Components are started correctly athttp://localhost:8080/#/main/hosts/sandbox-hdp.hortonworks.com/summary
.