connecting hadoop with radoop

ebtesam_almarzoebtesam_almarzo MemberPosts:2Contributor I
edited November 2018 inHelp

Hi

I have just started using rapidminer (i'm a begninner) and i'm also beginner in hadoop and all these stuff..

I wanted to ask you about the steps to connect hadoop2.7.1 with rapidminer (the newest version) in ubuntu 15.10

i have already added the extension "Radoop" and it was perfectly installed

after that, when i tried to connect hadoop with radoop i had some issues regarding the following:

from where can i get the "master address name" i have already read about it but didnt know how to figure it out

http://docs.www.turtlecreekpls.com/radoop/installation/configuring-radoop-connections.html

moreover, in my version of apache hadoop 2.7.1, i cant install spark 1.6 it is not applicable with it, the only version which is applicable is spark2.0 and i dont have the option to select it in the connection window.

while in rapidminer it should be spark 1.6

so which one should i install? and should i connect spark with hadoop or just install spark without configure it in hadoop?

http://spark.apache.org/downloads.html

我需要下载蜂巢和ins吗tall it? to have a proper connection with hadoop or it is not mandatory ?

and whats the need for hiveserver2 ? are they the same hive and hiveserver?

Thank you so much

Regards,

Ebtesam

Tagged:

Best Answers

  • bhupendra_patilbhupendra_patil Administrator, Employee, MemberPosts:168RM Data Scientist
    Solution Accepted

    Your master address is the ipaddress or a qualified name like server.corp.com of your master node in your cluster.

    If it is a single node cluster then you cna use the ip address or the name of that node.

    As far as spark goes, Rapidminer can work with Spark only on hadoop. So you will need to install spark.

    What flavor of hadoop are you using? Apache? Cloudera? Hortonworks?

    If you are just trying then your easiest bet is using teh VM;s that are provided with Cloudera or Hortonwork works.

    ebtesam_almarzo
  • phellingerphellinger Employee, MemberPosts:103RM Engineering
    Solution Accepted

    Hi Ebtesam,

    the Spark 1.6 that was built for Hadoop 2.6 will work perfectly with Hadoop 2.7.1 one as well.

    You can download that to your cluster, and provide the HDFS (or local) path in the appropriate Radoop connection setting.

    Also, Apache Hive will work on Java 8. Basically, you can expect almost anything that supports Java 7 to work on Java 8.

    Peter

    MartinLiebig ebtesam_almarzo

Answers

  • ebtesam_almarzoebtesam_almarzo MemberPosts:2Contributor I

    @bhupendra_patilwrote:

    Your master address is the ipaddress or a qualified name like server.corp.com of your master node in your cluster.

    If it is a single node cluster then you cna use the ip address or the name of that node.

    As far as spark goes, Rapidminer can work with Spark only on hadoop. So you will need to install spark.

    What flavor of hadoop are you using? Apache? Cloudera? Hortonworks?

    If you are just trying then your easiest bet is using teh VM;s that are provided with Cloudera or Hortonwork works.


    Thank you for your immediate reply

    I'm using Apache Hadoop..

    for spark which version should i download since its written in the "configuring radoop conncetion"

    http://docs.www.turtlecreekpls.com/radoop/installation/configuring-radoop-connections/

    that it has to be version 1.6 or 1.5

    but the applicable one for the version I have installed of hadoop(2.7.1) is spark 2.0

    so which one should i download and install ?

    for the apache hive.. i havent find hive which is applicable with java8 ..

    hive versions are only applicable with java7..

    so how can i install hive ?

    https://cwiki.apache.org/confluence/display/Hive/GettingStarted

Sign InorRegisterto comment.