Categories

Versions

You are viewing the RapidMiner Legacy documentation for version 9.9 -Check here for latest version

Installation guide

See thedeployment documentationto learn how to deploy RapidMiner as a High Availability cluster.

The recommended method is to use Kubernetes. The documentation below is provided in case you prefer a non-Kubernetes solution.

在本指南,我们将通过安装说唱idMiner Server as a High Availability cluster in a Linux environment. It covers installing RapidMiner Server High Availability for the first time, with no existing data.

Terminology

In this guide we'll use the following terminology:

  • Installation directory - is the directory where you installed RapidMiner Server on a node.
  • Shared home directory – The RapidMiner Server home directory that is accessible to all nodes in the cluster via the same path.

测试RapidMiner服务器Availabili高ty installation

Be sure to test your RapidMiner Server High Availability installation thoroughly before deploying to production.

  • Set up and test RapidMiner Server High Availability in your staging environment before deploying to a production environment.
  • 测试RapidMiner服务器Availabili高ty with identical data (repositories, users, extensions) to your production instance.

Accessing a RapidMiner Server High Availability installation

When the installation is completed, the URL of RapidMiner Server will be the URL of the load balancer; this machine should be identified as RapidMiner Server by the DNS. The remaining machines do not need to be publicly accessible to your users.

Provision the shared database, shared filesystem, and ActiveMQ broker

Provision the shared database

Set up the shared database server and make sure that your database allows enough concurrent connections. With many RapidMiner Server nodes connecting to the same database the default connection limit might be quickly exceeded. For PostgreSQL, for example, the default limit is 100 connections. To increase the limit, edit the postgresql.conf file and increase the value ofmax_connections, then restart PostgreSQL.

Provision the shared filesystem

Set up the shared NFS filesystem and make sure RapidMiner Server nodes can access it and have full read and write permissions.

Provision ActiveMQ broker

Although the RapidMiner Server cluster will function with a single instance of ActiveMQ, we highly recommend clustering it as well, because high availability depends on each component being highly available. You don't want ActiveMQ to be the single point of failure. For the sake of completeness, both a single-node setup and a clustered setup are outlined below.

Single node ActiveMQ setup

  • Download and install ActiveMQ.

Currently onlyActiveMQ version 5.14.5has been tested and is officially supported but feel free to test more updated 5.x versions.

If you’re using GNU/Linux ActiveMQ packages should be provided by your distribution. You can easily install them with your package manager and start the application with the help of a system daemon like initd or systemd.

  • Configure the ActiveMQ broker user that will be used by RapidMiner Server and the Job Agents:

    • Open/users.propertiesand add a new broker user and password (e.g., the user "brokerUser" with password "brokerP4ssw0rd"):

      admin=admin brokerUser=brokerP4ssw0rd
    • Open/groups.propertiesand add the new user to the users group:

      admins=admin users=brokerUser
  • Write down the new user's credentials. They are needed to configure the connection from RapidMiner Server and from the Job Agents to the broker.

  • Start ActiveMQ.

Clustered ActiveMQ setup

  • Download and install ActiveMQ on all your machines serving as ActiveMQ instances.
  • Install the ActiveMQ instances on every machine. To do so, follow any setup describedhere.
    • It is advised to use theShared File System Master Slave setupas your clustered setup already has a shared file system for the RapidMiner Server home directory.
    • Please make sure that all instances share the same broker user credentials (see "Single node ActiveMQ setup" on how to setup credentials)
  • Start all instances.

Prepare a headless installation

To install RapidMiner Server on the nodes we will use the headless installation option. A detailed description is given on theheadless installation documentationpage. However here's a short overview on how to prepare the headless installation:

  1. Download the RapidMiner Server installer on a machine with a UI
  2. Start the installer and choose the "Install RapidMiner Server on a headless machine" option
  3. Go through the installer steps and use configuration values appropriate for the clustered setup of RapidMiner Server
    1. Use the reachable hostname/IP addressload_balancer_addressof the load balancer for the server host name
    2. Make sure todisable bundled Job Agents
    3. Donotenable the Radoop proxy
  4. Finally, generate the installation XML file and store it on your disk. This file will be used to install RapidMiner Server on the nodes.

Prepare the first RapidMiner Server node

  1. Provision the infrastructure of the first RapidMiner Server node. You can automate this by using a configuration management tool such as Chef or Puppet or by spinning up identical virtual machine snapshots.
  2. Make sure the filesystem of your RapidMiner Server node supports UTF-8. If not add the following statement to the/etc/environmentconfiguration file:

    LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8
  3. Mount the shared home directory.

    • 例如,假设您的RapidMiner服务器home directory is/var/rapidminer/application-data/rapidminer-server/and your shared home directory is available as an NFS export calledrapidminer-san:/rapidminer-server-home. Add the following line to/etc/fstabon each cluster node:

      rapidminer-san:/rapidminer-server-home /var/rapidminer/application-data/rapidminer-server/ nfs lookupcache=pos,noatime,intr,rsize=32768,wsize=32768 0 0
    • Then mount it:

      mkdir -p /var/rapidminer/application-data/rapidminer-server/ sudo mount -a
  4. Make sure all nodes have synchronized clocks and identical timezone configuration. Here are some examples for how to do this:

    • Red Hat Enterprise Linux or CentOS:

      sudo yum install ntp sudo service ntpd start sudo tzselect
    • Ubuntu:

      sudo apt-get install ntp sudo service ntp start sudo dpkg-reconfigure tzdata

Install RapidMiner Server on the first node

Once the infrastructure for the first RapidMiner Server node is available and meets all the node requirements, you can start installing RapidMiner Server.

Install RapidMiner Server

  1. Download the RapidMiner Server installer and extract it
  2. Upload the headless installation XML file to the node
  3. Run the headless installation:

    cd  ./bin/rapidminer-server-installer .xml

Adapt configuration

After the installation has finished you need to adapt a few configurations to configure RapidMiner Server for High Availability.

  1. First adapt theexecution.propertiesconfiguration file to enable the cluster mode. The file can be found in the/configuration/folder.

    1. Enable clustered mode for RapidMiner Server via

      rapidminer.server.isClustered = true
    2. Configure the load balancer URL as the RapidMiner Server URL like this

      rapidminer.server.protocol = http rapidminer.server.host =  rapidminer.server.port = 
    3. Disable the embedded ActiveMQ broker and point to the external broker like this:

      jobservice.queue.activemq.embeddedBroker.enabled = false jobservice.queue.activemq.uri = failover:(tcp://172.31.21.116:61616,tcp://172.31.21.112:61616) jobservice.queue.activemq.username = brokerUser jobservice.queue.activemq.password = brokerP4ssw0rd
  2. Next updatescheduler.propertiesconfiguration file to enabled a clustered scheduler. The config file is located in the same folder as theexecution.propertiesfile. Add following lines:

    org.quartz.jobStore.isClustered = true org.quartz.jobStore.clusterCheckinInterval = 10000
  3. Edit thestandalone.conffile located in the/bin/folder.

    1. Look for

      JAVA_OPTS="$JAVA_OPTS -Djboss.server.log.dir=$RAPIDMINER_SERVER_HOME/log"

      and change it to a new log folder that matches the instance name. For example:

      JAVA_OPTS="$JAVA_OPTS -Djboss.server.log.dir=$RAPIDMINER_SERVER_HOME/log/instance1"
    2. Also, add a new line that points the Execution Backend to thelocalhostright next to the other JAVA_OPTS lines. For example:

      JAVA_OPTS="$JAVA_OPTS -Dexecution-backend-url=http://localhost:8080/executions"
  4. Add the RapidMiner Server node to the load balancer

  5. Start the first RapidMiner Server node
  6. Open Web UI of RapidMiner Server athttp(s)://:and login as admin
  7. Make sure everything works fine (e.g. extensions are loaded, server logs can be inspected, etc.)

Install additional RapidMiner Server nodes

Once the first RapidMiner Server node is up and running, you can add more nodes to the cluster. There are two ways you can add more nodes: either manually or with a snapshot of the first node. Both are described below. The manual option requires a little more effort though.

Add nodes manually

To add nodes manually:

  1. Provision the infrastructure for additional modes, and then repeat the headless installation steps described inthe section above.
  2. You donotneed to adapt the whole configuration again. But unfortunately each RapidMiner Server headless installation overwrites the shared configuration folder of the initial installation. Please go to thefolder and restore the backup configuration every time the headless installation has finished. For example:

    cd  ./bin/rapidminer-server-installer .xml ### # wait for installation to finish ### cd /var/rapidminer/application-data/rapidminer-server/ # delete newly created configuration and replace initial config rm -rf configuration/ mv configuration_backup_9.1.0_2018-11-08_14-40-42/ configuration/
  3. Configure a new log folder in the file/bin/standalone.conf, as describedin the section above.

  4. Once the installation is finished and the initial configuration is restored, you can make the new node available as an endpoint by adding the IP address and port8080to theloadbalancer.
  5. Start the new RapidMiner Server node

Add nodes from snapshot

If you are running RapidMiner Server in a virtual infrastructure or in the Cloud, we recommend creating a snapshot of the initial node, then adding new nodes from the snapshot.

To do so:

  1. Shutdown RapidMiner Server on the initial node
  2. Create a snapshot of the virtual instance
  3. Restart the initial RapidMiner Server node once the snapshot has been created
  4. Create a new node from the just created snapshot
  5. SSH to the new cluster node and configure a new log folder in the/bin/standalone.conffile as describedin the section above.
  6. Add the new node to the load balancer
  7. Start new RapidMiner Server node

Install Job Agents

Each Job Agent should be installed on a dedicated machine. You can download the Job Agent ZIP file from RapidMiner Server's web interface, or you can call the REST API. We recommend the second approach, because you don't have to upload the ZIP file via SSH to your dedicated Job Agent machine. Using the second approach, proceed as follows:

  1. SSH to your machine on which the JobAgent will run.
  2. To download the JobAgent ZIP file:

    1. Obtain a token (value of theidTokenfield) which is eligible to access the download JobAgent route, e.g. the admin user:

      curl -u admin:PASSWORD http(s)://:/api/rest/tokenservice
    2. Download the ZIP for a queue QUEUENAME. The default queue is named DEFAULT. Be aware that names are case sensitive.

      curl -H "Authorization: Bearer TOKEN_FROM_REQUEST_ABOVE" http(s)://:/executions/queues/QUEUENAME/agent --output /path/to/save/location/JobAgent.zip
  3. Unzip the ZIP file to your preferred location. For example:

    unzip /path/to/save/location/JobAgent.zip -d /path/to/extract/location
  4. Adjust properties in thehome/config/agent.propertiesfile to your needs. The ActiveMQ broker URI should point to your ActiveMQ cluster which you've already configured in theexecution.propertiesfile of the shared RapidMiner Server home directory. Theuriproperty represents a set of available ActiveMQ instances with their default port61616. For example:

    jobagent.queue.activemq.uri = failover:(tcp://172.31.21.116:61616,tcp://172.31.21.112:61616) jobagent.queue.activemq.username = brokerUser jobagent.queue.activemq.password = brokerP4ssw0rd
  5. (Optional) Add extensions or JDBC drivers.

  6. Start the JobAgent.

Congratulations!

That's it! RapidMiner Server is accessible in High Availability mode from a URL like this:http(s)://: