Categories

Versions

You are viewing the RapidMiner Legacy documentation for version 9.9 -Check here for latest version

Load balancer

There are several solutions for load balancing traffic between different instances of the same application (nginx load balancer in commercial version,Elastic Load Balancing), butHAProxy开源的首选解决方案是当前负载balancing with support for session stickiness and will therefore be presented in this guide. It's able to handle a lot of traffic. Similar to nginx, it uses a single-process, event-driven model and therefore has a low memory fingerprint and is able to handle a large number of concurrent requests.

This article covers how to set up HAProxy to load balance between two RapidMiner Server instances, but SSL configuration is not covered by this guide.

Setup

Be sure that you follow the steps outlined in this article. The load balancer should be a dedicated machine which is only responsible for redirecting traffic and load balancing several RapidMiner Server instances. In this setup we'll assume that you use an Ubuntu machine and that SSL configuration will not be done within the load balancer but within an additionalreverse proxy.

  1. Installhaproxywith the package manager of your distribution. For Ubuntu there's a dedicated repository to install thehaproxypackage:

    sudo add-apt-repository ppa:vbernat/haproxy-1.8 sudo apt-get update sudo apt-get install -y haproxy
  2. After the installation, the HAProxy configuration can be found at/etc/haproxy/haproxy.cfg. The default configuration is split into two sections:globalanddefaults. If you want to change the user which runs the HAProxy process or adapt some logging behaviour, you can do this in those sections. See theHAProxy documentationfor more details. For our basic setup we'll skip those and just define two additional sections:frontendandbackend. Thefrontendsection contains the connections where HAProxy receives incoming traffic. Thebackendsection contains the connections where HAProxy redirects and load balances the traffic to.

  3. Add thefrontendsection to yourhaproxy.cfg:

    frontend localnodes bind *:80 mode http default_backend rapidminerservers

    In this example setup, HAProxy will listen for requests on all network interfaces (*) on port80but only for the HTTP protocol. Thefrontendsection serves as trafficinput. All observed/incoming traffic from this port is load balanced between nodes defined in thebackendsectionrapidminerservers(trafficoutput).

  4. Add thebackendsection to yourhaproxy.cfg:

    backend rapidminerservers mode http balance roundrobin option forwardfor http-request set-header X-Forwarded-Port %[dst_port] http-request add-header X-Forwarded-Proto https if { ssl_fc } option httpchk HEAD / HTTP/1.1\r\nHost:localhost cookie RAPIDMINER_SRV prefix server rapidminerserver1 ip-address-of-first-instance:8080 cookie check server rapidminerserver2 ip-address-of-second-instance:8080 cookie check
    • mode http: This will pass HTTP requests to the servers listed.
    • balance roundrobin: Use theround-robinstrategy for load distribution.
    • option forwardfor: Adds theX-Forwarded-Forheader so RapidMiner Server instances can get the clients actual IP address. Without this, RapidMiner Server instances would instead see every incoming request as coming from the load balancer's IP address.
    • http-request set-header X-Forwarded-Port %[dst_port]: Manually add theX-Forwarded-Portheader so that RapidMiner Server instances know which port to use when redirecting.
    • option httpchk HEAD / HTTP/1.1\r\nHost:localhost: Set the health check HAProxy uses to test if the RapidMiner Server instances are still responding. If these fail to respond without error, the server is removed from HAProxy. This sends a HEAD request with the HTTP/1.1 andHostheader set.
    • http-request add-header X-Forwarded-Proto https if { ssl_fc }: Add theX-Forwarded-Protoheader and set it to "https" if the "https" scheme is used over "http" (viassl_fc). Similar to the forwarded-port header, this can help RapidMiner Server instances determine which scheme to use when sending redirects.
    • cookie RAPIDMINER_SRV prefix: Add a unique session identifier. With the help of this sticky sessions are enabled.
    • server rapidminerserver1 ip-address-of-first-instance:8080 cookie check: Add RapidMiner Server instances for HAProxy to balance traffic between. Set their IP address and port (RapidMiner Server's default port is8080), and adds the directive check to tell HAProxy to health check the server. Thecookiedirective tells HAProxy to always re-use the same server for a session (stickiness).

    Ensure that the load balancer can reach all RapidMiner Server instances on port8080.

  5. (Optional) Add a statistics website to monitor traffic and load balancing. Adjust yourhaproxy.cfgand add:

    听统计*:1936统计数据enable stats uri / stats hide-version stats auth someUser:somePassword
  6. (Optional) If you want to add a RapidMiner Server instance to HAProxy, addserver rapidminerserverX ip-address-of-instance:8080 cookie checkto thebackendsection.

  7. The load balancer is ready to serve after you've started the service withsudo service haproxy start. Depending on the distribution you use, you might need to setENABLED = 1in the/etc/default/haproxyconfig file. It will load balance between all configured RapidMiner Server instances. If you've configured thestatslisten you can now visit the load balancer's IP address on port1936with the usersomeUserand the passwordsomePasswordto monitor HAProxy.