RapidMiner Job Container
Job Containers are the back-end components of RapidMiner Go that execute CPU-heavy computations such as model training and prediction. The default docker-compose-services only starts one Job Container on the same host as RapidMiner Go, but in a production environment multiple Job Containers should be started on separate machines. The load balancing between JC instances is handled by the AMQ service. A JC instance only performs one job at a time, so the next job in queue will be picked by the JC instance that first becomes idle.
Licensing
Job Containers depend on the license file atlicenses/rapidminer-go-on-premdirectory - if this is not present JC will not start. This folder is automatically mounted into the file system of every RapidMiner Go and Job Container instance - so there's no need to copy it manually.
Configuration using environment variables
A Job Container is a Spring Boot application. It currently has a single valid Spring profile value:broker-amq
.
Table of default environment variables:
Environment variable name | Description |
---|---|
JOB_QUEUE | AMQ job queue name |
JOB_STATUS_QUEUE | AMQ status queue name |
JOB_COMMAND_TOPIC | AMQ topic name |
AMQ_URL | AMQ URL |
AMQ_USERNAME | AMQ username |
AMQ_PASSWORD | AMQ password |
Multiple JobContainers and per user job limitation
Multiple JobContainer instances can be run by increasing theJOB_CONTAINERS
variable in .env file. In this case make sure there is enough available RAM on the host machine to be allocated for these instances. The default value ofMEMORY_PER_JOB_CONTAINER
requires 4GB per JobContaner. For instance by using the default memory settings with 2 JCs will require 4 + 4 * 2 = 12Gb RAM in total.
With multiple JCs available you can also increase theAUTOMODELER_EXECUTION_QUEUE_LIMIT_PER_USER
in AutoModeler settings. If this setting is equal to the number of JCs one user's jobs can be run parallely on all JCs - so an other user submitting his or her job later will need to wait until both JC finish their current job. By decreasing the queue limit you can limit every user to a fraction of the JCs thus preserving execution resources for other concurrent users.