You are viewing the RapidMiner Server documentation for version 9.4 -Check here for latest version
Scalable architecture
To build a RapidMiner Server environment for your data science team, two components have to be installed:
- RapidMiner Server- the central component
- Job Agents- local or remote, to provide scalability
plus the following helper applications:
- RapidMiner Studio- to design the processes you will run in the RapidMiner Server environment
- A database - to store configuration files, cron job details, user report requests, and other support data
The following optional component can be installed, and it runs independently:
- Real-Time Scoring Agents- provide scalability of low-latency web services
一个简单的示意图如下所示。
The design
The design of the RapidMiner Server environment reflects a typical data science workflow, where there are two kinds of activities:
Model building, involving long-running processes that can be placed on a queue and run asynchronously
RapidMiner Serveroffers a queue system for long-running jobs, which are executed externally viaJob Agents. You increase processing power by adding Job Agents.
Prediction, or any other application of the models, where the need for a real-time response is paramount
There are two engines for generating predictions:
- Web services, executed directly by RapidMiner Server
- Real-Time Scoring Agents, external entities that run independently of RapidMiner Server
Only the latter are scalable. You increase processing power by adding Real-Time Scoring Agents.
RapidMiner Server
RapidMiner Server is the central component in the architecture. You interact with it via a web interface or via RapidMiner Studio. Its main responsibilities are:
- User,queue, and permissions management
- Schedulingof user jobs (processes)
- Execution of processes called via web services / web apps
- Execution of processes running on the local Job Agent, if it exists
- Repository management(storage of models, processes, etc. and permissions for them)
- 连接管理(DB, Hadoop/Radoop, etc.)
Read more:Install RapidMiner Server
In the diagram below, each blue box represents a separate machine. RapidMiner Server is installed on the big blue box on the left, while the blue boxes on the right host remote Job Agents.
Job Agent
The design with Job Agents running remotely on dedicated machines is aimed at scalability. However, one or more Job Agents can beinstalled locally, on the same machine as RapidMiner Server.
Each Job Agent is configured to point to one of the queues on RapidMiner Server. Its only responsibility is to pick up jobs from the queue and run them, by spawning a Job Container. For each Job Agent, the number of Job Containers that can be spawned and the available memory is configurable.
Multiple Job Agents can point to the same queue. You canmanage the queues, and therefore the allocation of resources, by assigning permissions.
Read more:Install a Job Agent
Job Container
The Job Container spawned by the Job Agent runs a RapidMiner Studio instance that executes a process. Once that process is finished, the Job Container terminates. Because each job runs in its own sandbox, the system is highly robust; problems with one job have no effect on any other job.
The price of safety is latency -- the latency in spawning a Job Container is measured in seconds. If a real-time response is not paramount, this latency will not be important, but if it is, we recommend using a web service or a Real-Time Scoring Agent. For example, you might build a model in a Job Container and generate predictions for that model via a Real-Time Scoring Agent.
Real-Time Scoring Agent
As mentioned previously, there are two engines for generating predictions:
- Web services, executed directly by RapidMiner Server
- Real-Time Scoring Agents, external entities that run independently of RapidMiner Server
When generating predictions via the Real-Time Scoring Agent, you need RapidMiner Server tocreate the deployment, but once it'sinstalled, it runs independently of RapidMiner Server.
As the table below makes clear, the Real-Time Scoring Agent is the scalable, low-latency counterpart to the Job Agent / Job Container. In short, it's just what you need for real-time predictions.
Component | scalable | low-latency |
---|---|---|
Job Agent / Job Container | ||
Real-Time Scoring Agent | ||
Web service |
Read more:Web services
Read more:Real-Time Scoring