You are viewing the RapidMiner Studio documentation for version 9.0 -Check here for latest version
The Design View
See also thevideo introductionto the RapidMiner GUI.
RapidMiner Studio is a visual workflow designer for predictive analytics that brings data science and machine learning to everyone on the analytics team.
When you're working on a new project of any kind, often the first step will be to go to a whiteboard, where you will plan the workflow and identify the key steps on the way to your goal. If you're a data scientist, the workflow will usually include one or more of the following steps:
- Import data
- Prepare data
- Build a model
- Validate the model
- Apply the model
RapidMiner Studio implements your whiteboard workflow in software, in theDesign View.The Design View includes numerouspanels.
- Data, processes, and results are stored in theRepository.
- The essential elements of every workflow are calledOperators.
- Operators are connected viaports.The output of the first is passed as input to the second.
- A connected set of Operators that help you to transform and analyze your data is called aprocess.
- The behavior of an Operator can be modified by changing itsparameters.
- The behavior of an Operator can be understood by reading theHelp.
Each of these terms will be examined in more detail below.
The default view
Process
Process: A connected set of Operators that help you to transform and analyze your data.
Also known as: flow, program, pipeline, diagram
Your goal is to create a finishedprocess, a connected set of Operators that produce a result. For example, your process might read a data set and build a predictive model. When you have connected all yourOperators并设置theirparameters, press theRunbutton at the top of the user interface, and the results will be displayed in theResults View.
As discussed inRun a Process, there is more than one way to run your process. You can run it:
- locally
- in the background
- on RapidMiner Server
- on RapidMiner Server, as ascheduled process
As your processes grow in size, you will need some way to manage their complexity.
- You canhide the complexity, by moving groups of Operators into a single
Subprocess
Operator. - You can从另一个进程内运行过程, via the
Execute Process
Operator.
To save your process to aRepository, selectFile
>Save Process
from the main menu.
You can easily share a process by first exporting it to an XML file:
- to export the process, select
File
>Export Process
.The export dialog allows you to save the file as.rmp
or.xml
; in reality, both these file formats are identical (XML). - to import the process, select
File
>Import Process
.
Ports
To build a process, you must connect the output from eachOperatorto the input of the next via aport.到康涅狄格州ect two ports, click on them. Hover a port to see a tooltip with additional information. When connecting two Operators, you need to make sure that the output port of the first is compatible with the input port of the second, or you will get an error message. The input and output ports for each Operator are described in the OperatorHelp, and a complete list of ports is given in the开始Glossary.
If you want to see the results in theResults View, you must connect the last Operator in a process to the results port ("res") on the right side of the Process Panel.
Hint: double-click on an output port, and it will be connected to the next available results (“res”) port.
The following example shows a simple process, where the data from an Excel file is (1) read, (2) stored in theRepository, and (3) displayed in the Results View.
A simple process
The screenshot below shows a more complex process, generated byAuto Model.
A more complex process
Repository
Repository: The place where your data, processes, and results are stored, either locally or remotely.
Also known as: folder, workspace, project
When working with RapidMiner Studio, you need a place to save your work. TheRepositorycan be used to store:
- data
- processes
- results
A Repository can be local orremote, to facilitate group collaborations. It is the natural place to store your processes (File
>Save Process
); whether you save your data and results in a Repository depends on your use case.
To start with, your data probably lives in a file or a database. RapidMiner Studio provides numerousOperatorsto help you import your data e.g.,Read Excel
orRead Database
.To launch the Import Data wizard, click onImport Data
in the Repository Panel, or selectFile
>Import Data
from the main menu.
Given the data and the process, your results can always be regenerated, but there might be good reasons to store the results in a Repository:
- If the result is a complex model based on a large data set, regenerating it will take time.
- If you are running a process on RapidMiner Server, you will need tostore the results(using the
Store
操作符),因为没有equiva RapidMiner服务器lent to theResults Viewin RapidMiner Studio.
Bundled Repositories
For both new and experienced users, the Repositories bundled with RapidMiner Studio are an invaluable resource. They provide a wide array of sample data sets and sample processes, together with explanatory notes.
- The
Training Resources
Repository is a large set of data sets and processes designed as a companion to the training lessons and courses at the onlineRapidMiner Academy.We encourage you to take advantage of these free courses to practice your skills. - The
Samples
Repository contains additional data sets and processes, including Time Series examples underSamples
>Time Series
. - The
Community Samples
Repository is a special collection of data sets and processes published by our top users in theRapidMiner User Community, not by the RapidMiner team. Each process contains a header naming the author, giving a brief explanation of the purpose, and linking to the thread from which it originated. We encourage users to read the online conversation that accompanies each process before using it, because processes in this Repository do not necessarily run as published.
Operators
Operators: The elements of a Process, each Operator takes input and creates output, depending on the choice of parameters.
Also known as: function, formula, node
To use RapidMiner Studio effectively, you have to learn about itsOperators.RapidMiner Studio includes hundreds of Operators, and therefore a large part of the task is learning how to find what you need. As so often with search, there are two major strategies:hierarchical searchandkeyword search.TheRapidMiner Communityis also a source of support.
To verify that the Operator you have found has the functionality you expect, read theHelp.
Once you've found the Operator you want, there are at least 3 ways of getting it into theProcess Panel.
Drag-and-drop the Operator
Double-click the Operator
Right-click the Operator, and choose
Insert Operator
from the context menu.
Hierarchical search
The hierarchy of folders in theOperators Panelreflects a typical data science workflow:
- Data Access
- Blending
- Cleansing
- Modeling
- Scoring
- Validation
- Utility
- Extensions
By opening these folders and their subfolders, you will get some insight into what's available.
This same hierarchy can be examined on thedocs website, which includes theHelpfor each Operator.
Keyword search
The alternative is keyword search. Although the Operators Panel includes a search field, the recommended procedure is to use theglobal search, in the upper right corner of the user interface. The global search finds not just Operators, but data and processes from the Repository, extensions from theMarketplace, and even actions you can take from the menu!
Hint: when you hover an Operator displayed by theglobal search, the Help for that Operator is displayed immediately in theHelp Panel.If you firstmaximize the Help Panel, you can quickly scan the Help pages for all the Operators that appear in your search.
Community search (Wisdom of Crowds)
If you've started building a process, and you're looking for hints, the "Wisdom of Crowds" can be helpful. The "Wisdom of Crowds" is an opt-in recommender system, based on the usage pattern of other RapidMiner users. It predicts which Operators you might need, based on the Operators that are already included in your process. To activate it, click on the button that saysActivate Wisdom of Crowds
.You can activate it or deactivate it at any time via the menu itemSettings
>Preferences
>Recommender
>Enable operator recommendations
.
If you still can't find what you are looking for, theRapidMiner Communitycan probably help. RapidMiner's data science team actively contributes.
Parameters
Parameters: Options for configuring the behavior of an Operator.
The content of theParameters Panelis context-dependent. Select anyOperatorthat is displayed in theProcess Panel, and the Parameters Panel displays the options for configuring that Operator. Because RapidMiner Studio includes many Operators, each with its own unique functionality, the range of parameters is also quite diverse. By default, RapidMiner Studio will show you only the more commonly used parameters. To see all of the available parameters, click展示先进的参数
.
To understand the parameters, you need to learn more about the Operator; reading theHelpfor that Operator is probably a good place to start. Alternatively, hover the information icon next to the parameter of interest, and a help text is displayed.
Help
Help: Displays a help text for the current Operator.
The content of theHelp Panelis also context-dependent. Select anyOperatorthat is displayed in theProcess Panel, and the Help Panel displays a help text for that Operator. The Help Panel provides useful background information, including:
- An overview of the Operator, its purpose, and its functionality
- A description of the Operator's input and outputports
- A description of the Operator'sparameters
- One or more examples, in the form of a TutorialProcess
Within the Help Panel, clicking on an example immediately opens the associated Tutorial Process in RapidMiner Studio, so that you can examine a relevant application.
All of the Operator help texts provided within RapidMiner Studio are also availableonline.
Reconfiguring the Design View
To optimize your screen real estate, you might consider reorganizing the panels. Notice first that you can right-click the tab connected with any panel, and select one of the following:
Detach
- The panel is detached from RapidMiner Studio.Maximize
- The panel fills the entire space allotted to panels.Close
- The panel is removed from the user interface.
If you need more space to read the Help, for example, you can maximize the Help Panel, then click the panel tab a second time to restore the user interface to its original state. If you don't really need a panel, you can close it. A closed panel can be restored via the menu items underView
>Show Panel
.
Additional configuration is available via drag and drop:
- Panels can be resized
- Panels can be moved
- Panels can be displayed as tabs.
The table below summarizes the available panels, most of which are not displayed, e.g., theXML Panel, that displays an XML representation of your process. The panels displayed inboldare the default panels. To display a panel from this list, select it from the menu underView
>Show Panel
.
Panel | Description |
---|---|
App Objects | Simulate aRapidMiner Server Appenvironment |
Background Monitor | Managebackground processesand results |
Cloud Monitor | Manage cloud processes and results |
Context | Advanced process settings and macros |
Data editor | Offers spreadsheet-like data manipulation |
Help | Documentation for the selected Operator |
History | Version control for processes on RapidMiner Server |
Log | View recorded events |
Macros | Live overview of definedmacros |
Operators | All Operators available to add to your process |
Overview | A zoomed out overview for huge processes |
Parameters | Configure Operator behavior in your process |
Problems | View potential problems in your process |
Process | Create and design your process here |
Repository | Manage your data and processes |
Resource Monitor | Displays the RAM currently used |
Result History | A history of all process results this session |
Server Monitor | Processes running on RapidMiner Server |
XML | An XML representation of your process |
To restore the Design View to the default panel setup, selectView
>Restore Default View
.