You are viewing the RapidMiner Studio documentation for version 9.7 -Check here for latest version
Connect to your data
To be effective as a data science tool, RapidMiner Studio has to first connect to your data.
- If the data is in a file on your computer, RapidMiner Studio has to read the文件格式.
- If the data is in adatabase, RapidMiner Studio has to connect to that database, and know the language of that database (SQL / NoSQL).
- If the data is in thecloud, RapidMiner Studio has to connect to the cloud service and know its API.
- If the data is imported from or exported to another software tool, for examplePythonorTableau, RapidMiner Studio has to know about that tool.
- If the connection is via aproxyor aself-signed SSL certificate, RapidMiner Studio has to navigate that hurdle.
The good news is that RapidMiner Studio supports a wide range of文件格式s, databases, cloud services, and other software tools, either natively or viaextensions.
Connection Objects
The concept of aconnection objectwas introduced in RapidMiner Studio 9.3.
You canconvert your legacy connections into connection objects.
When the connection to your data occurs over a network, you must first create aconnection object. A connection object enables the connection to adatabase,cloud, oremailservice. All connection objects are stored in a repository, in theConnectionssubfolder.
From now on, we'll simply call themconnections, remembering however that they have similarities to other objects in therepository. You can, for example, drag a database connection into the Process Panel toRetrieveit, before connecting the output to theRead DatabaseOperator.
To create a connection, right-click on theConnectionsfolder, and selectCreate Connection. TheCreate connectiondialog opens, and you can configure your connection. If you're connecting to an SQLdatabase:
- Choose theConnection Type(Database),Repository(where the connection will be stored) andConnection Name.
- PressCreateand theEdit Connectiondialog opens.
- Under theSetuptab, select theDatabase Systemand fill inUser,Password,Host,Port, and (optionally) theDatabase名字
- Press测试连接. Once it's working,Savethe connection. The connection will appear in the
Connections subfolder of the repository you selected in step (1).
You can view the connection details at any time by double-clicking on the connection in the Repository Panel, or by right-clicking on the connection and choosingOpenorEdit.
Injected parameters: sharing connections
Connection objectscan be shared.
Suppose that a group of users has access to the same database, and they collaborate onRapidMiner AI Hub. Can they share the database connection, without sharing their usernames and passwords? The answer isyes!
The solution is to build the connection as a template, where all the common parameters are pre-filled, and all the parameters unique to each user areinjected. The values of theinjected parametersare not stored in the connection object, but retrieved from an external source every time the connection is used. Possible external sources includemacrosand secure storage onRapidMiner AI Hub.
Tocreate a connection in a RapidMiner AI Hub repositoryRapidMi或复制一个连接ner AI Hub repository, a user has to belong to theconnection managergroup. SeeSharing and permissions.
In outline, assuming the database credentials will be securely stored on RapidMiner AI Hub, the whole process of using a connection template might proceed as follows. We'll call the user with theconnection managerrole theadmin.
Within RapidMiner Studio, theadmincreates a connection in aRapidMiner AI Hub repository. While it's possible to create a connection in a local repository, that connection will only providemacrosas an injection source.
While editing the connection, theadminpresses the buttonSet injected parametersand selects the parameters whose values will be left blank until later (e.g. User and Password). The admin must also choose RapidMiner AI Hub as the source of the injected values.
To set the injected values, ausermust connect to the web interface of RapidMiner AI Hub. Either click the link displayed in theEdit connectiondialog
or connect directly to the web interface, then navigate toRepository>Connections, and identify the connection by name. A warning says:This connection has missing values. The user clicks the link, fills in his or her own username and password, and presses the buttonSave in RapidMiner AI Hub, where the credentials are securely saved. Step (3) needs to be repeated by each individual user.
For more details, read the RapidMiner AI Hub documentationCreate connectionsandUsage and injection.
Macros as a source of injected parameters
Within RapidMiner Studio, using values from process macros for your connection settings is immediately possible. When editing a connection, pressSet injected parametersand choose which parameters should get values from macros. The macro name then needs to match the parameter key to be able to inject that value. The parameter key can be found in the information next to the parameter.
Configuration for the macro source is optional. Without configuring a prefix, the macro name has to match the parameter key. If the prefix for the configuration is given, the macro name has to match the prefix followed by an underscore (_
), ending with the parameter key. For the prefixmyprefix关键的参数userwould require the macro name
myprefix_user
The macro that should be used will be shown when setting injection, as well as in the view and edit dialogs themselves.
Use this for your macro to properly inject it into the connection.
Placeholders
Placeholders can be used inside any configuration parameter's value to reference other parameters. It is possible to concatenate placeholders and free text. Nesting of placeholders is not supported.
Since the syntax for placeholders is the same as formacros, it is important to make the context clear:
- The context for macros isprocesses.
- The context for placeholders isconnections.
A placeholder can access parameter values from the current tab as well as from any other tab. To find out the key of a field you want to reference via placeholder in a different field, look at the information tooltip of the original field. TheFull keyis what you're looking for:
To use this placeholder in another field, simply reference the full key in the other field by surrounding it with a percentage sign (%
) and curly brackets ({}
), like this:
%{db_config.database}
If a placeholder cannot be resolved, it is simply replaced with an empty string, but still counts as an injected value and will not fail the process execution.
The JDBC based database connections use this mechanism to create the URL from the parameters.
Without parameter information the URL consists of several placeholders and a double colon. By setting the parameters these values are replaced.
Use the placeholder system exactly like this to configure dynamic parameter values.