As mentioned in aprevious blog, we are on a journey to develop and improve the connectivity between RapidMiner Cloud and any external data sources. The goal is to allow users to incorporate external data into their workflows, be it as input or output.
在4个阶段,我们计划将连接nd we have now completed phase 2:
- Phase 1: connections are available, but the user still needs Studio to create and edit them (complete).
- Phase 2:it will be possible to add and edit connections directly in the cloud (complete).
- Phase 3: connections will be shareable among multiple users and projects.
- Phase 4: it will be possible to create connection templates which can be shared in a secure way without credentials or other sensitive data.
Phase 2 is actually the most important, because it makes connections usable in any RapidMiner project. The remaining task is to improve connection management for large environments.
What is a connection?
A connection is a RapidMiner object that contains the information you need to connect to a particular external data source. This information may include:
- keys and secrets for a cloud repository,
- credentials for a database, or
- a token for sources based on cloud authentication.
If you want to use data from an external source, you first need to create (or ask your administrator to create) a connection.
How to use a connection
In the Designer, connections can be dragged and dropped into a workflow and attached to the corresponding read or write operators. The read, write, and loop operators automatically use the connection details to work with the data as if it were local.
Supported data sources
RapidMiner supports a variety of data sources, including the following:
- SQL Databases
- Amazon S3
- Azure Blob
- Azure Data Lake (Gen1)
- Azure Data Lake (Gen2)
- Dropbox
- Google Cloud
- Salesforce
This list will grow as we add support for more types.