Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.3 -Check here for latest version

Using the Google Cloud Storage Connector

This guide targets the new Connection Management introduced with RapidMiner Studio 9.3.

For the old Legacy Google Cloud Storage Connections see the9.2 documentation

The Google Cloud Storage Connector allows you to access your Google Cloud Storage directly from RapidMiner Studio. Bothreadandwriteoperations are supported. You can alsoread from a set of filesin a Google Cloud Storage directory, using theGoogle Storage IconLoop Google Storage操作符。This document will walk you through how to:

Connect to your Google Cloud Storage account

Before you can use the Google Cloud Storage connector, you have to configure a new Google Cloud Storage Connection. For this purpose, you will need the connection details of your account. This includes a project id and either an access token, or a private key for a service account.

  1. In RapidMiner Studio, right-click on the repository you want to store your Google Cloud Storage Connection in and chooseNew Connection IconCreate Connection.

    You can also click onConnections > Create ConnectionNew Connection Iconand select therepositoryfrom the dropdown of the following dialog.

  2. Give a name to the new Connection, and setConnection TypetoGoogle Storage iconGoogle Cloud Storage:

  3. Click onCreate IconCreateand switch to theSetuptab in theEdit connectiondialog.

  4. Fill in the connection details of your Google Storage account. You have two alternative options for that, see next two steps for details.

  5. You may use anaccess tokenthat you get after you allow RapidMiner to access your cloud account on a consent screen. This is the default option. LeaveUse Service Accountunchecked and follow the steps below.

    1. To the right of theAccess Tokenfield, click theId Iconbutton to request an access token.

    2. Click onRequest access tokenWebsite Iconto open the Google website in your browser. If you are not already logged into your Google Cloud account, you will have to do so now. You can manually copy the URL by clicking onShow URL instead.

    3. ClickAllowto give RapidMiner access to your Google Cloud account and to generate a token. This will bring you to a page where you can see the access token. Copy the code you get there.

    4. Return to RapidMiner Studio, enter the access token, and clickComplete IconComplete:

    5. Specify theProject IDfor the Connection as well.

  6. Alternatively, you may setup aService accountfor your project. In this case, checkUse Service Accountflag and follow the steps below.

    1. After setting up the Service account, create and download a JSON key for it. Use thefilechooser buttonfile chooser iconnext to thePrivate Key File Contentfield to select the JSON file containing the key. Alternatively you can paste the entire JSON file content (e.g. using a text editor and the clipboard) into thePrivate Key File Contentfield.

    2. Specify theProject IDfor the Connection as well.

  7. While not required, we recommend testing your new Google Cloud Storage Connection by clicking on theConnection Test IconTest connectionbutton. If the test fails, please check whether the details are correct.

  8. ClickSave IconSaveto save your Connection and close theEdit connectiondialog. You can now start using the Google Cloud Storage operators.

Read from Google Cloud Storage

TheGoogle Storage IconRead Google Storage从你的谷歌云运营商读取数据Storage account. The operator can be used to load arbitrary file formats, since it only downloads and does not process the files. To process the files you will need to use additional operators such asRead CSV,Read Excel, orRead XML.

Let us start with reading a simplecsvfile from Google Cloud Storage.

  1. Drag aRead Google Storageoperator into theProcess Panel. Select your Google Cloud Storage Connection for theconnection entryparameter from theConnectionsfolder of the repository you stored it in by clicking on therepository chooser iconbutton next to it:

    Alternatively, you can drag the Google Cloud Storage Connection from the repository into theProcess Paneland connect the resulting operator with theRead Google Storage操作符。

  2. Click on thefilechooser buttonfile chooser iconto view the files in your Google Cloud Storage account. Select the file that you want to load and clickFile Chooser IconOpen. Note that you needstorage.buckets.listpermissions on the project to be able to list the buckets and use the file chooser. If you do not have that permission, please type the path from which you want to read directly into the parameter field.

    As mentioned above, theGoogle Storage IconRead Google Storageoperator does not process the contents of the specified file. In our example, we have chosen acsvfile (acomma separated valuesfile). This file type can be processed via theRead CSV操作符。

  3. Add aRead CSVoperator between theGoogle Storage IconRead Google Storageoperator and the result port. You may set the parameters of theRead CSVoperator - such as column separator -, depending on the format of your csv file:

  4. RunRun Processthe process! In the结果perspective, you should see a table containing the rows and columns of your choosen csv file:

You could now use further operators to work with this document, e.g., to determine the commonness of certain events. To write results back to Google Cloud Storage, you can use theGoogle Storage IconWrite Google Storage操作符。It uses the same Connection Type as theGoogle Storage IconRead Google Storageoperator and has a similar interface. You can alsoread from a set of filesin a Google Cloud Storage directory, using theGoogle Storage IconLoop Google Storage操作符。对于这一点,你需要指定connection entryand thefolder, which you want to process, as well the steps of the processing loop with nested operators. For more details read the help of theGoogle Storage IconLoop Google Storage操作符。