Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.9 -Check here for latest version

Using the Amazon S3 Connector

This guide targets the new Connection Management introduced with RapidMiner Studio 9.3.

For the old Legacy Amazon S3 connections see the9.2 documentation

The Amazon S3 Connector allows you to access your Amazon S3 storage directly from RapidMiner Studio. Bothreadandwriteoperations are supported. This document will walk you through how to:

Connect to your Amazon S3 account

配置一个新的Amazon S3连接need the connection details of your Amazon S3 account (at least the access key and the secret key).

  1. In RapidMiner Studio, right-click on the repository you want to store your Amazon S3 connection in and chooseNew Connection IconCreate Connection.

    You can also click onConnections >New Connection IconCreate Connectionand select therepositoryfrom the dropdown of the following dialog.

  2. Enter a name for the new connection and setConnection TypetoAWS IconAmazon S3:

  3. Click onCreate IconCreateand switch to theSetuptab in theEdit connectiondialog.

  4. Fill in the connection details of your Amazon S3 account:

    Note that Amazon S3 supports arbitrary folder "delimiters" (symbols to separate nested folders), e.g., "/" as used for URLs or "\" as used by Microsoft Windows. If the configuration specifies the wrong delimiter, your folder structure might not be displayed correctly in RapidMiner Studio. Don't worry though, you can always change the delimiter in the connection configuration later on.

    虽然不是必需的,我们建议测试你的新Amazon S3 connection by clicking theConnection Test IconTest connectionbutton. If the test fails, please check whether the details are correct.

  5. ClickSave IconSaveto save your connection and close theEdit connectiondialog. You can now start using the Amazon S3 operators!

Read from Amazon S3

TheRead Amazon S3operator reads data from your Amazon S3 account. The operator can be used to load arbitrary file formats, since it only downloads and does not process the files. To process the files, you will need to use additional operators such asRead Document,Read Excel, orRead XML.

Let us start with reading a simple log file from Amazon S3.

  1. Drag aRead Amazon S3operator into theProcess Panel. Select your Amazon S3 connection for theconnection entryparameter from the Connections folder of the repository you stored it in by clicking on therepository chooser iconbutton next to it:

    Alternatively, you can drag the Amazon S3 connection from the repository into theProcess Paneland connect the resulting operator with theRead Amazon S3operator.

  2. Click on thefilechooser buttonfile chooser iconto view the files in your Amazon S3 account. Select the file that you want to load and clickFile Chooser IconOpen.

    As mentioned above, theRead Amazon S3operator does not process the contents of the specified file. In our example, we have chosen a log file (a plain text file). This file type can be processed via theRead Documentoperator which is part of theText Processingextension for RapidMiner Studio.

  3. If you have not already installed theText Processingextension for RapidMiner Studio, please go to the marketplace and do so now. Then add aRead Documentoperator between theRead Amazon S3operator and the result port:

  4. RunRun Processthe process! In theResultsperspective, you should see a single document containing the content of the log file.

You could now use further text processing operators to work with this document, e.g., to determine the commonness of certain events. To write results back to Amazon S3, you can use theWrite Amazon S3operator. It uses the same Connection Type as theRead Amazon S3operator and has a similar interface. You can alsoread from a set of filesin an Amazon S3 directory, using theLoop Amazon S3operator. For this you need to specify theconnection entryand thefolderthat you want to process, as well the steps of the processing loop with nested operators. For more details please read the help of theLoop Amazon S3operator.