Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.9 -Check here for latest version

Using the Cassandra Connector

This guide targets the new Connection Management introduced with RapidMiner Studio 9.3.

For the old Legacy Cassandra connections see the9.2 documentation

The Cassandra connector allows you to connect to clusters of theNoSQLdatabase Cassandra directly from RapidMiner Studio. It supports all CRUD operations (Create,Read,Update, and删除),以及运行更复杂的数据库commands. This document will walk you through how to:

Install the NoSQL Connector extension

First, you need to install the NoSQL Extension:

Connect to Your Cassandra Cluster

Before you can use the Cassandra connector, you have to configure a new Cassandra connection. For this purpose, you will need the connection details of your database (host name, port, and keyspace name). If your Cassandra installation requires authentication, you will also need valid credentials.

  1. In RapidMiner Studio, right-click on the repository you want to store your Cassandra connection in and chooseNew Connection IconCreate Connection.

    You can also click onConnections >New Connection IconCreate Connectionand select therepositoryfrom the dropdown of the following dialog.

  2. Enter a name for the new connection and setConnection TypetoCassandra IconCassandra:

  3. Click onCreate IconCreateand switch to theSetuptab in theEdit connectiondialog.

  4. Fill in the connection details of your Cassandra cluster:

    The preconfigured port is the default port used by Cassandra. Note that Cassandra does not require user authentication by default.

    While not required, we recommend testing your new Cassandra connection by clicking theConnection Test IconTest connectionbutton. If the test fails, please check whether the details are correct.

  5. ClickSave IconSaveto save your connection and close theEdit connectiondialog.

You can now use the newly created connection with all of the Cassandra operators!

Read from Cassandra

TheRead Cassandraoperator allows to read data from Cassandra tables.

  1. 打开一个新processNew Process Iconin RapidMiner Studio, drag theRead Cassandraoperator into theProcessview, and connect its output port to the result port of the process: Select your Cassandra connection for theconnection entryparameter from the Connections folder of the repository you stored it in by clicking on therepository chooser iconbutton next to it:

    Alternatively, you can drag the Cassandra connection from the repository into theProcess Paneland connect the resulting operator with theRead Cassandraoperator.

  2. Define the query consistency level. For clusters with fewer than three nodes, it is recommended to set it toONE. Otherwise use the default valueQUORUM.

  3. Define the query type (query,query file, ortable). If you choosetable, another parameter will show which will be populated with the tables available.

  4. RunRun Processthe process! In the Result Perspective, you should see the example set loaded from Cassandra. In our example, the example set contains RapidMiner Studio'sDealssample data set:

Write to Cassandra

TheWrite Cassandraoperator allows to write data to Cassandra tables. As a requirement of the Cassandra data storage system each data row needs to be identified by an unique ID (which can consist of one or more columns). The following example illustrates how to write one of RapidMiner Studio's sample data sets to a new Cassandra table.

  1. 打开一个新processNew Process Iconin RapidMiner Studio.

  2. Drag theIrissample data set and theWrite Cassandraoperator into theProcessview and connect the operators as shown in the following screen shot. Select your Cassandra connection and enter a name for the newtable:

    Note that you can also select an existing table.

    Cassandra would then update the table with the new data (if the schema of the new data matches the selected Cassandra table schema). This also means that one has to be careful when writing data to Cassandra as data with the same unique ID as the new data will just be overwritten.

  3. Connect theWrite Cassandraoperator to the results port and runRun Processthe process!