Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.1 -Check here for latest version

Using the Cassandra Connector

The Cassandra connector allows you to connect to clusters of theNoSQLdatabase Cassandra directly from RapidMiner Studio. It supports all CRUD operations (Create,Read,Update, and删除), as well as running more sophisticated database commands. This document will walk you through how to:

Install the NoSQL Connector extension

First, you need to install the NoSQL Extension:

Connect to Your Cassandra Cluster

Before you can use the Cassandra connector, you have to configure a new Cassandra connection. For this purpose, you will need the connection details of your database (host name, port, and keyspace name). If your Cassandra installation requires authentication, you will also need valid credentials.

  1. Open theManage Connectionsdialog in RapidMiner Studio by going toManage Connections IconTools > Manage Connections.

  2. Click onAdd ConnectionAdd Connection Iconin the lower left:

  3. Enter a name for the new connection and selectCassandra IconCassandra Connectionas theConnection Type:

  4. Fill in the connection details of your Cassandra cluster:

    The preconfigured port is the default port used by Cassandra. Note that Cassandra does not require user authentication by default. Optionally, you can test the new configuration by clicking on theConnection Test IconTestbutton.

  5. ClickSave Icon保存所有更改to save your connection and close theManage Connectionswindow.

You can now use the newly created connection with all of the Cassandra operators!

Read from Cassandra

TheRead Cassandraoperator allows to read data from Cassandra tables.

  1. Open a new processNew Process Iconin RapidMiner Studio, drag theRead Cassandraoperator into theProcessview, and connect its output port to the result port of the process:

  2. Select your Cassandra connection from theconnectiondrop down menu in theParametersview.

  3. Define the query consistency level. For clusters with fewer than three nodes, it is recommended to set it toONE. Otherwise use the default valueQUORUM.

  4. Define the query type (query,query file, ortable). If you choosetable, another parameter will show which will be populated with the tables available.

  5. RunRun Processthe process! In the Result Perspective, you should see the example set loaded from Cassandra. In our example, the example set contains RapidMiner Studio'sDealssample data set:

Write to Cassandra

TheWrite Cassandraoperator allows to write data to Cassandra tables. As a requirement of the Cassandra data storage system each data row needs to be identified by an unique ID (which can consist of one or more columns). The following example illustrates how to write one of RapidMiner Studio's sample data sets to a new Cassandra table.

  1. Open a new processNew Process Iconin RapidMiner Studio.

  2. Drag theIrissample data set and theWrite Cassandraoperator into theProcessview and connect the operators as shown in the following screen shot. Select your Cassandra connection and enter a name for the newtable:

    Note that you can also select an existing table.

    Cassandra would then update the table with the new data (if the schema of the new data matches the selected Cassandra table schema). This also means that one has to be careful when writing data to Cassandra as data with the same unique ID as the new data will just be overwritten.

  3. Connect theWrite Cassandraoperator to the results port and runRun Processthe process!