Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.9 -Check here for latest version

Using the MongoDB Connector

This guide targets the new Connection Management introduced with RapidMiner Studio 9.3.

For the old Legacy MongoDB connections see the9.2 documentation

The MongoDB connector allows you to connect to instances of theNoSQLdatabase MongoDB directly from RapidMiner Studio. It supports all CRUD operations (Create,Read,Update, and删除), as well as running more sophisticated database commands. This document will walk you through how to:

Install the NoSQL Connector extension

First, you need to install the NoSQL Extension:

Connect to Your MongoDB Instance

Before you can use the MongoDB connector, you have to configure a new MongoDB connection. For this purpose, you will need the connection details of your database (host name, port, and database name). If your MongoDB installation requires authentication, you will also need valid credentials.

  1. In RapidMiner Studio, right-click on the repository you want to store your MongoDB connection in and chooseNew Connection IconCreate Connection.

    You can also click onConnections >New Connection IconCreate Connectionand select therepositoryfrom the dropdown of the following dialog.

  2. Enter a name for the new connection and setConnection TypetoCassandra IconMongoDB:

  3. Click onCreate IconCreateand switch to theSetuptab in theEdit connectiondialog.

  4. Fill in the connection details of your MongoDB server:

    The preconfigured port is the default port used by MongoDB. Note that MongoDB does not require user authentication by default.

    While not required, we recommend testing your new MongoDB connection by clicking theConnection Test IconTest connectionbutton. If the test fails, please check whether the details are correct.

  5. ClickSave IconSaveto save your connection and close theEdit connectiondialog.

You can now use the newly created connection with all of the MongoDB operators!

Read from MongoDB

TheRead MongoDBoperator allows to read data from MongoDB collections. MongoDB uses theJSONformat to represent data and does not use database schemata. This data format can be converted to RapidMiner Studio's native format via theJSON to Dataoperator. For the opposite direction, use theData to JSONoperator.

Let us start with reading the raw JSON data without further conversions.

  1. Open a new processNew Process Iconin RapidMiner Studio. Drag theRead MongoDBoperator into theProcessview, and connect its output port to the result port of the process: Select your MongoDB connection for theconnection entryparameter from the Connections folder of the repository you stored it in by clicking on therepository chooser iconbutton next to it:

    Alternatively, you can drag the MongoDB connection from the repository into theProcess Panel和连接的操作符Read MongoDBoperator.

  2. Select your MongoDB connection from themongodb instancedrop down menu in the operator parameters.

  3. Select a MongoDB collection from thecollectiondrop down menu. It should be populated with the collections available in the configured MongoDB database:

  4. RunRun Processthe process! In the Result Perspective, you should see a single collection of JSON documents (provided that the selected collection is not empty). In our example, the collection contains RapidMiner Studio'sDealssample data set:

Convert into a single example set

Let us now extend the process to convert this collection of JSON documents into a single example set, i.e., into a format that is compatible with RapidMiner Studio's core operators.

  1. Navigate to theDesignperspective and add aJSON to Dataoperator in between theRead MongoDBoperator and the result port:

  2. RunRun Processthe process again! In the Result Perspective, you should see a single example set containing the same data as in the previous run:

    You can now work with this example set as you are used to from other data sources. However, you might wonder how to query specific subsets of a MongoDB collection. As of now, we have always queried the entire collection.

    An introduction to MongoDB's query syntax would be out of the scope of this guide. Please refer to the officialMongoDB documentationfor an in depth introduction to MongoDB. However, to give you an idea, let us modify the process one last time.

    In our example, the JSON documents in our MongoDB collection contain a field namedFuture Customer. We can specify a simple query criterion that requires the value of this field to beyes(changing the following example to match your own data should be straight forward).

  3. Navigate to the Design Perspective, select theRead MongoDB运营商和编辑操作参数namedcriteria.

  4. Enter the following short JSON document (the query criterion):

  5. RunRun Processthe process again. The result set should only contain examples where the value of the attributeFuture Customerisyes:

Write to MongoDB

Writing an example set to a MongoDB collection is easy: load the example set, convert it to a collection of JSON documents, and write it to MongoDB. The following example illustrates how to write one of RapidMiner Studio's sample data sets to a new MongoDB collection.

  1. Open a new processNew Process Iconin RapidMiner Studio.

  2. Drag theIrissample data set, theData to JSONoperator, and theWrite MongoDBoperator into theProcessview and connect the operators as shown in the following screen shot. Select your MongoDB connection and enter a name for the newcollection:

    Note that you can also select an existing collection. MongoDB would then add the new JSON documents to this collection, regardless of the structure of the documents (remember that MongoDB collections have no static schema).

  3. RunRun Processthe process! In the Result Perspective, you should see the collection of JSON documents that have been added to the specified MongoDB collection:

    Note that MongoDB automatically assigns unique IDs to newly added documents. As a consequence, running this process multiple times will result in duplicate entries.