Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.2 -Check here for latest version

Using the MongoDB Connector

The MongoDB connector allows you to connect to instances of theNoSQLdatabase MongoDB directly from RapidMiner Studio. It supports all CRUD operations (Create,Read,Update, and删除), as well as running more sophisticated database commands. This document will walk you through how to:

Install the NoSQL Connector extension

First, you need to install the NoSQL Extension:

Connect to Your MongoDB Instance

Before you can use the MongoDB connector, you have to configure a new MongoDB connection. For this purpose, you will need the connection details of your database (host name, port, and database name). If your MongoDB installation requires authentication, you will also need valid credentials.

  1. Open theManage Connectionsdialog in RapidMiner Studio by going toManage Connections IconTools > Manage Connections.

  2. Click onAdd ConnectionAdd Connection Iconin the lower left:

  3. Enter a name for the new connection and selectTree IconMongoDB Connectionas theConnection Type:

  4. Fill in the connection details of your MongoDB instance:

    The preconfigured port is the default port used by MongoDB. Note that MongoDB does not require user authentication by default. Optionally, you can test the new configuration by clicking on theConnection Test IconTestbutton.

  5. ClickSave IconSave all changesto save your connection and close theManage Connectionswindow.

You can now use the newly created connection with all of the MongoDB operators!

Read from MongoDB

TheRead MongoDBoperator allows to read data from MongoDB collections. MongoDB uses theJSONformat to represent data and does not use database schemata. This data format can be converted to RapidMiner Studio's native format via theJSON to Dataoperator. For the opposite direction, use theData to JSONoperator.

Let us start with reading the raw JSON data without further conversions.

  1. Open a new processNew Process Iconin RapidMiner Studio. Drag theRead MongoDBoperator into theProcessview, and connect its output port to the result port of the process:

  2. Select your MongoDB connection from themongodb instancedrop down menu in the operator parameters.

  3. Select a MongoDB collection from thecollectiondrop down menu. It should be populated with the collections available in the configured MongoDB database:

  4. RunRun Processthe process! In the Result Perspective, you should see a single collection of JSON documents (provided that the selected collection is not empty). In our example, the collection contains RapidMiner Studio'sDealssample data set:

Convert into a single example set

Let us now extend the process to convert this collection of JSON documents into a single example set, i.e., into a format that is compatible with RapidMiner Studio's core operators.

  1. Navigate to theDesignperspective and add aJSON to Dataoperator in between theRead MongoDBoperator and the result port:

  2. RunRun Processthe process again! In the Result Perspective, you should see a single example set containing the same data as in the previous run:

    You can now work with this example set as you are used to from other data sources. However, you might wonder how to query specific subsets of a MongoDB collection. As of now, we have always queried the the entire collection.

    An introduction to MongoDB's query syntax would be out of the scope of this guide. Please refer to the officialMongoDB documentationfor an in depth introduction to MongoDB. However, to give you an idea, let us modify the process one last time.

    In our example, the JSON documents in our MongoDB collection contain a field namedFuture Customer. We can specify a simple query criterion that requires the value of this field to beyes(changing the following example to match your own data should be straight forward).

  3. Navigate to the Design Perspective, select theRead MongoDB运营商和编辑操作参数namedcriteria.

  4. Enter the following short JSON document (the query criterion):

  5. RunRun Processthe process again. The result set should only contain examples where the value of the attributeFuture Customerisyes:

Write to MongoDB

Writing an example set to a MongoDB collection is easy: load the example set, convert it to a collection of JSON documents, and write it to MongoDB. The following example illustrates how to write one of RapidMiner Studio's sample data sets to a new MongoDB collection.

  1. Open a new processNew Process Iconin RapidMiner Studio.

  2. Drag theIrissample data set, theData to JSONoperator, and theWrite MongoDBoperator into theProcess视图和连接运营商所示符合lowing screen shot. Select your MongoDB connection and enter a name for the newcollection:

    Note that you can also select an existing collection. MongoDB would then add the new JSON documents to this collection, regardless of the structure of the documents (remember that MongoDB collection have no static schema).

  3. RunRun Processthe process! In the Result Perspective, you should see the collection of JSON documents that have been added to the specified MongoDB collection:

    Note that MongoDB automatically assigns unique IDs to newly added documents. As a consequence, running this process multiple times will result in duplicate entries.