You are viewing the RapidMiner Studio documentation for version 9.7 -Check here for latest version
Read Database(JDBC Connectors)
Synopsis
This operator reads an ExampleSet from a SQL database.Description
The Read Database operator is used for reading an ExampleSet from the specified SQL database. You need to have at least basic understanding of databases, database connections and queries in order to use this operator properly. Go through the parameters and Example Process to understand the flow of this operator.
When this operator is executed, the table delivered by the query will be copied into the memory of your computer. This will give all subsequent operators a fast access on the data. Even learning schemes like the Support Vector Machine with their high number of random accesses will run fast.
The javaResultSetMetaDatainterface does not provide information about the possible values of nominal attributes. The internal indices the nominal values are mapped to, will depend on the ordering they appear in the table. This may cause problems only when processes are split up into a training process and a testing process. This is not a problem for learning schemes which are capable of handling nominal attributes. If a learning scheme like the SVM is used with nominal data, RapidMiner pretends that nominal attributes are numerical and uses indices for the nominal values as their numerical value. The SVM may perform well if there are only two possible values. If a test set is read in another process, the nominal values may be assigned different indices, and hence the SVM trained is useless. This is not a problem for the label attributes, since the classes can be specified using theclasses参数,因此所有的学习计划to use with nominal data are safe to use. You might avoid this problem if you first combine both ExampleSets using the Append operator and then split it again using two Filter Examples operators.
Differentiation
Execute SQL
The Read Database operator is used for loading data from a database into RapidMiner. The Execute SQL operator cannot be used for loading data from databases. It can be used for executing SQL statements like CREATE or ADD etc on the database.Input
- connection(Connection)
This input port expects a Connection object if any. See the parameter connection entry for more information.
Output
- output(IOObject)
This port delivers the result of the query on database in tabular form along with the meta data. This output is similar to the output of the Retrieve operator.
- connection(Connection)
This output port delivers the Connection object from the input port. If the input port is not connected the port delivers nothing.
Parameters
- define_connectionThis parameter indicates how the database connection should be specified. It gives you four options: repository, predefined, url and jndi. The parameter is not visible if theconnectioninput port is connected.Range: selection
- connection_entryThis parameter is only available when thedefine connectionparameter is set torepository.This parameter is used to specify a repository location that represents a connection entry. The connection can also be provided using theconnectioninput port.Range: string
- connectionThis parameter is only available when thedefine connectionparameter is set topredefined.This parameter is used to connect to the database using a predefined connection. You can have many predefined connections. You can choose one of them using the drop down box. You can add a new connection or modify previous connections using the button next to the drop down box. You may also accomplish this by clicking on theManage Database Connections...from theToolsmenu in the main window. A new window appears. This window asks for several details e.g.Host, Port, Database system, schema, usernameandpassword.TheTestbutton in this new window will allow you to check whether the connection can be made. Save the connection once the test is successful. After saving a new connection, it can be chosen from the drop down box of theconnectionparameter. You need to have basic understanding of databases for configuring a connection.Range: string
- database_systemThis parameter is only available when thedefine connectionparameter is set tourl.This parameter is used to select the database system in use. It can have one of the following values: MySQL, PostgreSQL, Sybase, HSQLDB, ODBC Bridge (e.g. Access), Microsoft SQL Server (JTDS), Ingres, Oracle.Range: selection
- database_urlThis parameter is only available when thedefine connectionparameter is set tourl.This parameter is used to define the URL connection string for the database, e.g. 'jdbc:mysql://foo.bar:portnr/database'.Range: string
- usernameThis parameter is only available when thedefine connectionparameter is set tourl.This parameter is used to specify the username of the database.Range: string
- passwordThis parameter is only available when thedefine connectionparameter is set tourl.This parameter is used to specify the password of the database.Range: string
- jndi_nameThis parameter is only available when thedefine connectionparameter is set tojndi.This parameter is used to give the JNDI a name for a data source.Range: string
- define_queryQuery is a statement that is used to select required data from the database. This parameter specifies whether the database query should be defined directly, through a file or implicitly by a given table name. The SQL query can be auto generated giving a table name, passed to RapidMiner via a parameter or, in case of long SQL statements, in a separate file. The desired behavior can be chosen using thedefine queryparameter. Please note that column names are often case sensitive and might need quoting.Range: selection
- queryThis parameter is only available when thedefine queryparameter is set toquery.This parameter is used to define the SQL query to select desired data from the specified database.Range: string
- query_fileThis parameter is only available when thedefine queryparameter is set toquery file.这个参数是用来选择一个文件,反对tains the SQL query to select desired data from the specified database. Long queries are usually stored in files. Storing queries in files can also enhance reusability.Range: filename
- table_nameThis parameter is only available when thedefine queryparameter is set totable name.This parameter is used to select the required table from the specified database.Range: string
- prepare_statementIf checked, the statement is prepared, and '?' can be filled in using theparametersparameter.Range: boolean
- parametersParameters to insert into '?' placeholders when statement is prepared.Range: enumeration
Tutorial Processes
Reading ExampleSet from a mySQL database
The Read Database operator is used to read a mySQL database. The define connection parameter is set to predefined. The define connection parameter was configured using the button next to the drop down box. The name of the connection was set to 'mySQLconn'. The following values were set in the connection parameter's wizard. The Database system was set to 'mySQL'. The Host was set to 'localhost'. The Port was set to '3306'. The Database scheme was set to 'golf'; this is the name of the database. The User was set to 'root'. No password was provided. You will need a password if your database is password protected. Set all the values and test the connection. Make sure that the connection works.
The define query parameter was set to 'table name'. The table name parameter was set to 'golf_table' which is the name of the required table in the 'golf' database. Run the process, you will see the entire 'golf_table' in the Results Workspace. The define query parameter is set to 'table name' if you want to read an entire table from the database. You can also read a selected portion of the database by using queries. Set the define query parameter to 'query' and specify a query in the query parameter. One sample query is already defined in this example. This query reads only those examples from 'golf_table' where the 'Outlook' attribute has the value 'sunny'.