Skip to main content

Read Excel

Synopsis

This operator reads an ExampleSet from the specified Excel file.

Description

This operator can be used to load data from Microsoft Excel spreadsheets. This operator is able to read data from Excel 95, 97, 2000, XP, and 2003. The user has to define which of the spreadsheets in the workbook should be used as data table. The table must have a format such that each row is an example and each column represents an attribute. Please note that the first row of the Excel sheet might be used for attribute names which can be indicated by a parameter. The data table can be placed anywhere on the sheet and can contain arbitrary formatting instructions, empty rows and empty columns. Missing data values in Excel should be indicated by empty cells or by cells containing only "?".

For complete understanding of this operator read the parameters section. The easiest and shortest way to import an Excel file is to use theimport configuration wizardfrom the Parameters panel. The best way, which may require some extra effort, is to first set all the parameters in the Parameters panel and then use thewizard。Please make sure that the Excel file is read correctly before building a process using it.

Input

file

An Excel file is expected as a file object which can be created with other operators with file output ports like the Read File operator.

Output

output

This port delivers the Excel file in tabular form along with the meta data. This output is similar to the output of the Retrieve operator.

Parameters

Import configuration wizard

This option allows you to configure this operator by means of a wizard. This user-friendly wizard makes the use of this operator easy.

Excel file

The path of the Excel file is specified here. It can be selected using thechoose a filebutton.

Sheet selection

This option allows you to change the sheet selection between sheet number and sheet name.

Sheet number

The number of the sheet which you want to import should be specified here.

Sheet name

The name of the sheet which you want to import should be specified here.

Imported cell range

This is a mandatory parameter. The range of cells to be imported from the specified sheet is given here. It is specified in 'xm:yn' format where 'x' is the column of the first cell of range, 'm' is the row of the first cell of range, 'y' is the column of the last cell of range, 'n' is the row of the last cell of range. 'A1:E10' will select all cells of the first five columns from row 1 to 10.

First row as names

If this option is set to true, it is assumed that the first line of the Excel file has the names of attributes. Then the attributes are automatically named and the first line of Excel file is not treated as a data line.

Annotations

If thefirst row as namesparameter is not set to true, annotations can be added using the 'Edit List' button of this parameter which opens a new menu. This menu allows you to select any row and assign an annotation to it.Name,CommentandUnit注释可以被指定。如果行0分配Nameannotation, it is equivalent to setting thefirst row as namesparameter to true. If you want to ignore any rows you can annotate them asComment

Date format

The date and time format is specified here. Many predefined options exist; users can also specify a new format. If text in an Excel file column matches this date format, that column is automatically converted todatetype. Some corrections are automatically made in thedatetype values. For example a value '32-March' will automatically be converted to '1-April'. Columns containing values which can't be interpreted as numbers will be interpreted as nominal, as long as they don't match the date and time pattern of thedate formatparameter. If they do, this column of the Excel file will be automatically parsed asdateand the according attribute will be ofdatetype.

Time zone

This is an expert parameter. A long list of time zones is provided; users can select any of them.

Locale

This is an expert parameter. A long list of locales is provided; users can select any of them.

Read all values as polynominal

This option allows you to disable the type handling for this operator. Every column will be read as a polynominal attribute. To parse an excel date afterwards use 'date_parse(86400000 * (parse(date_attribute) - 25569))' (- 24107 for Mac Excel 2007) in the Generate Attributes operator.

Data set meta data information

This option is an important one. It allows you to adjust the meta data of the ExampleSet created from the specified Excel file.Column index,name,typeandrolecan be specified here. The Read Excel operator tries to determine an appropriate type of the attributes by reading the first few lines and checking the occurring values. If all values are integers, the attribute will become aninteger。同样如果所有值al numbers, the attribute will become of typereal。Columns containing values which can't be interpreted as numbers will be interpreted as nominal, as long as they don't match the date and time pattern of thedate formatparameter. If they do, this column of the Excel file will be automatically parsed asdateand the according attribute will be of typedate。Automatically determined types can be overridden using this parameter.

Read not matching values as missings

If this value is set to true, values that do not match with the expected value type are considered as missing values and are replaced by '?'. For example if 'abc' is written in anintegercolumn, it will be treated as a missing value. A question mark (?) or an empty cell in the Excel file is also read as a missing value.

Data management

This is an expert parameter. A long list is provided; users can select any option from this list.