Read Excel
Synopsis
This operator reads an ExampleSet from the specified Excel file.
Description
This operator can be used to load data from Microsoft Excel spreadsheets. This operator is able to read data from Excel 95, 97, 2000, XP, and 2003. The user has to define which of the spreadsheets in the workbook should be used as data table. The table must have a format such that each row is an example and each column represents an attribute. Please note that the first row of the Excel sheet might be used for attribute names which can be indicated by a parameter. The data table can be placed anywhere on the sheet and can contain arbitrary formatting instructions, empty rows and empty columns. Missing data values in Excel should be indicated by empty cells or by cells containing only "?".
For complete understanding of this operator read the parameters section. The easiest and shortest way to import an Excel file is to use theimport configuration wizardfrom the Parameters panel. The best way, which may require some extra effort, is to first set all the parameters in the Parameters panel and then use thewizard。Please make sure that the Excel file is read correctly before building a process using it.
Input
file
An Excel file is expected as a file object which can be created with other operators with file output ports like the Read File operator.
Output
output
This port delivers the Excel file in tabular form along with the meta data. This output is similar to the output of the Retrieve operator.
Parameters
Import configuration wizard
This option allows you to configure this operator by means of a wizard. This user-friendly wizard makes the use of this operator easy.
Excel file
The path of the Excel file is specified here. It can be selected using thechoose a filebutton.
Sheet selection
This option allows you to change the sheet selection between sheet number and sheet name.
Sheet number
The number of the sheet which you want to import should be specified here.
Sheet name
The name of the sheet which you want to import should be specified here.
Imported cell range
This is a mandatory parameter. The range of cells to be imported from the specified sheet is given here. It is specified in 'xm:yn' format where 'x' is the column of the first cell of range, 'm' is the row of the first cell of range, 'y' is the column of the last cell of range, 'n' is the row of the last cell of range. 'A1:E10' will select all cells of the first five columns from row 1 to 10.
First row as names
If this option is set to true, it is assumed that the first line of the Excel file has the names of attributes. Then the attributes are automatically named and the first line of Excel file is not treated as a data line.
Annotations
If thefirst row as namesparameter is not set to true, annotations can be added using the 'Edit List' button of this parameter which opens a new menu. This menu allows you to select any row and assign an annotation to it.Name,CommentandUnit注释可以被指定。如果行0分配Nameannotation, it is equivalent to setting thefirst row as namesparameter to true. If you want to ignore any rows you can annotate them asComment。
Date format
The date and time format is specified here. Many predefined options exist; users can also specify a new format. If text in an Excel file column matches this date format, that column is automatically converted todatetype. Some corrections are automatically made in thedatetype values. For example a value '32-March' will automatically be converted to '1-April'. Columns containing values which can't be interpreted as numbers will be interpreted as nominal, as long as they don't match the date and time pattern of thedate formatparameter. If they do, this column of the Excel file will be automatically parsed asdateand the according attribute will be ofdatetype.
Time zone
This is an expert parameter. A long list of time zones is provided; users can select any of them.
Locale
This is an expert parameter. A long list of locales is provided; users can select any of them.
Read all values as polynominal
This option allows you to disable the type handling for this operator. Every column will be read as a polynominal attribute. To parse an excel date afterwards use 'date_parse(86400000 * (parse(date_attribute) - 25569))' (- 24107 for Mac Excel 2007) in the Generate Attributes operator.
Data set meta data information
This option is an important one. It allows you to adjust the meta data of the ExampleSet created from the specified Excel file.Column index,name,typeandrolecan be specified here. The Read Excel operator tries to determine an appropriate type of the attributes by reading the first few lines and checking the occurring values. If all values are integers, the attribute will become aninteger。同样如果所有值al numbers, the attribute will become of typereal。Columns containing values which can't be interpreted as numbers will be interpreted as nominal, as long as they don't match the date and time pattern of thedate formatparameter. If they do, this column of the Excel file will be automatically parsed asdateand the according attribute will be of typedate。Automatically determined types can be overridden using this parameter.
Read not matching values as missings
If this value is set to true, values that do not match with the expected value type are considered as missing values and are replaced by '?'. For example if 'abc' is written in anintegercolumn, it will be treated as a missing value. A question mark (?) or an empty cell in the Excel file is also read as a missing value.
Data management
This is an expert parameter. A long list is provided; users can select any option from this list.