Read Sparse
Synopsis
This operator is used for reading files written in sparse formats.
Description
This operator reads sparse format files. The lines of a sparse file have the form:
label index:value index:value index:value...
Whereindexmay be an integer (starting with 1) for the regular attributes or one of the prefixes specified by theprefix mapparameter. The following formats are supported:
- xy format: The label is the last token in each line.
- yx format: The label is the first token in each line.
- prefix format: The label is prefixed by 'l:'
- 单独的文件格式:The label is read from a separate file specified by thelabel fileparameter.
- no label: The ExampleSet is unlabeled.
Output
output
This port delivers the required file in tabular form along with the meta data. This output is similar to the output of the Retrieve operator.
Parameters
Format
This parameter specifies the format of the sparse data file.
Attribute description file
The name of the attribute description file is specified here. An attribute description file (extension: .aml) is required to retrieve meta data of the ExampleSet. This file is a simple XML document defining the properties of the attributes (like their name and range) and their source files. The data may be spread over several files. This file also contains the names of the files to read the data from. Therefore, the actual data files do not have to be specified as a parameter of this operator.
Data file
This parameter specifies the name of the data file. It is necessary if it is not specified in the attribute description file.
Label file
This parameter specifies the name of the file containing the labels. It is necessary if theformatparameter is set to 'format separate file'
Dimension
This parameter specifies the dimension of the example space. It is necessary if theattribute description fileparameter is not set.
Sample size
This parameter specifies the maximum number of examples which should be read. If it is set to -1, then all examples are read.
Use quotes
This parameter indicates if quotes should be regarded. If this option is set to true, thequotes characterparameter can be used for specifying the quotes character.
Quotes character
This parameter defines thequotes character.
Datamanagement
This parameter determines how the data is represented internally. This is an expert parameter. There are different options, users can choose any of them.
Decimal point character
This character is used as the decimal character.
Prefix map
This parameter maps prefixes to names of special attributes.
Encoding
This is an expert parameter. A long list of encoding is provided; users can select any one of them.