Skip to main content

Read Sparse

Synopsis

This operator is used for reading files written in sparse formats.

Description

This operator reads sparse format files. The lines of a sparse file have the form:

label index:value index:value index:value...

Whereindexmay be an integer (starting with 1) for the regular attributes or one of the prefixes specified by theprefix mapparameter. The following formats are supported:

  • xy format: The label is the last token in each line.
  • yx format: The label is the first token in each line.
  • prefix format: The label is prefixed by 'l:'
  • 单独的文件格式:The label is read from a separate file specified by thelabel fileparameter.
  • no label: The ExampleSet is unlabeled.

Output

output

This port delivers the required file in tabular form along with the meta data. This output is similar to the output of the Retrieve operator.

Parameters

Format

This parameter specifies the format of the sparse data file.

Attribute description file

The name of the attribute description file is specified here. An attribute description file (extension: .aml) is required to retrieve meta data of the ExampleSet. This file is a simple XML document defining the properties of the attributes (like their name and range) and their source files. The data may be spread over several files. This file also contains the names of the files to read the data from. Therefore, the actual data files do not have to be specified as a parameter of this operator.

Data file

This parameter specifies the name of the data file. It is necessary if it is not specified in the attribute description file.

Label file

This parameter specifies the name of the file containing the labels. It is necessary if theformatparameter is set to 'format separate file'

Dimension

This parameter specifies the dimension of the example space. It is necessary if theattribute description fileparameter is not set.

Sample size

This parameter specifies the maximum number of examples which should be read. If it is set to -1, then all examples are read.

Use quotes

This parameter indicates if quotes should be regarded. If this option is set to true, thequotes characterparameter can be used for specifying the quotes character.

Quotes character

This parameter defines thequotes character.

Datamanagement

This parameter determines how the data is represented internally. This is an expert parameter. There are different options, users can choose any of them.

Decimal point character

This character is used as the decimal character.

Prefix map

This parameter maps prefixes to names of special attributes.

Encoding

This is an expert parameter. A long list of encoding is provided; users can select any one of them.