Skip to main content

Read XML

Synopsis

This operator is used for reading an XML file.

Description

This operator can read XML files, where examples are represented by elements which match a given XPath and features are attributes and text-content of each element and its sub-elements.

This operator tries to determine an appropriate type of the attributes by reading the first few elements and checking the occuring values. If all values are integers, the attribute will become integer, if real numbers occur, it will be of type real. Columns containing values which can't be interpreted as numbers will be nominal, as long as they don't match the date and time pattern of thedate format参数。如果他们这样做,这个属性将汽车matically parsed as date and the according feature will be of type date.

Input

file

An XML file is expected as a file object which can be created with other operators with file output ports like the Read File operator.

Output

output

This port delivers the XML file in tabular form along with the meta data. This output is similar to the output of the Retrieve operator.

Parameters

Parse numbers

Specifies whether numbers are parsed or not.

Decimal character

This character is used as the decimal character.

Grouped digits

This option decides whether grouped digits should be parsed or not. If this option is set to true, agrouping characterparameter should be specified.

Grouping character

This character is used as the grouping character. If this character is found between numbers, the numbers are combined and this character is ignored. For example if "22-14" is present in the CSV file and "-" is set asgrouping character, then "2214" will be stored.

Infinity string

This parameter can be set to parse a specific infinity representation (e.g. "Infinity"). If it is not set, the local specific infinity representation will be used.

Date format

The date and time format is specified here. Many predefined options exist; users can also specify a new format. If text in a CSV file column matches this date format, that column is automatically converted todatetype. Some corrections are automatically made indatetype values. For example a value '32-March' will automatically be converted to '1-April'. Columns containing values which can't be interpreted as numbers will be interpreted as nominal, as long as they don't match the date and time pattern of thedate format参数。If they do, this column of the CSV file will be automatically parsed asdateand the according attribute will be ofdatetype.

First row as names

If this option is set to true, it is assumed that the first line of the CSV file has the names of the attributes. Then the attributes are automatically named and first line of the CSV file is not treated as a data line.

Annotations

If first row as names is not set to true, annotations can be added using the 'Edit List' button of this parameter which opens a new menu. This menu allows you to select any row and assign an annotation to it.Name,CommentandUnitannotations can be assigned. If row 0 is assigned aNameannotation, it is equivalent to setting thefirst row as namesparameter to true. If you want to ignore any rows you can annotate them asComment. Remember row number in this menu does not count commented lines.

Time zone

这是一个专家帕拉meter. A long list of time zones is provided; users can select any of them.

Locale

这是一个专家帕拉meter. A long list of locales is provided; users can select any of them.

Read all values as polynominal

This option allows you to disable the type handling for this operator. Every xpath entry will be read as a polynominal attribute.

Data set meta data information

This option is an important one. It allows you to adjust the meta data of the CSV file. Column index, name, type and role can be specified here. The Read CSV operator tries to determine an appropriate type of the attributes by reading the first few lines and checking the occurring values. If all values are integers, the attribute will become an integer. Similarly if all values are real numbers, the attribute will become of type real. Columns containing values which can't be interpreted as numbers will be interpreted as nominal, as long as they don't match the date and time pattern of thedate format参数。If they do, this column of the CSV file will be automatically parsed as date and the according attribute will be of typedate. Automatically determined types can be overridden using this parameter.

Read not matching values as missings

If this value is set to true, values that do not match with the expected value type are considered as missing values and are replaced by '?'. For example if 'abc' is written in an integer column, it will be treated as a missing value. A question mark (?) in the CSV file is also read as a missing value.

Datamanagement

这是一个专家帕拉meter. A long list is provided; users can select any option from this list.