Skip to main content

XGBoost

Synopsis

Wrapper for the XGBoost gradient boosting framework.

Description

The operator automatically selects the learning objective based on the training data. It will use logistic regression and regression with squared loss for categorical and regression problems respectively. The exact objective as well as all other hyper parameters are listed in the model description (result view).

XGBoost supports missing values but does not support categorical features out of the box. The operator converts categorical columns into one of the following two formats: if the column has at most two classes, the column is converted into a single numeric vector with 0 and 1 representing the negative and positive class respectively. If the column has no Boolean mapping, the class with the higher index is assumed to be the positive class. Missing values are encoded as such.

If the column has more than two classes, a modified one-hot encoding is applied: class vectors are encoded using missing values and the value 1 instead of the more common 0 and 1. In other words, they are encoded as unary instead of binary features.

The operator currently exposes almost all XGBoost hyper parameters. Seehttps://xgboost.readthedocs.io/en/latest/parameter.htmlfor details. Take not that some parameters can only be defined in the list ofexpert parameters.

Input

training

The training data set.

validation

The validation data set used for early stopping (optional). This port is only available when using the early stopping mode 'custom'.

Output

model

The XGBoost model.

exampleSet

The unmodified training data set.