Skip to main content

Embedding Id to Text

Synopsis

Maps embedding "id-s" into their respective "tokens" using an embedding model.

Description

This operator utilizes a pre-trained embedding model and transforms its input. The output will hold tokens that belong to the id-s in the input data set.

Input

example set

Input ExampleSet. Contains column that holds the id-s which are to be mapped by this operator.

Output

example set

Transformed input. Id column values mapped into the respective tokens.

original example set

Original input data (~ throughput).

Parameters

Model type

Select the supported model types for embedding.

  • WORD2VEC: Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located close to one another in the space.
  • STATIC_MODEL: A static model is a model for distributed word representation. Select this option, if the representation is stored in a txt file with one token as a string per line, followed by the (decimal, english notation) numbers of its representation separated by spaces. E.g.: word 0.123 0.432 0.445 as one line in a txt file. A prime example is the structure of the GloVe, coined from Global Vectors, model. The model is an unsupervised learning algorithm for obtaining vector representations for words. This is achieved by mapping words into a meaningful space where the distance between words is related to semantic similarity. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Embedding model file

Provide path to the model file that contains the embedding weights/parameters.

Id属性

Select the attribute that contains the id-s to be mapped into tokens.