Generalized Sequential Patterns
Synopsis
This operator searches sequential patterns in a set of transactions using the GSP (Generalized Sequential Pattern) algorithm. GSP is a popular algorithm used for sequence mining.
Description
This operator searches sequential patterns in a set of transactions. The ExampleSet must contain one attribute for the time and one attribute for the customer. Moreover, each transaction must be encoded as a single example. The time and customer attributes are specified through thetime attributeandcustomer idparameters respectively. This pair of attributes is used for generating one sequence per customer containing every transaction ordered by the time of each transaction. The algorithm then searches sequential patterns in the form of: If a customer bought item 'a' and item 'c' in one transaction, he bought item 'b' in the next. This pattern is represented in this form:<a, c>then<b>. The minimal support describes how many customer must support such a pattern for regarding it as frequent. Infrequent patterns will be dropped. A customer supports such a pattern, if there are some parts of his sequence that includes that pattern. The above pattern would be supported by a customer, for example, with transactions:<s, g>then<a, s, c>then<b>then<f, h>. The minimum support criteria is specified through themin supportparameter.
Themin gap,max gapandwindow sizeparameters determine how transactions are handled. For example, if the above customer forgot to buy item 'c', and had to return 5 minutes later to buy it, then his transactions would look like:<s, g>then<a, s>then<c>then<b>then<f, h>. This would not support the pattern<a, c>then<b>. To avoid this problem, the window size determines, how long a subsequent transaction is treated as the same transaction. If the window size is larger than 5 minutes then<c>would be treated as being part of the second transaction and hence this customer would support the above pattern. Themax gapparameter causes a customers sequence not to support a pattern, if the transactions containing this pattern are too widely separated in time. Themin gapparameter does the same if they are too near.
这technique overcomes some crucial drawbacks of existing mining methods, for example:
- absence of time constraints: This drawback is overcome by themin gapandmax gapparameters.
- rigid definition of a transaction: This drawback is overcome by the sliding time window.
请注意,所有属性(除了客户nd time attributes) of the given ExampleSet should be binominal, i.e. nominal attributes with only two possible values. If your ExampleSet does not satisfy this condition, you may use appropriate preprocessing operators to transform it into the required form. The discretization operators can be used for changing the value of numerical attributes to nominal attributes. Then the Nominal to Binominal operator can be used for transforming nominal attributes into binominal attributes.
Please note that the sequential patterns are mined for the positive entries in your ExampleSet, i.e. for those nominal values which are defined as positive in your ExampleSet. If your data does not specify the positive entries correctly, you may set them using thepositive valueparameter. This only works if all your attributes contain this value.
Input
example set
This input port expects an ExampleSet. Please make sure that all attributes (except customer and time attributes) of the ExampleSet are binominal.
Output
example set
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
patterns
The GSP algorithm is applied on the given ExampleSet and the resultant set of sequential patterns is delivered through this port.
Parameters
Customer id
This parameter specifies the name of the attribute that will be used for identifying the customers.
Time attribute
This parameter specifies the name of the numerical attribute that specifies the time of a transaction.
Min support
Prune patterns that are supported by less thanmin supportpercentage of the customers.
窗口大小
The time window within successive transactions will be additional handled as a single transaction.
Max gap
Themax gapparameter causes a customers sequence not to support a pattern, if the transactions containing this pattern are too widely separated in time.
Min gap
Themin gapparameter causes a customers sequence not to support a pattern, if the transactions containing this pattern are too near in time.
Positive value
This parameter determines which value of the binominal attributes should be treated as positive. The attributes with this value in an example are considered to be part of that transaction.