FP-Growth (Deprecated)
Synopsis
This operator efficiently calculates all frequent itemsets from the given ExampleSet using the FP-tree data structure. It is compulsory that all attributes of the input ExampleSet should be binominal.
Description
In simple words, frequent itemsets are groups of items that often appear together in the data. It is important to know the basics of market-basket analysis for understanding frequent itemsets.
The market-basket model of data is used to describe a common form of a many-to-many relationship between two kinds of objects. On the one hand, we have items, and on the other we have baskets, also called 'transactions'. The set of items is usually represented as set of attributes. Mostly these attributes are binominal. The transactions are usually each represented as examples of the ExampleSet. When an attribute value is 'true' in an example; it implies that the corresponding item is present in that transaction. Each transaction consists of a set of items (an itemset). Usually it is assumed that the number of items in a transaction is small, much smaller than the total number of items i.e. in most of the examples most of the attribute values are 'false'. The number of transactions is usually assumed to be very large i.e. the number of examples in the ExampleSet is assumed to be large. The frequent-itemsets problem is that of finding sets of items that appear together in at least a threshold ratio of transactions. This threshold is defined by the 'minimum support' criteria. The support of an itemset is the number of times that itemset appears in the ExampleSet divided by the total number of examples. The 'Purchases' data set at "Samples/data/Purchases" in the repository of RapidMiner is an example of how transactions data usually look like.
The discovery of frequent itemsets is often viewed as the discovery of 'association rules', although the latter is a more complex characterization of data, whose discovery depends fundamentally on the discovery of frequent itemsets. Association rules are derived from the frequent itemsets. The FP-Growth operator finds the frequent itemsets and operators like the Create Association Rules operator uses these frequent itemsets for calculating the association rules.
This operator calculates all frequent itemsets from an ExampleSet by building a FP-tree data structure on the transaction data base. This is a very compressed copy of the data which in many cases fits into main memory even for large data bases. All frequent itemsets are derived from this FP-tree. Many other frequent itemset mining algorithms also exist e.g. the Apriori algorithm. A major advantage of FP-Growth compared to Apriori is that it uses only 2 data scans and is therefore often applicable even on large data sets.
Please note that the given ExampleSet should contain only binominal attributes, i.e. nominal attributes with only two different values. If your ExampleSet does not satisfy this condition, you may use appropriate preprocessing operators to transform it into the required form. The discretization operators can be used for changing the value of numerical attributes to nominal attributes. Then the Nominal to Binominal operator can be used for transforming nominal attributes into binominal attributes.
Please note that the frequent itemsets are mined for the positive entries in your ExampleSet, i.e. for those nominal values which are defined as positive in your ExampleSet. If your data does not specify the positive entries correctly, you may set them using thepositive valueparameter. This only works if all your attributes contain this value.
这个操作符有两种基本的工作模式:
- finding at least the specified number of itemsets with highest support without taking the 'min support' into account. This mode is available when thefind min number of itemsetsparameter is set to true. Then this operator finds the number of itemsets specified in themin number of itemsetsparameter. Themin supportparameter is ignored in this case.
- finding all itemsets with a support larger than the specified minimum support. The minimum support is specified through themin supportparameter. This mode is available when thefind min number of itemsetsparameter is set to false.
Input
example set
This input port expects an ExampleSet. Please make sure that all attributes of the ExampleSet are binominal.
Output
example set
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
频繁的sets
The frequent itemsets are delivered through this port. Operators like the Create Association Rules operator can use these frequent itemsets to generate association rules.
Parameters
Find min number of itemsets
If this parameter is set to true, this operator finds at least the specified number of itemsets with highest support without taking themin supportparameter into account. This operator finds (at least) the number of itemsets specified in themin number of itemsetsparameter. Themin supportparameter is ignored to some extent in this case. The minimal support is decreased automatically until the specified minimum number of frequent itemsets is found. The defined minimal support is lowered by 20 percent each time.
Min number of itemsets
This parameter is only available when thefind min number of itemsetsparameter is set true. This parameter specifies the minimum number of itemsets which should be mined.
Max number of retries
This parameter is only available when thefind min number of itemsetsparameter is set true. This parameter determines how many times the operator should lower the minimal support to find the minimal number of item sets. Each time the minimal support is lowered by 20 percent.
Positive value
This parameter determines which value of the binominal attributes should be treated as positive. The attributes with that value are considered as part of a transaction. If left blank, the ExampleSet determines which value is used.
Min support
The minimum support criteria is specified by this parameter. Please study the description of this operator for more information about minimum support.
Max items
This parameter specifies the upper bound for the length of the itemsets i.e. the maximum number of items in an itemset. If set to -1, this parameter imposes no upper bound.
Must contain
This parameter specifies the items that should be part of frequent itemsets. It is specified through a regular expression. If there is no specific item that you want to have in the frequent itemset, you can leave this blank.