Extracting Frequent Using FP-Growth Algorithm

A Demonstration of Association Rule Mining

Mansoor Aldosari
2 min readApr 24, 2023
Photo by Christina Winter on Unsplash

The following code is an example from the mlxtend library user guide that demonstrates how to use the frequent_patterns module to extract frequent itemsets from a dataset using the FP-Growth algorithm in Python. The mlxtend library is a popular tool for performing association rule mining and other data analysis tasks.

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth


dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

frequent_itemsets = fpgrowth(df, min_support=0.6, use_colnames=True)
### alternatively:
#frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
#frequent_itemsets = fpmax(df, min_support=0.6, use_colnames=True)

frequent_itemsets

The code computes frequent itemsets using the FP-Growth algorithm from the mlxtend library. It takes a dataset containing lists of items, transforms it into a one-hot encoded format using the TransactionEncoder class from mlxtend, and then applies the fpgrowth function to extract frequent itemsets.

The min_support parameter specifies the minimum support threshold for an itemset to be considered frequent. In this case, a minimum support of 0.6. It means that an itemset must appear in at least 60% of the transactions to be considered frequent.

The resulting frequent_itemsets object is a pandas DataFrame containing the frequent itemsets and their corresponding support values. The use_colnames parameter is set to True, which means that the column names in the DataFrame correspond to the item names instead of the one-hot encoded columns.

Note that the code also includes alternative methods for computing frequent itemsets using the Apriori and FPMax algorithms from the mlxtend library, which is an alternative to FP-Growth depending on the specific requirements of the analysis.

--

--

No responses yet