Understanding Association Rule Mining
Key Terms and Metrics
Association rule mining is a technique used in data mining to uncover relationships between variables in large datasets. It involves identifying patterns or associations between items in a transactional database. Association rule mining has various applications, such as market basket analysis, customer segmentation, and web mining.
It is crucial to be familiar with some key terms and metrics. We will introduce some, including antecedents, consequents, support, confidence, lift, leverage, conviction, and Zhang’s metric.
Antecedents and Consequents
In association rule mining, we refer to the items that occur before the [if] part of a rule as antecedents and the items that occur after the [then] part of the rule as consequents. For example, consider the rule [If a customer buys bread, then the customer is likely to buy butter.] In this rule, [bread] is the antecedent, and [butter] is the consequent.
Support
Support is a metric that measures the frequency of a particular item set in the dataset. It’s the proportion of transactions in the database that contain the itemset. For example, if we have a dataset of 100 transactions, and the itemset {bread, butter} occurs in 30 transactions, then the support of {bread, butter} is 30%.
Confidence
Confidence is a metric that measures the reliability of the association rule. It’s the proportion of transactions containing the antecedent that also contains the consequent. For example, if the antecedent [bread] appears in 50 transactions, and out of these, [butter] appears in 30 transactions, then the confidence of the rule [If a customer buys bread, then the customer is likely to buy butter] is 60% (i.e., 30/50).
Lift
Lift is a metric that measures the strength of the association between the antecedent and the consequent. It is defined as the ratio of the observed support of the itemset to the expected support if the antecedent and the consequent were independent. A lift value of 1 indicates that the antecedent and the consequent are independent, while a value greater than 1 indicates a positive association and a value less than 1 indicates a negative association.
Leverage
Leverage is a metric that measures the difference between the observed frequency of the itemset and the expected frequency if the antecedent and the consequent were independent. It is defined as the difference between the observed support of the itemset and the expected support of the itemset if the antecedent and the consequent were independent. A leverage value of 0 indicates independence, a positive value indicates a positive association and a negative value indicates a negative association.
Conviction
Conviction is a metric that measures the degree of dependence between the antecedent and the consequent. It is defined as the ratio of the expected frequency of the consequent if the antecedent were not present to the observed frequency. A conviction value of 1 indicates independence, a value greater than 1 indicates a positive association and a value less than 1 indicates a negative association.
Zhang’s Metric
Zhang’s metric is a metric that combines the strengths of lift and conviction. It’s the product of the lift and the conviction of the rule. A higher value of Zhang’s metric indicates a stronger association between the antecedent and the consequent.
In conclusion, association rule mining is a powerful technique for discovering hidden patterns in large datasets. The metrics discussed in this article provide a way to measure them.