

ASSOCIATION RULE MINING
OVERVIEW
​
Association Rule Mining is a data mining technique used to discover interesting relationships or associations among a set of items in large datasets. These relationships are often represented in the form of rules, which provide insights into the co-occurrence patterns of items in the data. In the context of text summarization, ARM can be applied to discover associations between words or phrases in a document. The items in ARM could represent terms, and rules can indicate co-occurrence patterns.



Measures in ARM:
-
Support: It measures the frequency of occurrence of a set of items in the dataset. Higher support indicates a stronger presence of the itemset. In text summarization, support can represent the frequency of a term or phrase in the document. Higher support may indicate the importance of a term in the context of the document.
-
Confidence: It measures the reliability or trustworthiness of the rule. It is the conditional probability of finding the consequent in a transaction given that the transaction contains the antecedent. In text summarization dataset, it may indicate how often a certain term contributes to the overall meaning of the document.
-
Lift: It measures how much more likely the consequent is to be observed in transactions containing the antecedent compared to its expected occurrence by chance. A lift value greater than 1 indicates a positive correlation. Lift, in this context, can represent the significance of the association between terms in the summarization process. It can indicate whether the co-occurrence of terms is more meaningful than expected by chance.
Rules: In ARM, rules are statements that assert a relationship between sets of items. They typically have the form "If {A} then {B}", where A is the antecedent and B is the consequent (outcome).​
Apriori Algorithm:
The Apriori algorithm is a popular and classic algorithm for Association Rule Mining. It works in two main steps:
-
Generate frequent itemsets:
-
Start with individual items as 1-itemsets.
-
Iteratively generate candidate k-itemsets by joining the (k-1)-itemsets.
-
Prune the candidate itemsets that do not meet the minimum support threshold.
-
-
Generate association rules:
-
Create rules from the frequent itemsets.
-
Prune rules based on the minimum confidence threshold.
-
