Overview | Data Quality Classif

WhatsApp Image 2024-02-27 at 03.29.55.jpeg

ASSOCIATION RULE MINING

OVERVIEW

Association Rule Mining is a data mining technique used to discover interesting relationships or associations among a set of items in large datasets. These relationships are often represented in the form of rules, which provide insights into the co-occurrence patterns of items in the data. In the context of text summarization, ARM can be applied to discover associations between words or phrases in a document. The items in ARM could represent terms, and rules can indicate co-occurrence patterns.

Measures in ARM:

Support: It measures the frequency of occurrence of a set of items in the dataset. Higher support indicates a stronger presence of the itemset. In text summarization, support can represent the frequency of a term or phrase in the document. Higher support may indicate the importance of a term in the context of the document.
Confidence: It measures the reliability or trustworthiness of the rule. It is the conditional probability of finding the consequent in a transaction given that the transaction contains the antecedent. In text summarization dataset, it may indicate how often a certain term contributes to the overall meaning of the document.
Lift: It measures how much more likely the consequent is to be observed in transactions containing the antecedent compared to its expected occurrence by chance. A lift value greater than 1 indicates a positive correlation. Lift, in this context, can represent the significance of the association between terms in the summarization process. It can indicate whether the co-occurrence of terms is more meaningful than expected by chance.

Rules: In ARM, rules are statements that assert a relationship between sets of items. They typically have the form "If {A} then {B}", where A is the antecedent and B is the consequent (outcome).

Apriori Algorithm:

The Apriori algorithm is a popular and classic algorithm for Association Rule Mining. It works in two main steps:

Generate frequent itemsets:
- Start with individual items as 1-itemsets.
- Iteratively generate candidate k-itemsets by joining the (k-1)-itemsets.
- Prune the candidate itemsets that do not meet the minimum support threshold.
Generate association rules:
- Create rules from the frequent itemsets.
- Prune rules based on the minimum confidence threshold.