top of page

ASSOCIATION RULE MINING

DATA PREPARATION

 

For Associate Rule Mining, the dataset needs to be a transactional dataset. The current dataset consists of textual data containing the columns 'text' and 'abstract'.  This first step is to convert the textual data into transactional data. Data preparation of this dataset includes looping through each row of the dataset and extracting unique items from  'text' column. The dataset is first converted to lowercase, punctuations and numbers are removed from the data. Common English stop words are also removed. Extra whitespaces are stripped from the data and stemming/lemmatization is performed on this dataset to reduce words to their base forms. Empty strings are removed and the items are added to the transactional list. This list is converted to transactional dataset which is suitable for performing associate rule mining. 

Sample data before data preparation

image.png

Sample data converted to transaction data for Associate Rule Mining

Screenshot 2024-02-28 at 2.26.43 AM.png

CODE

 

Code for ARM- Apriori Algorithm implementation in R can be found here.

​

​

​

bottom of page