
CLUSTERING

Sample data for clustering after data preparation
DATA PREPARATION
For k-means and hierarchical clustering, unlabelled numeric data is required. Among the three different datasets that are available, the dataset consisting of research documents can be chosen for performing clustering. Since the dataset mostly consists of textual data, numerical features like title_length, abstract_length, can be considered for clustering. The sample of data that will be used for clustering can be found on the right. The sample of data for clustering can also be found in the image below
Sample data before data preparation


Visualization of sample data
CODE
Code for K Means Clustering in Python can be found here.
​
​
Code for Hierarchical Clustering in R can be found here.
