top of page
MicrosoftTeams-image (5).png

CLUSTERING

1_o4PXxETRsZV_AjLhOxGk-g.png

Sample data for clustering after data preparation

DATA PREPARATION

 

For k-means and hierarchical clustering, unlabelled numeric data is required. Among the three different datasets that are available, the dataset consisting of research documents can be chosen for performing clustering. Since the dataset mostly consists of textual data, numerical features like title_length, abstract_length, can be considered for clustering. The sample of data that will be used for clustering can be found on the right. The sample of data for clustering can also be found in the image below

Sample data before data preparation

Screenshot 2024-02-28 at 8.34.31 PM.png
Screenshot 2024-02-28 at 2.54.34 AM.png

Visualization of sample data

CODE

 

Code for K Means Clustering in Python can be found here.

​

​

Code for Hierarchical Clustering in R can be found here.

Unknown-6.png
bottom of page