top of page
MicrosoftTeams-image (5).png
613158262dde4943a51937aa_data visualization tips_blog hero.jpg

VISUALIZATIONS

01

HISTOGRAM OF TEXT LENGTHS - JOURNALS DATASET

The Histogram on the right, displays the frequency of text length for the journals dataset. We can see that text length is between 100 to 200 and most of the records have a text length between 700- 1000.

02

HISTOGRAM OF SUMMARY LENGTHS FOR JOURNALS DATASET

This Histogram shows the frequency of summary length for the journals dataset. We can see that summaries are anywhere between 30 to 200 characters and on an average summary length is anywhere between 60-100.

viz3.png
viz4.png

03

BAR CHART DEPICTING TOP 20 WORDS IN NEWS ARTICLE DATASET

The bar graph shows top 20 words used in the second dataset which contains news articles obtained from API. From the visualization we can see that words like "the", "chars", "to" are the top words used in the dataset.

viz2.png
viz5.png

04

HISTOGRAM OF TEXT LENGTH FOR AMAZON REVIEWS DATASET

The Histogram on the left shows the text lengths for the amazon reviews dataset. We can see that amazon reviews dataset has reviews of length between 0 to 1000 and maximum of the reviews are about 200 words in length

05

WORD CLOUD FOR TEXT - COMBINED DATASET

This is a word cloud for the text in the dataset containing all the 3 journal, news articles and reviews datasets. We can see that words like "model", "show", "one", "using" are prominent in the dataset

Screenshot 2024-02-06 at 6.24.20 PM.png
Screenshot 2024-02-06 at 6.24.32 PM.png

06

WORD CLOUD FOR SUMMARY - COMBINED DATASET

The word cloud on the left shows the highest prominent words in the summary column in the combined dataset. We can see that words like "model", "system", "good", "great" are in majority of the summary.

07

HISTOGRAM OF TEXT LENGTH BY SOURCE - COMBINED DATASET

The Histogram on the right shows the frequency of text length in the combined dataset distinguished by the source of origin of the dataset. We can see how the text length varies for each dataset. Journals constitute larger text lengths and reviews mainly constitute smaller text length.

Screenshot 2024-02-06 at 6.24.58 PM.png
Screenshot 2024-02-07 at 12.30.39 AM.png

08

HISTOGRAM OF SUMMARY LENGTH BY SOURCE - COMBINED DATASET

The Histogram on the left shows the frequency of summary length in the combined dataset distinguished by the source of origin of the dataset. We can see how the summary length varies for each dataset. The graph shows the that journal dataset has majority and lengthier summary lengths when compared to other datasets and reviews have shorter summary lengths.

09

PAIRPLOT FOR MULTIVARIANT ANALYSIS

The pair plot on the right shows the multivariate analysis for the 3 datasets. Parameters for comparison is the summary length and text length

Screenshot 2024-02-07 at 12.33.37 AM.png
bargraph.png

10

SOURCE DISTRIBUTION IN COMBINED DATASET

The bar graph on the left shows the distribution of different datasets based on the source of the datasets. We can see that majority of the records are from journals and news articles is the least in number.

bottom of page