

VISUALIZATIONS
01
HISTOGRAM OF TEXT LENGTHS - JOURNALS DATASET
The Histogram on the right, displays the frequency of text length for the journals dataset. We can see that text length is between 100 to 200 and most of the records have a text length between 700- 1000.
02
HISTOGRAM OF SUMMARY LENGTHS FOR JOURNALS DATASET
This Histogram shows the frequency of summary length for the journals dataset. We can see that summaries are anywhere between 30 to 200 characters and on an average summary length is anywhere between 60-100.


03
BAR CHART DEPICTING TOP 20 WORDS IN NEWS ARTICLE DATASET
The bar graph shows top 20 words used in the second dataset which contains news articles obtained from API. From the visualization we can see that words like "the", "chars", "to" are the top words used in the dataset.


04
HISTOGRAM OF TEXT LENGTH FOR AMAZON REVIEWS DATASET
The Histogram on the left shows the text lengths for the amazon reviews dataset. We can see that amazon reviews dataset has reviews of length between 0 to 1000 and maximum of the reviews are about 200 words in length
05
WORD CLOUD FOR TEXT - COMBINED DATASET
This is a word cloud for the text in the dataset containing all the 3 journal, news articles and reviews datasets. We can see that words like "model", "show", "one", "using" are prominent in the dataset


06
WORD CLOUD FOR SUMMARY - COMBINED DATASET
The word cloud on the left shows the highest prominent words in the summary column in the combined dataset. We can see that words like "model", "system", "good", "great" are in majority of the summary.
07
HISTOGRAM OF TEXT LENGTH BY SOURCE - COMBINED DATASET
The Histogram on the right shows the frequency of text length in the combined dataset distinguished by the source of origin of the dataset. We can see how the text length varies for each dataset. Journals constitute larger text lengths and reviews mainly constitute smaller text length.


08
HISTOGRAM OF SUMMARY LENGTH BY SOURCE - COMBINED DATASET
The Histogram on the left shows the frequency of summary length in the combined dataset distinguished by the source of origin of the dataset. We can see how the summary length varies for each dataset. The graph shows the that journal dataset has majority and lengthier summary lengths when compared to other datasets and reviews have shorter summary lengths.
09
PAIRPLOT FOR MULTIVARIANT ANALYSIS
The pair plot on the right shows the multivariate analysis for the 3 datasets. Parameters for comparison is the summary length and text length


10
SOURCE DISTRIBUTION IN COMBINED DATASET
The bar graph on the left shows the distribution of different datasets based on the source of the datasets. We can see that majority of the records are from journals and news articles is the least in number.