RESULTS
1. Sentiment Analysis using Naive Bayes
After the Naive Bayed model is trained with the training dataset, it is tested on the testing dataset.
Performance Metrics Summary
-
Accuracy: 82.89% - This is the percentage of total predictions that were correct. An accuracy of 82.89% indicates that the model is quite good at predicting the sentiment of the reviews. It means that for approximately 83 out of every 100 reviews, the model correctly identified whether the sentiment was positive or negative.
-
Precision: 81.00% - Precision measures the proportion of positive identifications that were actually correct. A precision of 81% means that when the model predicted a review to have a certain sentiment (positive or negative), it was correct about 81% of the time.
-
Recall: 84.78% - Recall measures the proportion of actual positives that were correctly identified. A recall of 84.78% indicates that the model was able to correctly identify approximately 85% of the reviews' sentiments.
-
F1 Score: 82.85% - The F1 score is the harmonic mean of precision and recall, providing a single metric to assess the balance between them. An F1 score of 82.85% is quite high, suggesting that the model maintains a good balance between precision and recall.​​
Confusion Matrix

​From the confusion matrix, the model's specific strengths and weaknesses can be studied. The matrix shows high true positive and true negative scores.
The top-left quadrant (1334) represents the true positives (TP), indicating the number of positive sentiments correctly identified as positive.
The bottom-right quadrant (1326) shows the true negatives (TN), reflecting the number of negative sentiments accurately recognized as negative.
The top-right quadrant (311) represents false positives (FP), where negative sentiments were incorrectly classified as positive.
The bottom-left quadrant (238) shows false negatives (FN), where positive sentiments were mistakenly classified as negative.
​
​
2. Sentence Classification using Naive Bayes
​
Once the Naive Bayes model is trained, it is tested on a testing dataset that labels the sentences as important or not importance.
Performance Metric Summary

The results show a model performance with a precision, recall, and F1-score of 1.00 for both classes, and an overall accuracy of 1.00 across 6 test samples. Here’s a breakdown of what these results imply:
-
Precision: Indicates no false positives; every sentence the model predicted as important or not important was correct.
-
Recall: Reflects no false negatives; the model successfully identified all important and not important sentences.
-
F1-Score: A perfect F1-score for both classes shows a balance between precision and recall — an ideal scenario.
-
Support: Indicates the number of actual occurrences of each class in the test set, with 3 samples for each class, signifying a balanced test set.
Confusion Matrix

The matrix has two classes: "Not Important" (0) and "Important" (1).
There are no off-diagonal numbers, meaning there are no false positives or false negatives.
The diagonal cells show that 3 "Not Important" sentences and 3 "Important" sentences were correctly identified, matching the count of actual labels.
The colors indicate the density of the values, with darker colors typically representing higher numbers. In this case, the dark color shows the count of 3 for both correct predictions
​
Implications and Considerations:
A confusion matrix like this is indicative of an ideal situation where the model predictions are perfectly aligned with the actual labels. It suggests that the model has learned the distinctions between "important" and "not important" sentences effectively for the given test set.
While the model's performance looks perfect, it's essential to scrutinize the test dataset for size and diversity. This result could also indicate that the test dataset might not be challenging enough or could be too similar to the training data.
The model should be evaluated on a larger, more diverse, and completely unseen dataset to ensure these results are not due to overfitting. Implementing cross-validation can also provide a more accurate assessment of the model's performance.
CONCLUSION
The Naive Bayes model has shown promising results in sentiment analysis, achieving an accuracy of 82.89%. Such a high degree of accuracy indicates that the model is quite effective at distinguishing between positive and negative sentiments in text reviews. It correctly predicts the sentiment nearly 83 out of 100 times, a significant achievement for a probabilistic model that considers each word's contribution to the sentiment independently. The precision and recall scores reflect the model’s capability to not only identify positive reviews accurately but also to retrieve a high proportion of actual positive sentiments. The F1 score, which is a balance between precision and recall, underscores a harmonious alignment of both measures, emphasizing the model's consistent performance across the spectrum of sentiments.
However, the perfection displayed in sentence classification, with a precision, recall, and F1-score of 1.00, warrants careful consideration. While it showcases the Naive Bayes model's potential in correctly classifying sentences as important or not, such impeccable metrics may also point to a lack of complexity in the test dataset or overfitting. In real-world applications, a dataset’s diversity and representativeness are crucial for a model's ability to generalize. Therefore, it's imperative to evaluate the model on a broader and more varied dataset to confirm these results. This serves as a reminder that while Naive Bayes can be a powerful tool for text classification tasks, its performance is heavily dependent on the quality of the input data and the model's capacity to handle diverse linguistic expressions.