Data Preparation and Code | Data Quality Classif

top of page

This site was designed with the

website builder. Create your website today.Start Now

DATA PREPRATION

1.TEXT SUMMARIZATION USING NEURAL NETWORKS

Dataset used for NN

(b) Training and Testing Sets

Splitting Data: The labeled dataset must be divided into two disjoint sets: a Training Set and a Testing Set. The Training Set is used to build and train the SVM model, while the Testing Set is used to evaluate its performance and ensure that the model generalizes well to new, unseen data.
Why Disjoint: The Training and Testing sets must be disjoint to prevent overfitting. Overfitting occurs when a model learns the details and noise in the training data to an extent that it negatively impacts the performance of the model on new data.

Link to the data can be found here

Training Dataset

Testing Dataset

CODE

Code for Text Summarization using Python can be found here.

bottom of page