“Text Summarization is the task of producing a concise and fluent summary while preserving key information content and overall meaning.”
TEXT SUMMARIZATION TECHNIQUES:
There are two main types of techniques used for text summarization:
- NLP Based
- Deep Learning Based
In this, we will see a simple NLP(Natural Language Processing) based technique for text summarization. There are many libraries for NLP. For this , we will be using NLTK- Natural Language Toolkit.
TEXT SUMMARIZATION WITH NLTK:
Natural Language Toolkit (NLTK) is a leading platform for building Python programs to work with human language data. It guides the reader through the fundamentals of writing Python programs, categorizing text, analyzing linguistic structure, and more.
Broadly, there are two approaches to summarizing texts:
- Extractive Summarization
- Abstractive Summarization
These methods rely on extracting several parts, such as phrases and sentences, from a piece of text and stack them together to create a summary. Therefore, identifying the right sentences for summarization is of utmost importance in an extractive method.
In simple words, highlighting some important lines with in a book highlighter is Extractive Summarization.
Input document → sentences similarity → weight sentences → select sentences with higher rank
These methods select words based on semantic understanding, even those words did not appear in the source documents. It aims at producing important material in a new way. These methods use advanced NLP techniques to generate an entirely new summary.
In simple words, after reading a book, writing the summary in your own words is Abstractive Summarization.
Input document → understand context → semantics → create own summary