Training Data Generation

12 - Textual Data Augmentation

Boost your performance by creating data out of data, instead of new data.

The amount of available textual (training) data influences the performance of many NLP tasks. If collecting more data is not an option, there are different techniques for boosting performance on your NLP task.

Data augmentations are a standard part for Computer Visions tasks. However, due to the grammatical structure, the task is much more delicate for textual data and Natural Language Generation.

Here are some examples of how the textual data is transformed by Easy Data Augmentation (EDA) techniques and Back Translation:

Textual Data Augmentation Techniques (source)

Data Augmentation might not help, but it’s worth the shot if you are stuck. Whatever you do; do not validate with augmented textual data!

This article is part of the project Periodic Table of NLP Tasks. Click to read more about the making of the Periodic Table and the project to systemize NLP tasks.