Training Data Generation

09 - Annotation with Active Learning

Use an annotation tool that benefits from active learning to enforce a robust annotion process and balanced annotations.

It might not be useful to build a training dataset for Named Entity Recognition with 2000 annotations, including 100 occurrences where ‘Barack Obama’ is tagged as a Person. You only want to annotate sentences where the model is least sure about the prediction.

With active learning, the model chooses which sentences should be selected for annotating. Other sentences are skipped, because the model is more certain about those annotations.

The makers of spaCy made the annotation tool Prodi.gy which is powered by active learning is (video below).

Type caption for embed (optional)

This article is part of the project Periodic Table of NLP Tasks. Click to read more about the making of the Periodic Table and the project to systemize NLP tasks.