Information Visualization

77 - Wordcloud

Most Wordclouds seem to ignore best practices for visualizing information.

The Wordcloud has been around for a long time. Visualizing information is a profession in itself. So there are some best practices, but Wordclouds seem to ignore these. Here are some remarks about the (missing) elements in a Wordcloud:

  • Stopwords are excluded, while the word don’t has an important meaning in front of another word. Including stopwords will mess up the Wordcloud because of their high frequencies.
  • Multi-word expressions are not calculated. The separate words from a multi-word expression (e.g. New York Times ) will be interpreted totally different.
  • Different colors have no different meaning.
  • Vertical or horizontal words have no different meaning. The same applies to words at the top/bottom/left/right.
  • No context is given to clarify the sense of that word.

Although there is a lot of resistance against Wordclouds, they are still around. You can generate your Wordclouds with the stylecloud python library.

Compare two Wordclouds about the State of the Union 2002 vs 2011 (source)

This article is part of the project Periodic Table of NLP Tasks. Click to read more about the making of the Periodic Table and the project to systemize NLP tasks.