42 - Language Identification

Identifying the language of a text is often done before you select the right language model.

Language identification is one of the first tasks you will do, because you have to select the right language-specific model. You can use the python library langdetect (55 lang, 99% acc) or fasttext (176 lang, 93% acc).

    from langdetect import detect, detect_langs 

    detect("War doesn't show who's right, just who's left.")  
    >>> en 

    detect_langs("Otec matka syn.")  
    >>> [sk:0.572770823327, pl:0.292872522702, cs:0.134356653968] 

This article is part of the project Periodic Table of NLP Tasks. Click to read more about the making of the Periodic Table and the project to systemize NLP tasks.