Entity Enriching

29 - Price Parser

Extracting price and currency from raw text and normalize it into a standard format.

A Price Parser extracts prices and currency from raw text. Applications can be in Price Management and Competitive Pricing, often combined with webscraping. Another usage example is from the Panama Papers; extract all valuta amounts from documents and link each amount to a person, organization, date or bank account.

A good price parser normalizes all prices into a standard format. It should also recognize all currencies. It might be difficult to recognize the abbreviations of currencies; Euro, EUR, US Dollar, dollar (which one?), USD. The currency symbols are easier to find, because they have their own Unicode category. But still this doesn’t guarantee completeness.

    import unicodedata 

    def is_currency_symbol(char):  
        return unicodedata.category(char) == “Sc” 

A good python package is price-parser. Another library for finding amounts of money is Facebook’s Duckling (in Haskell).

This article is part of the project Periodic Table of NLP Tasks. Click to read more about the making of the Periodic Table and the project to systemize NLP tasks.