A Price Parser extracts prices and currency from raw text. Applications can be in Price Management and Competitive Pricing, often combined with webscraping. Another usage example is from the Panama Papers; extract all valuta amounts from documents and link each amount to a person, organization, date or bank account.
A good price parser normalizes all prices into a standard format. It should also recognize all currencies. It might be difficult to recognize the abbreviations of currencies; Euro, EUR, US Dollar, dollar (which one?), USD. The currency symbols are easier to find, because they have their own Unicode category. But still this doesn’t guarantee completeness.
import unicodedata def is_currency_symbol(char): return unicodedata.category(char) == “Sc”
This article is part of the project Periodic Table of NLP Tasks. Click to read more about the making of the Periodic Table and the project to systemize NLP tasks.