Named Entity Recognization

Named Entity Recognition (NER) is also known as entity extraction and entity recognition. NER, a subtask of information extraction, is a process of finding mentions of specified things in the given text. In other words, it seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

Most research on NER systems has been structured as taking an unannotated block of text, such as this one:
Jim bought 300 shares of Acme Corp. in 2006.

And producing an annotated block of text, such as this one:
<ENAMEX TYPE="PERSON">Jim</ENAMEX> bought <NUMEX TYPE="QUANTITY">300</NUMEX> shares of <ENAMEX TYPE="ORGANIZATION">Acme Corp.</ENAMEX> in <TIMEX TYPE="DATE">2006</TIMEX>

In this example, the annotations have been done using so-called ENAMEX tags that were developed for the Message Understanding Conference in the 1990s.
  • Performance: state-of-the-art NER systems for English produce near-human performance.
  • Tools: Wikipedia lists a number of open source tools such as MALLET

0 comments:

Post a Comment