Named Entity Recognition Using Web Document Corpus

June 03, 2020

International Journal of Managing Information Technology (IJMIT)

ISSN: 0975-5586 (Online); 0975-5926 (Print)

http://airccse.org/journal/ijmit/ijmit.html

Vol.3, No.1, February 2011

Page No 46 to 55

Article:

Named Entity Recognition Using Web Document Corpus

Authors

Wahiba Ben Abdessalem Karaa, Institut Superieur de Gestion de Tunis, Tunisia

Abstract

This paper introduces a named entity recognition approach in textual corpus. This Named Entity (NE)
can be a named: location, person, organization, date, time, etc., characterized by instances. A NE is
found in texts accompanied by contexts: words that are left or right of the NE. The work mainly aims at identifying contexts inducing the NE’s nature. As such, The occurrence of the word "President" in a text, means that this word or context may be followed by the name of a president as President "Obama". Likewise, a word preceded by the string "footballer" induces that this is the name of a
footballer. NE recognition may be viewed as a classification method, where every word is assigned to
a NE class, regarding the context. The aim of this study is then to identify and classify the contexts that are most relevant to recognize a NE, those which are frequently found with the NE. A learning approach using training corpus: web documents, constructed from learning examples is then suggested. Frequency representations and modified tf-idf representations are used to calculate the context weights associated to context frequency, learning example frequency, and document frequency in the corpus.

Keywords

Named entity; Learning; Information extraction; tf-idf; Web document.

Paper URL:

http://airccse.org/journal/ijmit/papers/3111ijmit04.pdf

Current Issue URL:

http://airccse.org/journal/ijmit/vol3.html