Information Extraction (Questionnaire 2)

2009/03/28 at 5:51 pm Utzi iruzkina

As Jim Cowie and Yorick Wilks say, this name (Information Extraction) is given to a process that discriminatively structures and also combines data found in only one, or more texts. The ending outcome of the process of extraction changes; nevertheless, it can be transformed in order to populate some database type. Information analyst who have worked long run on particular assignment have already carried out information extraction manually with the main of database creation.

The importance of Information Extraction is determined by the huge amount of information available in a badly built form; internet is a good example of this fact. Those unstructured information can be made more accessible by transforming into relational form or also by marking-up with XML tags. To transform unstructured data into something that can be reasoned with, is required Information Extraction.

We can find a lot of definitions about Information Extraction given by some experts:

  1. Grishman(1997): “The identification of instances of a particular class of events or relationships in a natural language text, and the extraction of the relevant arguments of the event or relationship. It involves the creation of a structured representation (such as a data base) of selected information drawn from the text.”
  2. Riloff (1999): “A subfield of natural language processing that is concerned with identifying predefined types of information from text.”
  3. Yangarber (2001): “An emerging NLP technology whose function is to process unstructured, natural language text, to locate specific pieces of information, or facts in the text, and to use these facts to fill a database.”
  4. Peshkin and Pheffer (2003): “The task of filling template information from previously unseen text which belongs to a predefined domain.”
  5. Cunningham (2005): “A technology based on analyzing natural language in order to extract snippets of information.”

One of the reasons for interest in Information Extraction is its role in analyzing, and contrasting different Natural Language Processing technologies. The evaluation (analyzing) process is specific and moreover, it can also be performed automatically. This, and the immediate applications of a successful extraction system, has given encouragement to research funders to support both evaluations of and research into Information Extraction

To end up, it’s crucial to mention the typical subtasks of Information Extraction:

  1. Named Entity Recognition: entity names’ recognition (for organizations and people), name of places, temporal expressions, and some types of numerical expressions.
  2. Conference: identification chains of noun phrases which refer to the same object. For instance, anaphora is a kind of conference.
  3. Terminology extraction: finding the suitable terms for a particular corpus.
  4. Relationship Extraction: relations’ identification between entities, such as:

*Person works for organization (extracted from the sentence “Bill works for IBM.”)
*Person located in location (extracted from the sentence “Bill is in France.”)


.Information Extraction. In Natural Language Processing Group, The University of Sheffield. Retrieved 11:53, March 18, 2009, from
.Information extraction. (2007, February 06). In Open Clinical, knowledege management for medical care. Retrieved 13:24, March 28, 2009, from
.Information Extraction. Jim Cowie and Yorick Wilks. In Department of Computer Science, University of Sheffield. Retrieved 13:11, March 28, 2009, from
.Information extraction. (2009, February 14). In Wikipedia, The Free Encyclopedia. Retrieved 11:46, March 18, 2009, from


Entry filed under: HLT, Littera. Tags: , , , , , , , , , , .

Topics’ list (Questionnaire 2) Automatic Summarization (Questionnaire 2)

Utzi erantzun bat

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Aldatu )

Google+ photo

You are commenting using your Google+ account. Log Out /  Aldatu )

Twitter picture

You are commenting using your Twitter account. Log Out /  Aldatu )

Facebook photo

You are commenting using your Facebook account. Log Out /  Aldatu )


Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed

Azken postak


2009(e)ko martxoa
« Ots   Api »


RSS amaiaren bloga

RSS CiteUlike

RSS Google Books

  • An error has occurred; the feed is probably down. Try again later.

%d bloggers like this: