Information Extraction (Questionnaire 2)

2009/03/28

 As Jim Cowie and Yorick Wilks say, this name (Information Extraction) is given to a process that discriminatively structures and also combines data found in only one, or more texts. The ending outcome of the process of extraction changes; nevertheless, it can be transformed in order to populate some database type. Information analyst who have worked long run on particular assignment have already carried out information extraction manually with the main of database creation.

 The importance of Information Extraction is determined by the huge amount of information available in a badly built form; internet is a good example of this fact. Those unstructured information can be made more accessible by transforming into relational form or also by marking-up with XML tags. To transform unstructured data into something that can be reasoned with, is required Information Extraction.

 

We can find a lot of definitions about Information Extraction given by some experts:

  1.  Grishman(1997): “The identification of instances of a particular class of events or relationships in a natural language text, and the extraction of the relevant arguments of the event or relationship. It involves the creation of a structured representation (such as a data base) of selected information drawn from the text.”
  2. Riloff (1999): “A subfield of natural language processing that is concerned with identifying predefined types of information from text.”
  3. Yangarber (2001): “An emerging NLP technology whose function is to process unstructured, natural language text, to locate specific pieces of information, or facts in the text, and to use these facts to fill a database.”
  4. Peshkin and Pheffer (2003): “The task of filling template information from previously unseen text which belongs to a predefined domain.”
  5. Cunningham (2005): “A technology based on analyzing natural language in order to extract snippets of information.”

One of the reasons for interest in Information Extraction is its role in analyzing, and contrasting different Natural Language Processing technologies. The evaluation (analyzing) process is specific and moreover, it can also be performed automatically. This, and the immediate applications of a successful extraction system, has given encouragement to research funders to support both evaluations of and research into Information Extraction

 To end up, it’s crucial to mention the typical subtasks of Information Extraction:

  1. Named Entity Recognition: entity names’ recognition (for organizations and people), name of places, temporal expressions, and some types of numerical expressions.
  2. Conference: identification chains of noun phrases which refer to the same object. For instance, anaphora is a kind of conference.
  3. Terminology extraction: finding the suitable terms for a particular corpus.
  4. Relationship Extraction: relations’ identification between entities, such as:

                             *Person works for organization (extracted from the sentence “Bill works for IBM.”)
                             *Person located in location (extracted from the sentence “Bill is in France.”)

REFERENCES:

.Information Extraction. In Natural Language Processing Group, The University of Sheffield. Retrieved 11:53, March 18, 2009, from http://gate.ac.uk/ie/
.Information extraction. (2007, February 06). In Open Clinical, knowledege management for medical care. Retrieved 13:24, March 28, 2009, from http://www.openclinical.org/informationextraction.html
.Information Extraction. Jim Cowie and Yorick Wilks. In Department of Computer Science, University of Sheffield. Retrieved 13:11, March 28, 2009, from http://www.dcs.shef.ac.uk/~yorick/papers/infoext.pdf
.Information extraction. (2009, February 14). In Wikipedia, The Free Encyclopedia. Retrieved 11:46, March 18, 2009, from http://en.wikipedia.org/wiki/Information_extraction

Entry Filed under: HLT. Tags: , , , , , , , , , , .

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


Sailak

Recent Posts

Artxibo

Blogroll

Lagunak

 

Martxoa 2009
M T W T F S S
« Feb   Apr »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Tags

ANSI Artificial Intelligence BMW Bruce Springsteen Conference Cunningham Daimler Chrysler DKFI Eutanasia FeedRoll Gorputz Grishman Hans Uszkoreit HLT HTML Hypertext ISO Jay David Bolter Kevin Kevin Kelly Mar adentro Martin Kay Microsoft Named Entity Recognition OpenTrad Peshkin and Pheffer Platon Ramon Sampedro Relationship Extraction Riloff RSS RSS feeds RSS readers SAP SGML Systran Terminology extraction the E Street Band Walter G. Olthoff Walter Ong Wolfgang Wahlster Working on a dream XML Yangarber Yorick Wilks

Feeds

RSS litterako jarioa

RSS amaiaren bloga

Meta

RSS CiteUlike

RSS Google Books