MIR

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Goal)
(Structure)
Line 16: Line 16:
  
 
== Structure ==
 
== Structure ==
MIR is divided into 6 different subprojects according to the [[FRAU|Frameword of Reference for UNL]]:
+
MIR is divided into 6 different subprojects according to the [[FRAU|Framework of Reference for UNL]]:
 
{|border="1" align="center" cellpadding="2"
 
{|border="1" align="center" cellpadding="2"
 
!Repository
 
!Repository

Revision as of 15:46, 18 September 2012

MIR is a centralized repository of lexical data extracted from the WordNet3.0. It contains 117,659 UW's representing different sets of synonyms (or synsets) of English, which are expected to be associated to the corresponding lexical items of any language, whenever possible.

Contents

Goal

The project MIR has two main goals:

  1. To provide a concept-to-word multilingual database (i.e., a decoding or writer's dictionary). This dictionary is expected to be used in language-based shallow NLization, i.e., in generating natural language documents out of UNL graphs, especially through EUGENE.
  2. To assign a universality degree to each of the senses registered in the WordNet3.0 in order to decide in which section of the UNL Dictionary they should be included: in the UNL Core Dictionary, in the UNL Abridged Dictionary or in the UNL Unabridged Dictionary.

The repository

MIR is based on the WordNet3.0. It contains 117,659 UW's, which correspond to the different sets of synonyms (or synsets) of English. Each UW was defined as a 9-digit string with the following format:

<POS><WORDNETID>

where:

  • <POS> = {1,2,3,4}, being 1 = noun, 2 = verb, 3 = adjective and 4 = adverb;
  • and <WORDNETID> is the synset ID in the WordNet30.

Along with the UW, we provide the definition, examples, headwords and other features extracted from the WordNet3.0.
As an English-biased repository, which is expected to cover only concepts lexicalized in English, MIR should not be mistaken by the whole UNL Dictionary, of which it is only a part.

Structure

MIR is divided into 6 different subprojects according to the Framework of Reference for UNL:

Repository Description # of entries
MIR-A1
MIR-A2
MIR-B1
MIR-B2
MIR-C1
MIR-C2

Methodology

Software