English dictionary

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Dictionary entry structure)
Line 1: Line 1:
The English dictionaries are used, along with the [[English grammar]]s, in the process of [[UNLization]] and [[NLization]] with [[IAN]] and [[EUGENE]], respectively. They follow the [[UNL Dictionary Specs]] and are provided in two different formats:
+
The English dictionaries are used, along with the [[English grammar]]s, in the process of [[UNLization]] and [[NLization]] with [[IAN]] and [[EUGENE]], respectively. They follow the [[UNL Dictionary Specs]] and use the tags provided in the [[Tagset]]. They are provided in two different formats:
 
*The '''ENG-UNL Dictionary''', to be used in UNLization (IAN), is presented in the enumerative format, i.e., it brings all word forms ("die","dies","dying","dead") and not only base forms
 
*The '''ENG-UNL Dictionary''', to be used in UNLization (IAN), is presented in the enumerative format, i.e., it brings all word forms ("die","dies","dying","dead") and not only base forms
 
*The '''UNL-ENG Dictionary''', to be used in NLization (EUGENE), is presented in the generative format, i.e., it brings only base forms ("die") along with the instructions to generate the corresponding inflections
 
*The '''UNL-ENG Dictionary''', to be used in NLization (EUGENE), is presented in the generative format, i.e., it brings only base forms ("die") along with the instructions to generate the corresponding inflections
Line 44: Line 44:
 
|LEX,POS
 
|LEX,POS
 
|}
 
|}
 +
The tags follow the structure defined in the [[Tagset]]<br />
 +
<nowiki>*</nowiki>the atrribute "att", along with its corresponding value, is used when the word is associated to an attribute (instead of a UW):
 +
:[the]{}"" (LEX=D,POS=ART,att=@def)<eng,255,0>;
 +
:In this case, the English word "the" does not correspond to any UW, but to the attribute "@def".
 +
<nowiki>**</nowiki>the atrribute "rel", along with its corresponding value, is used when the word is associated to a relation (instead of a UW):
 +
:[of]{}"" (LEX=P,POS=PRE,rel=mod)<eng,255,0>;
 +
:In this case, the English word "of" does not correspond to any UW, but to the relation "mod"
 +
Some entries may have both "att" and "rel":
 +
:[under]{}"" (P,rel=plc,att=@under)<eng,255,0>;
 +
:In this case, the English word "under" correspond to both the relation "plc" (place) and the attribute "@under".

Revision as of 16:18, 27 July 2012

The English dictionaries are used, along with the English grammars, in the process of UNLization and NLization with IAN and EUGENE, respectively. They follow the UNL Dictionary Specs and use the tags provided in the Tagset. They are provided in two different formats:

  • The ENG-UNL Dictionary, to be used in UNLization (IAN), is presented in the enumerative format, i.e., it brings all word forms ("die","dies","dying","dead") and not only base forms
  • The UNL-ENG Dictionary, to be used in NLization (EUGENE), is presented in the generative format, i.e., it brings only base forms ("die") along with the instructions to generate the corresponding inflections

ENG-UNL Dictionary

The ENG-UNL Dictionary for Corpus500 may be downloaded from ana_dic_eng.txt. The complete ENG-UNL Dictionary may be exported from the UNLarium: UNLWEB>UNLARIUM>DICTIONARY>ENGLISH>EXPORT.

Dictionary entry structure

In the ENG-UNL Dictionary, entries are provided in the following format:

[English Word Form]{}"UW"(feature list in attribute-value pair format)<eng,FREQUENCY,PRIORITY>;

The dictionary is divided into three parts:

  • Corpus-specific words bring open-class words (i.e., nouns, adjectives, verbs and adverbs) appearing in the Corpus500
  • Grammar words bring closed-class words (determiners, prepositions, conjunctions and numbers) of English
  • Default dictionary brings punctuation signs and regular expressions to process URLs, dates and other canned structures

The features used in the dictionary are the following

Feature structure of the ENG-UNL Dictionary
Class Attributes
Common noun and proper names (N) LEX,POS,NUM
Adjectives (J) LEX,POS
Adverbs (A) LEX,POS,att*
Verbs (V) LEX,POS,TRA,ATE,PER
Conjunctions (C) LEX,POS,att*,rel**
Determiners (D) LEX,POS,att*,rel**
Prepositions (P) LEX,POS,att*,rel**
Pronouns (R) LEX,POS,CAS,PER,GEN,NUM
Numerals (U) LEX,POS

The tags follow the structure defined in the Tagset
*the atrribute "att", along with its corresponding value, is used when the word is associated to an attribute (instead of a UW):

[the]{}"" (LEX=D,POS=ART,att=@def)<eng,255,0>;
In this case, the English word "the" does not correspond to any UW, but to the attribute "@def".

**the atrribute "rel", along with its corresponding value, is used when the word is associated to a relation (instead of a UW):

[of]{}"" (LEX=P,POS=PRE,rel=mod)<eng,255,0>;
In this case, the English word "of" does not correspond to any UW, but to the relation "mod"

Some entries may have both "att" and "rel":

[under]{}"" (P,rel=plc,att=@under)<eng,255,0>;
In this case, the English word "under" correspond to both the relation "plc" (place) and the attribute "@under".
Software