English dictionary

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Dictionary entry structure)
Line 12: Line 12:
 
*Default dictionary brings punctuation signs and regular expressions to process URLs, dates and other canned structures
 
*Default dictionary brings punctuation signs and regular expressions to process URLs, dates and other canned structures
 
The features used in the dictionary are the following
 
The features used in the dictionary are the following
{|
+
{|border=1 align=center
 
|+Feature structure of the ENG-UNL Dictionary
 
|+Feature structure of the ENG-UNL Dictionary
 
!Class
 
!Class

Revision as of 16:11, 27 July 2012

The English dictionaries are used, along with the English grammars, in the process of UNLization and NLization with IAN and EUGENE, respectively. They follow the UNL Dictionary Specs and are provided in two different formats:

  • The ENG-UNL Dictionary, to be used in UNLization (IAN), is presented in the enumerative format, i.e., it brings all word forms ("die","dies","dying","dead") and not only base forms
  • The UNL-ENG Dictionary, to be used in NLization (EUGENE), is presented in the generative format, i.e., it brings only base forms ("die") along with the instructions to generate the corresponding inflections

ENG-UNL Dictionary

The ENG-UNL Dictionary for Corpus500 may be downloaded from ana_dic_eng.txt. The complete ENG-UNL Dictionary may be exported from the UNLarium: UNLWEB>UNLARIUM>DICTIONARY>ENGLISH>EXPORT.

Dictionary entry structure

In the ENG-UNL Dictionary, entries are provided in the following format:

[English Word Form]{}"UW"(feature list in attribute-value pair format)<eng,FREQUENCY,PRIORITY>;

The dictionary is divided into three parts:

  • Corpus-specific words bring open-class words (i.e., nouns, adjectives, verbs and adverbs) appearing in the Corpus500
  • Grammar words bring closed-class words (determiners, prepositions, conjunctions and numbers) of English
  • Default dictionary brings punctuation signs and regular expressions to process URLs, dates and other canned structures

The features used in the dictionary are the following

Feature structure of the ENG-UNL Dictionary
Class Attributes
Common noun and proper names (N) LEX,POS,NUM
Adjectives (J) LEX,POS
Adverbs (A) LEX,POS,att*
Verbs (V) LEX,POS,TRA,ATE,PER
Conjunctions (C) LEX,POS,att*,rel**
Determiners (D) LEX,POS,att*,rel**
Prepositions (P) LEX,POS,att*,rel**
Pronouns (R) LEX,POS,CAS,PER,GEN,NUM
Numerals (U) LEX,POS
Software