English dictionary
The English dictionaries are used, along with the English grammars, in the process of UNLization and NLization with IAN and EUGENE, respectively. They follow the UNL Dictionary Specs and are provided in two different formats:
- The ENG-UNL Dictionary, to be used in UNLization (IAN), is presented in the enumerative format, i.e., it brings all word forms ("die","dies","dying","dead") and not only base forms
- The UNL-ENG Dictionary, to be used in NLization (EUGENE), is presented in the generative format, i.e., it brings only base forms ("die") along with the instructions to generate the corresponding inflections
ENG-UNL Dictionary
The ENG-UNL Dictionary for Corpus500 may be downloaded from ana_dic_eng.txt. The complete ENG-UNL Dictionary may be exported from the UNLarium: UNLWEB>UNLARIUM>DICTIONARY>ENGLISH>EXPORT.
Dictionary entry structure
In the ENG-UNL Dictionary, entries are provided in the following format:
[English Word Form]{}"UW"(feature list in attribute-value pair format)<eng,FREQUENCY,PRIORITY>;
The dictionary is divided into three parts:
- Corpus-specific words bring open-class words (i.e., nouns, adjectives, verbs and adverbs) appearing in the Corpus500
- Grammar words bring closed-class words (determiners, prepositions, conjunctions and numbers) of English
- Default dictionary brings punctuation signs and regular expressions to process URLs, dates and other canned structures
The features used in the dictionary are the following { !Class !Attributes |- |Common noun and proper names (N) |LEX,POS,NUM |- |Adjectives (J) |LEX,POS |- |Adverbs (A) |LEX,POS,att* |- |Verbs (V) |LEX,POS,TRA,ATE,PER |- |Conjunctions (C) |LEX,POS,att*,rel** |- |Determiners (D) |LEX,POS,att*,rel** |- |Prepositions (P) |LEX,POS,att*,rel** |- |Pronouns (R) |LEX,POS,CAS,PER,GEN,NUM |- |Numerals (U) |LEX,POS |}