English dictionary

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Created page with "The English dictionaries are used, along with the English grammars, in the process of UNLization and NLization with IAN and EUGENE, respectively. They foll...")
 
(ENG-UNL Dictionary)
Line 3: Line 3:
 
*The '''UNL-ENG Dictionary''', to be used in NLization (EUGENE), is presented in the generative format, i.e., it brings only base forms ("die") along with the instructions to generate the corresponding inflections
 
*The '''UNL-ENG Dictionary''', to be used in NLization (EUGENE), is presented in the generative format, i.e., it brings only base forms ("die") along with the instructions to generate the corresponding inflections
 
== ENG-UNL Dictionary ==
 
== ENG-UNL Dictionary ==
The ENG-UNL Dictionary for Corpus500 may be downloaded from [http://www.unlweb.net/resources/dic_ana_eng.txt dic_ana_eng.txt]. The complete ENG-UNL Dictionary may be exported from the [[UNLarium]]: UNLWEB>UNLARIUM>DICTIONARY>ENGLISH>EXPORT.
+
The ENG-UNL Dictionary for Corpus500 may be downloaded from [http://www.unlweb.net/resources/ana_dic_eng.txt ana_dic_eng.txt]. The complete ENG-UNL Dictionary may be exported from the [[UNLarium]]: UNLWEB>UNLARIUM>DICTIONARY>ENGLISH>EXPORT.
 +
=== Dictionary entry structure ===
 +
In the ENG-UNL Dictionary, entries are provided in the following format:
 +
[English Word Form]{}"UW"(feature list in attribute-value pair format)<eng,FREQUENCY,PRIORITY>/
 +
The dictionary is divided into three parts:
 +
*Corpus-specific words bring open-class words (i.e., nouns, adjectives, verbs and adverbs) appearing in the [[Corpus500]]
 +
*Grammar words bring closed-class words (determiners, prepositions, conjunctions and numbers) of English
 +
*Default dictionary brings punctuation signs and regular expressions to process URLs, dates and other canned structures
 +
The features used in the dictionary are the following

Revision as of 16:01, 27 July 2012

The English dictionaries are used, along with the English grammars, in the process of UNLization and NLization with IAN and EUGENE, respectively. They follow the UNL Dictionary Specs and are provided in two different formats:

  • The ENG-UNL Dictionary, to be used in UNLization (IAN), is presented in the enumerative format, i.e., it brings all word forms ("die","dies","dying","dead") and not only base forms
  • The UNL-ENG Dictionary, to be used in NLization (EUGENE), is presented in the generative format, i.e., it brings only base forms ("die") along with the instructions to generate the corresponding inflections

ENG-UNL Dictionary

The ENG-UNL Dictionary for Corpus500 may be downloaded from ana_dic_eng.txt. The complete ENG-UNL Dictionary may be exported from the UNLarium: UNLWEB>UNLARIUM>DICTIONARY>ENGLISH>EXPORT.

Dictionary entry structure

In the ENG-UNL Dictionary, entries are provided in the following format:

[English Word Form]{}"UW"(feature list in attribute-value pair format)<eng,FREQUENCY,PRIORITY>/

The dictionary is divided into three parts:

  • Corpus-specific words bring open-class words (i.e., nouns, adjectives, verbs and adverbs) appearing in the Corpus500
  • Grammar words bring closed-class words (determiners, prepositions, conjunctions and numbers) of English
  • Default dictionary brings punctuation signs and regular expressions to process URLs, dates and other canned structures

The features used in the dictionary are the following

Software