English dictionary
From UNL Wiki
(Difference between revisions)
(Created page with "The English dictionaries are used, along with the English grammars, in the process of UNLization and NLization with IAN and EUGENE, respectively. They foll...") |
(→ENG-UNL Dictionary) |
||
Line 3: | Line 3: | ||
*The '''UNL-ENG Dictionary''', to be used in NLization (EUGENE), is presented in the generative format, i.e., it brings only base forms ("die") along with the instructions to generate the corresponding inflections | *The '''UNL-ENG Dictionary''', to be used in NLization (EUGENE), is presented in the generative format, i.e., it brings only base forms ("die") along with the instructions to generate the corresponding inflections | ||
== ENG-UNL Dictionary == | == ENG-UNL Dictionary == | ||
− | The ENG-UNL Dictionary for Corpus500 may be downloaded from [http://www.unlweb.net/resources/ | + | The ENG-UNL Dictionary for Corpus500 may be downloaded from [http://www.unlweb.net/resources/ana_dic_eng.txt ana_dic_eng.txt]. The complete ENG-UNL Dictionary may be exported from the [[UNLarium]]: UNLWEB>UNLARIUM>DICTIONARY>ENGLISH>EXPORT. |
+ | === Dictionary entry structure === | ||
+ | In the ENG-UNL Dictionary, entries are provided in the following format: | ||
+ | [English Word Form]{}"UW"(feature list in attribute-value pair format)<eng,FREQUENCY,PRIORITY>/ | ||
+ | The dictionary is divided into three parts: | ||
+ | *Corpus-specific words bring open-class words (i.e., nouns, adjectives, verbs and adverbs) appearing in the [[Corpus500]] | ||
+ | *Grammar words bring closed-class words (determiners, prepositions, conjunctions and numbers) of English | ||
+ | *Default dictionary brings punctuation signs and regular expressions to process URLs, dates and other canned structures | ||
+ | The features used in the dictionary are the following |
Revision as of 16:01, 27 July 2012
The English dictionaries are used, along with the English grammars, in the process of UNLization and NLization with IAN and EUGENE, respectively. They follow the UNL Dictionary Specs and are provided in two different formats:
- The ENG-UNL Dictionary, to be used in UNLization (IAN), is presented in the enumerative format, i.e., it brings all word forms ("die","dies","dying","dead") and not only base forms
- The UNL-ENG Dictionary, to be used in NLization (EUGENE), is presented in the generative format, i.e., it brings only base forms ("die") along with the instructions to generate the corresponding inflections
ENG-UNL Dictionary
The ENG-UNL Dictionary for Corpus500 may be downloaded from ana_dic_eng.txt. The complete ENG-UNL Dictionary may be exported from the UNLarium: UNLWEB>UNLARIUM>DICTIONARY>ENGLISH>EXPORT.
Dictionary entry structure
In the ENG-UNL Dictionary, entries are provided in the following format:
[English Word Form]{}"UW"(feature list in attribute-value pair format)<eng,FREQUENCY,PRIORITY>/
The dictionary is divided into three parts:
- Corpus-specific words bring open-class words (i.e., nouns, adjectives, verbs and adverbs) appearing in the Corpus500
- Grammar words bring closed-class words (determiners, prepositions, conjunctions and numbers) of English
- Default dictionary brings punctuation signs and regular expressions to process URLs, dates and other canned structures
The features used in the dictionary are the following