EUGENE

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Requirements)
(Functioning)
Line 13: Line 13:
 
*Segmentation, i.e., the division of the input document into a series of isolated graphs, which are processed one at a time
 
*Segmentation, i.e., the division of the input document into a series of isolated graphs, which are processed one at a time
 
*[[Tokenization]], i.e., the identification of the tokens ([[UW]]s, [[relation]]s and [[attribute]]s) of each graph of the input document
 
*[[Tokenization]], i.e., the identification of the tokens ([[UW]]s, [[relation]]s and [[attribute]]s) of each graph of the input document
*Transformation, i.e., the tranformation of each tokenized graph into a natural language sentence
+
*Transformation, i.e., the application of the transformation rules of the grammar over each tokenized graph in order to generate a natural language sentence

Revision as of 00:42, 23 July 2012

EUGENE is a natural language generation system. It generates natural language sentences out of semantic networks represented in the UNL format. In its current release, it is a Java based web application available at the UNLdev.

Requirements

As a universal engine, EUGENE must be parameterized to the target languages with the following files, to be provided through EUGENE's interface:

  • The input document in the UNL document structure, i.e., the universal semantic network to be generated in natural language
  • The UNL-NL (generation) dictionary, i.e., a lexical database where UWs are mapped into natural language entries, along with the corresponding features, to be provided according to the UNL Dictionary Specs
  • The UNL-NL (generation) transformation grammar, i.e., a set of of transformation rules used to convert the UNL graphs into natural langauge sentences, to be provided according to the UNL Grammar Specs
  • The UNL-NL (generation) disambiguation grammar, i.e, a set of disambiguation rules used to improve the results of the tokenization and of the transformation

to be provided according to the UNL Grammar Specs, to be provided according to the UNL Grammar Specs

Functioning

EUGENE performs the three following movements over the input file:

  • Segmentation, i.e., the division of the input document into a series of isolated graphs, which are processed one at a time
  • Tokenization, i.e., the identification of the tokens (UWs, relations and attributes) of each graph of the input document
  • Transformation, i.e., the application of the transformation rules of the grammar over each tokenized graph in order to generate a natural language sentence
Software