UNL-NL Dictionaries

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(New page: The '''UNL-NL dictionaries''' are bilingual dictionaries linking UWs to natural language (NL) words. They can be unidirectional (UNL-to-NL or NL-to-UNL) or bidirectiona...)
 
Line 29: Line 29:
 
::*a list of simple features: (NOU, MCL, SNG)  
 
::*a list of simple features: (NOU, MCL, SNG)  
 
::*a list of attribute-value pairs: (pos=NOU, gen=MCL, num=SNG)
 
::*a list of attribute-value pairs: (pos=NOU, gen=MCL, num=SNG)
::*a list of attribute and transformation rules: (plural:=”oo”:”ee”)
+
::*a list of attribute and transformation rules (see below): (plural:=”oo”:”ee”)
 
Attributes should be separated by “,”.
 
Attributes should be separated by “,”.
  
Line 43: Line 43:
 
;COMMENT
 
;COMMENT
 
:Any comment necessary to clarify the mapping between NL and UNL entries. It should end with the return code.
 
:Any comment necessary to clarify the mapping between NL and UNL entries. It should end with the return code.
 +
 +
== Transformation rules for dictionary entries ==
 +
 +
In order to deal with exceptions, infixation and irregular forms, the following rules can be included inside dictionary entries:
 +
 +
In case of simple transformation:<br >
 +
<ATTRIBUTE>”:=”<SOURCE>”:”<TARGET><br >
 +
 +
In case of left appending:<br >
 +
<ATTRIBUTE>”:=”<LEFT DELETION>”<”<LEFT ADDITION><br >
 +
 +
In case of right appending:<br >
 +
<ATTRIBUTE>”:=”<RIGHT ADDITION>”>”<RIGHT DELETION><br >
 +
 +
Where:<br >
 +
<ATTRIBUTE> is the name of the attribute<br >
 +
<SOURCE> is the original form to be replaced (if empty, it means that the whole NLW should be replaced)<br >
 +
<TARGET> is the form to be used instead of the source (if empty, it means that the whole NLW should be deleted)<br >
 +
<LEFT DELETION> is the string or the number of characters from the beginning of the NLW to be deleted before the addition of the LEFT ADDITION<br >
 +
<RIGHT DELETION> is the string or the number of characters from the end of the NLW to be deleted before the addition of the RIGHT ADDITION<br >
 +
<LEFT ADDITION> is the string to be added to beginning of the NLW<br >
 +
<RIGHT ADDITION> is the string to be added to the end of the NLW<br >

Revision as of 21:04, 16 April 2009

The UNL-NL dictionaries are bilingual dictionaries linking UWs to natural language (NL) words. They can be unidirectional (UNL-to-NL or NL-to-UNL) or bidirectional (NL-to-UNL-to-NL). UNL-to-NL dictionaries are used for deconversion, while NL-to-UNL are used for enconversion.

Syntax

In the UNL System, the UNL-NL dictionaries are plain text files with a single entry per line in the following format:

[NLW]  {ID}  “UW”  (ATTR , ... )  < LG , FRE , PRI >; COMMENTS

Where:

NLW
The lexical item of the natural language. Its format should be decided by the dictionary builder. It can be:
  • a multiword expression: [United States of America]
  • a compound: [hot-dog]
  • a simple word: [happiness]
  • a simple morpheme: [happ]
  • a complex structure*: [[bring] [back]]
  • a non-motivated linguistic entity: [g]

* complex structures are used for infixation, as in “Bring him back”

ID
The unique identifier (primary-key) of the entry.
UW
The Universal Word of UNL. This field can be empty if a word does not need a UW.
ATTR
The list of features of the NLW. It can be:
  • a list of simple features: (NOU, MCL, SNG)
  • a list of attribute-value pairs: (pos=NOU, gen=MCL, num=SNG)
  • a list of attribute and transformation rules (see below): (plural:=”oo”:”ee”)

Attributes should be separated by “,”.

FLG
The two-character language code according to ISO 639-1.
FRE
The frequency of NLW in natural texts. Used for natural language analysis (NL-UNL). It can range from 0 (less frequent) to 255 (most frequent).
PRI
The priority of the NLW. Used for natural language generation (UNL-NL). It can range from 0 to 255.
COMMENT
Any comment necessary to clarify the mapping between NL and UNL entries. It should end with the return code.

Transformation rules for dictionary entries

In order to deal with exceptions, infixation and irregular forms, the following rules can be included inside dictionary entries:

In case of simple transformation:
<ATTRIBUTE>”:=”<SOURCE>”:”<TARGET>

In case of left appending:
<ATTRIBUTE>”:=”<LEFT DELETION>”<”<LEFT ADDITION>

In case of right appending:
<ATTRIBUTE>”:=”<RIGHT ADDITION>”>”<RIGHT DELETION>

Where:
<ATTRIBUTE> is the name of the attribute
<SOURCE> is the original form to be replaced (if empty, it means that the whole NLW should be replaced)
<TARGET> is the form to be used instead of the source (if empty, it means that the whole NLW should be deleted)
<LEFT DELETION> is the string or the number of characters from the beginning of the NLW to be deleted before the addition of the LEFT ADDITION
<RIGHT DELETION> is the string or the number of characters from the end of the NLW to be deleted before the addition of the RIGHT ADDITION
<LEFT ADDITION> is the string to be added to beginning of the NLW
<RIGHT ADDITION> is the string to be added to the end of the NLW

Software