UNL-NL Dictionaries

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Replacing page with 'The '''UNL-NL dictionaries''' are bilingual dictionaries linking UWs to natural language (NL) words. They can be unidirectional (UNL-to-NL or NL-to-UNL) or ...')
 
Line 1: Line 1:
The '''UNL-NL dictionaries''' are bilingual dictionaries linking [[Universal Words|UWs]] to natural language (NL) words. They can be unidirectional (UNL-to-NL or NL-to-UNL) or bidirectional (NL-to-UNL-to-NL). UNL-to-NL dictionaries are used for [[deconversion]], while NL-to-UNL are used for [[enconversion]]. In what follows, we present the current specifications for UNL-NL dictionaries. They are not mandatory but are required from those interested in using [[UNL Centre]]'s and UNDL Foundation's tools. The features marked with an * are only supported by UNDL Foundation's tools.
+
The '''UNL-NL dictionaries''' are bilingual dictionaries linking [[Universal Words|UWs]] to natural language (NL) words. They can be unidirectional (UNL-to-NL or NL-to-UNL) or bidirectional (NL-to-UNL-to-NL). UNL-to-NL dictionaries are used for [[deconversion]], while NL-to-UNL are used for [[enconversion]].
 
+
== Syntax ==
+
 
+
In the [[UNL System]], the UNL-NL dictionaries are plain text files with a single entry per line in the following format:
+
 
+
[NLW]  {ID}  “UW”  (ATTR , ... )  < LG , FRE , PRI >; COMMENTS
+
 
+
Where:
+
 
+
;NLW
+
:The lexical item of the natural language. Its format should be decided by the dictionary builder. It can be:
+
::*a multiword expression: [United States of America]
+
::*a compound:  [hot-dog]
+
::*a simple word: [happiness]
+
::*a simple morpheme: [happ]
+
::*a complex structure (see below): [[bring] [back]]*
+
::*a non-motivated linguistic entity: [g]
+
 
+
;ID
+
:The unique identifier (primary-key) of the entry.
+
 
+
;UW
+
:The Universal Word of UNL. This field can be empty if a word does not need a UW.
+
 
+
;ATTR
+
:The list of features of the NLW. It can be:
+
::*a list of simple features: (NOU, MCL, SNG)
+
::*a list of attribute-value pairs: (pos=NOU, gen=MCL, num=SNG)*
+
::*a list of transformation rules (see below): (plural:=”oo”:”ee”)*
+
Attributes should be separated by “,”.
+
 
+
;FLG
+
:The two-character language code according to ISO 639-1.
+
+
;FRE
+
:The frequency of NLW in natural texts. Used for natural language analysis (NL-UNL). It can range from 0 (less frequent) to 255 (most frequent).
+
 
+
;PRI
+
:The priority of the NLW. Used for natural language generation (UNL-NL). It can range from 0 to 255.
+
 
+
;COMMENT
+
:Any comment necessary to clarify the mapping between NL and UNL entries. It should end with the return code.
+
 
+
The features marked with * are not supported by the UNL Centre's tools
+
== Complex structures as NLW* ==
+
 
+
In order to deal with '''infixation''', the NLW can be represented as a complex structure comprising several sub-NLW entries. The syntax for complex NLWs is:
+
 
+
[[sub-NLW][sub-NLW]...[sub-NLW]]  {ID}  “UW”  (ATTR , ..., 1#(ATTR, ...), 2#(ATTR, ...), ...)  < LG , FRE , PRI >; COMMENTS
+
 
+
Where:<br />
+
[sub-NLW] is a part of the NLW;<br />
+
1#(ATTR, ...) are the specific features for the first sub-NLW to appear in the NLW; <br />
+
2#(ATTR, ...) are the specific features for the second sub-NLW to appear in the NLW; <br />
+
and so on.<br />
+
The first sub-NLW to appear in a NLW will be always the #1, the second the #2, and so on. <br />
+
The feature list preceded by <number># will apply only to the corresponding sub-NLW.<br />
+
The features outside the sub-NLW feature lists are shared by all sub-NLWs.
+
 
+
:Example<br />
+
::[[bring] [back]] {} "to bring back(icl>to bring)" (pos=VER, 01#(past:=4>ought), 02#(pos:PRE)) <en, 0, 0>;<br />
+
:::In the entry above, the NLW has been split into two different sub-NLWs ([bring] and [back] with a blank space in between). Each of these sub-NLWs has different features, referred to in the embedded parentheses inside the feature list. The sub-NLW [bring], which was the first to appear, has the feature "past:=4>ought", while the sub-NLW [back], which was the second, has the feature "pos:PRE". The feature "pos=VER", which is outside the specific feature lists, is shared by both of them.
+
 
+
== Transformation rules for dictionary entries* ==
+
 
+
In order to deal with '''exceptions''' and '''irregular''' forms, the following rules can be included inside dictionary entries (in the feature list field):
+
 
+
;Replacement:
+
:<ATTRIBUTE>”:=”<SOURCE>”:”<TARGET>
+
::Example: plural:="oo":"ee" (it means that, in case of the feature "plural", the "oo" string will be replaced by "ee" in the NLW, as in foot>feet)
+
 
+
;Left appending:
+
:<ATTRIBUTE>”:=”<LEFT DELETION>”<”<LEFT ADDITION>
+
::Example: not:=<"un" (it means that, in case of the feature "not", the string "un" will be added to the left of the NLW, as in dress>undress)
+
 
+
;Right appending:
+
:<ATTRIBUTE>”:=”<RIGHT ADDITION>”>”<RIGHT DELETION>
+
::Example: plural:=y>ies (it means that, in case of the feature "plural", the rightmost "y" will be deleted and the "ies" string will be added to the right of the NLW, as in baby>babies)
+
 
+
Where:<br >
+
<ATTRIBUTE> is the name of the attribute<br >
+
<SOURCE> is the original form to be replaced (if empty, it means that the whole NLW should be replaced)<br >
+
<TARGET> is the form to be used instead of the source (if empty, it means that the whole NLW should be deleted)<br >
+
<LEFT DELETION> is the string or the number of characters from the beginning of the NLW to be deleted before the addition of the LEFT ADDITION<br >
+
<RIGHT DELETION> is the string or the number of characters from the end of the NLW to be deleted before the addition of the RIGHT ADDITION<br >
+
<LEFT ADDITION> is the string to be added to beginning of the NLW<br >
+
<RIGHT ADDITION> is the string to be added to the end of the NLW<br >
+
 
+
== Examples of dictionary entries ==
+
 
+
[a]{}  ""  (pos=DFA) <en,0,0>;<br>
+
[book]{} "book(icl>thing)"  (pos=N) <en,0,0>;<br>
+
[buy]{} "buy(icl>do)" (pos=DTV, PP=04) <en,0,0>;<br>
+
[book] {} “book(icl>document)” (pos=NOU) <en,0,0>;<br>
+
[foot] {} “foot(icl>vertebrate foot) (pos=NOU, pl:=”feet”) <en,0,0>;<br>
+
[baby] {} “baby(icl>child) (pos=NOU, pl:=”y”>”ies”) <en,0,0>;<br>
+
[baby] {} “baby(icl>child) (pos=NOU, pl:=1>”ies”) <en,0,0>;<br>
+

Latest revision as of 21:24, 17 April 2009

The UNL-NL dictionaries are bilingual dictionaries linking UWs to natural language (NL) words. They can be unidirectional (UNL-to-NL or NL-to-UNL) or bidirectional (NL-to-UNL-to-NL). UNL-to-NL dictionaries are used for deconversion, while NL-to-UNL are used for enconversion.

Software