Dictionary

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Formal syntax)
Line 65: Line 65:
 
  <ACTION LIST> ::= <ACTION>(”,”<ACTION>)*
 
  <ACTION LIST> ::= <ACTION>(”,”<ACTION>)*
 
  <ACTION> ::= <PREFIXATION>|<SUFFIXATION>|<REPLACEMENT>
 
  <ACTION> ::= <PREFIXATION>|<SUFFIXATION>|<REPLACEMENT>
  <PREFIXATION> ::= <number of chars>”<””<text>”””)|(”””<text>””<”<text>”””)
+
  <PREFIXATION> ::= <number>”<””<text>”””)|(”””<text>””<”<text>”””)
  <SUFIXATION> ::= (”””<text>””>”<number of chars>)| (”””<text>””>”<text>”””)
+
  <SUFIXATION> ::= (”””<text>””>”<number>)| (”””<text>””>”<text>”””)
  <REPLACEMENT> ::= (”””<text>””:””<text>”””)|(“[”<number of chars>”-”<number of chars>”]:””<text>”””)
+
  <REPLACEMENT> ::= (”””<text>””:””<text>”””)|(“[”<number>”-”<number>”]:””<text>”””)
 
+
<LANG> ::= [a..z][a..z]
Where:
+
<PRI> ::= [0..255]
+ = 1 or more times,
+
<FRE> ::= [0..255]
[] = optional, {} = 0 or more times)
+
Examples:
+
[Peter]{177}"Peter(iof>person)"(pos=N)<en,10,30>;
+
[kill]{5987}"kill(icl>do)"(pos=V,time(past:=0>"ed"))<en,70,80>;
+
[[bring][back]]{2345}"bring back"(pos=V,#01(pos=V,time(past:=3>"ought")),#02(pos=P))<en,50,34>;
+
 
+
  
 +
Where:<br />
 +
+ = 1 or more times<br />
 +
* = 0 or more times<br />
 +
| = optional<br />
  
 
== Complex structures as NLW* ==
 
== Complex structures as NLW* ==

Revision as of 12:53, 25 October 2009

The UNL-NL dictionaries are bilingual dictionaries linking UWs to natural language (NL) words. They can be unidirectional (UNL-to-NL or NL-to-UNL) or bidirectional (NL-to-UNL-to-NL). UNL-to-NL dictionaries are used for deconversion, while NL-to-UNL are used for enconversion. In what follows, we present the current specifications for UNL-NL dictionaries. They are not mandatory but are required from those interested in using UNL Centre's and UNDL Foundation's tools. The features marked with an * are only supported by UNDL Foundation's tools.

Contents

General syntax

In the UNL System, the UNL-NL dictionaries are plain text files with a single entry per line in the following format:

[NLW]  {ID}  “UW”  (ATTR , ... )  < LG , FRE , PRI >; COMMENTS

Where:

NLW
The lexical item of the natural language. Its format should be decided by the dictionary builder. It can be:
  • a multiword expression: [United States of America]
  • a compound: [hot-dog]
  • a simple word: [happiness]
  • a simple morpheme: [happ]
  • a non-motivated linguistic entity: [g]
  • a complex structure (see below)*: [[bring] [back]]
  • a regular expression*: [colou{0,1}r]
ID
The unique identifier (primary-key) of the entry.
UW
The Universal Word of UNL. This field can be empty if a word does not need a UW. It can also be a regular expression.
ATTR
The list of features of the NLW. It can be:
  • a list of simple features: NOU, MCL, SNG
  • a list of attribute-value pairs*: pos=NOU, gen=MCL, num=SNG
  • a list of inflection rules (see below)*: PLR:=”oo”:”ee”

Attributes should be separated by “,”.

FLG
The two-character language code according to ISO 639-1.
FRE
The frequency of NLW in natural texts. Used for natural language analysis (NL-UNL). It can range from 0 (less frequent) to 255 (most frequent).
PRI
The priority of the NLW. Used for natural language generation (UNL-NL). It can range from 0 to 255.
COMMENT
Any comment necessary to clarify the mapping between NL and UNL entries. It should end with the return code.

The features marked with * are not supported by the UNL Centre's tools

Formal syntax

<dictionary entry> ::= <NLW><ID><UW><FEATURE LIST>”<”<LANG>”,”<PRI>”,”<FRE>”>;” 
<NLW>::= “[”(<SIMPLE NLW>|<COMPOUND NLW>|<RESERVED NLW>)”]”
<SIMPLE NLW> ::= <text>
<COMPOUND NLW> ::= (“[”<text>”]”)+
<RESERVED NLW> ::= “RegEx”
<ID> ::= “{”<number>”}”
<UW> ::= “””<text>”””
<FEATURE LIST> ::= “(”<FEATURE> (”,”<FEATURE>)+”)”
<FEATURE> ::= (<VALUE>|<ATTRIBUTE>”=”<VALUE>|<RULE LIST>|”#”<SUBNLWID><FEATURE LIST>)
<SUBNLWID> ::= <number> 
<RULE LIST> ::= <RULE>(”;”<RULE>)*
<RULE> ::= <ATTRIBUTE>”(”<VALUE>”:=”<ACTION LIST>”)”
<ATTRIBUTE> ::= <text>
<VALUE> ::= <text>
<ACTION LIST> ::= <ACTION>(”,”<ACTION>)*
<ACTION> ::= <PREFIXATION>|<SUFFIXATION>|<REPLACEMENT>
<PREFIXATION> ::= <number>”<””<text>”””)|(”””<text>””<”<text>”””)
<SUFIXATION> ::= (”””<text>””>”<number>)| (”””<text>””>”<text>”””)
<REPLACEMENT> ::= (”””<text>””:””<text>”””)|(“[”<number>”-”<number>”]:””<text>”””)
<LANG> ::= [a..z][a..z]
<PRI> ::= [0..255]
<FRE> ::= [0..255]

Where:
+ = 1 or more times

  • = 0 or more times

| = optional

Complex structures as NLW*

In order to deal with multiple word expressions, the NLW can be represented as a complex structure comprising several sub-NLW entries. The syntax for complex NLWs is:

[[sub-NLW][sub-NLW]...[sub-NLW]]  {ID}  “UW”  (ATTR , ..., 1#(ATTR, ...), 2#(ATTR, ...), ...)  < LG , FRE , PRI >; COMMENTS

Where:
[sub-NLW] is a part of the NLW;
1#(ATTR, ...) are the specific features for the first sub-NLW to appear in the NLW;
2#(ATTR, ...) are the specific features for the second sub-NLW to appear in the NLW;
and so on.
The first sub-NLW to appear in a NLW will be always the #1, the second the #2, and so on.
The feature list preceded by <number># will apply only to the corresponding sub-NLW.
The features outside the sub-NLW feature lists are shared by all sub-NLWs.

Example
[[bring] [back]] {} "to bring back(icl>to bring)" (pos=VER, 01#(ET0:=4>ought), 02#(pos=PRE)) <en, 0, 0>;
In the entry above, the NLW has been split into two different sub-NLWs ([bring] and [back] with a blank space in between). Each of these sub-NLWs has different features, referred to in the embedded parentheses inside the feature list. The sub-NLW [bring], which was the first to appear, has the feature "ET0:=4>ought", while the sub-NLW [back], which was the second, has the feature "pos=PRE". The feature "pos=VER", which is outside the specific feature lists, is shared by both of them.

Inflection rules for dictionary entries*

In order to deal with exceptions and irregular forms, the following rules can be included inside dictionary entries (in the feature list field):

Replacement
<ATTRIBUTE>”:=”<SOURCE>”:”<TARGET> or
<ATTRIBUTE>”:=["<INTERVAL>"]:”<TARGET>
Example: plural:="oo":"ee" (it means that, in case of the feature "plural", the "oo" string will be replaced by "ee" in the NLW, as in foot>feet)
Example: plural:=[2-3]:"ee" (it means that, in case of the feature "plural", the string "ee" will replace the string that goes from the second to the third character)
Prefixation (left appending)
<ATTRIBUTE> ”:=” <LEFT ADDITION> ”<” <LEFT DELETION>
Example: not:="un"<0 (it means that, in case of the feature "not", the string "un" will be added to the left of the NLW, and nothing will be deleted, as in dress>undress)
Suffixation (right appending)
<ATTRIBUTE> ”:=” <RIGHT DELETION> ”>” <RIGHT ADDITION>
Example: plural:="y">"ies" (it means that, in case of the feature "plural", the rightmost "y" will be deleted and the "ies" string will be added to the right of the NLW, as in baby>babies)

Where:
<ATTRIBUTE> is the name of the attribute
<SOURCE> is the original form to be replaced (if empty, it means that the whole NLW should be replaced)
<TARGET> is the form to be used instead of the source (if empty, it means that the whole NLW should be deleted)
<LEFT DELETION> is the string or the number of characters from the beginning of the NLW to be deleted before the addition of the LEFT ADDITION
<RIGHT DELETION> is the string or the number of characters from the end of the NLW to be deleted before the addition of the RIGHT ADDITION
<LEFT ADDITION> is the string to be added to beginning of the NLW
<RIGHT ADDITION> is the string to be added to the end of the NLW

Examples of dictionary entries

[a]{} "" (pos=DFA) <en,0,0>;
[book]{} "book(icl>thing)" (pos=N) <en,0,0>;
[buy]{} "buy(icl>do)" (pos=DTV, PP=04) <en,0,0>;
[book] {} “book(icl>document)” (pos=NOU) <en,0,0>;
[foot] {} “foot(icl>vertebrate foot) (pos=NOU, pl:=”feet”) <en,0,0>;
[baby] {} “baby(icl>child) (pos=NOU, pl:=”y”>”ies”) <en,0,0>;
[baby] {} “baby(icl>child) (pos=NOU, pl:=1>”ies”) <en,0,0>;

Software