UNL-NL Dictionaries
From UNL Wiki
The UNL-NL dictionaries are bilingual dictionaries linking UWs to natural language (NL) words. They can be unidirectional (UNL-to-NL or NL-to-UNL) or bidirectional (NL-to-UNL-to-NL). UNL-to-NL dictionaries are used for deconversion, while NL-to-UNL are used for enconversion.
Syntax
In the UNL System, the UNL-NL dictionaries are plain text files with a single entry per line in the following format:
[NLW] {ID} “UW” (ATTR , ... ) < LG , FRE , PRI >; COMMENTS
Where:
- NLW
- The lexical item of the natural language. Its format should be decided by the dictionary builder. It can be:
- a multiword expression: [United States of America]
- a compound: [hot-dog]
- a simple word: [happiness]
- a simple morpheme: [happ]
- a complex structure*: [[bring] [back]]
- a non-motivated linguistic entity: [g]
* complex structures are used for infixation, as in “Bring him back”
- ID
- The unique identifier (primary-key) of the entry.
- UW
- The Universal Word of UNL. This field can be empty if a word does not need a UW.
- ATTR
- The list of features of the NLW. It can be:
- a list of simple features: (NOU, MCL, SNG)
- a list of attribute-value pairs: (pos=NOU, gen=MCL, num=SNG)
- a list of attribute and transformation rules: (plural:=”oo”:”ee”)
Attributes should be separated by “,”.
- FLG
- The two-character language code according to ISO 639-1.
- FRE
- The frequency of NLW in natural texts. Used for natural language analysis (NL-UNL). It can range from 0 (less frequent) to 255 (most frequent).
- PRI
- The priority of the NLW. Used for natural language generation (UNL-NL). It can range from 0 to 255.
- COMMENT
- Any comment necessary to clarify the mapping between NL and UNL entries. It should end with the return code.