|
|
(16 intermediate revisions by one user not shown) |
Line 1: |
Line 1: |
| The '''UNL-NL dictionaries''' are bilingual dictionaries linking [[Universal Words|UWs]] to natural language (NL) words. They can be unidirectional (UNL-to-NL or NL-to-UNL) or bidirectional (NL-to-UNL-to-NL). UNL-to-NL dictionaries are used for [[deconversion]], while NL-to-UNL are used for [[enconversion]]. | | The '''UNL-NL dictionaries''' are bilingual dictionaries linking [[Universal Words|UWs]] to natural language (NL) words. They can be unidirectional (UNL-to-NL or NL-to-UNL) or bidirectional (NL-to-UNL-to-NL). UNL-to-NL dictionaries are used for [[deconversion]], while NL-to-UNL are used for [[enconversion]]. |
− |
| |
− | == Syntax ==
| |
− |
| |
− | In the [[UNL System]], the UNL-NL dictionaries are plain text files with a single entry per line in the following format:
| |
− |
| |
− | [NLW] {ID} “UW” (ATTR , ... ) < LG , FRE , PRI >; COMMENTS
| |
− |
| |
− | Where:
| |
− |
| |
− | ;NLW
| |
− | :The lexical item of the natural language. Its format should be decided by the dictionary builder. It can be:
| |
− | ::*a multiword expression: [United States of America]
| |
− | ::*a compound: [hot-dog]
| |
− | ::*a simple word: [happiness]
| |
− | ::*a simple morpheme: [happ]
| |
− | ::*a complex structure*: [[bring] [back]]
| |
− | ::*a non-motivated linguistic entity: [g]
| |
− | <nowiki>*</nowiki> complex structures are used for infixation, as in “Bring him back”
| |
− |
| |
− | ;ID
| |
− | :The unique identifier (primary-key) of the entry.
| |
− |
| |
− | ;UW
| |
− | :The Universal Word of UNL. This field can be empty if a word does not need a UW.
| |
− |
| |
− | ;ATTR
| |
− | :The list of features of the NLW. It can be:
| |
− | ::*a list of simple features: (NOU, MCL, SNG)
| |
− | ::*a list of attribute-value pairs: (pos=NOU, gen=MCL, num=SNG)
| |
− | ::*a list of attribute and transformation rules (see below): (plural:=”oo”:”ee”)
| |
− | Attributes should be separated by “,”.
| |
− |
| |
− | ;FLG
| |
− | :The two-character language code according to ISO 639-1.
| |
− |
| |
− | ;FRE
| |
− | :The frequency of NLW in natural texts. Used for natural language analysis (NL-UNL). It can range from 0 (less frequent) to 255 (most frequent).
| |
− |
| |
− | ;PRI
| |
− | :The priority of the NLW. Used for natural language generation (UNL-NL). It can range from 0 to 255.
| |
− |
| |
− | ;COMMENT
| |
− | :Any comment necessary to clarify the mapping between NL and UNL entries. It should end with the return code.
| |
− |
| |
− | == Transformation rules for dictionary entries ==
| |
− |
| |
− | In order to deal with exceptions, infixation and irregular forms, the following rules can be included inside dictionary entries:
| |
− |
| |
− | In case of simple transformation:<br >
| |
− | <ATTRIBUTE>”:=”<SOURCE>”:”<TARGET><br >
| |
− |
| |
− | In case of left appending:<br >
| |
− | <ATTRIBUTE>”:=”<LEFT DELETION>”<”<LEFT ADDITION><br >
| |
− |
| |
− | In case of right appending:<br >
| |
− | <ATTRIBUTE>”:=”<RIGHT ADDITION>”>”<RIGHT DELETION><br >
| |
− |
| |
− | Where:<br >
| |
− | <ATTRIBUTE> is the name of the attribute<br >
| |
− | <SOURCE> is the original form to be replaced (if empty, it means that the whole NLW should be replaced)<br >
| |
− | <TARGET> is the form to be used instead of the source (if empty, it means that the whole NLW should be deleted)<br >
| |
− | <LEFT DELETION> is the string or the number of characters from the beginning of the NLW to be deleted before the addition of the LEFT ADDITION<br >
| |
− | <RIGHT DELETION> is the string or the number of characters from the end of the NLW to be deleted before the addition of the RIGHT ADDITION<br >
| |
− | <LEFT ADDITION> is the string to be added to beginning of the NLW<br >
| |
− | <RIGHT ADDITION> is the string to be added to the end of the NLW<br >
| |