Language settings

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Grammatical redundancy)
(Translation)
 
(5 intermediate revisions by one user not shown)
Line 1: Line 1:
'''Language settings''' are used to define the general parameters of a given language, such as word order, sentence structure and other overall patterns. They can also be used to describe grammatical redundancy (and therefore to avoid proliferating rules) or to indicate how an absent (i.e., a non-grammaticalized) category should be translated.
+
'''Language settings''' are used to define the general parameters of a given language, such as word order, syntactic agreement and case marking. They can also be used to describe grammatical redundancy (and therefore to avoid proliferating rules) or to indicate how an absent (i.e., a non-grammaticalized) category should be translated.
  
 
== When to use language settings ==
 
== When to use language settings ==
Line 22: Line 22:
 
The same can be stated for the present progressive [[tense]], which is always formed by the periphrasis TO BE + GERUND. Instead of indicating this possibility inside the verb paradigms, we can simply create a general rule that would be applied in all cases.
 
The same can be stated for the present progressive [[tense]], which is always formed by the periphrasis TO BE + GERUND. Instead of indicating this possibility inside the verb paradigms, we can simply create a general rule that would be applied in all cases.
  
  ET1&PGS&1PS:="am"<<(+GER);  
+
  ET1&PGS&1PS:=IP("am":VP(GER));  
  ET1&PGS&2PS:="are"<<(+GER);
+
  ET1&PGS&2PS:=IP("are":VP(GER));
  ET1&PGS&3PS:="is"<<(+GER);  
+
  ET1&PGS&3PS:=IP("is":VP(GER));  
  ET1&PGS&1PP:="are"<<(+GER);
+
  ET1&PGS&1PP:=IP("are":VP(GER));
  ET1&PGS&2PP:="are"<<(+GER);  
+
  ET1&PGS&2PP:=IP("are":VP(GER));  
  ET1&PGS&3PP:="are"<<(+GER);  
+
  ET1&PGS&3PP:=IP("are":VP(GER));  
  
 
=== Grammatical redundancy ===
 
=== Grammatical redundancy ===
Line 45: Line 45:
 
The language settings may indicate that:
 
The language settings may indicate that:
  
  DUA:="a couple of"<<(+PLR); (if DUAL, the determiner "a couple of" should be generated at the left of the word, which would assume the value of PLURAL)  
+
  DUA:=NS(PLR;<,"a couple of"); (if DUAL, the string "a couple of" should be generated as the specifier of the noun phrase (NS), whose head would assume the value of PLURAL)  
  TRI=PLR; (if TRIAL, the word will assume the value of PLURAL)  
+
  TRI:=PLR; (if TRIAL, the word will assume the value of PLURAL)  
  QDR=PLR; (if QUADRUAL, the word will assume the value of PLURAL  
+
  QDR:=PLR; (if QUADRUAL, the word will assume the value of PLURAL
  
 
== Syntax ==  
 
== Syntax ==  
  
There can be two different types of language settings:
+
The syntax of language setting rules depends on the action to be performed:
;Identification rules,
+
* Morphological rules (i.e., those involving prefixation, suffixation or infixation) must comply with the '''[[M-rule]]''' formalism
:which are used for stating grammatical redundancies; and
+
* Syntactic rules (i.e., those involving the insertion of words) must comply with the '''[[S-Rule]]''' formalism
;Generation rules
+
:which are used for define generative rules.
+
 
+
=== Syntax of identification rules ===
+
 
+
The identification rules must comply with the following format:
+
 
+
<DICTIONARY ATTRIBUTE VALUES> "=" <DICTIONARY ATTRIBUTE VALUES>;
+
where
+
:<DICTIONARY ATTRIBUTE VALUES> = one of the dictionary tags extracted from the [[UNL Dictionary Tagset]]
+
 
+
=== Syntax of generation rules ====
+
 
+
Generation rules follow the general syntactic formalism of the UNL grammar:
+
+
<DICTIONARY ATTRIBUTE VALUES> ":=" <ACTION> ["," <ACTION>]* ";"
+
 
+
where
+
:<DICTIONARY ATTRIBUTE VALUES> = one of the dictionary tags extracted from the [[UNL Dictionary Tagset]]
+
:<ACTION> = the action to be performed over the lemma in order to generate the multiword expression
+
:“ “ = constant
+
:[ ] = optional
+
:<nowiki>*</nowiki> to be repeated 0 or more times
+
 
+
== Actions ==
+
There are three different types of action:
+
 
+
{| border="1" align="center" cellpadding="5"
+
!Type
+
!Syntax
+
|-
+
|right appending
+
|<RIGHT DELETION>”>”<RIGHT ADDITION>
+
|-
+
|left appending
+
|<LEFT ADDITION>”<”<LEFT DELETION>
+
|-
+
|replacement
+
|<SOURCE>”:”<TARGET>
+
|}
+
 
+
where
+
;<LEFT DELETION>
+
:the string or the number of characters from the beginning of the entry to be deleted before the addition of the LEFT ADDITION.  
+
;<LEFT ADDITION>
+
:the string to be added to the beginning of the entry along with its corresponding features
+
;<RIGHT DELETION>
+
:the string or the number of characters from the end of the entry to be deleted before the addition of the RIGHT ADDITION.
+
;<RIGHT ADDITION>
+
:the string to be added to the end of the entry along with its corresponding features
+
;<SOURCE>
+
:the string to be replaced (if empty, it means that the whole string will be replaced).
+
;<TARGET>
+
:the string to be used instead of the source (if empty, it means that the whole entry should be deleted)
+
 
+
=== Observations ===
+
: Strings must come between double quotes.
+
: <LEFT ADDITION> and <RIGHT ADDITION> must comme between parentheses.
+
: <LEFT ADDITION> and <RIGHT ADDITION> may have as many features as necessary, provided that they are separated by ",".
+
: Features must comply with the values defined in the [[UNL Dictionary Tagset]].
+
: <LEFT ADDITION> and <RIGHT ADDITION> may be split into several different nodes, each of which enclosed between parentheses.
+
: <LEFT DELETION> and <RIGHT DELETION> may be empty (or equal to 0) if nothing is to be deleted.
+
: <SOURCE> may also be the interval of characters to be replaced. In this case, the number of the beginning character and of the ending character should be informed between square brackets and should be separated with a semicolon.
+
: Blank spaces are not inserted automatically. They can be inserted either as a string (" ") or as a feature (BLK).
+
: [Square brackets] may be used to indicate optional elements: a[b]c = ac, abc
+
: {braces} may be used to indicate alternative elements: a{b,c}d = abd, acd
+
: Phrase types (NP, PP, VP, CP, AP, JP, SP) may be used to indicate embedded phrases in separable words or multiword expressions.
+

Latest revision as of 13:37, 2 November 2009

Language settings are used to define the general parameters of a given language, such as word order, syntactic agreement and case marking. They can also be used to describe grammatical redundancy (and therefore to avoid proliferating rules) or to indicate how an absent (i.e., a non-grammaticalized) category should be translated.

Contents

When to use language settings

Language settings should be defined in three cases:

  • to set phonotactic, morphotactic and syntactic parameters of the language;
  • to state grammatical redundancy and to avoid proliferating rules; and
  • to ensure cross-linguistic mapping.

General parameters

In English, the grammatical category of person is represented, in almost all cases, by the same morpheme (zero). Instead of repeating this information inside every verb paradigm, we can represent it as general language setting such as:

1PS:=0>""; (= if FIRST PERSON OF SINGULAR, then ADD NOTHING) 
2PS:=0>""; (= if SECOND PERSON OF SINGULAR, then ADD NOTHING) 
1PP:=0>""; (= if FIRST PERSON OF PLURAL, then ADD NOTHING) 
2PP:=0>""; (= if SECOND PERSON OF PLURAL, then ADD NOTHING) 
3PP:=0>""; (= if THIRD PERSON OF PLURAL, then ADD NOTHING) 

In this case, only the exceptions (such as the 3PS and the verb "to be") would be treated inside the verb paradigm.

The same can be stated for the present progressive tense, which is always formed by the periphrasis TO BE + GERUND. Instead of indicating this possibility inside the verb paradigms, we can simply create a general rule that would be applied in all cases.

ET1&PGS&1PS:=IP("am":VP(GER)); 
ET1&PGS&2PS:=IP("are":VP(GER));
ET1&PGS&3PS:=IP("is":VP(GER)); 
ET1&PGS&1PP:=IP("are":VP(GER));
ET1&PGS&2PP:=IP("are":VP(GER)); 
ET1&PGS&3PP:=IP("are":VP(GER)); 

Grammatical redundancy

In English, the grammatical category of mood is conflated, and there is no clear morphological distinction between the indicative, the subjunctive, the conditional and other possible values of the attribute. This information can be represented by a general rule:

SUB:=IND; The SUBJUNCTIVE is equal to the INDICATIVE
CON:=IND; The CONDITIONAL is equal to the INDICATIVE 
IMP:=IND; The IMPERATIVE is equal to the INDICATIVE

In this case, the indicative (IND) is the only form that needs to be defined in the verb paradigm.

Translation

In English, the grammatical category of number may have only two different values: singular or plural. In several other languages, however, the number may assume other values, such as dual, trial and quadrual. The English grammar must inform what to do in those cases in order to ensure full intertranslatability.

The language settings may indicate that:

DUA:=NS(PLR;<,"a couple of"); (if DUAL, the string "a couple of" should be generated as the specifier of the noun phrase (NS), whose head would assume the value of PLURAL) 
TRI:=PLR; (if TRIAL, the word will assume the value of PLURAL) 
QDR:=PLR; (if QUADRUAL, the word will assume the value of PLURAL

Syntax

The syntax of language setting rules depends on the action to be performed:

  • Morphological rules (i.e., those involving prefixation, suffixation or infixation) must comply with the M-rule formalism
  • Syntactic rules (i.e., those involving the insertion of words) must comply with the S-Rule formalism
Software