S-rule
From UNL Wiki
(Difference between revisions)
(→Examples) |
(→When to use S-rules) |
||
Line 8: | Line 8: | ||
*[[agreement]], i.e., concord between different parts of a phrase; | *[[agreement]], i.e., concord between different parts of a phrase; | ||
*[[distribution]], i.e., defining the precedence of word forms; | *[[distribution]], i.e., defining the precedence of word forms; | ||
− | *[[adjacency]], i.e., defining the distance between syntactic branches; | + | *[[adjacency]], i.e., defining the distance between syntactic branches; |
− | *[[projection]], i.e., projecting syntactic structures out of the constituents. | + | *[[projection]], i.e., projecting syntactic structures out of the constituents; and |
+ | *[[movement]], i.e., for moving nodes and branches from different places in the syntactic structure. | ||
== When not to use S-rules == | == When not to use S-rules == |
Revision as of 13:01, 26 March 2010
S-rule (syntactic rule) is the formalism used for describing syntactic structures and syntactic operations in the UNLarium framework.
Contents |
When to use S-rules
S-rules are used for:
- composition, i.e., creating compounds out of the base forms (such as "take">"take into account");
- periphrasis, i.e., generating analytic grammatical structures, such as in ("love">"will love")
- subcategorization, i.e., defining the number and the type of arguments of a given base form;
- case marking, i.e., defining the grammatical cases of the arguments of a given base form;
- agreement, i.e., concord between different parts of a phrase;
- distribution, i.e., defining the precedence of word forms;
- adjacency, i.e., defining the distance between syntactic branches;
- projection, i.e., projecting syntactic structures out of the constituents; and
- movement, i.e., for moving nodes and branches from different places in the syntactic structure.
When not to use S-rules
S-rules are not used for for affixation (prefixation, infixation, suffixation) or spelling changes, which must be addressed by A-rules and Ph-rules, respectively.
Types of S-rules
There are four types of S-rules:
- Change
<CONDITION> := <RELATION>;
- Change the attributes of the constituents of the relation. The relation itself is not affected. Features are added through "+" and deleted through "-".
- VA(%head;%adjt):=VA(%head,+C;%adj,-D); (add the feature C to the head and remove the feature D from the adjunct)
- Create
<CONDITION> := +<RELATION>;
- Create a new relation. Nodes to be created must be defined as strings (between quotes) or lemmas (between brackets), if not co-indexed to an existing node.
- VA(%head;%adjt):=+VC(%head;"c"); (add the relation VC between the head and "c", which is created.)
- Delete
<CONDITION> := -<RELATION>;
- Delete a relation between the head and the argument. The head and the argument are not deleted.
- VA(%head;%adjt):=-VA(%head;%adjt); (delete the relation VA between the head and its arguments. The nodes are not deleted)
- Replace
<RELATION> := <RELATION>;
- Replace the relation in the left side by the one in the right side
- VA(%head;%any):=VC(%head;%any); (replace the relation VA by VC)
- Two special cases of replacement are
- Merge
- <RELATION><RELATION> := <RELATION>;
- Replace the relations in the left side by the ones in the right side.
- VA(%head;%adjt)VC(%head;%comp):=VB(VB(%head;%adjt);%comp); (VA and VC are deleted, and VB is created)
- Divide
- <RELATION> := <RELATION><RELATION>;
- Replace the relation in the left side by those in the right side.
- VA(%head;%adjt):=VC(%head;%x)VC(%head;%y); (VC is deleted, and the two VAs are created)
Where:
- <CONDITION> (to be repeated 0 or more times) may be a tag or a <RELATION> that defines when the rule is applied. It may be empty in general cases (i.e., if the rule is always applied).
- <RELATION> (to be repeated 1 or more times) is a syntactic relation containing the <HEAD>, in case of head-only relations (VH, NH, JH, PH, IH, CH, AH, DH), or the <HEAD> and <ARGUMENT> (i.e, complement, adjunct or specifier), in case of binary relations (VA, VC, VS, VB, NA, NC, NS, etc).
- <HEAD> and <ARGUMENT> may be expressed as
- a "string" (strings come between parentheses);
- a [lemma] (lemmas come between square brackets);
- a feature or a set of features, separated by comma, and extracted from the UNDLF Tagset;
- an index;
- an action, to be performed by adding features (through "+"), deleting features (through "-"), or through the right side of an A-rule (i.e., prefixation, suffixation, infixation); or
- a <RELATION> itself (i.e., rules may be recursive).
Observations
- The <CONDITION> field may be empty in change, create and delete rules, in case of unconditional change, creation or deletion. It is obligatory in replace rules
-
- VA(+C); (add the feature C to all adjuncts to the head in the verbal phrase)
- +VA("a"); (add an adjunct "a" to the head of the verbal phrase, whatever the case)
- -VA(C); (delete all adjuncts to the head of the verbal phrase that have the feature C)
- The <HEAD> and the <ARGUMENT> may be empty in case of no change. Empty heads are automatically extended
- Binary relations (?A, ?S, ?C)
- VA; (no head nor argument: the relation is automatically extended to "VA(;);" )
- VA(); (same as above)
- VA(;); (same as above)
- VA("a"); (argument only: the relation is automatically extended to "VA(;"a");" )
- VA("a";); (head only)
- VA("a";"b"); (head and argument)
- Unary relations (?H)
- VH; (no head: the relation is automaticall extended to "VH();" )
- VH(); (same as above)
- VH("a"); (head)
- Relations are always juxtaposed (they must not be separated by ",")
- VS("b")VC("c")VA("d");
VS("b"),VC("c"),VA("d");- Order is not important between relations, but essential between constituents of the same relation
- VS("b")VC("c")VA("d") = VC("c")VA("d")VS("b") = VA("d")VC("c")VS("b")
- VA("a";"b"); ≠ VA("b";"a");
- Arguments of relations may be expressed by A-rules, but only in the right side of rules
- VA(0>"a"); (the verbal adjuncts, if any, receive an "a" as suffix)
- Rules are conservative. Features will be preserved unless explicitly deleted (through "-")
- VC(%comp,ACC):=VC(%comp,NOM); (is the same as "VC(%comp,ACC):=VC(%comp,+NOM);" i.e., add the feature "NOM" to the complements of verb that have the feature "ACC"; the feature "ACC" will be preserved and not replaced by "NOM")
- VC(%comp,%ACC):=VC(%comp,NOM,-ACC); (add the feature "NOM" and delete the feature "ACC" from the complements of the verb that have the feature "ACC")
- Features and strings should not be repeated in the right side except in case of deletion or change. Indexes may be repeated for clarity.
- VC(%comp,ACC):=VC(%comp,+NOM); (the feature "ACC" should not be repeated in the right side of the rule)
- VC(%comp,"a"):=VC(%comp,+NOM); (the string "a" should not be repeated in the right side of the rule)
- A node may have as many features as necessary, but one single string or lemma
- VC(%comp,"a"):=VC(%comp,"b"); ("a" is replaced by "b")
- Strings are represented between quotes if invariable, or between brackets if variable (lemmas)
- VA("into account"); (the string "into account" is an adjunct to the verb)
- IH([be]); (the lemma "be" is the head of inflectional phrase)
- Negation
- "^" is used for negation, and may be applied over features, strings or relations:
- VA(^NOU); (if the adjunct does not have the feature "NOU")
- VA(^"a"); (if the adjunct is not the string "a")
- ^VA("a"); (if there is no VA relation between the head and "a")
- S-rules always end in ";"
- VA("a");
VA("a")
Indexes
- Nodes are always indexed in S-rules
- Indexes (%) are used for indexing nodes, attributes and values inside and between the left (condition) and the right side of rules.
- X(%a;%b)Y(%a;%c); (the head of X is also the head of Y)
- Indexes as variables
- Indexes are features and may be used as variables
- X(%a;%b)Y(%a;%c):=Z(%b;%c); (if the head of the relation X is the head of the relation Y, delete X and Y and create Z between the arguments of X and Y)
- X(%a,A;%b,B):=X(%a;%b,+C,-B); (add the feature C to the argument of X and remove the feature B from it if the head of X has the feature A)
- If omitted, indexes are assigned by default, according to the position
-
- X(A;B)Y(C;D)Z(E;F); is the same as X(A,%01;B,%02)Y(C,%03;D,%04)Z(E,%05;F,%06);
- X(A;B):=X(;+C,-B); is the same as X(A,%01;B,%02):=X(%01;+C,-B,%02);
- X(A;B):=X(+C,-B); is the same as X(A,%01;B,%02):=X(%01;+C,-B,%02); (same as above: the relation is automatically extended if the head is empty)
- However
- X(A;B)Y(A;C):=Z(B;C); is different from X(%a;%b)Y(%a;%c):=Z(%b;%c);
- X(A;B)Y(A;C):=Z(B;C); is the same as X(A,%01;B,%02)Y(A,%03;C,%04):=Z(B,%01;C,%02); while
- X(A,%a;B,%b)Y(A,%a;C,%c):=Z(B,%b;C,%c); is the same as X(A,%01;B,%02)Y(A,%01;C,%04):=Z(B,%02;C,%04);
- In the first case, the feature B is added to the head of X and the feature C is added to its argument; the relation Y is deleted. In the second case, the feature C is added to the argument of Y, and Z is made between the arguments of X and Y>
- X(A;B)Y(A;C):=Z(B;C); is different from X(%a;%b)Y(%a;%c):=Z(%b;%c);
- If omitted, right side indexes are automatically co-indexed with the left side ones
-
- X(;):=Y(;); is the same as X(%01;%02):=Y(%01;%02);
- Right side indexes are to explicitly defined if order is to be altered
-
- X(;):=Y(%02;%01);
- Indexes can be replaced by user-defined labels made of any sequence of alphabetic characters and underscore
- X(A,%a;B,%b)Y(C,%c;D,%d)Z(E,%e;F,%f)
- %01 = A, %02 = B, %03 = C, %04 = D, %05 = E, %06 = F and
- %a = A, %b = B, %c = C, %d = D, %e = E, %f = F
- Numeric characters cannot be used as user-defined indexes
- X(A,%03;B,%05)
- %01 = A, %02 = B (there is no %03 nor %05)
- To avoid ambiguities, users are strongly recommended to replace default values by customized labels
-
- X(A,%a;B,%b)
- instead of simply X(A;B) or X(A,%01;B,%02)
- In case of sub-nodes, the parent node must be informed by the syntax <PARENT NODE><CHILD NODE>, where <PARENT NODE> may be, itself, a sub-node
- X(Y(A;B);C)
- %01 = Y(A;B), %02 = C, %01%01 = A, %01%02 = B
- X(Y(Z(A;B);C);D)
- %01 = Y(Z(A;B);C), %02 = D, %01%01 = Z(A;B), %01%02 = C, %01%01%01 = A, %01%01%02 = B
- Indexation is not affected by repetition
- X(A;B)Y(A;C)Z(A;D)
- %01 = A, %02 = B, %03 = A, %04 = C, %05 = A, %06 = D (and %01 = %03 = %05)
- Empty nodes are also indexed
- X(;)
- %01 = first node of X, %02 = second node of X
- Indexes may be used both in the left and in the right side of rules
- X(%a;%b):=Y(%b;%a); (the first node of the X relation becomes the second node of the Y relation)
- X(%a;)Y(%a;):=Z(%a); (if the first node of the X relation is the first node of the Y relation then make it the single node of a Z relation)
- Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE
- X(A,%a,ATT1=VAL1;B,%b):=X(%a;%b,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)
Examples
Examples of S-rules:
- composition
- VA("into account"); (add the string "into account" as the adjunct of the verb)
- subcategorization
- VC(PH("in")); (the complement of the verb is a prepositional phrase headed by the preposition "in")
- agreement
- VS(ANUM,APER); (the specifier of the verb assigns number (ANUM) and person (APER) to its head
- case marking
- VS(NOM); (the specifier of the verb receives the case nominative (NOM)
- distribution
- VA(>>); (the adjunct of the verb comes at the right side of the verb after a blank space)
- adjacency
- VA(AJ2); (the adjunct of the verb integrates the second projection of the head)
- periphrasis
- VH(%vh,FUT):=+IC([will];%vh,+INF);
- projection
- VS(%head;%spec)VB(%head;%comp):=VP(VB(%head;%comp);%spec); (integrate the two relations on the left side into a single relation)
Formal Syntax
S-rules comply with the following formal syntax:
<S-RULE> ::= <CONDITION> ":=" (<SYNTACTIC RELATION>)+";" <CONDITION> ::= <TAG>(","<TAG>)* | (<SYNTACTIC RELATION>)* <SYNTACTIC RELATION> ::= <HEAD-DRIVEN RELATION> "(" (<NODE>";")? <NODE> ")" <HEAD-DRIVEN RELATION> ::= {one of the head-driven syntactic relations defined in the UNDLF Tagset} <NODE> ::= <FEATURE>(","<FEATURE>)* <FEATURE> ::= <ID>|<TAG>|"""<STRING>"""|"["<STRING>"]"|<DIRECTION>|<SYNTACTIC RELATION>|<ACTION> <ID> ::= "%"[a-zA-Z_0-9]+ <TAG> ::= {one of the tags defined in the UNDLF Tagset} <STRING> ::= [a..Z]+ <DIRECTION> ::= ">"|">>"|"<"|"<<" <ACTION> ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> (cf. A-rule)
where
<a> = a is a non-terminal symbol
"a" = a is a constant
a | b = a or b
(a)? = a can be repeated 0 or one time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times