S-rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Examples)
(When to use S-rules)
Line 8: Line 8:
 
*[[agreement]], i.e., concord between different parts of a phrase;
 
*[[agreement]], i.e., concord between different parts of a phrase;
 
*[[distribution]], i.e., defining the precedence of word forms;  
 
*[[distribution]], i.e., defining the precedence of word forms;  
*[[adjacency]], i.e., defining the distance between syntactic branches; and
+
*[[adjacency]], i.e., defining the distance between syntactic branches;  
*[[projection]], i.e., projecting syntactic structures out of the constituents.
+
*[[projection]], i.e., projecting syntactic structures out of the constituents; and
 +
*[[movement]], i.e., for moving nodes and branches from different places in the syntactic structure.
  
 
== When not to use S-rules ==
 
== When not to use S-rules ==

Revision as of 13:01, 26 March 2010

S-rule (syntactic rule) is the formalism used for describing syntactic structures and syntactic operations in the UNLarium framework.

Contents

When to use S-rules

S-rules are used for:

  • composition, i.e., creating compounds out of the base forms (such as "take">"take into account");
  • periphrasis, i.e., generating analytic grammatical structures, such as in ("love">"will love")
  • subcategorization, i.e., defining the number and the type of arguments of a given base form;
  • case marking, i.e., defining the grammatical cases of the arguments of a given base form;
  • agreement, i.e., concord between different parts of a phrase;
  • distribution, i.e., defining the precedence of word forms;
  • adjacency, i.e., defining the distance between syntactic branches;
  • projection, i.e., projecting syntactic structures out of the constituents; and
  • movement, i.e., for moving nodes and branches from different places in the syntactic structure.

When not to use S-rules

S-rules are not used for for affixation (prefixation, infixation, suffixation) or spelling changes, which must be addressed by A-rules and Ph-rules, respectively.

Types of S-rules

There are four types of S-rules:

Change
<CONDITION> := <RELATION>;
Change the attributes of the constituents of the relation. The relation itself is not affected. Features are added through "+" and deleted through "-".
  • VA(%head;%adjt):=VA(%head,+C;%adj,-D); (add the feature C to the head and remove the feature D from the adjunct)
Create
<CONDITION> := +<RELATION>;
Create a new relation. Nodes to be created must be defined as strings (between quotes) or lemmas (between brackets), if not co-indexed to an existing node.
  • VA(%head;%adjt):=+VC(%head;"c"); (add the relation VC between the head and "c", which is created.)
Delete
<CONDITION> := -<RELATION>;
Delete a relation between the head and the argument. The head and the argument are not deleted.
  • VA(%head;%adjt):=-VA(%head;%adjt); (delete the relation VA between the head and its arguments. The nodes are not deleted)
Replace
<RELATION> := <RELATION>;
Replace the relation in the left side by the one in the right side
  • VA(%head;%any):=VC(%head;%any); (replace the relation VA by VC)
Two special cases of replacement are
Merge
<RELATION><RELATION> := <RELATION>;
Replace the relations in the left side by the ones in the right side.
  • VA(%head;%adjt)VC(%head;%comp):=VB(VB(%head;%adjt);%comp); (VA and VC are deleted, and VB is created)
Divide
<RELATION> := <RELATION><RELATION>;
Replace the relation in the left side by those in the right side.
  • VA(%head;%adjt):=VC(%head;%x)VC(%head;%y); (VC is deleted, and the two VAs are created)

Where:

  • <CONDITION> (to be repeated 0 or more times) may be a tag or a <RELATION> that defines when the rule is applied. It may be empty in general cases (i.e., if the rule is always applied).
  • <RELATION> (to be repeated 1 or more times) is a syntactic relation containing the <HEAD>, in case of head-only relations (VH, NH, JH, PH, IH, CH, AH, DH), or the <HEAD> and <ARGUMENT> (i.e, complement, adjunct or specifier), in case of binary relations (VA, VC, VS, VB, NA, NC, NS, etc).
  • <HEAD> and <ARGUMENT> may be expressed as
    • a "string" (strings come between parentheses);
    • a [lemma] (lemmas come between square brackets);
    • a feature or a set of features, separated by comma, and extracted from the UNDLF Tagset;
    • an index;
    • an action, to be performed by adding features (through "+"), deleting features (through "-"), or through the right side of an A-rule (i.e., prefixation, suffixation, infixation); or
    • a <RELATION> itself (i.e., rules may be recursive).

Observations

The <CONDITION> field may be empty in change, create and delete rules, in case of unconditional change, creation or deletion. It is obligatory in replace rules
  • VA(+C); (add the feature C to all adjuncts to the head in the verbal phrase)
  • +VA("a"); (add an adjunct "a" to the head of the verbal phrase, whatever the case)
  • -VA(C); (delete all adjuncts to the head of the verbal phrase that have the feature C)
The <HEAD> and the <ARGUMENT> may be empty in case of no change. Empty heads are automatically extended
Binary relations (?A, ?S, ?C)
  • VA; (no head nor argument: the relation is automatically extended to "VA(;);" )
  • VA(); (same as above)
  • VA(;); (same as above)
  • VA("a"); (argument only: the relation is automatically extended to "VA(;"a");" )
  • VA("a";); (head only)
  • VA("a";"b"); (head and argument)
Unary relations (?H)
  • VH; (no head: the relation is automaticall extended to "VH();" )
  • VH(); (same as above)
  • VH("a"); (head)
Relations are always juxtaposed (they must not be separated by ",")
VS("b")VC("c")VA("d");
VS("b"),VC("c"),VA("d");
Order is not important between relations, but essential between constituents of the same relation
VS("b")VC("c")VA("d") = VC("c")VA("d")VS("b") = VA("d")VC("c")VS("b")
VA("a";"b"); VA("b";"a");
Arguments of relations may be expressed by A-rules, but only in the right side of rules
VA(0>"a"); (the verbal adjuncts, if any, receive an "a" as suffix)
Rules are conservative. Features will be preserved unless explicitly deleted (through "-")
VC(%comp,ACC):=VC(%comp,NOM); (is the same as "VC(%comp,ACC):=VC(%comp,+NOM);" i.e., add the feature "NOM" to the complements of verb that have the feature "ACC"; the feature "ACC" will be preserved and not replaced by "NOM")
VC(%comp,%ACC):=VC(%comp,NOM,-ACC); (add the feature "NOM" and delete the feature "ACC" from the complements of the verb that have the feature "ACC")
Features and strings should not be repeated in the right side except in case of deletion or change. Indexes may be repeated for clarity.
VC(%comp,ACC):=VC(%comp,+NOM); (the feature "ACC" should not be repeated in the right side of the rule)
VC(%comp,"a"):=VC(%comp,+NOM); (the string "a" should not be repeated in the right side of the rule)
A node may have as many features as necessary, but one single string or lemma
VC(%comp,"a"):=VC(%comp,"b"); ("a" is replaced by "b")
Strings are represented between quotes if invariable, or between brackets if variable (lemmas)
VA("into account"); (the string "into account" is an adjunct to the verb)
IH([be]); (the lemma "be" is the head of inflectional phrase)
Negation
"^" is used for negation, and may be applied over features, strings or relations:
  • VA(^NOU); (if the adjunct does not have the feature "NOU")
  • VA(^"a"); (if the adjunct is not the string "a")
  • ^VA("a"); (if there is no VA relation between the head and "a")
S-rules always end in ";"
  • VA("a");
  • VA("a")

Indexes

Nodes are always indexed in S-rules
Indexes (%) are used for indexing nodes, attributes and values inside and between the left (condition) and the right side of rules.
  • X(%a;%b)Y(%a;%c); (the head of X is also the head of Y)
Indexes as variables
Indexes are features and may be used as variables
  • X(%a;%b)Y(%a;%c):=Z(%b;%c); (if the head of the relation X is the head of the relation Y, delete X and Y and create Z between the arguments of X and Y)
  • X(%a,A;%b,B):=X(%a;%b,+C,-B); (add the feature C to the argument of X and remove the feature B from it if the head of X has the feature A)
If omitted, indexes are assigned by default, according to the position
  • X(A;B)Y(C;D)Z(E;F); is the same as X(A,%01;B,%02)Y(C,%03;D,%04)Z(E,%05;F,%06);
  • X(A;B):=X(;+C,-B); is the same as X(A,%01;B,%02):=X(%01;+C,-B,%02);
  • X(A;B):=X(+C,-B); is the same as X(A,%01;B,%02):=X(%01;+C,-B,%02); (same as above: the relation is automatically extended if the head is empty)
However
  • X(A;B)Y(A;C):=Z(B;C); is different from X(%a;%b)Y(%a;%c):=Z(%b;%c);
    • X(A;B)Y(A;C):=Z(B;C); is the same as X(A,%01;B,%02)Y(A,%03;C,%04):=Z(B,%01;C,%02); while
    • X(A,%a;B,%b)Y(A,%a;C,%c):=Z(B,%b;C,%c); is the same as X(A,%01;B,%02)Y(A,%01;C,%04):=Z(B,%02;C,%04);
In the first case, the feature B is added to the head of X and the feature C is added to its argument; the relation Y is deleted. In the second case, the feature C is added to the argument of Y, and Z is made between the arguments of X and Y>
If omitted, right side indexes are automatically co-indexed with the left side ones
  • X(;):=Y(;); is the same as X(%01;%02):=Y(%01;%02);
Right side indexes are to explicitly defined if order is to be altered
  • X(;):=Y(%02;%01);
Indexes can be replaced by user-defined labels made of any sequence of alphabetic characters and underscore
X(A,%a;B,%b)Y(C,%c;D,%d)Z(E,%e;F,%f)
%01 = A, %02 = B, %03 = C, %04 = D, %05 = E, %06 = F and
%a = A, %b = B, %c = C, %d = D, %e = E, %f = F
Numeric characters cannot be used as user-defined indexes
X(A,%03;B,%05)
%01 = A, %02 = B (there is no %03 nor %05)
To avoid ambiguities, users are strongly recommended to replace default values by customized labels
  • X(A,%a;B,%b)
instead of simply X(A;B) or X(A,%01;B,%02)
In case of sub-nodes, the parent node must be informed by the syntax <PARENT NODE><CHILD NODE>, where <PARENT NODE> may be, itself, a sub-node
X(Y(A;B);C)
%01 = Y(A;B), %02 = C, %01%01 = A, %01%02 = B
X(Y(Z(A;B);C);D)
%01 = Y(Z(A;B);C), %02 = D, %01%01 = Z(A;B), %01%02 = C, %01%01%01 = A, %01%01%02 = B
Indexation is not affected by repetition
X(A;B)Y(A;C)Z(A;D)
%01 = A, %02 = B, %03 = A, %04 = C, %05 = A, %06 = D (and %01 = %03 = %05)
Empty nodes are also indexed
X(;)
%01 = first node of X, %02 = second node of X
Indexes may be used both in the left and in the right side of rules
X(%a;%b):=Y(%b;%a); (the first node of the X relation becomes the second node of the Y relation)
X(%a;)Y(%a;):=Z(%a); (if the first node of the X relation is the first node of the Y relation then make it the single node of a Z relation)
Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE
X(A,%a,ATT1=VAL1;B,%b):=X(%a;%b,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)

Examples

Examples of S-rules:

  • composition
    • VA("into account"); (add the string "into account" as the adjunct of the verb)
  • subcategorization
    • VC(PH("in")); (the complement of the verb is a prepositional phrase headed by the preposition "in")
  • agreement
    • VS(ANUM,APER); (the specifier of the verb assigns number (ANUM) and person (APER) to its head
  • case marking
    • VS(NOM); (the specifier of the verb receives the case nominative (NOM)
  • distribution
    • VA(>>); (the adjunct of the verb comes at the right side of the verb after a blank space)
  • adjacency
    • VA(AJ2); (the adjunct of the verb integrates the second projection of the head)
  • periphrasis
    • VH(%vh,FUT):=+IC([will];%vh,+INF);
  • projection
    • VS(%head;%spec)VB(%head;%comp):=VP(VB(%head;%comp);%spec); (integrate the two relations on the left side into a single relation)

Formal Syntax

S-rules comply with the following formal syntax:

<S-RULE>                ::= <CONDITION> ":=" (<SYNTACTIC RELATION>)+";"
<CONDITION>             ::= <TAG>(","<TAG>)* | (<SYNTACTIC RELATION>)*
<SYNTACTIC RELATION>    ::= <HEAD-DRIVEN RELATION> "(" (<NODE>";")? <NODE> ")"
<HEAD-DRIVEN RELATION>  ::= {one of the head-driven syntactic relations defined in the UNDLF Tagset} 
<NODE>                  ::= <FEATURE>(","<FEATURE>)* 
<FEATURE>               ::= <ID>|<TAG>|"""<STRING>"""|"["<STRING>"]"|<DIRECTION>|<SYNTACTIC RELATION>|<ACTION>
<ID>                    ::= "%"[a-zA-Z_0-9]+
<TAG>                   ::= {one of the tags defined in the UNDLF Tagset}
<STRING>                ::= [a..Z]+
<DIRECTION>             ::= ">"|">>"|"<"|"<<"
<ACTION>                ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> (cf. A-rule)

where
<a> = a is a non-terminal symbol
"a" = a is a constant
a | b = a or b
(a)? = a can be repeated 0 or one time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times

Software