S-rule

From UNL Wiki
Revision as of 08:14, 26 March 2010 by Martins (Talk | contribs)
Jump to: navigation, search

S-rule (syntactic rule) is the formalism used for describing syntactic structures and syntactic operations in the UNLarium framework.

Contents

When to use S-rules

S-rules are used for:

  • composition, i.e., creating compounds out of the base forms (such as "take">"take into account");
  • periphrasis, i.e., generating analytic grammatical structures, such as in ("love">"will love")
  • subcategorization, i.e., defining the number and the type of arguments of a given base form;
  • case marking, i.e., defining the grammatical cases of the arguments of a given base form;
  • agreement, i.e., concord between different parts of a phrase;
  • distribution, i.e., defining the precedence of word forms;
  • adjacency, i.e., defining the distance between syntactic branches; and
  • projection, i.e., projecting syntactic structures out of the constituents.

When not to use S-rules

S-rules are not used for for affixation (prefixation, infixation, suffixation) or spelling changes, which must be addressed by A-rules and Ph-rules, respectively.

Types of S-rules

There are four types of S-rules:

Change
<CONDITION> := <RELATION>;
Change the attributes of the constituents of the relation. The relation itself is not affected. Features are added through "+" and deleted through "-".
  • VA(%head;%adjt):=VA(%head,+C;%adj,-D); (add the feature C to the head and remove the feature D from the adjunct)
Create
<CONDITION> := +<RELATION>;
Create a new relation. Nodes to be created must be defined as strings (between quotes) or lemmas (between brackets), if not co-indexed to an existing node.
  • VA(%head;%adjt):=+VC(%head;"c"); (add the relation VC between the head and "c", which is created.)
Delete
<CONDITION> := -<RELATION>;
Delete a relation between the head and the argument. The head and the argument are not deleted.
  • VA(%head;%adjt):=-VA(%head;%adjt); (delete the relation VA between the head and its arguments. The nodes are not deleted)
Replace
<RELATION> := <RELATION>;
Replace the relation in the left side by the one in the right side
  • VA(%head;%any):=VC(%head;%any); (replace the relation VA by VC)
Two special cases of replacement are
Merge
<RELATION><RELATION> := <RELATION>;
Replace the relations in the left side by the ones in the right side.
  • VA(%head;%adjt)VC(%head;%comp):=VB(VB(%head;%adjt);%comp); (VA and VC are deleted, and VB is created)
Divide
<RELATION> := <RELATION><RELATION>;
Replace the relation in the left side by those in the right side.
  • VA(%head;%adjt):=VC(%head;%x)VC(%head;%y); (VC is deleted, and the two VAs are created)

Where:

  • <CONDITION> (to be repeated 0 or more times) may be a tag or a <RELATION> that defines when the rule is applied. It may be empty in general cases (i.e., if the rule is always applied).
  • <RELATION> (to be repeated 1 or more times) is a syntactic relation containing the <HEAD>, in case of head-only relations (VH, NH, JH, PH, IH, CH, AH, DH), or the <HEAD> and <ARGUMENT> (i.e, complement, adjunct or specifier), in case of binary relations (VA, VC, VS, VB, NA, NC, NS, etc).
  • <HEAD> and <ARGUMENT> may be expressed as
    • a "string" (strings come between parentheses);
    • a [lemma] (lemmas come between square brackets);
    • a feature or a set of features, separated by comma, and extracted from the UNDLF Tagset;
    • an index;
    • an action, to be performed by adding features (through "+"), deleting features (through "-"), or through the right side of an A-rule (i.e., prefixation, suffixation, infixation); or
    • a <RELATION> itself (i.e., rules may be recursive).

Observations

The <CONDITION> field may be empty in change, create and delete rules, in case of unconditional change, creation or deletion. It is obligatory in replace rules
  • VA(+C); (add the feature C to all adjuncts to the head in the verbal phrase)
  • +VA("a"); (add an adjunct "a" to the head of the verbal phrase, whatever the case)
  • -VA(C); (delete all adjuncts to the head of the verbal phrase that have the feature C)
The <HEAD> and the <ARGUMENT> may be empty in case of no change
Binary relations (?A, ?S, ?C)
  • VA; (no head nor argument)
  • VA(); (same as above)
  • VA(;); (same as above)
  • VA("a"); (argument only)
  • VA("a";); (head only)
  • VA("a";"b"); (head and argument)
Unary relations (?H)
  • VH; (no head)
  • VH(); (no head)
  • VH("a"); (head)
Relations are always juxtaposed (they must not be separated by ",")
VS("b")VC("c")VA("d");
VS("b"),VC("c"),VA("d");
Order is not important between relations, but essential between constituents of the same relation
VS("b")VC("c")VA("d") = VC("c")VA("d")VS("b") = VA("d")VC("c")VS("b")
VA("a";"b"); VA("b";"a");
Arguments of relations may be expressed by A-rules, but only in the right side of rules
VA(0>"a"); (the verbal adjuncts, if any, receive an "a" as suffix)
Rules are conservative. Features will be preserved unless explicitly deleted (through "-")
VC(ACC):=VC(NOM); (is the same as "VC(ACC):=VC(+NOM);" i.e., add the feature "NOM" to the complements of verb that have the feature "ACC"; the feature "ACC" will be preserved and not replaced by "NOM")
VC(ACC):=VC(NOM,-ACC); (add the feature "NOM" and delete the feature "ACC" from the complements of the verb that have the feature "ACC")
Strings are represented between quotes if invariable, or between brackets if variable (lemmas)
VA("into account"); (add the invariable string "into account" as a verbal adjunct, take > take into account)
IH([be]); (add the lemma "be" to the head of inflectional phrase. The lemma may assume several forms: "am", "are", "is", and should be represented therefore between [brackets])
Negation
"^" is used for negation, and may be applied over features or relations:
  • VA(^NOU); (if the adjunct does not have the feature "NOU)
  • VA(^"a"); (if the adjunct is not the string "a")
  • ^VA("a"); (if there is no VA relation between the head and "a")

Indexes

Indexes (%) are used for indexing nodes, attributes and values inside and between the left (condition) and the right side of rules.
X(%a;)Y(%a;)
the first node of X is also the first node of Y
If omitted, indexes are assigned by default, according to the position
X(A;B)Y(C;D)Z(E;F)
%01 = A, %02 = B, %03 = C, %04 = D, %05 = E, %06 = F
The right side indexes are coindexed with the left side ones
X(%01;%02):=Y(%02;%01); (the first node of X is the second node of Y - index explicitly assigned)
X(;):=Y(;); (the first node of X is the first node of Y - indexes assigned by default)
Indexes can be replaced by user-defined labels made of any sequence of alphabetic characters and underscore
X(A,%a;B,%b)Y(C,%c;D,%d)Z(E,%e;F,%f)
%01 = A, %02 = B, %03 = C, %04 = D, %05 = E, %06 = F and
%a = A, %b = B, %c = C, %d = D, %e = E, %f = F
Numeric characters cannot be used as user-defined indexes
X(A,%03;B,%05)
%01 = A, %02 = B (there is no %03 nor %05)
To avoid ambiguities, users are strongly recommended to replace default values by customized labels
  • X(A,%a;B,%b)
instead of simply X(A;B) or X(A,%01;B,%02)
In case of sub-nodes, the parent node must be informed by the syntax <PARENT NODE><CHILD NODE>, where <PARENT NODE> may be, itself, a sub-node
X(Y(A;B);C)
%01 = Y(A;B), %02 = C, %01%01 = A, %01%02 = B
X(Y(Z(A;B);C);D)
%01 = Y(Z(A;B);C), %02 = D, %01%01 = Z(A;B), %01%02 = C, %01%01%01 = A, %01%01%02 = B
Indexation is not affected by repetition
X(A;B)Y(A;C)Z(A;D)
%01 = A, %02 = B, %03 = A, %04 = C, %05 = A, %06 = D (and %01 = %03 = %05)
Empty nodes are also indexed
X(;)
%01 = first node of X, %02 = second node of X
Indexes may be used both in the left and in the right side of rules
X(%a;%b):=Y(%b;%a); (the first node of the X relation becomes the second node of the Y relation)
X(%a;)Y(%a;):=Z(%a); (if the first node of the X relation is the first node of the Y relation then make it the single node of a Z relation)
Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE
X(A,%a,ATT1=VAL1;B,%b):=X(%a;%b,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)


Examples

Examples of S-rules:

  • composition
    • VA("into account"); (add the string "into account" as the adjunct of the verb)
  • subcategorization
    • VC(PH("in")); (the complement of the verb is a prepositional phrase headed by the preposition "in")
  • agreement
    • VS(ANUM,APER); (the specifier of the verb assigns number (ANUM) and person (APER) to its head
  • case marking
    • VS(NOM); (the specifier of the verb receives the case nominative (NOM)
  • distribution
    • VA(>>); (the adjunct of the verb comes at the right side of the verb after a blank space)
  • adjacency
    • VA(AJ2); (the adjunct of the verb integrates the second projection of the head)
  • projection
    • VS(%head;%spec)VB(%head;%comp):=VP(VB(%head;%comp);%spec); (integrate the two relations on the left side into a single relation)

Formal Syntax

S-rules comply with the following formal syntax:

<S-RULE>                ::= <CONDITION> ":=" (<SYNTACTIC RELATION>)+";"
<CONDITION>             ::= <TAG>(","<TAG>)* | (<SYNTACTIC RELATION>)*
<SYNTACTIC RELATION>    ::= <HEAD-DRIVEN RELATION> "(" (<NODE>";")? <NODE> ")"
<HEAD-DRIVEN RELATION>  ::= {one of the head-driven syntactic relations defined in the UNDLF Tagset} 
<NODE>                  ::= <FEATURE>(","<FEATURE>)* 
<FEATURE>               ::= <ID>|<TAG>|"""<STRING>"""|"["<STRING>"]"|<DIRECTION>|<SYNTACTIC RELATION>|<ACTION>
<ID>                    ::= "%"[a-zA-Z_0-9]+
<TAG>                   ::= {one of the tags defined in the UNDLF Tagset}
<STRING>                ::= [a..Z]+
<DIRECTION>             ::= ">"|">>"|"<"|"<<"
<ACTION>                ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> (cf. A-rule)

where
<a> = a is a non-terminal symbol
"a" = a is a constant
a | b = a or b
(a)? = a can be repeated 0 or one time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times

Software