S-rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Observations)
(Observations)
Line 77: Line 77:
 
:VA(0>"a"); (the verbal adjuncts, if any, receive an "a" as suffix)
 
:VA(0>"a"); (the verbal adjuncts, if any, receive an "a" as suffix)
 
;Rules are conservative. Features will be preserved unless explicitly deleted (through "-")
 
;Rules are conservative. Features will be preserved unless explicitly deleted (through "-")
:VC(%adjt;ACC):=VC(%adjt;NOM); (is the same as "VC(%adjt;ACC):=VC(%adjt;+NOM);" i.e., add the feature "NOM" to the complements of verb that have the feature "ACC"; the feature "ACC" will be preserved and not replaced by "NOM")
+
:VC(%comp,ACC):=VC(%comp,NOM); (is the same as "VC(%comp,ACC):=VC(%comp,+NOM);" i.e., add the feature "NOM" to the complements of verb that have the feature "ACC"; the feature "ACC" will be preserved and not replaced by "NOM")
:VC(ACC):=VC(NOM,-ACC); (add the feature "NOM" and delete the feature "ACC" from the complements of the verb that have the feature "ACC")
+
:VC(%comp,%ACC):=VC(%comp,NOM,-ACC); (add the feature "NOM" and delete the feature "ACC" from the complements of the verb that have the feature "ACC")
 +
;A node may have as many features as necessary, but one single string or lemma
 +
:VC(%comp,"a"):=VC(%comp,"b"); ("a" is replaced by "b")
 
;Strings are represented between quotes if invariable, or between brackets if variable (lemmas)
 
;Strings are represented between quotes if invariable, or between brackets if variable (lemmas)
:VA("into account"); (add the invariable string "into account" as a verbal adjunct, take > take into account)
+
:VA("into account"); (the string "into account" is an adjunct to the verb)
:IH([be]); (add the lemma "be" to the head of inflectional phrase. The lemma may assume several forms: "am", "are", "is", and should be represented therefore between [brackets])
+
:IH([be]); (the lemma "be" is the head of inflectional phrase)
 
;Negation
 
;Negation
:"^" is used for negation, and may be applied over features or relations:
+
:"^" is used for negation, and may be applied over features, strings or relations:
:*VA(^NOU); (if the adjunct does not have the feature "NOU)
+
:*VA(^NOU); (if the adjunct does not have the feature "NOU")
 
:*VA(^"a"); (if the adjunct is not the string "a")
 
:*VA(^"a"); (if the adjunct is not the string "a")
 
:*^VA("a"); (if there is no VA relation between the head and "a")
 
:*^VA("a"); (if there is no VA relation between the head and "a")

Revision as of 08:19, 26 March 2010

S-rule (syntactic rule) is the formalism used for describing syntactic structures and syntactic operations in the UNLarium framework.

Contents

When to use S-rules

S-rules are used for:

  • composition, i.e., creating compounds out of the base forms (such as "take">"take into account");
  • periphrasis, i.e., generating analytic grammatical structures, such as in ("love">"will love")
  • subcategorization, i.e., defining the number and the type of arguments of a given base form;
  • case marking, i.e., defining the grammatical cases of the arguments of a given base form;
  • agreement, i.e., concord between different parts of a phrase;
  • distribution, i.e., defining the precedence of word forms;
  • adjacency, i.e., defining the distance between syntactic branches; and
  • projection, i.e., projecting syntactic structures out of the constituents.

When not to use S-rules

S-rules are not used for for affixation (prefixation, infixation, suffixation) or spelling changes, which must be addressed by A-rules and Ph-rules, respectively.

Types of S-rules

There are four types of S-rules:

Change
<CONDITION> := <RELATION>;
Change the attributes of the constituents of the relation. The relation itself is not affected. Features are added through "+" and deleted through "-".
  • VA(%head;%adjt):=VA(%head,+C;%adj,-D); (add the feature C to the head and remove the feature D from the adjunct)
Create
<CONDITION> := +<RELATION>;
Create a new relation. Nodes to be created must be defined as strings (between quotes) or lemmas (between brackets), if not co-indexed to an existing node.
  • VA(%head;%adjt):=+VC(%head;"c"); (add the relation VC between the head and "c", which is created.)
Delete
<CONDITION> := -<RELATION>;
Delete a relation between the head and the argument. The head and the argument are not deleted.
  • VA(%head;%adjt):=-VA(%head;%adjt); (delete the relation VA between the head and its arguments. The nodes are not deleted)
Replace
<RELATION> := <RELATION>;
Replace the relation in the left side by the one in the right side
  • VA(%head;%any):=VC(%head;%any); (replace the relation VA by VC)
Two special cases of replacement are
Merge
<RELATION><RELATION> := <RELATION>;
Replace the relations in the left side by the ones in the right side.
  • VA(%head;%adjt)VC(%head;%comp):=VB(VB(%head;%adjt);%comp); (VA and VC are deleted, and VB is created)
Divide
<RELATION> := <RELATION><RELATION>;
Replace the relation in the left side by those in the right side.
  • VA(%head;%adjt):=VC(%head;%x)VC(%head;%y); (VC is deleted, and the two VAs are created)

Where:

  • <CONDITION> (to be repeated 0 or more times) may be a tag or a <RELATION> that defines when the rule is applied. It may be empty in general cases (i.e., if the rule is always applied).
  • <RELATION> (to be repeated 1 or more times) is a syntactic relation containing the <HEAD>, in case of head-only relations (VH, NH, JH, PH, IH, CH, AH, DH), or the <HEAD> and <ARGUMENT> (i.e, complement, adjunct or specifier), in case of binary relations (VA, VC, VS, VB, NA, NC, NS, etc).
  • <HEAD> and <ARGUMENT> may be expressed as
    • a "string" (strings come between parentheses);
    • a [lemma] (lemmas come between square brackets);
    • a feature or a set of features, separated by comma, and extracted from the UNDLF Tagset;
    • an index;
    • an action, to be performed by adding features (through "+"), deleting features (through "-"), or through the right side of an A-rule (i.e., prefixation, suffixation, infixation); or
    • a <RELATION> itself (i.e., rules may be recursive).

Observations

The <CONDITION> field may be empty in change, create and delete rules, in case of unconditional change, creation or deletion. It is obligatory in replace rules
  • VA(+C); (add the feature C to all adjuncts to the head in the verbal phrase)
  • +VA("a"); (add an adjunct "a" to the head of the verbal phrase, whatever the case)
  • -VA(C); (delete all adjuncts to the head of the verbal phrase that have the feature C)
The <HEAD> and the <ARGUMENT> may be empty in case of no change
Binary relations (?A, ?S, ?C)
  • VA; (no head nor argument)
  • VA(); (same as above)
  • VA(;); (same as above)
  • VA("a"); (argument only)
  • VA("a";); (head only)
  • VA("a";"b"); (head and argument)
Unary relations (?H)
  • VH; (no head)
  • VH(); (no head)
  • VH("a"); (head)
Relations are always juxtaposed (they must not be separated by ",")
VS("b")VC("c")VA("d");
VS("b"),VC("c"),VA("d");
Order is not important between relations, but essential between constituents of the same relation
VS("b")VC("c")VA("d") = VC("c")VA("d")VS("b") = VA("d")VC("c")VS("b")
VA("a";"b"); VA("b";"a");
Arguments of relations may be expressed by A-rules, but only in the right side of rules
VA(0>"a"); (the verbal adjuncts, if any, receive an "a" as suffix)
Rules are conservative. Features will be preserved unless explicitly deleted (through "-")
VC(%comp,ACC):=VC(%comp,NOM); (is the same as "VC(%comp,ACC):=VC(%comp,+NOM);" i.e., add the feature "NOM" to the complements of verb that have the feature "ACC"; the feature "ACC" will be preserved and not replaced by "NOM")
VC(%comp,%ACC):=VC(%comp,NOM,-ACC); (add the feature "NOM" and delete the feature "ACC" from the complements of the verb that have the feature "ACC")
A node may have as many features as necessary, but one single string or lemma
VC(%comp,"a"):=VC(%comp,"b"); ("a" is replaced by "b")
Strings are represented between quotes if invariable, or between brackets if variable (lemmas)
VA("into account"); (the string "into account" is an adjunct to the verb)
IH([be]); (the lemma "be" is the head of inflectional phrase)
Negation
"^" is used for negation, and may be applied over features, strings or relations:
  • VA(^NOU); (if the adjunct does not have the feature "NOU")
  • VA(^"a"); (if the adjunct is not the string "a")
  • ^VA("a"); (if there is no VA relation between the head and "a")

Indexes

Indexes (%) are used for indexing nodes, attributes and values inside and between the left (condition) and the right side of rules.
X(%a;)Y(%a;)
the first node of X is also the first node of Y
If omitted, indexes are assigned by default, according to the position
X(A;B)Y(C;D)Z(E;F)
%01 = A, %02 = B, %03 = C, %04 = D, %05 = E, %06 = F
The right side indexes are coindexed with the left side ones
X(%01;%02):=Y(%02;%01); (the first node of X is the second node of Y - index explicitly assigned)
X(;):=Y(;); (the first node of X is the first node of Y - indexes assigned by default)
Indexes can be replaced by user-defined labels made of any sequence of alphabetic characters and underscore
X(A,%a;B,%b)Y(C,%c;D,%d)Z(E,%e;F,%f)
%01 = A, %02 = B, %03 = C, %04 = D, %05 = E, %06 = F and
%a = A, %b = B, %c = C, %d = D, %e = E, %f = F
Numeric characters cannot be used as user-defined indexes
X(A,%03;B,%05)
%01 = A, %02 = B (there is no %03 nor %05)
To avoid ambiguities, users are strongly recommended to replace default values by customized labels
  • X(A,%a;B,%b)
instead of simply X(A;B) or X(A,%01;B,%02)
In case of sub-nodes, the parent node must be informed by the syntax <PARENT NODE><CHILD NODE>, where <PARENT NODE> may be, itself, a sub-node
X(Y(A;B);C)
%01 = Y(A;B), %02 = C, %01%01 = A, %01%02 = B
X(Y(Z(A;B);C);D)
%01 = Y(Z(A;B);C), %02 = D, %01%01 = Z(A;B), %01%02 = C, %01%01%01 = A, %01%01%02 = B
Indexation is not affected by repetition
X(A;B)Y(A;C)Z(A;D)
%01 = A, %02 = B, %03 = A, %04 = C, %05 = A, %06 = D (and %01 = %03 = %05)
Empty nodes are also indexed
X(;)
%01 = first node of X, %02 = second node of X
Indexes may be used both in the left and in the right side of rules
X(%a;%b):=Y(%b;%a); (the first node of the X relation becomes the second node of the Y relation)
X(%a;)Y(%a;):=Z(%a); (if the first node of the X relation is the first node of the Y relation then make it the single node of a Z relation)
Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE
X(A,%a,ATT1=VAL1;B,%b):=X(%a;%b,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)


Examples

Examples of S-rules:

  • composition
    • VA("into account"); (add the string "into account" as the adjunct of the verb)
  • subcategorization
    • VC(PH("in")); (the complement of the verb is a prepositional phrase headed by the preposition "in")
  • agreement
    • VS(ANUM,APER); (the specifier of the verb assigns number (ANUM) and person (APER) to its head
  • case marking
    • VS(NOM); (the specifier of the verb receives the case nominative (NOM)
  • distribution
    • VA(>>); (the adjunct of the verb comes at the right side of the verb after a blank space)
  • adjacency
    • VA(AJ2); (the adjunct of the verb integrates the second projection of the head)
  • projection
    • VS(%head;%spec)VB(%head;%comp):=VP(VB(%head;%comp);%spec); (integrate the two relations on the left side into a single relation)

Formal Syntax

S-rules comply with the following formal syntax:

<S-RULE>                ::= <CONDITION> ":=" (<SYNTACTIC RELATION>)+";"
<CONDITION>             ::= <TAG>(","<TAG>)* | (<SYNTACTIC RELATION>)*
<SYNTACTIC RELATION>    ::= <HEAD-DRIVEN RELATION> "(" (<NODE>";")? <NODE> ")"
<HEAD-DRIVEN RELATION>  ::= {one of the head-driven syntactic relations defined in the UNDLF Tagset} 
<NODE>                  ::= <FEATURE>(","<FEATURE>)* 
<FEATURE>               ::= <ID>|<TAG>|"""<STRING>"""|"["<STRING>"]"|<DIRECTION>|<SYNTACTIC RELATION>|<ACTION>
<ID>                    ::= "%"[a-zA-Z_0-9]+
<TAG>                   ::= {one of the tags defined in the UNDLF Tagset}
<STRING>                ::= [a..Z]+
<DIRECTION>             ::= ">"|">>"|"<"|"<<"
<ACTION>                ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> (cf. A-rule)

where
<a> = a is a non-terminal symbol
"a" = a is a constant
a | b = a or b
(a)? = a can be repeated 0 or one time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times

Software