L-rule

From UNL Wiki
Revision as of 14:24, 16 August 2013 by Martins (Talk | contribs)
Jump to: navigation, search

L-rule (linear rule) is the formalism used for applying transformations over ordered sequences of isolated nodes.

Contents

When to use L-rules

L-rules are used for:

  • reordering nodes in a list (a b c > a c b)
  • replacing nodes in a list (a b c > a x c)
  • adding nodes in a list (a b c > a x b c)
  • deleting nodes in a list (a b c > a c)

When not to use L-rules

L-rules are not used in transformations over structures other than lists (i.e., in trees and graphs)

Syntax

The general syntax for L-rules is the following:

(CONDITION) := (ACTION);

Where:

  • CONDITION is a single node or a sequence of nodes over which actions will take place; and
  • ACTION is the action to be performed over each node or sequence of nodes of the CONDITION.

Examples:

  • ("Mr."):=("Mister"); (replace "Mr." by "Mister")
  • ("I")(BLK)("am"):=("I'm"); (replace "I am" by "I'm")
  • ("a")(BLK)("/[aeiou].*/"):=("an")()(); (replace "a" by "an" before a blank space (BLK) and word beginning with "a", "e", "i", "o" or "u")
  • ("he")(BLK)("is"):=(%03)(%02)(%01); (reorder "he is" to "is he")

Types of L-rules

There are three types of L-rules:

  • replacement, when the number of parentheses in the CONDITION field is equal to the number of parentheses in the ACTION field:
  • addition, when the number of parentheses in the CONDITION field is lower than the number of parentheses in the ACTION field;
  • deletion, when the number of parentheses in the CONDITION field is greater than the number parentheses in the ACTION field.
Examples
RULE BEFORE > AFTER DESCRIPTION
("a")("b")("c"):=("d")("e")("f"); abc > def "a" will be replaced by "d"; "b" by "e"; and "c" by "f"
("a")("b")("c"):=("d")( )( ); abc > dbc "a" will be replaced by "d"; "b" and "c" will be preserved
("a")("b")("c"):=("d")("")(""); abc > d "a" will be replaced by "d"; "b" and "c" will be replaced by "" (i.e., blank)
("a")("b")("c"):=("d",%01)(%02); abc > db "a" will be replaced by "d"; "b" will be preserved; "c" will be deleted
("a")("b")("c"):=("d",%01); abc > d "a" will be replaced by "d"; "b" and "c" will be deleted
("a")("b")("c"):=(%03)(%02)(%01); abc > cba "a", "b" and "c" will be preserved, but reordered
("a")("b")("c"):=("d",%01)(%03); abc > dc "a" will be replaced by "d"; "b" will be deleted; "c" will be preserved
("a")("b")("c"):=("d",%01)("g")(%02)(%03); abc > dgc "a" will be replaced by "d"; "b" and "c" will be preserved; and a new node "g" will be created between "a" and "b"

Examples

Examples
RULE BEFORE > AFTER DESCRIPTION
("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( ); a adjective > an adjective replace the article (ART) "a" by "an" before a blank space (BLK) and a node starting with "a", "e", "i", "o" or "u"; preserve the second node (BLK) and the third node without any change
("a",PRE)(BLK)("a",ART):=("à",PRE,ART,CTC); a a > à replace the preposition (PRE) "a" + blank (BLK) + article (ART) "a" by "à"; add the features PRE (preposition), ART (article) and CTC (contraction) to the node "à"
("de",PRE)(BLK)("le",ART):=("du",PRE,ART,CTC); de le > du replace the preposition (PRE) "de" + blank (BLK) + article (ART) "le" by "du"; add the features PRE, ART and CTC to the node "du"
("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( ); a il > a-t-il replace the blank space (BLK) between the verb (VER) "a" and the pronoun (PPR) "il" by "-t-"; remove the feature BLK from the second form; preserve the first and the third form without any change
("de",PRE)(BLK)("/[aeiou].*/"):=("d'",%01)(%03); de avoir > d'avoir replace the preposition (PRE) "de" + blank space (BLK) + a node starting with "a", "e", "i", "o" or "u" by "d'"; delete the second form (BLK); and preserve the third form (%03) without any change

Observations

In L-Rules, nodes may have the following arguments
  • strings (between quotes or brackets): "de", "d'", [sense], etc.
  • features: PRE, BLK, VOW, etc.
  • indexes (preceded by "%"): %03, %head, etc.
  • regular expressions (between / /, only in the left side): /a[bcde]/, /a*/, etc.
  • A-rules (only in the right side): 0>"s", "a":"b"
Arguments may be combined (but strings, regular expressions and A-rules are mutually exclusive)
  • ("X",%x,X)("Y",%y,Y):=("Z",%x,-X,+A)(0>"s",%y,+B);
A node may contain one single string
  • ("a"):=("b");
  • ("a","b"):=("c");
Strings in the right side always replace strings in the left side
In the rule ("x"):=("y"); the string "x" is replaced by the string "y".
Strings are represented between "quotes" while lemmas are represented between [brackets].
The UNLarium distinguishes between strings (to be represented between "quotes") and lemmas (to be represented between [brackets]). The difference between strings and lemmas has to do with variance and the dictionary status: if the constituent is expected to figure as an entry in the dictionary (e.g., "in", "the", "after", "love", "sense", etc) or if may vary (e.g., if it may be inflected, or further composed by specification, adjunction or complementation), it must be represented between brackets; if it's a full phrase whose internal structure is not relevant, because invariant, it must come between quotes:
  • ("into account"); (the string "into account" does not vary: take > take into account, take into more account)
  • ([sense]); (the term "sense" may be further specified: make > make sense, make any sense, make no sense, etc).
Features are added through "+" and deleted through "-"
  • (X):=(+Y); (= add the feature Y to the node containing the feature X)
  • (X):=(-X); (= delete the feature X from any node containing the feature X)
L-rules are recursive: rules will apply while conditions are true
The rule "(BLK):=("-");" will transform "a b c d e" into "a-b-c-d-e" (and not only in "a-b c d e")
The rule "(X):=(+Y);" will never stop (i.e., it contains an infinite loop): the feature Y will keep been added eternally (X,Y,Y,Y,Y,Y,Y,Y,...)
The symbol ^ is used for negation and may be used to prevent infinite loops
  • (X,^Y):=(+Y); (= add the feature Y to a node containing the feature X that does not contain the feature Y yet)
  • (^".")(STAIL):=(%01)(".")(%02); (Add a period before the end of the sentence if there is not a period yet)
Rules are conservative. No feature is changed or deleted unless explicitly indicate through "-".
In the rule ("x",FEA):=("y"); the string "x" is replaced by the string "y", but the feature FEA is not altered (i.e.,the final state will be ("y",FEA));
The rule "("a",ART)(BLK)(VOW):=("an")( )( );" does not affect the status of the second and the third word forms, which continue to be BLK and VOW. On the other hand, the rule "("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( );" alters the status of the second form by deleting the feature BLK.
Indexes are used to control rules
  • (%a)(%b)(%c):=(%b); (delete the first and the third nodes, and keep the second)
  • (%a)(%b)(%c):=(%c)(%b)(%a); (reverse the order)
In the ACTION field, changes may be expressed by the right side of A-rules (i.e., by prefixation, infixation, suffixation or replacement) inside each form. The default is replacement.
The rule "("a",ART)(BLK)(VOW):=("an")( )( );" could also be expressed as "("a",ART)(BLK)(VOW):=(0>"n")( )( );", i.e., the change from "a" to "an" could be expressed either by "an" or 0>"n".
Rules apply only if all conditions are true.
The rule "("a")(BLK)(VOW):=("an")( )( );" will apply only in case of "a" before a blank and a vowel.
In order to enhance its power, conditions (but not actions) may be replaced by regular expressions between //.
("/a[bcd]e/"):=(""); (Delete the words "abe", "ace" and "ade")

Indexes

Nodes are always indexed in L-rules
Indexes (%) are used for indexing nodes, attributes and values between the left (condition) and the right side of rules.
  • (%a)(%b):=(%b)(%a); (change the order of the constituents)
If omitted, indexes are assigned by default, according to the position
  • (A)(B):=(C)(D); is the same as (A,%01)(B,%02):=(C,%01)(D,%02);
Indexes can be replaced by user-defined labels made of any sequence of alphabetic characters and underscore
(A,%a)(B,%b):=(C,%a)(D,%b);
Numeric characters cannot be used as user-defined indexes
(A,%03)(B,%05):=(C,%03)(D,%05);
%01 = A, %02 = B (there is no %03 nor %05)
Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE
(A,%a,ATT1=VAL1)(B,%b):=()(B,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)

Common mistakes

  • "Mr":="Mister";
    • Conditions and actions must always come between parentheses: ("Mr"):=("Mister");
  • (Mr):=(Mister);
    • Constants must come between quotes (inside the parentheses): ("Mr"):=("Mister");
  • ("Mr"):=("Mister")
    • Rules must end in semicolon: ("Mr"):=("Mister");
  • ("I am"):=("I'm");
    • Each separate word form must be isolated between parentheses and described as a different condition: ("I")(BLK)("am"):=("I'm");
  • ("a",ART)(BLK)(VOW):=("an");
    • "a adjective">"a": the blank and the following form are deleted because they are not present at the right side
  • ("de",PRE)(BLK)(VOW):=("d'")(VOW);
    • "de avoir">"d' ": coindexation is based on ordering and not on features. The third form is deleted because it's not present at the right side; the second form, which is BLK, receives the feature VOW;

Formal syntax

L-rules comply with the following formal syntax:

<L-RULE>          ::= ( "("<CONDITION>")" )+ ":=" ( "("<ACTION>")" )+ ";"
<CONDITION>        ::= """<STRING>""" ("," <TAGLIST> )* | "["<STRING>"]" ("," <TAGLIST> )* | <TAGLIST>
<ACTION>           ::= (<INDEX>)? ( <AFFIXATION> ("," <AFFIXATION>)* )* ( <ATT_CHANGE> ("," <ATT_CHANGE>)* )*
<AFFIXATION>       ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> (cf. A-rule)
<ATT_CHANGE>       ::= { "+" | "-" } <TAG> 
<TAGLIST>          ::= <INDEX> | (<INDEX> ",")? <TAG> ("," <TAG>)* 
<INDEX>            ::= "%"[01..99]
<TAG>              ::= {one of the tags defined in the UNDLF Tagset}
<STRING>           ::= [a-Z]+
<INTEGER>          ::= [0-9]+

where

<a> = a is a non-terminal symbol
“a“ = a is a constant
a | b = a or b
{ a | b } = either a or b
(a)? = a can occur 0 or 1 time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times

Software