L-rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(When not to use L-rules)
(Syntax)
Line 19: Line 19:
 
*("Mr."):=("Mister"); (replace "Mr." by "Mister")
 
*("Mr."):=("Mister"); (replace "Mr." by "Mister")
 
*("I")(BLK)("am"):=("I'm"); (replace "I am" by "I'm")
 
*("I")(BLK)("am"):=("I'm"); (replace "I am" by "I'm")
*("a")(BLK)("/[aeiou].*/"):=("an")()(); (replace "a" by "an" before a blank space (BLK) and word beginning with "a", "e", "i", "o" or "u")
+
*("a")(BLK)("/[aeiou].*/"):=("an")()(); (replace "a" by "an" before a blank space (BLK) and word beginning with "a", "e", "i", "o" or "u", such as a "a apple">"an apple")
 
*("he")(BLK)("is"):=(%03)(%02)(%01); (reorder "he is" to "is he")
 
*("he")(BLK)("is"):=(%03)(%02)(%01); (reorder "he is" to "is he")
 +
 
== Types of L-rules ==
 
== Types of L-rules ==
 
There are three types of L-rules:
 
There are three types of L-rules:

Revision as of 14:39, 16 August 2013

L-rule (linear rule) is the formalism used for applying transformations over ordered sequences of isolated nodes.

Contents

When to use L-rules

L-rules are used for:

  • reordering nodes in a list (a b c > a c b)
  • replacing nodes in a list (a b c > a x c)
  • adding nodes in a list (a b c > a x b c)
  • deleting nodes in a list (a b c > a c)

When not to use L-rules

L-rules are not used in transformations over structures other than lists (i.e., in trees and graphs). In these cases, we use S-rules (syntactic rules).

Syntax

The general syntax for L-rules is the following:

(CONDITION) := (ACTION);

Where:

  • CONDITION is a single node or a sequence of nodes over which actions will take place; and
  • ACTION is the action to be performed over each node or sequence of nodes of the CONDITION.

Examples:

  • ("Mr."):=("Mister"); (replace "Mr." by "Mister")
  • ("I")(BLK)("am"):=("I'm"); (replace "I am" by "I'm")
  • ("a")(BLK)("/[aeiou].*/"):=("an")()(); (replace "a" by "an" before a blank space (BLK) and word beginning with "a", "e", "i", "o" or "u", such as a "a apple">"an apple")
  • ("he")(BLK)("is"):=(%03)(%02)(%01); (reorder "he is" to "is he")

Types of L-rules

There are three types of L-rules:

  • replacement, when the number of parentheses in the CONDITION field is equal to the number of parentheses in the ACTION field:
  • addition, when the number of parentheses in the CONDITION field is lower than the number of parentheses in the ACTION field;
  • deletion, when the number of parentheses in the CONDITION field is greater than the number parentheses in the ACTION field.
Examples
RULE BEFORE > AFTER DESCRIPTION
("a")("b")("c"):=("d")("e")("f"); abc > def "a" will be replaced by "d"; "b" by "e"; and "c" by "f"
("a")("b")("c"):=("d")( )( ); abc > dbc "a" will be replaced by "d"; "b" and "c" will be preserved
("a")("b")("c"):=("d")("")(""); abc > d "a" will be replaced by "d"; "b" and "c" will be replaced by "" (i.e., blank)
("a")("b")("c"):=("d",%01)(%02); abc > db "a" will be replaced by "d"; "b" will be preserved; "c" will be deleted
("a")("b")("c"):=("d",%01); abc > d "a" will be replaced by "d"; "b" and "c" will be deleted
("a")("b")("c"):=(%03)(%02)(%01); abc > cba "a", "b" and "c" will be preserved, but reordered
("a")("b")("c"):=("d",%01)(%03); abc > dc "a" will be replaced by "d"; "b" will be deleted; "c" will be preserved
("a")("b")("c"):=("d",%01)("g")(%02)(%03); abc > dgc "a" will be replaced by "d"; "b" and "c" will be preserved; and a new node "g" will be created between "a" and "b"

Examples

Examples
RULE BEFORE > AFTER DESCRIPTION
("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( ); a adjective > an adjective replace the article (ART) "a" by "an" before a blank space (BLK) and a node starting with "a", "e", "i", "o" or "u"; preserve the second node (BLK) and the third node without any change
("a",PRE)(BLK)("a",ART):=("à",PRE,ART,CTC); a a > à replace the preposition (PRE) "a" + blank (BLK) + article (ART) "a" by "à"; add the features PRE (preposition), ART (article) and CTC (contraction) to the node "à"
("de",PRE)(BLK)("le",ART):=("du",PRE,ART,CTC); de le > du replace the preposition (PRE) "de" + blank (BLK) + article (ART) "le" by "du"; add the features PRE, ART and CTC to the node "du"
("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( ); a il > a-t-il replace the blank space (BLK) between the verb (VER) "a" and the pronoun (PPR) "il" by "-t-"; remove the feature BLK from the second form; preserve the first and the third form without any change
("de",PRE)(BLK)("/[aeiou].*/"):=("d'",%01)(%03); de avoir > d'avoir replace the preposition (PRE) "de" + blank space (BLK) + a node starting with "a", "e", "i", "o" or "u" by "d'"; delete the second form (BLK); and preserve the third form (%03) without any change

Observations

Strings in the right side always replace strings in the left side
In the rule ("x"):=("y"); the string "x" is replaced by the string "y".
L-rules are recursive: rules will apply while conditions are true
The rule "(BLK):=("-");" will transform "a b c d e" into "a-b-c-d-e" (and not only in "a-b c d e")
The rule "(X):=(+Y);" will never stop (i.e., it contains an infinite loop): the feature Y will keep been added eternally (X,Y,Y,Y,Y,Y,Y,Y,...)
The symbol ^ is used for negation and may be used to prevent infinite loops
  • (X,^Y):=(+Y); (= add the feature Y to a node containing the feature X that does not contain the feature Y yet)
  • (^".")(STAIL):=(%01)(".")(%02); (Add a period before the end of the sentence if there is not a period yet)
Rules are conservative. No feature is changed or deleted unless explicitly indicate through "-".
In the rule ("x",FEA):=("y"); the string "x" is replaced by the string "y", but the feature FEA is not altered (i.e.,the final state will be ("y",FEA));
The rule "("a",ART)(BLK)(VOW):=("an")( )( );" does not affect the status of the second and the third word forms, which continue to be BLK and VOW. On the other hand, the rule "("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( );" alters the status of the second form by deleting the feature BLK.
In the ACTION field, changes may be expressed by the right side of A-rules (i.e., by prefixation, infixation, suffixation or replacement) inside each form. The default is replacement.
The rule "("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( );" could also be expressed as "("a",ART)(BLK)("/[aeiou].*/"):=(0>"n")( )( );", i.e., the change from "a" to "an" could be expressed either by "an" or 0>"n".
Rules apply only if all conditions are true.
The rule "("a")(BLK)("/[aeiou].*/"):=("an")( )( );" will apply only in case of "a" before a blank and a vowel.
In order to enhance its power, conditions (but not actions) may be replaced by regular expressions between //.
("/a[bcd]e/"):=(""); (Delete the words "abe", "ace" and "ade")

Indexes

Indexes are used to associate nodes in the left side of the rule (CONDITION) to nodes in the right side of the rule (ACTION):

  • (%a)(%b)(%c):=(%b); (delete the first and the third nodes, and keep the second)
  • (%a)(%b)(%c):=(%c)(%b)(%a); (reverse the order)

Indexation is done automatically by the machine, as follows:

  • if the number of nodes is the same in the left and in the right side, NODES ARE CO-INDEXED
    ("a")("b")("c"):=("d")("e")("f"); is the same as ("a",%01)("b",%02)("c",%03):=("d",%01)("e",%02)("f",%03); (i.e., "a" will be replaced by "d", "b" by "e", and "c" by "f")
  • if the number of nodes is not the same in both sides, NODES ARE NOT CO-INDEXED
    ("a")("b")("c"):=("d")("e"); is the same as ("a",%01)("b",%02)("c",%03):=("d",%04)("e",%05); (i.e., "a", "b" and "c" will be deleted, and "d" and "e" will be created

In order to avoid ambiguities, it is highly recommended that indexes are replaced by user-defined labels made of any sequence of alphabetic characters and underscore:

(A,%a)(B,%b):=(C,%a)(D,%b);

Numeric characters cannot be used as user-defined indexes:

(A,%03)(B,%05):=(C,%03)(D,%05);

Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE:

(A,%a,ATT1=VAL1)(B,%b):=()(B,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)

Common mistakes

  • "Mr":="Mister";
    • Conditions and actions must always come between parentheses: ("Mr"):=("Mister");
  • (Mr):=(Mister);
    • Constants must come between quotes (inside the parentheses): ("Mr"):=("Mister");
  • ("Mr"):=("Mister")
    • Rules must end in semicolon: ("Mr"):=("Mister");
  • ("I am"):=("I'm");
    • Each separate word form must be isolated between parentheses and described as a different condition: ("I")(BLK)("am"):=("I'm");
  • ("a",ART)(BLK)(VOW):=("an");
    • "a adjective">"a": the blank and the following form are deleted because they are not present at the right side
  • ("de",PRE)(BLK)(VOW):=("d'")(VOW);
    • "de avoir">"d' ": coindexation is based on ordering and not on features. The third form is deleted because it's not present at the right side; the second form, which is BLK, receives the feature VOW;

Formal syntax

L-rules comply with the following formal syntax:

<L-RULE>          ::= ( "("<CONDITION>")" )+ ":=" ( "("<ACTION>")" )+ ";"
<CONDITION>        ::= """<STRING>""" ("," <TAGLIST> )* | "["<STRING>"]" ("," <TAGLIST> )* | <TAGLIST>
<ACTION>           ::= (<INDEX>)? ( <AFFIXATION> ("," <AFFIXATION>)* )* ( <ATT_CHANGE> ("," <ATT_CHANGE>)* )*
<AFFIXATION>       ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> (cf. A-rule)
<ATT_CHANGE>       ::= { "+" | "-" } <TAG> 
<TAGLIST>          ::= <INDEX> | (<INDEX> ",")? <TAG> ("," <TAG>)* 
<INDEX>            ::= "%"[01..99]
<TAG>              ::= {one of the tags defined in the UNDLF Tagset}
<STRING>           ::= [a-Z]+
<INTEGER>          ::= [0-9]+

where

<a> = a is a non-terminal symbol
“a“ = a is a constant
a | b = a or b
{ a | b } = either a or b
(a)? = a can occur 0 or 1 time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times

Software