A-rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Syntax)
(Examples)
Line 57: Line 57:
 
| '''y'''zabc
 
| '''y'''zabc
 
|-
 
|-
| X:=”y”<;
+
| X:=”y”<;<ref>This feature is not supported by the UNL<sup>dev</sup> and it is automatically replaced, in the UNL<sup>arium</sup>, by a "0".</ref>
 
| if X add the string “y” to the beginning of the string (idem previous)
 
| if X add the string “y” to the beginning of the string (idem previous)
 
| zabc
 
| zabc

Revision as of 12:50, 18 February 2010

A-rule (affixation rule) is the formalism used for generating affixes (prefixes, suffixes, infixes) in the UNLarium framework.

Contents

When to use a-rules

A-rules are used for prefixation, suffixation and infixation, i.e., for adding morphemes to a given base form. They are used for generating inflections (such as "book">"books", "love">"loved") or derivations (such as "dress">"undress", "write">"writer").

When not to use a-rules

A-rules are not to be used for composition (i.e., to form new words by combining or putting together old words), as in "give">"give in", "go">"have gone" or "man">"fireman"). This should be treated by c-rules.

Types of a-rules

There are two types of a-rules:

  • simple a-rules involve a single action (such as prefixation, suffixation, infixation and replacement); and
  • complex a-rules involve more than one action (such as circumfixation).

Simple a-rules

There are four types of simple a-rules:

  • prefixation, for adding morphemes at the beginning of a base form
  • suffixation, for adding morphemes at the end of a base form
  • infixation, for adding morphemes to the middle of the base form
  • replacement, for changing the base form

Syntax

The syntax for simple a-rules is the following:

prefixation

CONDITION := "ADDED" < DELETED;

suffixation

CONDITION := DELETED > "ADDED";

infixation

CONDITION := [REFERENCE] > "ADDED";
CONDITION := "ADDED" < [REFERENCE];

replacement

 CONDITION := DELETED : "ADDED";

Where:

  • CONDITION = tag (such as “PLR”, “FEM”, etc) or list of tags (“FEM&PLR”) that indicates when the rule should be applied
  • ADDED (between quotes) = the string to be added ;
  • REFERENCE (between square brackets) = the reference string (between quotes) or the position (without quotes) of the string to be added;
  • DELETED = the string (between quotes) or the number of characters (without quotes) to be deleted.

Examples

Prefixation
RULE BEHAVIOR BEFORE AFTER
X:=”y”<”z”; if X replace the string “z” by the string “y” in the beginning of the string zabc yabc
X:=”y”<1; if X replace the first character of the string by “y” zabc yabc
X:=”y”<0; if X add the string “y” to the beginning of the string zabc yzabc
X:=”y”<;[1] if X add the string “y” to the beginning of the string (idem previous) zabc yzabc
X:=”y”<<0; if X add the string “y” and a blank space to the beginning of the string zabc y zabc
X:=”y”<<; if X add the string “y” and a blank space to the beginning of the string (idem previous) zabc y zabc


Suffixation
RULE BEHAVIOR BEFORE AFTER
X:=”z”>”y”; if X replace the string “z” by the string “y” in the end of the string abcz abcy
X:=1>”y”; if X replace the last character of the string by “y” abcz abcy
X:=0>”y”; if X add the string “y” to the end of the string abcz abczy
X:=>”y”; if X add the string “y” to the end of the string (idem previous) abcz abczy
X:=0>>”y”; if X add a blank space and the string “y” to the end of the string abcz abcz y
X:=>>”y”; if X add a blank space and the string “y” to the end of the string (idem previous) abcz abcz y


Infixation
RULE BEHAVIOR BEFORE AFTER
X:=[2]>"y"; if X add "y" to the right of the second character abc abyc
X:="y"<[3]; if X add "y" to the left of the third character abc abyc
X:=["b"]>”y”; if X add "y" to the right of "b"; abc abyc
X:="y"<["c"]; if X add "y" to the left of "c" abc abyc


Replacement
RULE BEHAVIOR BEFORE AFTER
X:=”y”; if X replace the whole by “y” X y
X:=”z”:”y”; if X replace the string “z” by “y” azbc aybc
X:=[2-3]:”y”; if X replace the second to the third character by “z” abcz ayz

Observations

Rules will only be applied if all conditions are true
X:=”y”<”z”; ( “zabc” changes to “yabc”, but “abc” remains “abc” since there is no "z" to be replaced)
Each action is applied only once (i.e, rules are not exhaustive)
PLR:=0>”s”; ("X" becomes "Xs", and not "Xssssss...")
The replacement rule applies only once to the same string
X:=”a”:”b”; ( “aaa” becomes “baa” and not “bbb”)
In prefixation and suffixation rules, the part to be deleted may be represented by the number of characters (without quotes)
PLR := “X”<””; = PLR := “X”<0; (ABC becomes XABC)
PLR:= “X”<”A”; = PLR:= “X”<1; (ABC becomes XBC)
PLR:= “XY”<”AB”; = PLR:= “XY”<2; (ABC becomes XYC)
PLR:=””>”X”; = PLR:= 0>”X”; (ABC becomes ABCX)
PLR:=”C”>”X”; = PLR:= 1>”X”; (ABC becomes ABX)
PLR:=”BC”>”XY”; = PLR:= 2>”XY”; (ABC becomes AXY)
In infixation rules, the position of the addition may be made with reference to the end of string by using "-".
RULE BEHAVIOR BEFORE AFTER
X:=[1]>"y"; if X add "y" to the right of the first character abc aybc
X:=[-1]>"y"; if X add "y" to the right of the last character abc abyc
X:="y"<[2]; if X add "y" to the left of the second character abcde aybc
X:="y"<[-2]; if X add "y" to the left of the second character abcde abcyde
In replacement rules, the part to be deleted may be omitted if the whole string is to be replaced
PLR:=”ABC”:”XYZ”; = PLR:=”XYZ” (ABC becomes XYZ)
In replacement rules, the part to be deleted may be represented by an interval of characters in the format [beginning-end]
PLR:=”B”:”X”; = PLR:=[2-2]:”X”; (ABC becomes AXC)
The symbol “^” is used for negation (“^MCL” means “not MCL”)
NOU&^MCL:=”x”:”y”; (If NOU and not MCL then replace “x” by “y”)
“<<” and “>>” add blank spaces[2]
X:=”a”<<”b” (“bc” becomes “a bc” and not “abc”)

Common mistakes

  • nou:= ”y”<”z”; (WRONG: Tags are case sensitive)
  • NNN:= ”y”<”z”; (WRONG: NNN is not defined in the tagset)
  • NOUFEM:=”y”<”z”; (WRONG: Tags must be separated by “&”)
  • NOU,FEM:=”y”<”z”; (WRONG: Tags must be separated by “&”)
  • NOU & FEM:=”y”<”z”; (WRONG: There can be no blank spaces between tags)
  • X:=1<1; (WRONG: The left side must always be a string in a prefixation rule)
  • X:=1>1; (WRONG: The right side must always be a string in a suffixation rule)
  • X:=1; (WRONG: Replacement rules do not allow for numbers)
  • X:=1:1; (WRONG: Replacement rules do not allow for numbers)

Complex a-rules

Complex a-rules are formed from the combination of simple a-rules:

  • circumfixation (prefixation + suffixation), to add a prefix and a suffix at the same time
  • prefixation + infixation, to add a prefix and a suffix at the same time
  • infixation + suffixation, to add an infix and a suffix at the same time
  • prefixation + infixation + suffixation, to add a prefix, an infix and a suffix at the same time

Syntax

Complex a-rules are formed by concatenating simple a-rules with ",":

circumfixation

CONDITION := “ADDED” < DELETED , DELETED > "ADDED";

prefixation + infixation

CONDITION := “ADDED” < DELETED , DELETED > "ADDED";

infixation + suffixation

CONDITION := DELETED > "ADDED" , "DELETED" > "ADDED";

etc.

Examples

Complex m-rules
RULE BEHAVIOR BEFORE AFTER
X:=”x”<0, 0>"y"; if X add "x" to the beginning and "z" to the end of the string A xAy
X:=”x”<0, "A":"y"; if X add "x" to the beginning and replace "A" by "y" ABC xyBC
X:="A":"y", 0>"x"; if X replace "A" by "y" and add "x" to the end of the string ABC yBCx
X:=”x”<0, "A":"y", 0>"z"; if X add "x" to the beginning, replace "A" by "y" and add "z" to the end of the string ABC xyBCz

Observations

Complex a-rules are also used to integrate different simple a-rules
ORD:="1">"1st";
ORD:="2">"2nd";
ORD:="3">"3rd";
ORD:="1">"1st", "2">"2nd", "3">"3rd";
Actions are applied from left to right (i.e., order is important)
PLR := "s" > "ses", "y" > "ies"; (kiss > kisses, city > cities)
PLR := "y" > "ies", "s" > "ses"; (kiss > kisses, city>cities>citieses)

Formal syntax

A-rules comply with the following syntax:

<A-RULE>           ::= <CONDITION> “:=” <ACTION> ("," <ACTION>)* “;”
<CONDITION>        ::= <ATAG>(“&”(“^”)?<ATAG>)*
<ATAG>             ::= {one of the tags defined in the UNDLF Tagset}
<ACTION>           ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT>
<PREFIXATION>      ::= <ADDED>	 {“<” | “<<”} 	(<DELETED>)?
<SUFFIXATION>      ::= (<DELETED>)? {“>” | “>>”} 	<ADDED>
<INFIXATION>       ::= "["<DELETED"]" ">" <ADDED> | <ADDED> "<" "["<DELETED"]"
<REPLACEMENT>      ::= ( <STRING> ":" )? <ADDED> | "[" <INTEGER> "-" <INTEGER> "]" ":"  <ADDED>
<ADDED>            ::= <STRING> 
<DELETED>          ::= <STRING> | <INTEGER>  
<STRING>           ::= “ “ “ [a..Z]+ “ “ “
<INTEGER>          ::= [0..9]+

where

<a> = a is a non-terminal symbol
“a“ = a is a constant
a | b = a or b
{ a | b } = either a or b
(a)? = a can occur 0 or 1 time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times

Notes

  1. This feature is not supported by the UNLdev and it is automatically replaced, in the UNLarium, by a "0".
  2. This feature is not supported by the UNLdev and it is automatically replaced, in the UNLarium, by a blank space.
Software