A-rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Syntax)
(Simple a-rules)
Line 10: Line 10:
  
 
== Simple a-rules ==
 
== Simple a-rules ==
There are three types of simple a-rules:
+
There are four types of simple a-rules:
 
*'''prefixation''', for adding morphemes at the beginning of a base form
 
*'''prefixation''', for adding morphemes at the beginning of a base form
 
*'''suffixation''', for adding morphemes at the end of a base form
 
*'''suffixation''', for adding morphemes at the end of a base form
*'''infixation''', for changing the internal structure of a base form
+
*'''infixation''', for adding morphemes to the middle of the base form
 +
*'''replacement''', for changing the base form
 
=== Syntax ===
 
=== Syntax ===
 
The syntax for simple a-rules is the following:
 
The syntax for simple a-rules is the following:
Line 231: Line 232:
 
*X:=1; (WRONG: Replacement rules do not allow for numbers)
 
*X:=1; (WRONG: Replacement rules do not allow for numbers)
 
*X:=1:1; (WRONG: Replacement rules do not allow for numbers)
 
*X:=1:1; (WRONG: Replacement rules do not allow for numbers)
 +
 
== Complex a-rules ==
 
== Complex a-rules ==
 
Complex a-rules are formed from the combination of simple a-rules:
 
Complex a-rules are formed from the combination of simple a-rules:

Revision as of 14:19, 3 February 2010

A-rule (affixation rule) is the formalism used for generating affixes (prefixes, suffixes, infixes) in the UNLarium framework.

Contents

When to use a-rules

A-rules are used for prefixation, suffixation and infixation, i.e., for adding morphemes to a given base form. They are used for generating inflections (such as "book">"books", "love">"loved") or derivations (such as "dress">"undress", "write">"writer").

When not to use a-rules

A-rules are not to be used for composition (i.e., to form new words by combining or putting together old words), as in "give">"give in", "go">"have gone" or "man">"fireman"). This should be treated by c-rules.

Types of a-rules

There are two types of a-rules:

  • simple a-rules involve a single action (such as prefixation, suffixation, infixation and replacement); and
  • complex a-rules involve more than one action (such as circumfixation).

Simple a-rules

There are four types of simple a-rules:

  • prefixation, for adding morphemes at the beginning of a base form
  • suffixation, for adding morphemes at the end of a base form
  • infixation, for adding morphemes to the middle of the base form
  • replacement, for changing the base form

Syntax

The syntax for simple a-rules is the following:

prefixation

CONDITION := "ADDED" < DELETED;

suffixation

CONDITION := DELETED > "ADDED";

infixation

CONDITION := [REFERENCE] > "ADDED";
CONDITION := "ADDED" < [REFERENCE];

replacement

 CONDITION := DELETED : "ADDED";

Where:

  • CONDITION = tag (such as “PLR”, “FEM”, etc) or list of tags (“FEM&PLR”) that indicates when the rule should be applied
  • ADDED (between quotes) = the string to be added ;
  • REFERENCE (between square brackets) = the reference string (between quotes) or the position (without quotes) of the string to be added;
  • DELETED = the string (between quotes) or the number of characters (without quotes) to be deleted.

Examples

Prefixation
RULE BEHAVIOR BEFORE AFTER
X:=”y”<”z”; if X replace the string “z” by the string “y” in the beginning of the string zabc yabc
X:=”y”<1; if X replace the first character of the string by “y” zabc yabc
X:=”y”<0; if X add the string “y” to the beginning of the string zabc yzabc
X:=”y”<; if X add the string “y” to the beginning of the string (idem previous) zabc yzabc
X:=”y”<<0; if X add the string “y” and a blank space to the beginning of the string zabc y zabc
X:=”y”<<; if X add the string “y” and a blank space to the beginning of the string (idem previous) zabc y zabc


Suffixation
RULE BEHAVIOR BEFORE AFTER
X:=”z”>”y”; if X replace the string “z” by the string “y” in the end of the string abcz abcy
X:=1>”y”; if X replace the last character of the string by “y” abcz abcy
X:=0>”y”; if X add the string “y” to the end of the string abcz abczy
X:=>”y”; if X add the string “y” to the end of the string (idem previous) abcz abczy
X:=0>>”y”; if X add a blank space and the string “y” to the end of the string abcz abcz y
X:=>>”y”; if X add a blank space and the string “y” to the end of the string (idem previous) abcz abcz y


Infixation
RULE BEHAVIOR BEFORE AFTER
X:=[2]>"y"; if X add "y" to the right of the second character abc abyc
X:="y"<[3]; if X add "y" to the left of the third character abc abyc
X:=["b"]>”y”; if X add "y" to the right of "b"; abc abyc
X:="y"<["c"]; if X add "y" to the left of "c" abc abyc


Replacement
RULE BEHAVIOR BEFORE AFTER
X:=”y”; if X replace the whole by “y” X y
X:=”z”:”y”; if X replace the string “z” by “y” azbc aybc
X:=[2-3]:”y”; if X replace the second to the third character by “z” abcz ayz

Observations

Rules will only be applied if all conditions are true
X:=”y”<”z”; ( “zabc” changes to “yabc”, but “abc” remains “abc” since there is no "z" to be replaced)
Each action is applied only once (i.e, rules are not exhaustive)
PLR:=0>”s”; ("X" becomes "Xs", and not "Xssssss...")
The replacement rule applies only once to the same string
X:=”a”:”b”; ( “aaa” becomes “baa” and not “bbb”)
In prefixation and suffixation rules, the part to be deleted may be represented by the number of characters (without quotes)
PLR := “X”<””; = PLR := “X”<0; (ABC becomes XABC)
PLR:= “X”<”A”; = PLR:= “X”<1; (ABC becomes XBC)
PLR:= “XY”<”AB”; = PLR:= “XY”<2; (ABC becomes XYC)
PLR:=””>”X”; = PLR:= 0>”X”; (ABC becomes ABCX)
PLR:=”C”>”X”; = PLR:= 1>”X”; (ABC becomes ABX)
PLR:=”BC”>”XY”; = PLR:= 2>”XY”; (ABC becomes AXY)
In replacement rules, the part to be deleted may be omitted if the whole string is to be replaced
PLR:=”ABC”:”XYZ”; = PLR:=”XYZ” (ABC becomes XYZ)
In replacement rules, the part to be deleted may be represented by an interval of characters in the format [beginning-end]
PLR:=”B”:”X”; = PLR:=[2-2]:”X”; (ABC becomes AXC)
The symbol “^” is used for negation (“^MCL” means “not MCL”)
NOU&^MCL:=”x”:”y”; (If NOU and not MCL then replace “x” by “y”)
“<<” and “>>” add blank spaces
X:=”a”<<”b” (“bc” becomes “a bc” and not “abc”)

Common mistakes

  • nou:= ”y”<”z”; (WRONG: Tags are case sensitive)
  • NNN:= ”y”<”z”; (WRONG: NNN is not defined in the tagset)
  • NOUFEM:=”y”<”z”; (WRONG: Tags must be separated by “&”)
  • NOU,FEM:=”y”<”z”; (WRONG: Tags must be separated by “&”)
  • NOU & FEM:=”y”<”z”; (WRONG: There can be no blank spaces between tags)
  • X:=1<1; (WRONG: The left side must always be a string in a prefixation rule)
  • X:=1>1; (WRONG: The right side must always be a string in a suffixation rule)
  • X:=1; (WRONG: Replacement rules do not allow for numbers)
  • X:=1:1; (WRONG: Replacement rules do not allow for numbers)

Complex a-rules

Complex a-rules are formed from the combination of simple a-rules:

  • circumfixation (prefixation + suffixation), to add a prefix and a suffix at the same time
  • prefixation + infixation, to add a prefix and a suffix at the same time
  • infixation + suffixation, to add an infix and a suffix at the same time
  • prefixation + infixation + suffixation, to add a prefix, an infix and a suffix at the same time

Syntax

Complex a-rules are formed by concatenating simple a-rules with ",":

circumfixation

CONDITION := “ADDED” < “DELETED” , "DELETED" > "ADDED";

prefixation + infixation

CONDITION := “ADDED” < “DELETED” , "DELETED" : "ADDED";

infixation + suffixation

CONDITION := "DELETED" : "ADDED" , "DELETED" > "ADDED";

prefixation + infixation + suffixation

CONDITION := “ADDED” < “DELETED” , "DELETED" : "ADDED" , "DELETED" > "ADDED";

Examples

Complex m-rules
RULE BEHAVIOR BEFORE AFTER
X:=”x”<0, 0>"y"; if X add "x" to the beginning and "z" to the end of the string A xAy
X:=”x”<0, "A":"y"; if X add "x" to the beginning and replace "A" by "y" ABC xyBC
X:="A":"y", 0>"x"; if X replace "A" by "y" and add "x" to the end of the string ABC yBCx
X:=”x”<0, "A":"y", 0>"z"; if X add "x" to the beginning, replace "A" by "y" and add "z" to the end of the string ABC xyBCz

Observations

Complex a-rules are also used to integrate different simple a-rules
ORD:="1">"1st";
ORD:="2">"2nd";
ORD:="3">"3rd";
ORD:="1">"1st", "2">"2nd", "3">"3rd";
Actions are applied from left to right (i.e., order is important)
PLR := "s" > "ses", "y" > "ies"; (kiss > kisses, city > cities)
PLR := "y" > "ies", "s" > "ses"; (kiss > kisses, city>cities>citieses)

Formal syntax

A-rules comply with the following syntax:

<A-RULE>           ::= <CONDITION> “:=” <ACTION> ("," <ACTION>)* “;”
<CONDITION>        ::= <ATAG>(“&”(“^”)?<ATAG>)*
<ATAG>             ::= {one of the tags defined in the UNDLF Tagset}
<ACTION>           ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION>
<PREFIXATION>      ::= <ADDED>	 {“<” | “<<”} 	(<DELETED>)?
<SUFFIXAITON>      ::= (<DELETED>)? {“>” | “>>”} 	<ADDED>
<INFIXATION>       ::= ( <STRING> ":" )? <ADDED> | "[" <INTEGER> "-" <INTEGER> "]" ":"  <ADDED>
<ADDED>            ::= <STRING> 
<DELETED>          ::= <STRING> | <INTEGER>  
<STRING>           ::= “ “ “ [a..Z]+ “ “ “
<INTEGER>          ::= [0..9]+

where

<a> = a is a non-terminal symbol
“a“ = a is a constant
a | b = a or b
{ a | b } = either a or b
(a)? = a can occur 0 or 1 time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times

Software