A-rule
(→When to use m-rules) |
|||
Line 1: | Line 1: | ||
− | ''' | + | '''A-rule''' (affixation rule) is the formalism used for generating affixes in the UNLarium framework. |
== Generative and enumerative lexica == | == Generative and enumerative lexica == | ||
Line 5: | Line 5: | ||
The UNLarium is mainly a generative environment, in the sense that word forms are expected to be represented by their corresponding LRUs and base forms, along with rules for generating their possible inflections. These are the '''m-rules''', to be provided either as LRU-specific (in case of irregular behaviour) or as inflectional paradigms (applying to several different LRUs). | The UNLarium is mainly a generative environment, in the sense that word forms are expected to be represented by their corresponding LRUs and base forms, along with rules for generating their possible inflections. These are the '''m-rules''', to be provided either as LRU-specific (in case of irregular behaviour) or as inflectional paradigms (applying to several different LRUs). | ||
− | == When to use | + | == When to use a-rules == |
− | + | A-rules are used for prefixation, suffixation and infixation, i.e., for morphological changes affecting a given base form. They are mainly used for generating '''inflections''' (such as "book">"books", "love">"loved") or '''derivations''' (such as "dress">"undress", "write">"writer", "give">"give in" . They may also be used for generating some internal phonetic changes due to morphological behaviour, such as metaphony ("foot">"feet"). | |
− | + | == When not to use a-rules == | |
− | == When not to use | + | A-rules are not to be used for phonetic changes occurring at word boundaries (such as "a" > "an" in "a book" > "an arm") or for syntactic changes (such as the insertion of new words in compound tenses: "go" > "have gone"). |
− | + | == Types of a-rules == | |
− | + | There are two types of a-rules: | |
− | == Types of | + | *'''simple a-rules''' involve a single action (such as prefixation, suffixation or infixation); and |
− | There are two types of | + | *'''complex a-rules''' involve more than one action (such as circumfixation). |
− | *'''simple | + | == Simple a-rules == |
− | *'''complex | + | There are three types of simple a-rules: |
− | + | *'''prefixation''', for adding morphemes at the beginning of a base form | |
− | == Simple | + | *'''suffixation''', for adding morphemes at the end of a base form |
− | There are three types of simple | + | *'''infixation''', for changing the internal structure of a base form |
− | *'''prefixation''', for adding | + | |
− | *'''suffixation''', for adding | + | |
− | *'''infixation''', for | + | |
=== Syntax === | === Syntax === | ||
− | The syntax for simple | + | The syntax for simple a-rules is the following: |
<br> | <br> | ||
<br> | <br> | ||
Line 35: | Line 32: | ||
*ADDED = the string to be added (between quotes); | *ADDED = the string to be added (between quotes); | ||
*DELETED = the string to be deleted (between quotes); | *DELETED = the string to be deleted (between quotes); | ||
− | |||
=== Examples === | === Examples === | ||
{|border="1" align="center" cellpadding="2" | {|border="1" align="center" cellpadding="2" | ||
Line 200: | Line 196: | ||
;“<<” and “>>” add blank spaces | ;“<<” and “>>” add blank spaces | ||
:X:=”a”<<”b” (“bc” becomes “a bc” and not “abc”) | :X:=”a”<<”b” (“bc” becomes “a bc” and not “abc”) | ||
− | |||
=== Common mistakes === | === Common mistakes === | ||
*nou:= ”y”<”z”; (WRONG: Tags are case sensitive) | *nou:= ”y”<”z”; (WRONG: Tags are case sensitive) | ||
Line 211: | Line 206: | ||
*X:=1; (WRONG: Replacement rules do not allow for numbers) | *X:=1; (WRONG: Replacement rules do not allow for numbers) | ||
*X:=1:1; (WRONG: Replacement rules do not allow for numbers) | *X:=1:1; (WRONG: Replacement rules do not allow for numbers) | ||
− | == Complex | + | == Complex a-rules == |
− | Complex | + | Complex a-rules are formed from the combination of simple a-rules: |
*circumfixation (prefixation + suffixation), to add a prefix and a suffix at the same time | *circumfixation (prefixation + suffixation), to add a prefix and a suffix at the same time | ||
*prefixation + infixation, to add a prefix and a suffix at the same time | *prefixation + infixation, to add a prefix and a suffix at the same time | ||
Line 218: | Line 213: | ||
*prefixation + infixation + suffixation, to add a prefix, an infix and a suffix at the same time | *prefixation + infixation + suffixation, to add a prefix, an infix and a suffix at the same time | ||
=== Syntax === | === Syntax === | ||
− | Complex | + | Complex a-rules are formed by concatenating simple a-rules with ",": |
<br> | <br> | ||
<br> | <br> | ||
Line 229: | Line 224: | ||
'''prefixation + infixation + suffixation''' | '''prefixation + infixation + suffixation''' | ||
CONDITION := “ADDED” < “DELETED” , "DELETED" : "ADDED" , "DELETED" > "ADDED"; | CONDITION := “ADDED” < “DELETED” , "DELETED" : "ADDED" , "DELETED" > "ADDED"; | ||
− | |||
=== Examples === | === Examples === | ||
{|border="1" align="center" cellpadding="2" | {|border="1" align="center" cellpadding="2" | ||
Line 258: | Line 252: | ||
| '''xy'''BCz | | '''xy'''BCz | ||
|} | |} | ||
− | |||
=== Observations === | === Observations === | ||
− | ;Complex | + | ;Complex a-rules are also used to integrate different simple a-rules: |
{|cellpadding=2 border=1 align=center | {|cellpadding=2 border=1 align=center | ||
|- | |- | ||
Line 269: | Line 262: | ||
:PLR := "s" > "ses", "y" > "ies"; (kiss > kisses, city > cities) | :PLR := "s" > "ses", "y" > "ies"; (kiss > kisses, city > cities) | ||
:PLR := "y" > "ies", "s" > "ses"; (kiss > kisses, city>cities>citieses) | :PLR := "y" > "ies", "s" > "ses"; (kiss > kisses, city>cities>citieses) | ||
+ | == Formal syntax for a-rules == | ||
+ | A-rules comply with the following syntax: | ||
− | + | <A-RULE> ::= <CONDITION> “:=” <ACTION> [, <ACTION>]* “;” | |
− | + | ||
− | + | ||
− | < | + | |
<CONDITION> ::= <ATAG>[“&”[“^”]<ATAG>]* | <CONDITION> ::= <ATAG>[“&”[“^”]<ATAG>]* | ||
<ATAG> ::= {one of the tags defined in the [[Tagset|UNDLF Tagset]]} | <ATAG> ::= {one of the tags defined in the [[Tagset|UNDLF Tagset]]} |
Revision as of 15:03, 24 January 2010
A-rule (affixation rule) is the formalism used for generating affixes in the UNLarium framework.
Contents |
Generative and enumerative lexica
The repertoire of lexemes of a given language can be organized in two basic ways: 1) as a simple listing of all word forms, i.e., of all variants of the same lexeme ("die", "dies", "died", "dying", etc); or 2) as a list of base forms accompanied by morphological rules for generating their inflections ("die", +s, +d, etc). The first architecture, the "enumerative" one, states that a word form can be more accurately retrieved as a single atomic entity instead of as a combination of several different morphemes. Its main advantages concern word matching (faster and more precise as there is no possibility of over-generation) and construction (it is easier and often less expensive to list the irregular forms instead of trying to define paradigms for them). Nevertheless, the latter architecture, i.e., the "generative" one, which relies on the principle that “the smaller the better”, is far much more common, as its main advantages concern access (the word retrieval process is supposed to be faster), storage (it requires a smaller amount of memory space) and maintenance (changes are automatically propagated to all instances of a given entry).
The UNLarium is mainly a generative environment, in the sense that word forms are expected to be represented by their corresponding LRUs and base forms, along with rules for generating their possible inflections. These are the m-rules, to be provided either as LRU-specific (in case of irregular behaviour) or as inflectional paradigms (applying to several different LRUs).
When to use a-rules
A-rules are used for prefixation, suffixation and infixation, i.e., for morphological changes affecting a given base form. They are mainly used for generating inflections (such as "book">"books", "love">"loved") or derivations (such as "dress">"undress", "write">"writer", "give">"give in" . They may also be used for generating some internal phonetic changes due to morphological behaviour, such as metaphony ("foot">"feet").
When not to use a-rules
A-rules are not to be used for phonetic changes occurring at word boundaries (such as "a" > "an" in "a book" > "an arm") or for syntactic changes (such as the insertion of new words in compound tenses: "go" > "have gone").
Types of a-rules
There are two types of a-rules:
- simple a-rules involve a single action (such as prefixation, suffixation or infixation); and
- complex a-rules involve more than one action (such as circumfixation).
Simple a-rules
There are three types of simple a-rules:
- prefixation, for adding morphemes at the beginning of a base form
- suffixation, for adding morphemes at the end of a base form
- infixation, for changing the internal structure of a base form
Syntax
The syntax for simple a-rules is the following:
prefixation
CONDITION := “ADDED” < “DELETED”;
suffixation
CONDITION := “DELETED” > “ADDED”;
infixation
CONDITION := “DELETED” : “ADDED”;
Where:
- CONDITION = tag (such as “PLR”, “FEM”, etc) or list of tags (“FEM&PLR”) that indicates when the rule should be applied
- ADDED = the string to be added (between quotes);
- DELETED = the string to be deleted (between quotes);
Examples
RULE | BEHAVIOR | BEFORE | AFTER |
---|---|---|---|
X:=”y”<”z”; | if X replace the string “z” by the string “y” in the beginning of the string | zabc | yabc |
X:=”y”<1; | if X replace the first character of the string by “y” | zabc | yabc |
X:=”y”<0; | if X add the string “y” to the beginning of the string | zabc | yzabc |
X:=”y”<; | if X add the string “y” to the beginning of the string (idem previous) | zabc | yzabc |
X:=”y”<<0; | if X add the string “y” and a blank space to the beginning of the string | zabc | y zabc |
X:=”y”<<; | if X add the string “y” and a blank space to the beginning of the string (idem previous) | zabc | y zabc |
RULE | BEHAVIOR | BEFORE | AFTER |
---|---|---|---|
X:=”z”>”y”; | if X replace the string “z” by the string “y” in the end of the string | abcz | abcy |
X:=1>”y”; | if X replace the last character of the string by “y” | abcz | abcy |
X:=0>”y”; | if X add the string “y” to the end of the string | abcz | abczy |
X:=>”y”; | if X add the string “y” to the end of the string (idem previous) | abcz | abczy |
X:=0>>”y”; | if X add a blank space and the string “y” to the end of the string | abcz | abcz y |
X:=>>”y”; | if X add a blank space and the string “y” to the end of the string (idem previous) | abcz | abcz y |
RULE | BEHAVIOR | BEFORE | AFTER |
---|---|---|---|
X:=”y”; | if X replace the whole by “y” | X | y |
X:=”z”:”y”; | if X replace the string “z” by “y” | azbc | aybc |
X:=[2;3]:”y”; | if X replace the second to the third character by “z” | abcz | ayz |
X:=Y; | replace the feature X by the feature Y | X | Y |
Observations
- Rules will only be applied if all conditions are true
- X:=”y”<”z”; ( “zabc” changes to “yabc”, but “abc” remains “abc” since there is no "z" to be replaced)
- Each action is applied only once (i.e, rules are not exhaustive)
- PLR:=0>”s”; ("X" becomes "Xs", and not "Xssssss...")
- The replacement rule applies only once to the same string
- X:=”a”:”b”; ( “aaa” becomes “baa” and not “bbb”)
- In prefixation and suffixation rules, the part to be deleted may be represented by the number of characters (without quotes)
PLR := “X”<””; | = | PLR := “X”<0; | (ABC becomes XABC) |
PLR:= “X”<”A”; | = | PLR:= “X”<1; | (ABC becomes XBC) |
PLR:= “XY”<”AB”; | = | PLR:= “XY”<2; | (ABC becomes XYC) |
PLR:=””>”X”; | = | PLR:= 0>”X”; | (ABC becomes ABCX) |
PLR:=”C”>”X”; | = | PLR:= 1>”X”; | (ABC becomes ABX) |
PLR:=”BC”>”XY”; | = | PLR:= 2>”XY”; | (ABC becomes AXY) |
- In replacement rules, the part to be deleted may be omitted if the whole string is to be replaced
PLR:=”ABC”:”XYZ”; | = | PLR:=”XYZ” | (ABC becomes XYZ) |
- In replacement rules, the part to be deleted may be represented by an interval of characters in the format [beginning-end]
PLR:=”B”:”X”; | = | PLR:=[2-2]:”X”; | (ABC becomes AXC) |
- The symbol “^” is used for negation (“^MCL” means “not MCL”)
- NOU&^MCL:=”x”:”y”; (If NOU and not MCL then replace “x” by “y”)
- “<<” and “>>” add blank spaces
- X:=”a”<<”b” (“bc” becomes “a bc” and not “abc”)
Common mistakes
- nou:= ”y”<”z”; (WRONG: Tags are case sensitive)
- NNN:= ”y”<”z”; (WRONG: NNN is not defined in the tagset)
- NOUFEM:=”y”<”z”; (WRONG: Tags must be separated by “&”)
- NOU,FEM:=”y”<”z”; (WRONG: Tags must be separated by “&”)
- NOU & FEM:=”y”<”z”; (WRONG: There can be no blank spaces between tags)
- X:=1<1; (WRONG: The left side must always be a string in a prefixation rule)
- X:=1>1; (WRONG: The right side must always be a string in a suffixation rule)
- X:=1; (WRONG: Replacement rules do not allow for numbers)
- X:=1:1; (WRONG: Replacement rules do not allow for numbers)
Complex a-rules
Complex a-rules are formed from the combination of simple a-rules:
- circumfixation (prefixation + suffixation), to add a prefix and a suffix at the same time
- prefixation + infixation, to add a prefix and a suffix at the same time
- infixation + suffixation, to add an infix and a suffix at the same time
- prefixation + infixation + suffixation, to add a prefix, an infix and a suffix at the same time
Syntax
Complex a-rules are formed by concatenating simple a-rules with ",":
circumfixation
CONDITION := “ADDED” < “DELETED” , "DELETED" > "ADDED";
prefixation + infixation
CONDITION := “ADDED” < “DELETED” , "DELETED" : "ADDED";
infixation + suffixation
CONDITION := "DELETED" : "ADDED" , "DELETED" > "ADDED";
prefixation + infixation + suffixation
CONDITION := “ADDED” < “DELETED” , "DELETED" : "ADDED" , "DELETED" > "ADDED";
Examples
RULE | BEHAVIOR | BEFORE | AFTER |
---|---|---|---|
X:=”x”<0, 0>"y"; | if X add "x" to the beginning and "z" to the end of the string | A | xAy |
X:=”x”<0, "A":"y"; | if X add "x" to the beginning and replace "A" by "y" | ABC | xyBC |
X:="A":"y", 0>"x"; | if X replace "A" by "y" and add "x" to the end of the string | ABC | yBCx |
X:=”x”<0, "A":"y", 0>"z"; | if X add "x" to the beginning, replace "A" by "y" and add "z" to the end of the string | ABC | xyBCz |
Observations
- Complex a-rules are also used to integrate different simple a-rules
ORD:="1">"1st"; ORD:="2">"2nd"; ORD:="3">"3rd"; |
ORD:="1">"1st", "2">"2nd", "3">"3rd"; |
- Actions are applied from left to right (i.e., order is important)
- PLR := "s" > "ses", "y" > "ies"; (kiss > kisses, city > cities)
- PLR := "y" > "ies", "s" > "ses"; (kiss > kisses, city>cities>citieses)
Formal syntax for a-rules
A-rules comply with the following syntax:
<A-RULE> ::= <CONDITION> “:=” <ACTION> [, <ACTION>]* “;” <CONDITION> ::= <ATAG>[“&”[“^”]<ATAG>]* <ATAG> ::= {one of the tags defined in the UNDLF Tagset} <ACTION> ::= <LEFT APPENDING> | <RIGHT APPENDING> | <REPLACEMENT> <LEFT APPENDING> ::= <ADDED> {“<” | “<<”} [ <DELETED> ] <RIGHT APPENDING> ::= [ <DELETED> ] {“>” | “>>”} <ADDED> <REPLACEMENT> ::= [ <STRING> ":" ] <ADDED> | "[" <INTEGER> "-" <INTEGER> "]" ":" <ADDED> <ADDED> ::= <STRING> <DELETED> ::= <STRING> | <INTEGER> <STRING> ::= “ “ “ [a..Z]+ “ “ “ <INTEGER> ::= [0..9]+
where
<a> = a is a non-terminal symbol
“a“ = a is a constant
[a] = a can be omitted
a | b = a or b
{ a | b } = either a or b
a* = a can be repeated 0 or more times
a+ = a can be repeated 1 or more times