Inflectional paradigms

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
 
(42 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Inflectional paradigms are used to generate the inflected forms out of the lemma.  
+
Inflectional paradigms are sets of rules used to generate the inflected forms out of the [[base form]].  
  
 
== When to use inflectional paradigms ==
 
== When to use inflectional paradigms ==
 
+
Inflectional paradigms are used when:
Inflectional paradigms must be used in the case of inflectional words (such as nouns, adjectives and verbs), regardless if they are regular or not.
+
*inflections can be described by '''AFFIXATION''' (i.e., prefixation, infixation or suffixation) AND
 +
*inflections are '''REGULAR''' (i.e., they may be applied to several different words)
 +
Consider, for instance, the case of the English nouns making the plural in -s (''book''>''books'', ''table''>''tables'', etc.). This can be expressed by affixation (suffixation of -s) and it is regular (there are many words in this set). This morphological behavior is then described by the paradigm M2, which contains two rules:
 +
*SNG:=0>""; (do not add anything to the word in case of singular)
 +
*PLR:=0>"s"; (add "s" to the end of the word in case of plural)
 +
This paradigm is created within the English grammar and is associated, in the English dictionary, to all words having the same morphological behavior.<br />
 +
In the grammar:
 +
*Paradigm M2: SNG:=0>""; PLR:=0>"s";
 +
In the dictionary:
 +
*[table]{ID}"UW"(LEX=N,POS=NOU,...,'''PAR=M2''')<eng,0,0>;
 +
*[book]{ID}"UW"(LEX=N,POS=NOU,...,'''PAR=M2''')<eng,0,0>;
 +
*...
  
 
== When not to use inflectional paradigms ==
 
== When not to use inflectional paradigms ==
  
Inflectional paradigms should not be used in the case of uninflected words (such as adverbs) or already inflected verbs (such as personal pronouns).
+
Inflectional paradigms are not used in the following cases:
 +
*When the word is INVARIANT (such as adverbs and adjectives, in English); OR
 +
*When the word is NOT INFLECTIONAL, i.e., when it is CONCATENATIVE, in the sense that the word forms are formed out of the same base form simply by concatenating particles (such as auxiliaries, adpositions and even affixes) that do not affect the base form; OR
 +
*When the inflections are '''NOT REGULAR''', i.e., when they are too specific (such as ''mouse''>''mice'', in English); OR
 +
*When the inflections '''CANNOT BE EXPRESSED BY AFFIXATION''', i.e., when inflections are represented by periphrases (such as future, in English: ''go'' > ''will go'').
  
== Syntax ==  
+
Consider, for instance, the cases below:
 +
*The English adverb "now"
 +
*:The adverb "now" is invariant (i.e., it does not change its form according to number, gender, tense, aspect, etc.) and, therefore, must be associated to the paradigm M0 (INVARIANT). No paradigm must be created in this case.
 +
*The English possessive case marker 's or ' ("John">"John's", or "Hans">"Hans'")
 +
*:The possessive case marker in English is simply concatenative, and not a real inflection, in the sense that it does not affect the base form, even if it may take different forms depending on the ending of the base form ("'s" or "'"). The possessive case must not be included within nominal paradigms in English.
 +
*The English noun "mouse"
 +
*:The noun "mouse" is inflectional (i.e., it changes its form in singular and plural), which can be expressed by two suffixation rules:
 +
*::SNG:=0>""; (do not add anything to the word in case of singular)
 +
*::PLR:="mice"; (replace everything by "mice" in case of plural)
 +
*:These rules, however, are very specific, because they apply to a very limited set of words. Therefore, this behavior should not be defined as a paradigm in the grammar, but as inflectional rules inside the dictionary:<br />
 +
*:In the grammar:<br />
 +
*:*Paradigm M1 (IRREGULAR), i.e., the inflectional rules are defined inside the dictionary because they are too specific
 +
*:In the dictionary:
 +
*:*[mouse]{ID}"UW"(LEX=N,POS=NOU,...,'''PAR=M1''','''FLX(SNG:=0>"";PLR:="mice";)''')<eng,0,0>;
 +
*The English verb "love"
 +
*:The verb "love" is inflectional (i.e., it changes its form in the present, past, future, etc), and these changes can be expressed by several suffixation rules:
 +
*:*INF:=0>""; (do not add anything to the word in case of infinitive)
 +
*:*PAS:=0>"d"; (add "d" to the word in case of past tense)
 +
*:*3PS&PRS&IND:=0>"s"; (add "s" to the word in case of third person singular present indicative)
 +
*:*GER:=1>"ing"; (remove the last character and add "ing" in case of gerund)
 +
*:...
 +
*:Note, however, that some inflections of the verb cannot be generated by affixation, such as the future (''love''>''will love''), the present progressive (''love''>''is loving''), the present perfect (''love''>''has loved''), etc. In these cases, the inflection is not simply a matter of appending strings to the end of the word. Note, for instance, that the negation comes in-between: "will not love", "is not loving", "has not loved"; and that the order may be changed in interrogatives: "will he/Peter/the boy with the telescope love?". So, we cannot define these inflections as mere affixes, but as a whole new syntactic structures. These inflections, thus, are not included inside paradigms, and must be defined in a different way. Paradigms must only include SIMPLE FORMS (the paradigm for English verbs contain only simple tenses, for instance) and must contain ONLY NON-REDUNDANT forms.
  
Inflectional paradigm rules follow the UNL syntactic general formalism:
+
== Predefined Paradigms ==
 +
There are two predefined paradigms in the UNL<sup>arium</sup>:
 +
;INVARIANT (M0)
 +
: If the word is not inflectional (case of adverbs in English, for instance) or does not accept any inflectional variant (case of "clothes", used only in plural, or "species", that has the same form in singular and plural).
 +
;IRREGULAR (M1)
 +
: If the word is inflectional but does not follow any existing paradigm, as in irregular forms (such as "man", "mouse", "foot" and "child"). In this case, the corresponding inflectional rules should be provided as [[inflectional rules]].
  
<DICTIONARY ATTRIBUTE VALUES> “:=” <ACTION> [“,” <ACTION>]*
+
== Semi-regular words ==
 +
Semi-regular words, i.e., words that follow regular paradigms except for some few cases, must be linked to the corresponding regular paradigms. The irregular forms must be informed in the field INFLECTIONAL RULES, which will prevail over the forms created through inflectional paradigms.
  
where
+
Consider, for instance, the following cases:
:<DICTIONARY ATTRIBUTE VALUES> is a set dictionary tags extracted from the [[UNL Dictionary Tagset]]
+
;MINOR SPELLING IRREGULARITIES (diacritics)
:<ACTION> is the action to be performed in the event of the dictionary value (see below)
+
:In French, the verb "acheter" (= to buy) follows, in general, the regular paradigm of verbs ending in -er, except for some forms in the present indicative, present subjunctive and conditional, where the root becomes "achèt", with a "è" instead of an "e". This verb must be then associated to the regular paradigm of verbs ending in -er and, additionally, in the field INFLECTIONAL RULES, the irregular forms must be listed as follows:
:“ “ = constant
+
*1PS&PRS&IND:="achète"; (i.e., replace the first person present indicative by "achète")
:[ ] = optional
+
*2PS&PRS&IND:="achètes"; (i.e., replace the second person present indicative by "achètes")
:<nowiki>*</nowiki> to be repeated zero or more times
+
*3PS&PRS&IND:="achète"; (i.e., replace the third person present indicative by "achète")
 +
etc.
 +
;DEFECTIVE WORDS:
 +
:In Portuguese, the verb "colorir" (= to color) follows, in general, the regular paradigm of verbs ending in -ir, except for the first person present indicative, which does not exist. This verb must be then associated to the regular paradigm of verbs ending in -ir and, additionally, in the field INFLECTIONAL RULES, the defective form must be listed as follows:
 +
*1PS&PRS&IND:=NULL; (i.e., do not generate the first person present indicative)
 +
;REDUNDANT FORMS
 +
:In English, the word "fish" may have two different plural forms: "fish" and "fishes". Both are acceptable by the grammar. In order to cope with both possibilities, we have to associate "fish" to one paradigm (the one that adds -es to words ending in -sh, for instance) and to inform, in the INFLECTIONAL RULES, the other possibility as follows:
 +
*PLR&ALT:="fish"; (i.e., there is another possibility to form the plural)
 +
In case of redundancy, it is important not to forget to add ALT, or the system will keep only the inflectional rule.
  
== Dictionary Attribute Values ==
+
== How to create inflectional paradigms ==
The dictionary attribute values should comply with the [[UNL Dictionary Tagset]]. They can be used in isolation or conjoined by “&”.
+
[[How to create inflectional paradigms]]
PLR (= PLURAL)
+
1PS&ET1&IND (= FIRST PERSON OF SINGULAR [1PS] + PRESENT [ET1] + INDICATIVE [ IND])
+
  
== Actions ==  
+
== Syntax ==
There are three different types of actions that can be performed over the entries. The syntax for each of them is depicted below:
+
{| border="1" align="center" cellpadding="5"
+
!Type
+
!Syntax
+
|-
+
|right appending
+
|<RIGHT ADDITION>”>”<RIGHT DELETION>
+
|-
+
|left appending
+
|<LEFT DELETION>”<”<LEFT ADDITION>
+
|-
+
|replacement
+
|<SOURCE>”:”<TARGET>
+
|}
+
  
where
+
Inflectional paradigms are expressed by [[A-rule]]s, a special formalism for introducing prefixes, infixes and suffixes to the base form.
;<LEFT DELETION>
+
:the string or the number of characters from the beginning of the entry to be deleted before the addition of the LEFT ADDITION.  
+
;<LEFT ADDITION>
+
:the string to be added to the beginning of the entry along with its corresponding features
+
;<RIGHT DELETION>
+
:the string or the number of characters from the end of the entry to be deleted before the addition of the RIGHT ADDITION.
+
;<RIGHT ADDITION>
+
:the string to be added to the end of the entry along with its corresponding features
+
;<SOURCE>
+
:the string to be replaced (if empty, it means that the whole string will be replaced).
+
;<TARGET>
+
:the string to be used instead of the source (if empty, it means that the whole entry should be deleted)
+
  
=== Observations ===
+
== Examples ==
: Strings must come between double quotes.
+
: <LEFT ADDITION> and <RIGHT ADDITION> must comme between parentheses.
+
: <LEFT ADDITION> and <RIGHT ADDITION> may have as many features as necessary, provided that they are separated by ",".
+
: Features must comply with the values defined in the [[UNL Dictionary Tagset]].
+
: <LEFT ADDITION> and <RIGHT ADDITION> may be split into several different nodes, each of which enclosed between parentheses.
+
: <LEFT DELETION> and <RIGHT DELETION> may be empty (or equal to 0) if nothing is to be deleted.
+
: <SOURCE> may also be the interval of characters to be replaced. In this case, the number of the beginning character and of the ending character should be informed between square brackets and should be separated with a semicolon.
+
: Blank spaces are not inserted automatically. They can be inserted either as a string (" ") or as a feature (BLK).
+
: [Square brackets] may be used to indicate optional elements: a[b]c = ac, abc
+
: {braces} may be used to indicate alternative elements: a{b,c}d = abd, acd
+
: Phrase types (NP, PP, VP, CP, AP, JP, SP) may be used to indicate embedded phrases in separable words or multiword expressions.
+
  
== Examples ==
+
{| border="1" cellpadding="5"
{| border="1" align="center" cellpadding="5"
+
!Name
!Type
+
!Rules
!Rule
+
!Description
!Behavior
+
 
!Examples
 
!Examples
 
|-
 
|-
|right appending
+
|PLR:=0>"s"
|PLR:=”y”>”ies”
+
|SNG:=0>"";PLR:=0>"s";
|in case of the feature “PLR” (=plural), the rightmost "y" will be deleted and the "ies" string will be added to the right of the entry
+
|Add "s" to the end of the form in case of plural
|baby>babies, lady>ladies
+
|table>tables, boy>boys, etc
 
|-
 
|-
|right appending
+
|PLR:="y">"ies"
|PLR:=1>”ies”
+
|SNG:=0>"";PLR:="y">"ies";
|in case of the feature “PLR” (=plural), the rightmost character will be deleted and the "ies" string will be added to the right of the entry
+
|Replace "y" by "ies" at the end of the form in case of plural
|baby>babies, lady>ladies
+
|baby>babies, city>cities, etc
 
|-
 
|-
|left appending
+
|PLR:="f">"ves"
|NOT:=<”un”
+
|SNG:=0>"";PLR:="f">"ves";
|in case of the feature NOT (=negation), the string "un" will be added to the left of the entry, and nothing will be deleted
+
|Replace "f" by "ves" at the end of the form in case of plural
|dress>undress
+
|wolf>wolves, half>halves, etc
 
|-
 
|-
|left appending
+
|PAS:=0>"ed"
|NOT:=0<”un”
+
|INF:=0>"";PAS:=0>"ed";GER:=0>"ing";PTP:=0>"ed";3PS&PRS&IND:=0>"s";
|in case of the feature NOT (=negation), the string "un" will be added to the left of the entry, and nothing will be deleted
+
|Add "ed" in the simple past, "ing" in the gerund, ...
|dress>undress
+
|work>worked, ask>asked, etc
|-
+
|replacement
+
|PLR:=”oo”:”ee”
+
|in case of the feature "PLR” (=plural), the "oo" string will be replaced by "ee"
+
|foot>feet, tooth>teeth
+
|-
+
|replacement
+
|PLR:=[2;3]:”ee”
+
|in case of the feature "PLR” (=plural), the string "ee" will replace the string that goes from the second to the third character
+
|foot>feet, tooth>teeth
+
 
|-
 
|-
|replacement
+
|PAS:=0>"d"
|1PS&ET1&IND:=”am”
+
|INF:=0>"";PAS:=0>"d";GER:=e>"ing";PTP:=0>"d";3PS&PRS&IND:=0>"s";
|in case of the features “1PS” (=first person of singular) AND “ET1” (=present tense) AND “IND” (indicative), the whole string will be replaced by “am”
+
|Add "d" in the simple past, replace the final "e" by "ing" in the gerund, ...
|be>am
+
|use>used, arrange>arranged, etc
 
|}
 
|}

Latest revision as of 17:49, 26 May 2014

Inflectional paradigms are sets of rules used to generate the inflected forms out of the base form.

Contents

When to use inflectional paradigms

Inflectional paradigms are used when:

  • inflections can be described by AFFIXATION (i.e., prefixation, infixation or suffixation) AND
  • inflections are REGULAR (i.e., they may be applied to several different words)

Consider, for instance, the case of the English nouns making the plural in -s (book>books, table>tables, etc.). This can be expressed by affixation (suffixation of -s) and it is regular (there are many words in this set). This morphological behavior is then described by the paradigm M2, which contains two rules:

  • SNG:=0>""; (do not add anything to the word in case of singular)
  • PLR:=0>"s"; (add "s" to the end of the word in case of plural)

This paradigm is created within the English grammar and is associated, in the English dictionary, to all words having the same morphological behavior.
In the grammar:

  • Paradigm M2: SNG:=0>""; PLR:=0>"s";

In the dictionary:

  • [table]{ID}"UW"(LEX=N,POS=NOU,...,PAR=M2)<eng,0,0>;
  • [book]{ID}"UW"(LEX=N,POS=NOU,...,PAR=M2)<eng,0,0>;
  • ...

When not to use inflectional paradigms

Inflectional paradigms are not used in the following cases:

  • When the word is INVARIANT (such as adverbs and adjectives, in English); OR
  • When the word is NOT INFLECTIONAL, i.e., when it is CONCATENATIVE, in the sense that the word forms are formed out of the same base form simply by concatenating particles (such as auxiliaries, adpositions and even affixes) that do not affect the base form; OR
  • When the inflections are NOT REGULAR, i.e., when they are too specific (such as mouse>mice, in English); OR
  • When the inflections CANNOT BE EXPRESSED BY AFFIXATION, i.e., when inflections are represented by periphrases (such as future, in English: go > will go).

Consider, for instance, the cases below:

  • The English adverb "now"
    The adverb "now" is invariant (i.e., it does not change its form according to number, gender, tense, aspect, etc.) and, therefore, must be associated to the paradigm M0 (INVARIANT). No paradigm must be created in this case.
  • The English possessive case marker 's or ' ("John">"John's", or "Hans">"Hans'")
    The possessive case marker in English is simply concatenative, and not a real inflection, in the sense that it does not affect the base form, even if it may take different forms depending on the ending of the base form ("'s" or "'"). The possessive case must not be included within nominal paradigms in English.
  • The English noun "mouse"
    The noun "mouse" is inflectional (i.e., it changes its form in singular and plural), which can be expressed by two suffixation rules:
    SNG:=0>""; (do not add anything to the word in case of singular)
    PLR:="mice"; (replace everything by "mice" in case of plural)
    These rules, however, are very specific, because they apply to a very limited set of words. Therefore, this behavior should not be defined as a paradigm in the grammar, but as inflectional rules inside the dictionary:
    In the grammar:
    • Paradigm M1 (IRREGULAR), i.e., the inflectional rules are defined inside the dictionary because they are too specific
    In the dictionary:
    • [mouse]{ID}"UW"(LEX=N,POS=NOU,...,PAR=M1,FLX(SNG:=0>"";PLR:="mice";))<eng,0,0>;
  • The English verb "love"
    The verb "love" is inflectional (i.e., it changes its form in the present, past, future, etc), and these changes can be expressed by several suffixation rules:
    • INF:=0>""; (do not add anything to the word in case of infinitive)
    • PAS:=0>"d"; (add "d" to the word in case of past tense)
    • 3PS&PRS&IND:=0>"s"; (add "s" to the word in case of third person singular present indicative)
    • GER:=1>"ing"; (remove the last character and add "ing" in case of gerund)
    ...
    Note, however, that some inflections of the verb cannot be generated by affixation, such as the future (love>will love), the present progressive (love>is loving), the present perfect (love>has loved), etc. In these cases, the inflection is not simply a matter of appending strings to the end of the word. Note, for instance, that the negation comes in-between: "will not love", "is not loving", "has not loved"; and that the order may be changed in interrogatives: "will he/Peter/the boy with the telescope love?". So, we cannot define these inflections as mere affixes, but as a whole new syntactic structures. These inflections, thus, are not included inside paradigms, and must be defined in a different way. Paradigms must only include SIMPLE FORMS (the paradigm for English verbs contain only simple tenses, for instance) and must contain ONLY NON-REDUNDANT forms.

Predefined Paradigms

There are two predefined paradigms in the UNLarium:

INVARIANT (M0)
If the word is not inflectional (case of adverbs in English, for instance) or does not accept any inflectional variant (case of "clothes", used only in plural, or "species", that has the same form in singular and plural).
IRREGULAR (M1)
If the word is inflectional but does not follow any existing paradigm, as in irregular forms (such as "man", "mouse", "foot" and "child"). In this case, the corresponding inflectional rules should be provided as inflectional rules.

Semi-regular words

Semi-regular words, i.e., words that follow regular paradigms except for some few cases, must be linked to the corresponding regular paradigms. The irregular forms must be informed in the field INFLECTIONAL RULES, which will prevail over the forms created through inflectional paradigms.

Consider, for instance, the following cases:

MINOR SPELLING IRREGULARITIES (diacritics)
In French, the verb "acheter" (= to buy) follows, in general, the regular paradigm of verbs ending in -er, except for some forms in the present indicative, present subjunctive and conditional, where the root becomes "achèt", with a "è" instead of an "e". This verb must be then associated to the regular paradigm of verbs ending in -er and, additionally, in the field INFLECTIONAL RULES, the irregular forms must be listed as follows:
  • 1PS&PRS&IND:="achète"; (i.e., replace the first person present indicative by "achète")
  • 2PS&PRS&IND:="achètes"; (i.e., replace the second person present indicative by "achètes")
  • 3PS&PRS&IND:="achète"; (i.e., replace the third person present indicative by "achète")

etc.

DEFECTIVE WORDS
In Portuguese, the verb "colorir" (= to color) follows, in general, the regular paradigm of verbs ending in -ir, except for the first person present indicative, which does not exist. This verb must be then associated to the regular paradigm of verbs ending in -ir and, additionally, in the field INFLECTIONAL RULES, the defective form must be listed as follows:
  • 1PS&PRS&IND:=NULL; (i.e., do not generate the first person present indicative)
REDUNDANT FORMS
In English, the word "fish" may have two different plural forms: "fish" and "fishes". Both are acceptable by the grammar. In order to cope with both possibilities, we have to associate "fish" to one paradigm (the one that adds -es to words ending in -sh, for instance) and to inform, in the INFLECTIONAL RULES, the other possibility as follows:
  • PLR&ALT:="fish"; (i.e., there is another possibility to form the plural)

In case of redundancy, it is important not to forget to add ALT, or the system will keep only the inflectional rule.

How to create inflectional paradigms

How to create inflectional paradigms

Syntax

Inflectional paradigms are expressed by A-rules, a special formalism for introducing prefixes, infixes and suffixes to the base form.

Examples

Name Rules Description Examples
PLR:=0>"s" SNG:=0>"";PLR:=0>"s"; Add "s" to the end of the form in case of plural table>tables, boy>boys, etc
PLR:="y">"ies" SNG:=0>"";PLR:="y">"ies"; Replace "y" by "ies" at the end of the form in case of plural baby>babies, city>cities, etc
PLR:="f">"ves" SNG:=0>"";PLR:="f">"ves"; Replace "f" by "ves" at the end of the form in case of plural wolf>wolves, half>halves, etc
PAS:=0>"ed" INF:=0>"";PAS:=0>"ed";GER:=0>"ing";PTP:=0>"ed";3PS&PRS&IND:=0>"s"; Add "ed" in the simple past, "ing" in the gerund, ... work>worked, ask>asked, etc
PAS:=0>"d" INF:=0>"";PAS:=0>"d";GER:=e>"ing";PTP:=0>"d";3PS&PRS&IND:=0>"s"; Add "d" in the simple past, replace the final "e" by "ing" in the gerund, ... use>used, arrange>arranged, etc
Software