How to create inflectional paradigms
Inflectional paradigms are sets of rules that are used to generate inflections out of the base forms. In the dictionary, we store only the base forms (e.g., "book" and "explain"); the inflections ("book/books", "explain/explains/explained/explaining" are generated through rules. These rules are of the A-rule (affixation rules) type.
Before starting, consider the following:
- Be critical about what you learnt in school and in traditional grammars
- Traditional grammars have been elaborated to describe language to humans. This is not the case here. We are describing the structure of a language to a machine, and this will require decisions that, in many occasions, will contradict and violate principles and rules defined in human-driven grammars.
- Do not duplicate paradigms.
- Before creating a paradigm, check whether it is really necessary, i.e., whether there is no existing paradigm that may be used in order to generate the intended inflections.
- Do not create paradigms for a single word.
- Paradigms are used to describe the behavior of several words. If the behavior is irregular, i.e., if it is restricted only to a single word, it should be described as an inflectional rule instead of an inflectional paradigm. For instance, the plural of the English word "foot" is better generated by an inflectional rule rather than by an inflectional paradigm. Inflectional rules are not included in the grammar. They are added directly to the dictionary entry, in the dictionary.
- Do not include compound forms in your paradigm.
- Paradigms must deal only with simple forms, i.e., forms that can be generated by prefixation, infixation or suffixation. In many cases, inflections are also generated by adding auxiliary or supporting words. These compound forms must not be included inside the paradigm, but should be handled by the grammar. For instance, in English, the simple present ("explain">"explains") is defined inside the paradigm, but the present progressive and the future are not ("explain">"is explaining", "explain">"will explain") because they cannot be formed through suffixation. They require more complex structures and should not be not as simple string manipulations (note that the negation, for instance, comes between the auxiliary and the main verb: "is NOT explaining", "will NOT explain", and this prevents the possibility of treating "will explain" as a single string formed out of "explain" through the prefixation of "will ").
- Do not include concatenative (agglutinative) forms in your paradigm.
- Paradigms deal with inflections, not with agglutinations. The difference between inflection and agglutination is sometimes not that clear. In the UNLarium framework, we normally understand that inflection, differently from agglutination, provokes changes to the base form. In this sense, the genitive case is said to be inflectional in Latin ("campus">"campi") and concatenative (agglutinative) in English ("John">"John's"). Only the former, i.e., inflectional cases, must be included in inflectional paradigms..
- For further information, please refer to Inflection or Agglutination?
- Do not include derivations in your paradigm.
- Paradigms deal with inflections, not with derivations. Derivations create new words, whereas inflections do not: they are used only to specify the meaning intended by a given base form (to indicate that it is plural, or masculine, or happened in the past). In that sense, inflections must necessarily preserve the lexical category of the base form (i.e., inflections cannot change nouns into verbs, or verbs into adjectives, etc.: "national" is not an inflection of "nation", "nationalize" is not an inflection of "national", and "nationalization" is not an inflection of "nationalize"). Additionally, inflections cannot promote any changes that will modify (instead of only specifying) the meaning of the base form, such as negation ("undo" is not an inflection of "do") and iteration ("redo" is not an inflection of "do").
- For further information, please refer to Inflection or Derivation?
- Avoid redundant forms in your paradigm
- Paradigms must generate the MINIMUM SET of different word forms that can be associated to the same base form. By "word form", we understand all the possible variants that a base form must have "at the string level".
- For instance, the verbal morphology of English is represented, in most regular cases (such as "to love"), only by 4 rules:
- because this is the MINIMUM SET of simple word forms that the verb may assume (as in "love, loved, loving, loves").
- Note that the base form "love" cannot generate any other word form. Obviously, some of these forms are used to convey different information (e.g., "love" is infinitive, but it is also first person present indicative, second person present indicative, ..., first person present subjunctive, second person imperative, etc.). But all this information is conveyed by the same string of the infinitive, and there is no reason to include all them in the dictionary. If we do so, we will have a serious person/tense disambiguation problem during tokenization. In a very simple sentence such as "I love Paris", there will be so many different candidates for "love" that the analysis will be very difficult. In order to be sure that we are picking the correct candidate "love" (1PS&PRS&IND), we would have to have so many disambiguation rules that the analysis will be unfeasible in terms of processing and time. It's much easier simply to have one single form "love" as infinitive, and to calculate the tense, person and aspect inside the grammar during syntactic analysis.
- Therefore, the English verbal paradigm, that could be expressed through many different rules (most of which will generate the same word forms), such as:
- can be reduced to 4 rules (in case of "love"):
- Being all the others calculated directly in the grammar.
In order to create inflectional paradigms, follow the steps below:
- Create the inflectional schema for the intended part-of-speech, if it has not been created yet
- Create the paradigm
- Name the paradigm
- Define the paradigm
- Create the rules
- Provide an exemplar base form (to test the paradimg)
- Provide examples
1. Create the inflectional schema
The inflectional schema is a template used to build paradigms and to assure that they will follow the same structure.
The inflectional schema is a list of inflectional categories for each part-of-speech. It describes the differences between the possible forms of the same lemma.
Consider the examples below for English, French and Latin.
In English, inflections concern only two part-of-speech: nouns and verbs. The others (determiners, adjectives, adverbs, etc.) are not inflectional.
- English nouns may have two forms: singular (SNG) and plural (PLR). Therefore, the inflectional schema for English nouns is the following:
- SNG (singular): table, man, foot
- PLR (plural): tables, men, feet
- English verbs may have several forms, but there are only 5 simple distinctive forms: infinitive (INF), gerund (GER), participle (PTP), simple past (PAS) and third person present indicative (3PS&PRS&IND). Therefore, the inflectional schema for English verbs is the following:
- INF (infinitive): love, do
- GER (gerund): loving, doing
- PAS (past): loved, did
- PTP (participle): loved, done
- 3PS&PRS&IND (third person singular present indicative): loves, does
- Note, in the above, that the inflectional schema does not include simple present (PRS) because this uses the same forms of the infinitive. Note, also, that the only person informed is the third person singular (3PS) in case of present indicative (PRS&IND), because this is the only one that has a special behavior. Note, at last, that the schema does not include any compound tense (such as future, present progressive, present perfect, past perfect, etc.), because they cannot be generated through simple affixation.
In French, inflections affect nouns, adjectives and verbs.
- There are two types of French nouns: those that have only number inflection, and those that have number and gender. There will be, therefore, two inflectional schemes:
- Nouns inflecting only in number (such as "table" (=table), "ville" (=city), "voiture" (=car), "père" (=father), "dentiste" (=dentist), etc.)
- SNG (singular): table, ville, voiture, père, dentiste
- PLR (plural): tables, villes, voitures, pères, dentistes
- Nouns inflecting in number and gender (such as "ami" (=friend), "chien" (=dog), "danceur" (=dancer), etc.)
- MCL&SNG (masculine singular): ami, chien, danceur
- FEM&SNG (feminine singular): amie, chienne, danceuse
- MCL&PLR (masculine plural): amis, chiens, danceurs
- FEM&PLR (feminine plural): amies, chiennes, danceuses
- Nouns inflecting only in number (such as "table" (=table), "ville" (=city), "voiture" (=car), "père" (=father), "dentiste" (=dentist), etc.)
- In French, adjectives vary regularly in number and gender, according to the following inflectional schema:
- MCL&SNG (masculine singular): beau
- FEM&SNG (feminine singular): belle
- MCL&PLR (masculine plural): beaux
- FEM&PLR (feminine plural): belles
- In French, verbs may have 51 different simple forms, as described in the following inflectionaln schema:
- INF (infinitive): aimer
- PTP&MCL&SNG (participle masculine singular): aimé
- PTP&MCL&PLR (participle masculine plural): aimés
- PTP&FEM&SNG (participle feminine singular): aimée
- PTP&FEM&PLR (particile feminine plural): aimées
- 1PS&PRS&IND (first person singular present indicative): aime
- 2PS&PRS&IND (second person singular present indicative): aimes
- 3PS&PRS&IND (third person singular present indicative): aime
- 1PP&PRS&IND (first person plural present indicative): aimons
- 2PP&PRS&IND (second person plural present indicative): aimez
- 3PP&PRS&IND (third person plural present indicative): aiment
- 1PS&PAS&NPFV&IND (first person singular past imperfective indicative): aimais
- 2PS&PAS&NPFV&IND (second person singular past imperfective indicative): aimais
- 3PS&PAS&NPFV&IND (third person singular past imperfective indicative): aimait
- 1PP&PAS&NPFV&IND (first person plural past imperfective indicative): aimions
- 2PP&PAS&NPFV&IND (second person plural past imperfective indicative): aimiez
- 3PP&PAS&NPFV&IND (third person plural past imperfective indicative): aient
- etc. (see the complete list at French grammar)
In Latin, inflections affect nouns, adjectives and verbs.
- Latin nouns may inflect in number and case (or in number, gender and case, in the special cases of some words having animals and human as referents, as in French)
- NOM&SNG (nominative singular): rosa
- NOM&PLR (nominative plural): rosae
- VOC&SNG (vocative singular): rosa
- VOC&PLR (vocative plural): rosae
- ACC&SNG (accusative singular): rosam
- ACC&PLR (accusative plural): rosas
- GNT&SNG (genitive singular): rosae
- GNT&PLR (genitive plural): rosarum
- DAT&SNG (dative singular): rosae
- DAT&PLR (dative plural): rosis
- ABL&SNG (ablative singular): rosa
- ABL&PLR (ablative plural): rosis
- Latin adjectives inflect in gender, number and case
- MCL&NOM&SNG (masculine nominative singular): bonus
- MCL&NOM&PLR (masculine nominative plural): boni
- MCL&VOC&SNG (masculine vocative singular): bone
- MCL&VOC&PLR (masculine vocative plural): boni
- MCL&ACC&SNG (masculine accusative singular): bonum
- MCL&ACC&PLR (masculine accusative plural): bonos
- MCL&GNT&SNG (masculine genitive singular): boni
- MCL&GNT&PLR (masculine genitive plural): bonorum
- MCL&DAT&SNG (masculine dative singular): bono
- MCL&DAT&PLR (masculine dative plural): bonis
- MCL&ABL&SNG (masculine ablative singular): bono
- MCL&ABL&PLR (masculine ablative plural): bonis
- FEM&NOM&SNG (feminine nominative singular): bona
- FEM&NOM&PLR (feminine nominative plural): bonae
- FEM&VOC&SNG (feminine vocative singular): bona
- FEM&VOC&PLR (feminine vocative plural): bonae
- FEM&ACC&SNG (feminine accusative singular): bonam
- FEM&ACC&PLR (feminine accusative plural): bonas
- FEM&GNT&SNG (feminine genitive singular): boni
- FEM&GNT&PLR (feminine genitive plural): bonarum
- FEM&DAT&SNG (feminine dative singular): bonae
- FEM&DAT&PLR (feminine dative plural): bonis
- FEM&ABL&SNG (feminine ablative singular): bona
- FEM&ABL&PLR (feminine ablative plural): bonis
- NEU&NOM&SNG (neuter nominative singular): bonum
- NEU&NOM&PLR (neuter nominative plural): bona
- NEU&VOC&SNG (neuter vocative singular): bonum
- NEU&VOC&PLR (neuter vocative plural): bona
- NEU&ACC&SNG (neuter accusative singular): bonum
- NEU&ACC&PLR (neuter accusative plural): bona
- NEU&GNT&SNG (neuter genitive singular): boni
- NEU&GNT&PLR (neuter genitive plural): bonorum
- NEU&DAT&SNG (neuter dative singular): bono
- NEU&DAT&PLR (neuter dative plural): bonis
- NEU&ABL&SNG (neuter ablative singular): bono
- NEU&ABL&PLR (neuter ablative plural): bonis
- Latin verbs inflect in many different simple forms (see the complete list at Latin grammar)
- The same part-of-speech may involve different inflectional schemes.
- In French, for instance, some nouns, such as "livre" (= book), only inflect in number (SNG and PLR); other nouns, such as "ami" (= friend), inflect in number and in gender (MCL&SNG,MCL&PLR,FEM&SNG,FEM&PLR). In these cases, there can be more than one inflectional schema for the same part-of-speech.
- Inflectional schemes must only include simple forms (i.e., those that are formed by affixation).
- Do not include categories in inflectional schema if they involve auxiliary or supporting words (such as future, in English, or passé composé, in French)
- Rules are not cumulative.
- You have to combine inflectional categories in one same condition because it's not possible to apply rules sequentially. For instance, it's not possible, in French, to write simply FEM:=0>"e"; and PLR:=0>"s";. It's necessary to write FEM&PLR:=0>"es";. This happens because, for the time being, it's not possible to tell the machine in which order the rules should be applied, i.e., we could have "amise" instead of "amies", if we define the number and the gender separately.
- Rules must be mutually exclusive.
- Inside the same paradigm, the conditions must be necessarily different, i.e., there cannot be two rules with the same conditions, or a rule that contains another rule:
SNG:=0>"";MCL&SNG:=0>"";(the condition SNG, of the first rule, is included in the condition of the second rule) PLR:=0>"";PLR:=0>"es";(the condition of the first rule is the same as the second rule)
- In order to deal with possible variants for the same lemma, the features ALT1, ALT2, ALT3, etc. must be used:
- For instance, the English word "fish" may have two different plurals: "fish" and "fishes". This is to be represented by
- SNG:=0>"";PLR&ALT1:=0>"";PLR&ALT2:=0>"es"; instead of
- Inflectional schemes are created inside the UNLarium (UNLARIUM>GRAMMAR>[LOCALE]>SETTINGS>INFLECTIONAL SCHEMES)
- Inflectional schemes, as templates, are created only once. After being created, they become available inside the UNLarium are used as templates in order to create new paradigms.
2. Create the paradigm
Paradigms are created inside the UNLarium (at UNLWEB>UNLARIUM>GRAMMAR>[LOCALE]>INFLECTIONAL PARADIGMS>ADD). They are normally based in inflectional schemes (templates).
2.1 Name the paradigm
The first field to be provided in the paradigm form is "name". Paradigm names must be unique. The following standards have been used to name paradigms:
- a common name (such as "first declension", "first group"), in case of well-established reference;
- the rule itself, in case of single-rule paradigms;
- the most distinctive rule, if any; or
- a "leading form", i.e., a typical example (a prototype) representative of the whole category, otherwise.
2.2 Define the paradigm
The paradigm definition must state clearly what the paradigm does and when it is applied.
2.3 Create the rules
Rules are normally creating by filling in the inflectional schema (which may be selected by the specific button at the right side of the form). Paradigm rules are always of the A-rule type, i.e., they are always affixation rules.
2.4 Provide an exemplar base form
The exemplar base form is used to test the paradigm, by pressing the right button after it. It must be a base form, over which the rules will be applied.
2.5 Provide examples
The examples illustrate other uses of the paradigm (in addition to the exemplar base form). Examples of the same category must be isolated by comma, and examples of different categories are isolated by semicolon. For instance:
- PAR: M2
- Examples: book,books; table,tables;
- ↑ It is important to stress that the changes must affect the BASE FORM, i.e., whenever the BASE FORM is preserved, even if there are changes to the appendices, there will be agglutination. For instance, in English, the genitive marker may have different forms depending on the base form: it will be "'s", if the word does not end in -s; and only "'", otherwise ("John">"John's", "Hans">"Hans'"). These variants, however, do not affect the base form and, therefore, should be still considered agglutinations, rather than inflections.
- ↑ Other English paradigms, such as "go" and "be", may include more rules, because they involve more unique word forms.