How to create entries

From UNL Wiki
Revision as of 13:24, 2 October 2009 by Admin (Talk | contribs)
Jump to: navigation, search

In the UNLarium, dictionary entries correspond to a translation of the Universal Word (UW) in a given natural language. In order to facilitate the task, UWs have been divided into 5 different categories ("adjectives", "adverbs", "nouns", "verbs" and "others"), each of which with a specific form.

Required fields

LEMMA
It's the canonical form or citation form of a word, i.e. the word as it normally appears in ordinary dictionaries. In English, for instance, run, runs, ran and running are forms of the same lexeme, with run as the lemma. The lemma is normally the form of singular, for nouns; of masculine singular, for adjectives; and of infinitive, for verbs. The lemma can also be a compound ("skinhead" or "African-American") or a multi-word expression ("United States of America"), but it should be reduced to the inflectional part of the word in case of separable words ("take (sth) into account", to be represented as "take"). In this last cases, the separable part of the word ("into account") must be represented in the field SUBCATEGORIZATION RULES.
WORD FORMATION
The word formation refers to the structure of the natural language word. The word can be:
  • a free morpheme (WRD), i.e., a regular word, such as "table", "beautiful", "yesterday", "give";
  • a multi-word expression (MTW), i.e., a word containing more than one stem, linked by hyphen ("African-American"), by blank spaces ("United States of America") or simply concatenated ("skinhead"); or
  • a bound morpheme (SBW), i.e., a morpheme that cannot stand alone as an independent word (such as "writ", "-s", "un-").

The word formation refers to the natural language word and not to the lemma. In this case, the lemma "take", when standing for "take into account", is to be classified as a multi-word expression.

PART OF SPEECH
The part of speech of the natural language word. The set of parts of speech is constrained by the class of the UW.
GENDER
It's required for nouns in languages that grammaticalize gender. The gender can be:
  • masculine (MCL), such as "he";
  • feminine (FEM), such as "she";
  • neutral (NEU), such as "it";
  • common, i.e., masculine or feminine (MOF), such as the French "pianiste", whose gender varies according to the referent: "le pianiste" (MCL), in case of man; "la pianiste" (FEM), in case of woman;
  • variable, i.e., masculine and feminine (MAF), such as the French "après-midi", that is used both in masculine ("un après-midi") and in feminine ("une après-midi") form, without any semantic change.
INFLECTIONAL PARADIGM
It should be informed always, even in the case of non-inflectional words, such as adverbs. There are two predefined values:
  • invariant (INV), for the words that do not vary, i.e., that do not receive any inflection (such as adverbs); and
  • irregular (IRR), for the words that do vary, but not according to any general set of rules (such as English irregular verbs).

In the latter case, the inflectional rules should be informed in the field INFLECTIONAL RULES, below INFLECTIONAL PARADIGM. In all the other cases - i.e., regular or quasi-regular words - the paradigms should be first created in the morphology module of the grammar in order to be available as an option to be selected. (See how to create paradigms)

Software