Lexica

From UNL Wiki
Revision as of 14:46, 21 September 2012 by Martins (Talk | contribs)
Jump to: navigation, search

The UNL System contains three different types of lexical databases: dictionaries, knowledge base and example bases.

Contents

Dictionaries

Main article: Dictionary Specs

In the UNL System, a dictionary is a flat list of entries with their corresponding features. The dictionaries comply with the structure defined in the Dictionary Specs and must contain only tags defined in the Tagset. They are divided in three different categories:

  • The UNL Dictionary, or simply UNLdic, is a list of UW's and their semantic (language-independent) features
  • The NL Dictionary, or simply NLdic, is a list of natural language entries with the corresponding morphological and syntactic (language-dependent) features
  • The UNL-NL Dictionary, or simply UNL-NLdic, is list of lexical mappings between UW's and natural language entries

The UNL Dictionary and the NL Dictionary are monolingual databases, whose entries are interlinked by the UNL-NL Dictionary, which brings the mappings between UW's and natural language entries, whenever available[1]. These three dictionaries are normally made through the UNLarium in different steps and constitute the basic resource for UNLization and NLization.

UNL Knowledge Base (UNLKB)

Main article: UNL Knowledge Base

The UNL Dictionary is simply a flat list of UW's and their corresponding classifiers (such as lexical category, semantic class, abstractness, cardinality, etc.). The UNL Dictionary does not contain any distinguisher, i.e., any information that can be used to differentiate a given UW from the others that belong to the same class. This information is provided in the UNL Knowledge Base, or UNLKB, which is a semantic network made of relations that are necessary to define UW's.

The UNL Knowledge Base is expected to represent the intension (the meaning) of UW's.



Main article: Lexica

UW's are grouped in several different lexical databases:

  • The UNL Dictionary is a flat list of UW's with the corresponding semantic features. It is divided into three different nested dictionaries: the UNL Core Dictionary, the UNL Abridged Dictionary and the UNL Unabridged Dictionary. The UNL Core Dictionary brings permanent UW's which are supposed to be lexicalized in all languages; the UNL Abridged Dictionary brings permanent UW's which are lexicalized in at least two language families (and includes therefore the UNL Core Dictionary); the UNL Unabridged Dictionary, which contains the UNL Abridged Dictionary, brings the whole sent of permanent UW's (i.e., the concepts that are lexicalized in at least one language).
  • The UNL Knowledge Base is a network where UW's are interconnected by the relations of UNL. Differently from the UNL Dictionary, which brings only general features (such as lexical category, semantic class, abstractness, cardinality, etc.), the UNL KB In the UNL KB, it is informed, for instance, that the UW "dog" is linked to the UW's "domesticated", "carnivorous", "mammal", etc.
  • The UNL Ontology is a part of the UNL Knowledge Base. It is a network where UW's are interconnected by the ontological relations of UNL, i.e., "is-a-kind-of" ("icl") and "is-an-instance-of" ("iof").
  • The UNL Memory is also a network where UW's are interconnected by the relations of UNL, but, differently from the UNL Knowledge Base, which brings the intension of a UW, the UNL Memory brings its extension, i.e., the set of instances of a UW. In the UNL Memory, it is informed, for instance, that the UW "dog" may be the agent of the UW "to bite", the object of the UW "to eat", the instrument of the UW "to chase", etc.




Consider, for instance, the case of the UW corresponding to the concept "a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs", which may be conveyed by the English word "table". The UNL Dictionary brings only the information that "table" is a nominal concrete concept which belongs to the class of artifacts. The information that "table" is "a piece of furniture having a smooth flat top" is stated in the UNLKB, where the UW corresponding to "table" is linked to several other UW's (such as "furniture", "smooth flat top", etc.), in order to precise its meaning.

Example Bases

In the UNL System, there are two different types of example bases:

  • The UNL Example Base, or simply UNLEB, is a network with frequent relations between UW's
  • The UNL-NL Memory, or UNL Memory Base, or simply UNL-NLMB, is a list of frequent mappings between UNL and a given natural language

The UNLEB is a monolingual resource and subsumes the UNLKB. The difference is that the UNLKB contains only necessary relations between UW's, whereas the UNLEB, which is corpus-based, brings any frequent relation between UW's. For instance, the idea that a "table" is "supported by one or more vertical legs" is not represented in the UNLKB because it is not supposed to be necessary (there are tables that are not supported by legs). This information, as the information that tables are normally round or square, that they are made of hard materials, etc., is repesented in the UNLEB. The UNLEB extends and complements the UNLKB.

The UNLMB is a bilingual database. The main difference between the UNL-NLdic and the UNL-NLMB is that the former involves only lexical units (i.e., entries defined as such in the UNL and the NL dictionaries) whereas the latter involves translation units, which may include several lexical units.

Notes

  1. Not all NL dictionary entries may be mapped onto UNL. Articles, prepositions, conjunctions and other particles do not have any correspondence in UNL.
Software