How to create a UW

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Universal Word)
(Universal Word (UW))
Line 10: Line 10:
 
To include a UW in the UNLKB is to define its [[UCI]] (Uniform Concept Identifier), which is made of two parts:
 
To include a UW in the UNLKB is to define its [[UCI]] (Uniform Concept Identifier), which is made of two parts:
 
*the UCL (Uniform Concept Locator), which is a 9-digit number, automatically assigned by the machine; and
 
*the UCL (Uniform Concept Locator), which is a 9-digit number, automatically assigned by the machine; and
*the UCN (Uniform Concept Name), which is an expression in the format LRU(icl>HYPERNYM), if it is a common word, or LRU(iof>HYPERNYM), if it is a proper name.
+
*the UCN (Uniform Concept Name), which is an expression in the format
*:In this expression LRU stands for [[Lexical Realisation Unit]], i.e., the name of the entity/concept. It can be a proper name (such as "Pablo Picasso", "Guernica", "Spanish Civil War", "Spanish Republican Armed Forces", "Facebook", "Candy Crush", etc.) or a common name ("paella", "baga ghanoush", "latifundium", "ilunga", etc.). For the time being, in order to ensure cross-language understanding, the name must be expressed in the way it is normally translated into English (i.e., "Spain", instead of "España", "Greece" instead of "Ελλάδα", "Egypt" instead of "مصر", etc.), but note that many concepts are only transliterated (e.g., "paella", "baba ghanoush", "latifundium" and "ilunga" normally appear as such in English texts, even though they are not English words, i.e., they are not really translated)<ref>Normally, in these cases, the words are represented in italic or between quotes in English texts, or are followed by a "translator note".</ref>. In any case, it is important for the LRU to be a "lexical unit", i.e., a real word (either simple or complex), and never an expression used to define the word. For instance, the LRU for "baba ghanoush" is "baba ghanoush" and not "dish of eggplant mashed and mixed with olive oil and various seasonings".
+
LRU(RELATION>CLASSIFIER)
 +
 +
In the above:
 +
*'''LRU''' stands for [[Lexical Realisation Unit]], i.e., the name of the entity/concept. It can be a proper name (such as "Pablo Picasso", "Guernica", "Spanish Civil War", "Spanish Republican Armed Forces", "Facebook", "Candy Crush", etc.) or a common name ("paella", "baga ghanoush", "latifundium", "ilunga", etc.). For the time being, in order to ensure cross-language understanding, the name must be expressed in the way it is normally translated into English (i.e., "Spain", instead of "España", "Greece" instead of "Ελλάδα", "Egypt" instead of "مصر", "Spanish Civil War" instead of "Guerra Civil Española", etc.). Note, however, that many concepts are only transliterated into English (e.g., "paella", "baba ghanoush", "latifundium" and "ilunga" normally appear as such in English texts, even though they are not English words, i.e., they are not really translated). Normally, in these cases, the words are represented in italic or between quotes in English texts, or are followed by a "translator note". In any case, it is important for the LRU to be a "lexical unit", i.e., a real word (either simple or complex), and never an expression used to define the word. For instance, the LRU for "baba ghanoush" is "baba ghanoush" and not "dish of eggplant mashed and mixed with olive oil and various seasonings".
 +
*'''CLASSIFIER''' is a category used to disambiguate and classify the LRU. It describe a major class, such as "person", "country", "city", "brand", "
 +
*'''RELATION''' is a [[Universal Relation]]s used to link the LRU to the CLASSIFIER. There
 +
 
 +
 
 +
 
 +
 
 +
*'''RELATION''' is any of the [[Universal Relation]]s that can be used to link the LRU to the CLASSIFIER. S
 +
 
 +
icl''' and '''iof''' are [[Universal Relation]]s and stand, respectively, for is-a-kind-of (icl) and is-an-instance-of (iof). The relation "icl" must be used when the concept is said to be common, where as "icl" is used when the concept is said to be proper. Compare the cases below:
 +
:*Pablo Picasso is an instance (and not a type) of person, then: Pablo Picasso(iof>person), instead of <strike>Pablo Picasso(icl>person)</strike>
 +
:*A painter is a type (and not an instance) of person, then: painter(icl>person), instead of <strike>painter(iof>person)</strike>
 +
:*Metropolis(iof>city) is a specific city (the place where Superman lives)
 +
:*metropolis(icl>city) is a type of city (a large city)
 +
 
 +
 
 +
Paris is an instance (and not a type) of city, then: Paris(iof>city), instead of <strike>Paris(icl>city)</strike>
 +
:*A metropolis is a type (and not an instance) of city, then: metropolis(icl>city), instead of <strike>metropolis(iof>city)</strike><ref>
  
 
== General Principles ==
 
== General Principles ==

Revision as of 17:58, 14 February 2014

The UNL Dictionary is never completed. It is expected to contain all the concepts that are lexicalized in at least one language. These include:

  • local concepts (such as "ilunga", from Tshiluba, which means "a person who is ready to forgive any transgression a first time and then to tolerate it for a second time, but never for a third time");
  • local named entities (rivers, mountains, beaches, cities, states, neighborhoods, brands, companies, people, etc.)
  • local products and practices (food, clothing, rituals, festivities, etc.)

All these concepts, if lexicalized in at least one language, must be included in the UNL Dictionary as Universal Words..

Universal Word (UW)

A UW is a concept endowed with semantic accessibility. The semantic accessibility is granted when the concept is introduced in the UNL Knowledge Base, i.e., when we connect the concept to other existing concepts. Thereafter, the concept may be handled even by languages that do not have it yet.[1]

To include a UW in the UNLKB is to define its UCI (Uniform Concept Identifier), which is made of two parts:

  • the UCL (Uniform Concept Locator), which is a 9-digit number, automatically assigned by the machine; and
  • the UCN (Uniform Concept Name), which is an expression in the format
LRU(RELATION>CLASSIFIER)

In the above:

  • LRU stands for Lexical Realisation Unit, i.e., the name of the entity/concept. It can be a proper name (such as "Pablo Picasso", "Guernica", "Spanish Civil War", "Spanish Republican Armed Forces", "Facebook", "Candy Crush", etc.) or a common name ("paella", "baga ghanoush", "latifundium", "ilunga", etc.). For the time being, in order to ensure cross-language understanding, the name must be expressed in the way it is normally translated into English (i.e., "Spain", instead of "España", "Greece" instead of "Ελλάδα", "Egypt" instead of "مصر", "Spanish Civil War" instead of "Guerra Civil Española", etc.). Note, however, that many concepts are only transliterated into English (e.g., "paella", "baba ghanoush", "latifundium" and "ilunga" normally appear as such in English texts, even though they are not English words, i.e., they are not really translated). Normally, in these cases, the words are represented in italic or between quotes in English texts, or are followed by a "translator note". In any case, it is important for the LRU to be a "lexical unit", i.e., a real word (either simple or complex), and never an expression used to define the word. For instance, the LRU for "baba ghanoush" is "baba ghanoush" and not "dish of eggplant mashed and mixed with olive oil and various seasonings".
  • CLASSIFIER is a category used to disambiguate and classify the LRU. It describe a major class, such as "person", "country", "city", "brand", "
  • RELATION is a Universal Relations used to link the LRU to the CLASSIFIER. There



  • RELATION is any of the Universal Relations that can be used to link the LRU to the CLASSIFIER. S

icl and iof are Universal Relations and stand, respectively, for is-a-kind-of (icl) and is-an-instance-of (iof). The relation "icl" must be used when the concept is said to be common, where as "icl" is used when the concept is said to be proper. Compare the cases below:

  • Pablo Picasso is an instance (and not a type) of person, then: Pablo Picasso(iof>person), instead of Pablo Picasso(icl>person)
  • A painter is a type (and not an instance) of person, then: painter(icl>person), instead of painter(iof>person)
  • Metropolis(iof>city) is a specific city (the place where Superman lives)
  • metropolis(icl>city) is a type of city (a large city)


Paris is an instance (and not a type) of city, then: Paris(iof>city), instead of Paris(icl>city)

  • A metropolis is a type (and not an instance) of city, then: metropolis(icl>city), instead of metropolis(iof>city)[2]
Software