Semantic network

From UNL Wiki
Revision as of 21:00, 18 September 2013 by Martins (Talk | contribs)
Jump to: navigation, search

The main goal of the UNL is to represent, in a machine-tractable format, natural language meaning, i.e., the information conveyed by natural language documents. In the UNL framework, this information is represented by a semantic network, a network which represents semantic relations between concepts. This semantic network, or UNL graph, is made of three different types of discrete semantic entities: Universal Words, Universal Relations and Universal Attributes. Universal Words, or simply UW's, are the nodes in the semantic network; Universal Relations are arcs linking UW's; and Universal Attributes are used to instantiate UW's. This three-layered representation model is the cornerstone of the UNL, and a distinctive feature over other semantic networks, which normally propose only two levels: edges and vertices.

However, this three-layered representation poses several problems to the UNLization as the distinction between what is supposed to be represented by each unit is not always clear. One difficulty concerns what is to be represented as a UW (i.e., as a node in the UNL graph) and what is to be represented as a relation between UW's. How many UW's are there, for instance, in the sentence "Charles Dickens was the author of Oliver Twist"? Should "author" be represented as a UW or as a relation between "Charles Dickens" and "Oliver Twist"? Should the verb "to be" be represented as a UW or as a relation between "Charles Dickens" and "author"? Should the preposition "of" be represented as a UW or as a relation between "author" and "Oliver Twist"?

Given the difficulty to categorize concepts, the UNL assumes the following principles:

1. If the information can only be conveyed by open lexical categories (nouns, adjectives, adverbs and verbs), or by pronouns and numbers[1],
   it is represented by UW's;

2. If the information can be conveyed, in any language, 
   by grammatical categories (affixes, articles, auxiliary verbs, copula, classifiers, conjunctions, interjections, prepositions), or
   by syntactic phenomena (word order, agreement, case marking), 
   it is represented
   2.1. as attributes, if the information is not relational, i.e., if it can be associated to a single node (or hyper-node) in the graph; or
   2.2. as relations, if the information is relational, i.e., if it is used to link two nodes in the graph.

Let's consider, for instance, the case of "Charles Dickens was the author of Oliver Twist". In this sentence, we notice the following:

  • The concept of "Charles Dickens" and "Oliver Twist" can only be realized by an open lexical category (noun) and, therefore, must be represented as UW's.
  • The concept of "author" (= "writer" or "creator") can only be fully[2] realized by an open lexical category (noun) and, therefore, must be represented as a UW
  • The concept of past (as in "was") is realized, in several languages, by inflectional affixes (as the English suffix "-ed", in "killed") and affects only one UW (the verb "to be"). It must be represented, therefore, by an attribute: @past
  • The concept of definite (as in "the") is also realized, in several languages, by affixes (as the Romanian suffix "-ul", in "omul" = the man) and, even in English, it is represented by a closed class category (article). As it is non-relational, it must be represented by an attribute: @def
  • The concept of "to be" (= to equal in identity) is not lexicalized in several languages and, even in English, it is not represented by an open lexical category[3]. However, as it involves more than one UW (it links the subject to the predicate), it must be represented as a relation: aoj
  • The concept conveyed by "of" (= origin) is represented, in several languages, by affixes (case markers) and, even in English, is represented by a closed class category (preposition). As it is relational (it links "author" to "Oliver Twist"), it must be represented as a relation: cnt.

Notes

  1. Pronouns are represented as UW's because they replace (and act as) nouns; numerals because they do not represent a really closed set (the set of numbers is infinite).
  2. One may argue that this concept may be represented by derivational suffixes, such as -er (as in "writer") or -or (as in "creator"), or by the preposition "by" (as in "Oliver Twist by Charles Dickens"), but this is not accurate, since both -er and -or are used for "one who performs an action" and "by" denotes rather an agent. None of them can fully replace "author" in this context.
  3. Linking verbs (copula) are not really productive and cannot be said to be an open lexical category.
Software