Semantic network

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Examples: consistency)
 
(42 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The main goal of the UNL is to represent, in a machine-tractable format, natural language '''meaning''', i.e., the '''information''' conveyed by natural language documents. In the UNL framework, this information is represented by a '''semantic network''', a network which represents semantic relations between concepts. This semantic network, or '''UNL graph''', is made of three different types of discrete semantic entities: [[Universal Words]], [[Universal Relations]] and [[Universal Attributes]]. Universal Words, or simply UW's, are the nodes in the semantic network; Universal Relations are arcs linking UW's; and Universal Attributes are used to instantiate UW's. This three-layered representation model is the cornerstone of the UNL, and a distinctive feature over other semantic networks, which normally propose only two levels: edges and vertices.
+
The main goal of the UNL is to represent, in a machine-tractable format, the '''information''' conveyed by natural language documents. In the UNL framework, this information is represented by a '''semantic network''', i.e., a network which represents semantic relations between concepts. This semantic network, or '''UNL graph''', is made of three different types of discrete semantic entities: [[Universal Words]], [[Universal Relations]] and [[Universal Attributes]]. Universal Words, or simply UW's, are the nodes in the semantic network; Universal Relations are arcs linking UW's; and Universal Attributes are used to instantiate UW's.  
  
However, this three-layered representation poses several problems to the [[UNLization]] as the distinction between what is supposed to be represented by each unit is not always clear. One difficulty concerns what is to be represented as a UW (i.e., as a node in the UNL graph) and what is to be represented as a relation between UW's. How many UW's are there, for instance, in the sentence "Charles Dickens was the author of Oliver Twist"? Should "author" be represented as a UW or as a relation between "Charles Dickens" and "Oliver Twist"? Should the verb "to be" be represented as a UW or as a relation between "Charles Dickens" and "author"? Should the preposition "of" be represented as a UW or as a relation between "author" and "Oliver Twist"?
+
For instance, the English sentence "Peter killed Mary yesterday with a knife in the kitchen because of John" could be represented, in simplified UNL, as:
 +
 
 +
[[file:graph0.png|300px]]
 +
 
 +
In the above:
 +
*"Peter", "kill", "Mary", "yesterday", "knife", "kitchen" and "John" are Universal Words
 +
*"agt" (agent), "obj" (patient), "tim" (time), "ins" (instrument), "plc" (place) and "rsn" (reason) are Universal Relations
 +
*"@past", "@def" and "@indef" are Universal Attributes
 +
 
 +
=== General Principles ===
 +
The three-layered representation model poses several problems to the [[UNLization]] as the distinction between what is supposed to be represented by each unit is not always clear. One difficulty concerns what is to be represented as a UW (i.e., as a node in the UNL graph) and what is to be represented as a link between UW's. How many UW's are there, for instance, in the sentence "Charles Dickens was the author of Oliver Twist"? Should "author" be represented as a UW or as a relation between "Charles Dickens" and "Oliver Twist"? Should the verb "to be" be represented as a UW or as a relation between "Charles Dickens" and "author"? Should the preposition "of" be represented as a UW or as a relation between "author" and "Oliver Twist"?
  
 
Given the difficulty to categorize concepts, the UNL assumes the following principles:
 
Given the difficulty to categorize concepts, the UNL assumes the following principles:
  1. If the information can only be conveyed by open lexical categories (nouns, adjectives, adverbs and verbs), it is represented by '''UW''''s;
+
  1. If the information can only be conveyed by open lexical categories (nouns, adjectives, adverbs and verbs), or if it is conveyed by pronouns and numbers,
  2. If the information can be conveyed, in any language, by grammatical categories (affixes, articles, auxiliary verbs, copula, classifiers, conjunctions, interjections, prepositions) or by syntactic phenomena (word order, agreement, case marking), it is represented
+
    it is represented by '''UW''''s, i.e., as nodes in the UNL graph;
  2.1. as '''attributes''', if the information is not relational, i.e., if it can be associated to a single node (or hyper-node) in the graph; or
+
  2.2. as '''relations''', if the information is relational, i.e., if it is used to link two nodes in the graph.
+
  2. If the information can be conveyed, in any language,  
 +
    by closed class categories (affixes, determiners, auxiliary verbs, copula, classifiers, conjunctions, interjections and prepositions), or
 +
    by syntactic phenomena (word order, agreement, government and case marking),  
 +
    it is represented
 +
    2.1. as '''attributes''', if the information is not relational, i.e., if it can be associated to a single node (or hyper-node) in the graph; or
 +
    2.2. as '''relations''', if the information is relational and reducible to the set of Universal Relations; or
 +
    2.3. as '''relations and attributes''', if the information is relational but not reducible to the set of Universal Relations.
 +
 
 +
=== Examples ===
 +
;(1) Mary died
 +
[[file:graph1.png|200px]]
 +
 
 +
Consider, for instance, the sentence<br />
 +
(1) Mary died.<br />
 +
This sentence is said to convey the following information<br />
 +
(1a) There is Mary (i.e., there is someone named Mary)<br />
 +
(1b) There is the process of dying<br />
 +
(1c) There is a relation between "Mary" and "die" (i.e., "Mary" undergoes a change of state expressed by "dying")<br />
 +
(1d) The fact described by (1c) happened in the past<br />
 +
The information conveyed by (1a) and (1b) can only be expressed by open lexical categories (noun and verb, respectively) and, therefore, (1a) and (1b) are defined as UW's, i.e., nodes in the graph. The information conveyed by (1c) cannot be said to be represented by a lexical item (such as "Mary" or "die"); it is defined by the position of the words in the sentence, i.e., by the fact that "Mary" comes right before "die". Actually, this information is relational, i.e., it links "Mary" and "die". This relation ("patient") is already part of the repertoire of Universal Relations and can be expressed by the tag "obj". The information conveyed by (1d) is not relational, in the sense that it does not link two nodes, but rather modify the whole relation between "Mary" and "die". As it is not relational, and can be expressed by closed class categories (the suffix "-d"), it is represented by the attribute @past, to be assigned to head of the relation (the UW "die").
 +
 
 +
;(2) The book is on the table
 +
[[file:graph2.png|200px]]
  
The only exception to this rule are pronouns and numerals, which are always represented as UWs. The former because they replace (and act as) nouns; the latter because they do not represent a really closed set (the set of numbers is infinite).  
+
Consider, now, the sentence<br />
 +
(2) The book is on the table.<br >
 +
This sentence is said to convey the following information<br />
 +
(2a) There is a book<br />
 +
(2b) We know this book (i.e., this book is definite)<br />
 +
(2c) There is a table<br />
 +
(2d) We know this table (i.e., this table is definite)<br />
 +
(2e) There is a relation between "book" and "table"<br />
 +
(2f) The relation (2e) is of the type "on" (and not "under", or "inside")<br />
 +
The information conveyed by (2a) and (2c) is, again, expressed by open lexical categories (noun, in both cases) and cannot be reduced to any closed class category. The information conveyed by (2b) and (2d), which is expressed by the article "the", modifies isolated nodes and, accordingly, is not relational: (2b) modifies (2a), and (2d) modifies (2c). They are therefore expressed by attributes (@def): book.@def and table.@def. The information conveyed by (2e) can be associated to the copula ("is") and it is definitely relational: it links "book" to "table". This relation is said to describe a "place", which is also part of the repertoire of Universal Relations (expressed by "plc"). However, "plc(book;table)" is too vague to express the information conveyed by the sentence, which explicitly indicates that the book is "on" the table. The information conveyed by (2f) is also relational and is expressed by a preposition ("on"). Ideally, we would have a relation "place_on", and we would represent (2) as "place_on(book;table)" instead of simply "plc(book;table)". But this would lead to several other relations: place_on, place_above, place_under, place_in_front_of, and so on. In order to avoid the proliferation of the repertoire of relations, we have decided to express these details by the combination of relations and attributes, i.e., "plc(book;table.@on)".
  
Let's consider, for instance, the case of "Charles Dickens was the author of Oliver Twist". In this sentence, we notice the following:
+
;(3) Charles Dickens was the author of Oliver Twist
*The concept of "Charles Dickens" and "Oliver Twist" can only be realized by an open lexical category (noun) and, therefore, must be represented as UW's.
+
[[file:graph3.png|300px]]
*The concept of "author" (= "writer" or "creator") can only be realized by an open lexical category (noun) and, therefore, must be represented as a UW<ref>One may argue that this concept may be represented by the derivational suffixes, such as -er (as in "writer") or -or (as in "creator"), or by the preposition "by" (as in "Oliver Twist by Charles Dickens"), but this is not accurate, since both -er and -or are used for "one who performs an action", "by" actually denotes an agent, and none of them can fully replace "author" in this context.</ref> 
+
*The concept of past (as in "was") is realized, in several languages, by inflectional affixes (as the English suffix "-ed", in "killed") and affects only one UW (the verb "to be"). It must be represented, therefore, by an attribute: @past
+
*The concept of definite (as in "the") is also realized, in several languages, by affixes (as the Romanian suffix "-ul", in "omul" = the man) and, even in English, it is represented by a closed class category (article). As it is non-relational, it must be represented by an attribute: @def
+
*The concept of "to be" (= to equal in identity) is not lexicalized in several languages and, even in English, it is not represented by an open lexical category<ref>Linking verbs (copula) are not really productive and cannot be said to be an open lexical category.</ref>. However, as it involves more than one UW (it links the subject to the predicate), it must be represented as a relation: aoj
+
*The concept conveyed by "of" (= origin) is represented, in several languages, by affixes (case markers) and, even in English, is represented by a closed class category (preposition). As it is relational (it links "author" to "Oliver Twist"), it must be represented as a relation: cnt.
+
  
== Notes ==
+
At last, let's come back to the case of<br />
<references />
+
(3) Charles Dickens was the author of Oliver Twist.<br />
 +
In this sentence, we notice the following:<br />
 +
(3a) There is Charles Dickens<br />
 +
(3b) There is Oliver Twist<br />
 +
(3c) There is the concept of "author"<br />
 +
(3d) There is a relation between "Charles Dickens" and "author" (we can say that "Charles Dickens is an author")<br />
 +
(3e) There is a relation between "Oliver Twist" and "author" (we can say that "Oliver Twist is the product of an author")<br />
 +
(3f) There is a relation between "Charles Dickens" and "Oliver Twist" (and this relation is mediated by the concept of "author")<br />
 +
(3g) The relation described by (3f) happened in the past<br />
 +
Once again, (3a) and (3b) can only be realized by an open lexical category (noun) and, therefore, must be represented as UW's. The concept conveyed by (3c) is far more controversial. One
 +
may argue that this concept may be represented by derivational suffixes, such as -er (as in "writer") or -or (as in "creator"), or by the preposition "by" (as in "Oliver Twist by Charles Dickens"), but this is not really accurate, since both -er and -or are used for "one who performs an action", and "by" denotes rather an agent. None of them can fully replace "author" in this context, i.e., as someone who writes a book. Accordingly, (3c) should also be expressed by a UW. The information conveyed by (3d) and (3e) is relational and fits existing relations in the repertoire of Universal Relations: attribute ("author" is an attribute of "Charles Dickens", or aoj(Charles Dickens;author)), and content ("Oliver Twist" is the theme of "author", or cnt(author;Oliver Twist)). The relation (3f) poses a problem to UNL because, in UNL, relations must be necessarily binary, i.e., they must have only two arguments. But this is solved by the general assumption that, given rel(a;b) and rel(b;c), "a" is related to "c" through "b". As for (3g), this is another source of problem. We know that the scope of the past tense, in this case, is not the whole sentence (the fact that Charles Dickens wrote Oliver Twist is still true); the information of past is rather related to Charles Dickens, in the sense that it indicates that he no longer lives. In any case, this information is not relational, and must be expressed by the attribute .@past, to be assigned to "Charles Dickens".

Latest revision as of 09:23, 5 January 2014

The main goal of the UNL is to represent, in a machine-tractable format, the information conveyed by natural language documents. In the UNL framework, this information is represented by a semantic network, i.e., a network which represents semantic relations between concepts. This semantic network, or UNL graph, is made of three different types of discrete semantic entities: Universal Words, Universal Relations and Universal Attributes. Universal Words, or simply UW's, are the nodes in the semantic network; Universal Relations are arcs linking UW's; and Universal Attributes are used to instantiate UW's.

For instance, the English sentence "Peter killed Mary yesterday with a knife in the kitchen because of John" could be represented, in simplified UNL, as:

Graph0.png

In the above:

  • "Peter", "kill", "Mary", "yesterday", "knife", "kitchen" and "John" are Universal Words
  • "agt" (agent), "obj" (patient), "tim" (time), "ins" (instrument), "plc" (place) and "rsn" (reason) are Universal Relations
  • "@past", "@def" and "@indef" are Universal Attributes

General Principles

The three-layered representation model poses several problems to the UNLization as the distinction between what is supposed to be represented by each unit is not always clear. One difficulty concerns what is to be represented as a UW (i.e., as a node in the UNL graph) and what is to be represented as a link between UW's. How many UW's are there, for instance, in the sentence "Charles Dickens was the author of Oliver Twist"? Should "author" be represented as a UW or as a relation between "Charles Dickens" and "Oliver Twist"? Should the verb "to be" be represented as a UW or as a relation between "Charles Dickens" and "author"? Should the preposition "of" be represented as a UW or as a relation between "author" and "Oliver Twist"?

Given the difficulty to categorize concepts, the UNL assumes the following principles:

1. If the information can only be conveyed by open lexical categories (nouns, adjectives, adverbs and verbs), or if it is conveyed by pronouns and numbers,
   it is represented by UW's, i.e., as nodes in the UNL graph;

2. If the information can be conveyed, in any language, 
   by closed class categories (affixes, determiners, auxiliary verbs, copula, classifiers, conjunctions, interjections and prepositions), or
   by syntactic phenomena (word order, agreement, government and case marking), 
   it is represented
   2.1. as attributes, if the information is not relational, i.e., if it can be associated to a single node (or hyper-node) in the graph; or
   2.2. as relations, if the information is relational and reducible to the set of Universal Relations; or
   2.3. as relations and attributes, if the information is relational but not reducible to the set of Universal Relations.

Examples

(1) Mary died

Graph1.png

Consider, for instance, the sentence
(1) Mary died.
This sentence is said to convey the following information
(1a) There is Mary (i.e., there is someone named Mary)
(1b) There is the process of dying
(1c) There is a relation between "Mary" and "die" (i.e., "Mary" undergoes a change of state expressed by "dying")
(1d) The fact described by (1c) happened in the past
The information conveyed by (1a) and (1b) can only be expressed by open lexical categories (noun and verb, respectively) and, therefore, (1a) and (1b) are defined as UW's, i.e., nodes in the graph. The information conveyed by (1c) cannot be said to be represented by a lexical item (such as "Mary" or "die"); it is defined by the position of the words in the sentence, i.e., by the fact that "Mary" comes right before "die". Actually, this information is relational, i.e., it links "Mary" and "die". This relation ("patient") is already part of the repertoire of Universal Relations and can be expressed by the tag "obj". The information conveyed by (1d) is not relational, in the sense that it does not link two nodes, but rather modify the whole relation between "Mary" and "die". As it is not relational, and can be expressed by closed class categories (the suffix "-d"), it is represented by the attribute @past, to be assigned to head of the relation (the UW "die").

(2) The book is on the table

Graph2.png

Consider, now, the sentence
(2) The book is on the table.
This sentence is said to convey the following information
(2a) There is a book
(2b) We know this book (i.e., this book is definite)
(2c) There is a table
(2d) We know this table (i.e., this table is definite)
(2e) There is a relation between "book" and "table"
(2f) The relation (2e) is of the type "on" (and not "under", or "inside")
The information conveyed by (2a) and (2c) is, again, expressed by open lexical categories (noun, in both cases) and cannot be reduced to any closed class category. The information conveyed by (2b) and (2d), which is expressed by the article "the", modifies isolated nodes and, accordingly, is not relational: (2b) modifies (2a), and (2d) modifies (2c). They are therefore expressed by attributes (@def): book.@def and table.@def. The information conveyed by (2e) can be associated to the copula ("is") and it is definitely relational: it links "book" to "table". This relation is said to describe a "place", which is also part of the repertoire of Universal Relations (expressed by "plc"). However, "plc(book;table)" is too vague to express the information conveyed by the sentence, which explicitly indicates that the book is "on" the table. The information conveyed by (2f) is also relational and is expressed by a preposition ("on"). Ideally, we would have a relation "place_on", and we would represent (2) as "place_on(book;table)" instead of simply "plc(book;table)". But this would lead to several other relations: place_on, place_above, place_under, place_in_front_of, and so on. In order to avoid the proliferation of the repertoire of relations, we have decided to express these details by the combination of relations and attributes, i.e., "plc(book;table.@on)".

(3) Charles Dickens was the author of Oliver Twist

Graph3.png

At last, let's come back to the case of
(3) Charles Dickens was the author of Oliver Twist.
In this sentence, we notice the following:
(3a) There is Charles Dickens
(3b) There is Oliver Twist
(3c) There is the concept of "author"
(3d) There is a relation between "Charles Dickens" and "author" (we can say that "Charles Dickens is an author")
(3e) There is a relation between "Oliver Twist" and "author" (we can say that "Oliver Twist is the product of an author")
(3f) There is a relation between "Charles Dickens" and "Oliver Twist" (and this relation is mediated by the concept of "author")
(3g) The relation described by (3f) happened in the past
Once again, (3a) and (3b) can only be realized by an open lexical category (noun) and, therefore, must be represented as UW's. The concept conveyed by (3c) is far more controversial. One may argue that this concept may be represented by derivational suffixes, such as -er (as in "writer") or -or (as in "creator"), or by the preposition "by" (as in "Oliver Twist by Charles Dickens"), but this is not really accurate, since both -er and -or are used for "one who performs an action", and "by" denotes rather an agent. None of them can fully replace "author" in this context, i.e., as someone who writes a book. Accordingly, (3c) should also be expressed by a UW. The information conveyed by (3d) and (3e) is relational and fits existing relations in the repertoire of Universal Relations: attribute ("author" is an attribute of "Charles Dickens", or aoj(Charles Dickens;author)), and content ("Oliver Twist" is the theme of "author", or cnt(author;Oliver Twist)). The relation (3f) poses a problem to UNL because, in UNL, relations must be necessarily binary, i.e., they must have only two arguments. But this is solved by the general assumption that, given rel(a;b) and rel(b;c), "a" is related to "c" through "b". As for (3g), this is another source of problem. We know that the scope of the past tense, in this case, is not the whole sentence (the fact that Charles Dickens wrote Oliver Twist is still true); the information of past is rather related to Charles Dickens, in the sense that it indicates that he no longer lives. In any case, this information is not relational, and must be expressed by the attribute .@past, to be assigned to "Charles Dickens".

Software