I UNL Panel

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Structure)
(Issues)
Line 33: Line 33:
 
*[[UW]] (to be criticized, if necessary)
 
*[[UW]] (to be criticized, if necessary)
  
== Issues ==
+
== Questions ==
 
Considering the commitments, assumptions and properties of the UNL, defined in [[Introduction to UNL]], and<br />
 
Considering the commitments, assumptions and properties of the UNL, defined in [[Introduction to UNL]], and<br />
 
Considering the state of the art of the theory and technology on natural language processing,<br />
 
Considering the state of the art of the theory and technology on natural language processing,<br />
 +
 
Which would be the most appropriate and feasible answers to the questions below?
 
Which would be the most appropriate and feasible answers to the questions below?
  
;1) How many UWs should be recognized in the sentence "Charles Dickens is generally regarded as the most important English novelist of the Victorian period"?
+
;1) How many UW's should be recognized in the sentence below?
 +
"Charles Dickens is generally regarded as the most important English novelist of the Victorian period"
 
:The basic assumption of the UNL approach is that the information conveyed by natural languages can be formally and usefully represented through semantic networks composed of three different types of discrete semantic entities: UW's, relations and attributes. UW's are nodes in the UNL graph; relations are arcs between nodes; and attributes are specifiers that restrict the extension of nodes. This three-layered representation poses several problems to the UNLization as the distinction between these three entities is not always clear. Consider, for instance, the sentence above. How many UW's (either permanent or temporary) should be recognized in this sentence?  
 
:The basic assumption of the UNL approach is that the information conveyed by natural languages can be formally and usefully represented through semantic networks composed of three different types of discrete semantic entities: UW's, relations and attributes. UW's are nodes in the UNL graph; relations are arcs between nodes; and attributes are specifiers that restrict the extension of nodes. This three-layered representation poses several problems to the UNLization as the distinction between these three entities is not always clear. Consider, for instance, the sentence above. How many UW's (either permanent or temporary) should be recognized in this sentence?  
 
:*"Victorian period" should be represented as single UW ("Victorian period") or as two different UW's ("Victorian" and "period")?  
 
:*"Victorian period" should be represented as single UW ("Victorian period") or as two different UW's ("Victorian" and "period")?  
Line 47: Line 49:
  
 
;2) "Charles Dickens" should be represented as a permanent UW or as a temporary UW?
 
;2) "Charles Dickens" should be represented as a permanent UW or as a temporary UW?
:The [[UNL Dictionary]] contains only permanent UWs. Untranslatable expressions, even though transliterated, are not included in the dictionary, but may be used in the UNL graphs as temporary UWs. This is the obvious case for URL's, e-mail addresses, phone numbers, formulae etc. However, there are cases in which these criteria are still under dispute: proper names (of people, of places, of brands etc.), for instance. When they should be considered permanent UWs (and included in the UNL Dictionary) and when they should not? Consider, for instance, the case of "Charles Dickens". Should it be defined as a permanent UW and included in the UNL Dictionary? Or should it be treated as a temporary UW? What about the "Charles Dickens Museum" located in London? And the bar and restaurant "Charles Dickens", located in Southwark? And the "Charles Dickens School", located in Kent? What about the other many named entities under the same name "Charles Dickens"? Should they be included in the UNL Dictionary? If so, how to manage the size of the dictionary? If not, how to decide which proper names must be included in the UNL Dictionary?
+
:The [[UNL Dictionary]] contains only permanent UW's. Untranslatable expressions, even though transliterated, are not included in the dictionary, but may be used in the UNL graphs as temporary UW's. This is the obvious case for URL's, e-mail addresses, phone numbers, formulae etc. However, there are cases in which these criteria are still under dispute: proper names (of people, of places, of brands etc.), for instance. When they should be considered permanent UW's (and included in the UNL Dictionary) and when they should not? Consider, for instance, the case of "Charles Dickens". Should it be defined as a permanent UW and included in the UNL Dictionary? Or should it be treated as a temporary UW? Consider also the cases of "Charles J Dickens" (an American citizen born on 06/17/1949 and died on 10/21/2004; the "Charles Dickens Museum", located in London; the bar and restaurant "Charles Dickens", located in Southwark; the "Charles Dickens School", located in Kent; and other entities named "Charles Dickens". Consider the size (and the maintenance) of the UNL Dictionary, in case you suggest to treat them all as permanent UW's; or, otherwise, consider how to handle concepts that have not been included in the UNL Dictionary.
  
;3) How "serendipity" should be represented in UNL?
+
;3) "Beauty" (= "the qualities that give pleasure to the senses"), "beautiful" (= "delighting the senses"), "beautifully" (= "in a beautiful manner") and "beautify" (= "to make or become beautiful") should be represented as simple or complex UW's?
:In English, "the faculty of making fortunate discoveries by accident" may be represented by a single lexical item: "serendipity". In most languages, this concept is not lexicalized, although it can obviously be expressed through approximate periphrases (such as "heureux hasard", in French, or "descubrimiento inesperado", in Spanish). How "serendipity" should be represented in UNL? Should it be represented as a temporary UW (not to be included in the UNL Dictionary) or as a permanent UW (to be included in the UNL Dictionary)? In the latter case, how this concept should be represented: as a simple or as a complex UW? In the former case, how to deal with culture-bound concepts that are not included inside the UNL Dictionary?
+
 
+
 
+
;4) "Beauty" (= "the qualities that give pleasure to the senses"), "beautiful" (= "delighting the senses"), "beautifully" (= "in a beautiful manner") and "beautify" (= "to make or become beautiful") should be represented as simple or complex UW's?
+
 
:In the current framework, UW's can be simple, compound or complex. A simple UW is represented as a node in the UNL graph. A compound UW is represented as a node with attribute(s). A complex UW is represented as a sub-graph, i.e., as a set of interlinked nodes. This offers different possibilities of representing the concepts above. For instance:
 
:In the current framework, UW's can be simple, compound or complex. A simple UW is represented as a node in the UNL graph. A compound UW is represented as a node with attribute(s). A complex UW is represented as a sub-graph, i.e., as a set of interlinked nodes. This offers different possibilities of representing the concepts above. For instance:
  
Line 84: Line 82:
 
|}
 
|}
 
:Which is the best way to represent these concepts? Consider the fact that some of these concepts are not lexicalized in all languages. Consider also the actual importance of part-of-speech for lexical semantics. Consider, at last, the actual "compositionality" of these concepts.
 
:Which is the best way to represent these concepts? Consider the fact that some of these concepts are not lexicalized in all languages. Consider also the actual importance of part-of-speech for lexical semantics. Consider, at last, the actual "compositionality" of these concepts.
 +
 +
;4) How "serendipity" should be represented in UNL?
 +
:In English, "the faculty of making fortunate discoveries by accident" may be represented by a single lexical item: "serendipity". In most languages, this concept is not lexicalized, although it can obviously be expressed through approximate periphrases (such as "heureux hasard", in French, or "descubrimiento inesperado", in Spanish). How "serendipity" should be represented in UNL? Should it be represented as a temporary UW (not to be included in the UNL Dictionary) or as a permanent UW (to be included in the UNL Dictionary)? In the former case, how to deal with culture-bound language-dependent concepts that are not included inside the UNL Dictionary? In the latter case, how this concept should be represented: as a simple or as a complex UW?
  
 
== Notes ==
 
== Notes ==
 
<references />
 
<references />

Revision as of 15:28, 20 September 2012

The main purpose of the UNL Panel is to collect the opinion of specialists, from inside and outside the UNL Community, about technical issues of the UNL, as to prepare the ground for an in-depth revision of the current specifications. The I UNL Panel, which has been proposed as an associated event to COLING'2012, is devoted to the set, the notation and the properties of UW's.

Contents

Rationale

The Universal Networking Language (UNL) is an artificial language created to process information across language barriers. It was initially proposed by the Institute of Advanced Studies of the United Nations University, in Tokyo, Japan, in 1996, and has been enhanced and promoted by the UNDL Foundation, in Geneva, Switzerland, under a mandate of the United Nations, since 2000.

Originally designed more than 15 years ago, the UNL has not escaped from the action of time and has not incorporated yet several recent advances in the domain of natural language processing. In order to prepare the ground for the necessary updates to the present specifications, the UNDL Foundation set the UNL Panel initiative and proposes a three-chapter dialogue with the UNL community and other researchers. In each chapter, the UNDL Foundation will invite specialists, from inside and outside the UNL Community, to present their positions and views about technical issues concerning the UNL. The first meeting will be dedicated to the Universal Words, the second will focus on relations and attributes, and the third will be devoted to the UNL document structure.

Structure

In order to take the best directions concerning the nature and the role of the UW's, the UNDL Foundation will listen to 6 specialists, from inside and outside the UNL Community, about the 5 questions below. These questions illustrate some theoretical and practical issues concerning UW's and have been receiveing several different possible answers. The main goal of I UNL Panel is to discuss which answers would be more appropriate and feasible, considering the nature and role of the UNL, and the state of the art of the theory and technology on natural language processing.

Participants are expected to use the particular cases below as starting points for their presentations, but we would expect them to suggest some general procedures to be adopted in similar cases, which could either confirm or deny our current practices, defined in the section UW's, and which have been object of revision. Participants should understand, however, that only the structure of UNL is under discussion. The commitments, assumptions and properties of the UNL, which are the keystones of the language and are presented in the Introduction to UNL, should be taken for granted, and are expected to be used as the general framework for all the answers.

The specialists are requested to explain their positions both in a paper in a question-answer format (to be published at the UNLweb) and in a 30-minute oral presentation (to be delivered during the meeting). The oral presentations will be followed by a discussion session, according to the tentative program below.

Program (tentative)

Saturday, December 15th, 2012

  • 09:00-09:30 - Opening session
  • 09:30-10:00 - General presentation of the questions
  • 10:00-10:30 - First presentation
  • 10:30-11:00 - Coffee-break
  • 11:00-11:30 - Second presentation
  • 11:30-12:00 - Third presentation
  • 12:00-14:00 - Lunch break
  • 14:00-14:30 - Fourth presentation
  • 14:30-15:00 - Fifth presentation
  • 15:00-15:30 - Sixth presentation
  • 15:30-16:00 - Coffee-break
  • 16:00-17:30 - Discussion session
  • 17:30-18:00 - Closing session

Background

Questions

Considering the commitments, assumptions and properties of the UNL, defined in Introduction to UNL, and
Considering the state of the art of the theory and technology on natural language processing,

Which would be the most appropriate and feasible answers to the questions below?

1) How many UW's should be recognized in the sentence below?
"Charles Dickens is generally regarded as the most important English novelist of the Victorian period"
The basic assumption of the UNL approach is that the information conveyed by natural languages can be formally and usefully represented through semantic networks composed of three different types of discrete semantic entities: UW's, relations and attributes. UW's are nodes in the UNL graph; relations are arcs between nodes; and attributes are specifiers that restrict the extension of nodes. This three-layered representation poses several problems to the UNLization as the distinction between these three entities is not always clear. Consider, for instance, the sentence above. How many UW's (either permanent or temporary) should be recognized in this sentence?
  • "Victorian period" should be represented as single UW ("Victorian period") or as two different UW's ("Victorian" and "period")?
  • The verb "to be" should be represented as a UW or as a relation between "Charles Dickens" and "the most important English novelist of the Victorian period"? (Consider also the options "was" and "has been" in the same context)
  • The preposition "of" should be represented as a UW or as a relation between "the most important novelist" and "the Victorian period"? (Consider also the options "since", "from ... on", "in" or "during" instead of "of")
  • "generally regarded as" should be represented by UW's ("generally", "regarded", "as", for instance) or as an attribute (a downtoner, which lowers the truth effect of the declaration) to be assigned to the whole proposition "Charles Dickens is the most important English novelist of the Victorian period"?
  • The adverb "most" should be represented as a UW or as a superlative marker (to be represented as an attribute to be assigned to the adjective "important"?) (Consider also "greatest English novelist" instead of "most important English novelist")
2) "Charles Dickens" should be represented as a permanent UW or as a temporary UW?
The UNL Dictionary contains only permanent UW's. Untranslatable expressions, even though transliterated, are not included in the dictionary, but may be used in the UNL graphs as temporary UW's. This is the obvious case for URL's, e-mail addresses, phone numbers, formulae etc. However, there are cases in which these criteria are still under dispute: proper names (of people, of places, of brands etc.), for instance. When they should be considered permanent UW's (and included in the UNL Dictionary) and when they should not? Consider, for instance, the case of "Charles Dickens". Should it be defined as a permanent UW and included in the UNL Dictionary? Or should it be treated as a temporary UW? Consider also the cases of "Charles J Dickens" (an American citizen born on 06/17/1949 and died on 10/21/2004; the "Charles Dickens Museum", located in London; the bar and restaurant "Charles Dickens", located in Southwark; the "Charles Dickens School", located in Kent; and other entities named "Charles Dickens". Consider the size (and the maintenance) of the UNL Dictionary, in case you suggest to treat them all as permanent UW's; or, otherwise, consider how to handle concepts that have not been included in the UNL Dictionary.
3) "Beauty" (= "the qualities that give pleasure to the senses"), "beautiful" (= "delighting the senses"), "beautifully" (= "in a beautiful manner") and "beautify" (= "to make or become beautiful") should be represented as simple or complex UW's?
In the current framework, UW's can be simple, compound or complex. A simple UW is represented as a node in the UNL graph. A compound UW is represented as a node with attribute(s). A complex UW is represented as a sub-graph, i.e., as a set of interlinked nodes. This offers different possibilities of representing the concepts above. For instance:
Simplified[1] UW candidates for "beauty", "beautiful", "beautifully" and "beautify"
Concept Simple UW Compound UW Complex UW
beauty beauty beauty the qualities that give pleasure to the senses
beautiful beautiful beauty.@full_of delighting the senses
beautifully beautifully beauty.@full_of.@manner in a beautiful manner
beautify beautify beauty.@full_of.@make to make or become beautiful
Which is the best way to represent these concepts? Consider the fact that some of these concepts are not lexicalized in all languages. Consider also the actual importance of part-of-speech for lexical semantics. Consider, at last, the actual "compositionality" of these concepts.
4) How "serendipity" should be represented in UNL?
In English, "the faculty of making fortunate discoveries by accident" may be represented by a single lexical item: "serendipity". In most languages, this concept is not lexicalized, although it can obviously be expressed through approximate periphrases (such as "heureux hasard", in French, or "descubrimiento inesperado", in Spanish). How "serendipity" should be represented in UNL? Should it be represented as a temporary UW (not to be included in the UNL Dictionary) or as a permanent UW (to be included in the UNL Dictionary)? In the former case, how to deal with culture-bound language-dependent concepts that are not included inside the UNL Dictionary? In the latter case, how this concept should be represented: as a simple or as a complex UW?

Notes

  1. The representations are here simplified in order to be more didactic. Simple UW's cannot be as ambiguous or English-biased as "beauty". The same for attributes such as "@full_of", "@make" or "@manner". The complex UW is actually the definition of the word. It indicates that, instead of a UW, the concept must be represented by a whole graph depicting the definition of the concept. For instance: "delighting the senses" would be represented, in simplified UNL, as obj(to delight, sense.@plural).
Software