Introduction to UNL

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Assumptions)
(Example)
Line 31: Line 31:
 
;There are semantic universals
 
;There are semantic universals
 
:The UNL assumes that any information conveyed by natural languages is '''translatable''', i.e., that natural languages differ, not in their power to express information, but in the way they do that. This means that there should be a sort of common semantic denominator between languages that ensures their intertranslatability. There are two approaches to this hypothesis: the weak proclaims that this denominator varies according to the different language pairs; the strong states that, above and beyond this variation, there is a common denominator to all languages, a set of semantic universals or primitives that could be derived from the fact that humans share the same underlying biological infrastructure for perceiving and categorizing the world. The UNL follows the strong approach, but it does not make any claim concerning the psychological reality of these universal entities.
 
:The UNL assumes that any information conveyed by natural languages is '''translatable''', i.e., that natural languages differ, not in their power to express information, but in the way they do that. This means that there should be a sort of common semantic denominator between languages that ensures their intertranslatability. There are two approaches to this hypothesis: the weak proclaims that this denominator varies according to the different language pairs; the strong states that, above and beyond this variation, there is a common denominator to all languages, a set of semantic universals or primitives that could be derived from the fact that humans share the same underlying biological infrastructure for perceiving and categorizing the world. The UNL follows the strong approach, but it does not make any claim concerning the psychological reality of these universal entities.
 
== Example ==
 
 
In the UNL approach, information conveyed by natural language is represented as a hypergraph composed of a set of directed binary labelled links (referred to as “[[relations]]”) between nodes or hypernodes (the “[[Universal Words]]”, or simply “UW”), which stand for concepts. UWs can also be annotated with “[[attributes]]" representing context information..
 
 
As a matter of example, the English sentence ‘The sky was blue?!’ can be represented in UNL as follows:
 
 
[[Image:Unl.ht1.gif]]
 
 
In the example above, "sky(icl>natural world)" and "blue(icl>color)", which represent individual concepts, are UWs; "aoj" (= attribute of an object) is a directed binary semantic relation linking the two UWs; and "@def", "@interrogative", "@past", "@exclamation" and "@entry" are attributes modifying UWs.
 
 
UWs are supposed to represent universal concepts and are expressed here in English words in order to be readable. They consist of "headword" (the UW root) and a "constraint list" (the UW suffix between parentheses), the latter being used to disambiguate the general concept conveyed by the former. The set of UWs constitute the [[UNL Dictionary]]. The UWs are defined in the [[UNL Knowledge Base]] (UNLKB), and are exemplified in the [[UNL Example Base]] (UNLEB). 
 
 
Relations are expected to represent semantic links between concepts or sets of concepts in every existing language. They can be ontological (such as "icl" and "iof" referred to above), logical (such as "and" and "or") and thematic (such as "agt" = agent, "ins" = instrument, "tim" = time, "plc" = place, etc). 
 
 
Attributes represent information that cannot be conveyed by UWs and relations. Normally, they represent information on tense (".@past", "@future", etc), reference ("@def", "@indef", etc), modality ("@can", "@must",  etc), focus ("@topic", "@focus", etc), and other closed class categories.
 

Revision as of 20:25, 31 August 2012

The Universal Networking Language (UNL) is a knowledge representation language that has been used in several different fields of natural language processing, such as machine translation, multilingual document generation, summarization, information retrieval and extraction, sentiment analysis and semantic reasoning.

History

The UNL Programme started in 1996, as an initiative of the Institute of Advanced Studies of the United Nations University in Tokyo, Japan. In January 2001, the United Nations University set up an autonomous organization, the UNDL Foundation, to be responsible for the development and management of the UNL Programme. The Foundation, a non-profit international organisation, has an independent identity from the United Nations University, although it has special links with the UN. It inherited from the UNU/IAS the mandate of implementing the UNL Programme. Its headquarters are based in Geneva, Switzerland.

The UNL Programme has already crossed important milestones. The overall architecture of the UNL System has been developed with a set of basic software and tools necessary for its functioning. These are being tested and improved. A vast amount of linguistic resources from the various native languages already under development has been accumulated in the last few years. Moreover, the technical infrastructure for expanding these resources is already in place, thus facilitating the participation of many more languages in the UNL system from now on. A growing number of scientific papers and academic dissertations on the UNL are being published every year.

The most visible accomplishment so far is the recognition by the Patent Co-operation Treaty (PCT) of the innovative character and industrial applicability of the UNL, which was obtained in May 2002 through the World Intellectual Property Organisation (WIPO). Acquiring the patent for the UNL is a completely novel achievement within the United Nations.

Commitments

The main goal of the UNL Programme is to construct the UNL, an artificial language that could be used to process information across the language barriers. The major commitments of the UNL are the following:

I - The UNL must represent knowledge
The UNL is an artificial language designed to represent knowledge. In this sense, the UNL is first and foremost a knowledge representation language. The most important corollary of this first commitment is that UNL is not a meta-language, i.e., it is not intended to describe or represent natural languages; on the contrary, it is used to describe and represent the information conveyed by natural languages. The goal of UNL is to represent "what was meant" and not "what was said" or "how it was said". Accordingly, the UNL is said to provide an interpretation rather than a translation of a given utterance, and should be understood as a declarative language (instead of a procedural language). The UNL version of an existing document is not committed to preserve the lexical and the syntactic choices of the original, but must represent, in a non-ambiguous format, one of its possible meanings, preferably the most conventional one. For instance, given a performative utterance such as "Can you pass me the salt?", the role of the UNL is to represent that "someone uttered a polite request for another person to pass him or her the salt", and the UNL representation itself will not be a request, in the sense that it will not be bound to provoke the same (perlocucionary) effect caused by the original utterance. This means that the UNL is not expected to perform speech acts (such as promises, requests, orders etc), but only to represent their meaning in a constative manner.
II - The UNL must be language-independent
The linguistic neutrality of the UNL is one of its most imperative and strong commitments and must be understood in its two different senses: the political and the technical. Politically, the UNL is expected to be the language of the United Nations and, therefore, must not be circumscribed to any existing natural language in particular, under the risk of being rejected by the state members of the General Assembly. Technically, the UNL document must be independent from any source or target languages, i.e., it should be as semantically complete and saturated as possible. In the UNL approach, there are two basic movements: UNLization and NLization. UNLization is the process of representing the information conveyed by natural language into UNL; NLization, conversely, is the process of generating a natural language document out of UNL. These processes should be completely independent, i.e., the UNLization should not take into consideration which will be the target language of any future NLization; and the NLization should not need any information about the original source language of any UNL document.
III - The UNL must be general-purpose
At first glance, the UNL seems to be an "interlingua", a sort of pivot-language to which the source texts are converted before being translated into the target languages. It can, in fact, be used for such a purpose, but its primary objective is to serve as an infrastructure for handling knowledge rather than individual languages. In addition to translation, the UNL is expected to be used in several other different tasks, such as text mining, multilingual document generation, summarization, text simplification, information retrieval and extraction, sentiment analysis etc. Indeed, in UNL-based systems there is no need for the source language to be different from the target language: an English text may be represented in UNL in order to be generated, once again, in English, as a summarized, a simplified, a localized or a simply rephrased version of the original.
IV - The UNL must be machine-tractable
The UNL is a formal system designed for computers. It is an artificial language shaped to represent knowledge in a machine-tractable format. Like other logical systems, it seeks to provide the linguistic and semiotic infrastructure for computers to handle what is meant by natural languages. Differently from other auxiliary languages (such as Esperanto, Interlingua, Volapük, Ido and others), the UNL is not intended to be a human language. We do not expect people to speak UNL or to communicate in UNL. And it must be opaque to the end users. As no one is required to know HTML to browse the Internet or even to create websites, everyone should be able to write and read documents in UNL without any knowledge of UNL.

Assumptions

Languages convey information about the world
The very basic assumption of the UNL approach is that one of the most outstanding uses of natural languages is to convey information, i.e., that natural languages can be used to represent what we know about the world. This aboutness of natural languages, i.e., its representational role, is the main object of the UNL, which is expected, not to do what natural languages do, but to represent what they represent.
Information can be represented by semantic networks
The UNL assumes that any information conveyed by natural language can be formally and usefully represented by a semantic network. This idea is not new. Semantic networks have been used in knowledge representation at least since Charles S. Peirce, and as an interlingua for machine translation since the 1950's. In the UNL approach, this semantic network (or UNL graph) is made of three different types of discrete semantic entities: Universal Words, relations and attributes. This three-layered representation model is the cornerstone of the UNL, and its most distinctive feature over other semantic networks, which normally propose only two levels: edges and vertices. The reason for distributing the semantic content of a proposition into three different levels comes from first-order logic: Universal Words stand for objects, which may be either simple or complex, and figure as nodes (or hyper-nodes, if complex) in the graph; relations stand for predicates, which are always binary and directed, and figure as links between nodes, in order to form the graph; and attributes play a role very similar to that of quantifiers in first-order predicate calculus: they are operators that bind objects ranging over a domain of discourse. The only difference is that they are not limited to quantification, but represent any type of specification. For instance, the English phrase "the boy kissed the girl" could be represented, in UNL, as


There are semantic universals
The UNL assumes that any information conveyed by natural languages is translatable, i.e., that natural languages differ, not in their power to express information, but in the way they do that. This means that there should be a sort of common semantic denominator between languages that ensures their intertranslatability. There are two approaches to this hypothesis: the weak proclaims that this denominator varies according to the different language pairs; the strong states that, above and beyond this variation, there is a common denominator to all languages, a set of semantic universals or primitives that could be derived from the fact that humans share the same underlying biological infrastructure for perceiving and categorizing the world. The UNL follows the strong approach, but it does not make any claim concerning the psychological reality of these universal entities.
Software