Introduction to UNL

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
m (Protected "Introduction to UNL" ([edit=sysop] (indefinite) [move=sysop] (indefinite)))
 
(164 intermediate revisions by 4 users not shown)
Line 1: Line 1:
The '''Universal Networking Language''' (UNL) is an artificial language for representing, describing, summarizing, refining, storing and disseminating information in a natural-language-independent format. It is a kind of mark-up language which represents not the formatting but the core information of a text. As HTML annotations can be realized differently in the context of different applications, machines, displays, etc., so UNL expressions can have different realizations in different human languages. The UNL was born within the United Nations and was conceived at the Institute of the Advanced Studies of the United Nations University. It is the property of the United Nations and, therefore, an asset of all of humankind.  
+
The '''Universal Networking Language''' (UNL) is an artificial language created to represent and process information across language barriers.  
  
 
== History ==  
 
== History ==  
  
The [[UNL Programme]] started in 1996, as an initiative of the [http://www.ias.unu.edu Institute of Advanced Studies] of the [http://www.unu.edu United Nations University] in Tokyo, Japan. In January 2001, the United Nations University set up an autonomous organization, the [http://www.undlfoundation.org UNDL Foundation], to be responsible for the development and management of the UNL Programme.  The Foundation, a non-profit international organisation, has an independent identity from the United Nations University, although it has special links with the UN. It inherited from the UNU/IAS the mandate of implementing the UNL Programme so that it can fulfil its mission. Its headquarters are based in Geneva, Switzerland.
+
The [[UNL Programme]] started in 1996, as an initiative of the [http://www.ias.unu.edu Institute of Advanced Studies] of the [http://www.unu.edu United Nations University] in Tokyo, Japan. In January 2001, the United Nations University set up an autonomous organization, the [http://www.undlfoundation.org UNDL Foundation], to be responsible for the development and management of the UNL Programme.  The Foundation, a non-profit international organisation, has an independent identity from the United Nations University, although it has special links with the United Nations. It inherited from the UNU/IAS the mandate of implementing the UNL Programme. Its headquarters are based in Geneva, Switzerland.
  
From the very beginning, a consortium of university departments from all regions of the world has been engaged in developing the UNL. That's the [[UNL Society]], a global-scale network of R&D teams, involving about 200 specialists in computer science and linguistics, who are at work creating the linguistic resources and developing the web structure of the [[UNL System]]. The UNDL Foundation provides technological support and co-ordinates the implementation of the Programme.
+
The UNL Programme has already crossed important milestones. The overall architecture of the UNL System has been developed with a set of basic software and tools necessary for its functioning. These are being tested and improved. A vast amount of linguistic resources from the various native languages already under development has been accumulated in the last few years. Moreover, the technical infrastructure for expanding these resources is already in place, thus facilitating the participation of many more languages in the UNL system from now on. A growing number of scientific papers and academic dissertations on the UNL are being published every year.   
 
+
The Programme has already crossed important milestones. The overall architecture of the UNL System has been developed with a set of basic software and tools necessary for its functioning. These are being tested and improved. A vast amount of linguistic resources from the various native languages already under development has been accumulated in the last few years. Moreover, the technical infrastructure for expanding these resources is already in place, thus facilitating the participation of many more languages in the UNL system from now on. A growing number of scientific papers and academic dissertations on the UNL are being published every year.   
+
  
 
The most visible accomplishment so far is the recognition by the Patent Co-operation Treaty (PCT) of the innovative character and industrial applicability of the UNL, which was obtained in May 2002 through the World Intellectual Property Organisation (WIPO). Acquiring the patent for the UNL is a completely novel achievement within the United Nations.
 
The most visible accomplishment so far is the recognition by the Patent Co-operation Treaty (PCT) of the innovative character and industrial applicability of the UNL, which was obtained in May 2002 through the World Intellectual Property Organisation (WIPO). Acquiring the patent for the UNL is a completely novel achievement within the United Nations.
  
== Scope and Goals ==  
+
== Commitments ==
 +
The main goal of the UNL Programme is to construct the UNL, an artificial language that can be used to process information across the language barriers. The major commitments of the UNL are the following:
  
The UNL is an effort to achieve a simple basis for representing the most central aspects of information and meaning in a human-language-independent form. As a knowledge representation language, the UNL aims at coding, storing, disseminating and retrieving information independently of the original language in which it was expressed. In this sense, UNL seeks to provide the tools for overcoming the language barrier in a systematic way.
+
;I - The UNL must represent information
 +
:The UNL is first and foremost a knowledge representation language. The most important corollary of this first commitment is that UNL is not a meta-language, i.e., it is not intended to describe or represent natural languages; on the contrary, it is used to represent the information conveyed by natural languages. The goal of UNL is to represent "what was meant" and not "what was said". Accordingly, the UNL provides an '''interpretation''' rather than a translation of a given utterance. The UNL version of an existing document is not bound to preserve the lexical and the syntactic choices of the original, but must represent, in a non-ambiguous format, one of its possible meanings, preferably the most conventional one.
  
At first glance, the UNL seems to be an “interlingua”, a sort of pivot-language to which the source texts are converted before being translated into the target languages. It can, in fact, be used for such a purpose, but its primary objective is to serve as an infrastructure for handling knowledge.
+
;II - The UNL must be a language for computers
 +
:The UNL is an artificial language shaped to represent knowledge in a machine-tractable format. Like other formal systems, it seeks to provide the infrastructure for computers to handle what is meant by natural languages. Differently from other auxiliary languages (such as Esperanto, Interlingua, Volapük, Ido and others), the UNL is not intended to be a human language. We do not expect people to speak UNL or to communicate in UNL. But we do expect computers to process UNL: to generate UNL out of natural language, and vice-versa, with and without human aid. We expect computers to be able to extract information from UNL documents, and to detect paraphrases, entailments, implicatures, presuppositions, inferences, contradictions and redundancies among a set of propositions represented in UNL.  
  
In the UNL approach, there are two basic different movements: UNL-isation and NL-isation. UNL-isation is the process of representing/mapping/analysing the information conveyed by natural language utterances into UNL; NL-isation, conversely, is the process of realizing/manifesting/generating a natural language document out of a UNL graph. These processes are completely independent. For the time being, the NL-isation process is already fully automatic, but the UNL-isation process is still mostly human, even though machine-aided.
+
;III - The UNL must be self-sufficient
 +
:In the UNL approach, there are two basic movements: [[UNLization]] and [[NLization]]. UNLization is the process of representing the information conveyed by natural language into UNL; NLization, conversely, is the process of generating a natural language document out of UNL. In order to be fully "understandable" (and manageable) by machines, the UNL must be self-sufficient, i.e., should be as semantically complete and saturated as possible. The UNL representation must not depend on any implicit knowledge, and should explicitly codify all information. This means that the UNLization should be completely independent from the NLization, and vice-versa, i.e., the UNLization should not take into consideration which will be the target language or format of any future NLization; and the NLization should not need any information about the original source language or previous structure of any UNL document.  
  
Currently, the main goal of the UNL-isation process has been to map the information that is verbally elicited in the surface structure of written texts into a language-independent and machine-tractable database. This means that the UNL representation has not been committed to replicate the lexical and the syntactic choices of the original, but focuses in representing, in a non-ambiguous format, one of its possible readings, preferably the most conventional one. In this sense, the UNL representation has been an interpretation rather than a translation of a given text.  
+
;IV - The UNL must be general-purpose
 +
:At first glance, the UNL seems to be a pivot-language to which the source texts are converted before being translated into the target languages. It can, in fact, be used for such a purpose, but its primary objective is to serve as an infrastructure for handling knowledge. In addition to translation, the UNL is expected to be used in several other different tasks, such as text mining, multilingual document generation, summarization, text simplification, information retrieval and extraction, sentiment analysis etc. Indeed, in UNL-based systems there is no need for the source language to be different from the target language: an English text may be represented in UNL in order to be generated, once again, in English, as a summarized, a simplified, a localized or a simply rephrased version of the original.  
  
Indeed, it is important to note that at this point in time it would be foolish to state it possible to represent the “full” meaning of any word, sentence or text for any language. Subtleties of intention and interpretation make the “full meaning”, whatever concept we might have of it, too variable and subjective for any systematic treatment. The UNL avoids the pitfalls of trying to represent the “full meaning” of sentences or texts, targeting instead the “core” or “consensual” meaning that is most often attributed to them. In this sense, much of the subtlety of poetry, metaphor, figurative language, inuendo and other complex, indirect communicative behaviours is beyond the current scope and goals of the UNL. Instead, the UNL targets direct communicative behaviour and literal meanings as a tangible, concrete basis for much or most of human communication in practical, day-to-day settings.
+
;V - The UNL must be independent from any particular natural language
 +
:The UNL is expected to be the language of the United Nations and, therefore, must not be circumscribed to any existing natural language in particular, under the risk of being rejected by the state members of the General Assembly.
  
This is the main reason why UNL has not been exactly a machine translation project, even though machine translation is one of the possible and more obvious and promising uses of UNL. The main problem is that the practice of translation has been normally restricted to the notion of "fidelity" (or faithfulness), i.e., any translated version of a text is expected to be a replica (of the content and of the form) of the original in another language. This transfer process, however, is “all too human”, as Nietzsche said, to be replicated by the currently existing technology, which is not prepared to deal with several language (and cultural) phenomena, such as vagueness, ambiguities, metaphors, ellipses, implicatures and so on. This does not mean that natural language automatic processing, and therefore machine translation, is impracticable; it just means that it is not possible yet to do that completely without humans or in the same way humans do. The results, in any case, are likely to be different from the ones produced by humans. Several techniques (rule-based, memory-based, corpus-based) have been proposed to decrease the role of humans in natural language analyses tasks, but the results, even though already promising, are not of publishing-quality yet, and require substantial human revision.
+
== Assumptions ==
 +
;1. Languages convey information
 +
:The UNL assumes that one of the most outstanding uses of natural languages is to convey '''information''', i.e., that natural languages can be used to represent what we know about the world. This "aboutness" of natural languages, i.e., its representational role, is the main object of the UNL, which is expected, not to do what natural languages do, but to represent what they represent.
  
In addition to translation, the UNL has been exploited for several other different tasks in natural language engineering, such as multilingual document generation, summarization, text simplification, information retrieval and semantic reasoning. In UNL-based applications, there is no need for the source and the target languages to be the same: an English text may be represented in UNL in order to be generated, once again, in English, as a summarized, a simplified or a simply rephrased version of the original.  
+
;2. Information can be represented by semantic networks
 +
:The UNL assumes that any information conveyed by natural language can be formally and usefully represented by a '''semantic network'''. This idea is not new. Semantic networks have been used in knowledge representation at least since Charles S. Peirce, and as an interlingua for machine translation since the 1950's. In the UNL approach, this semantic network (or UNL graph) is made of three different types of discrete semantic entities: concepts, relations and attributes. Concepts are nodes in the network; relations are arcs linking nodes; and attributes are used to delimit the use of nodes. This three-layered representation model is the cornerstone of the UNL, and a distinctive feature over other semantic networks, which normally propose only two levels: edges and vertices.  
  
Finally, it should also be stressed that UNL, differently from other auxiliary languages (such as Esperanto, Interlingua, Volapük, Ido and others), is not intended to be a human language. We do not expect people to speak UNL or to communicate "in" UNL; only specialists will have to learn UNL. We do expect people to use UNL and to communicate "through" UNL, but in the same unconscious, invisible and spontaneous way they do with other declarative and procedural languages which are pervasive in everyday applications. As no one is required to know HTML to browse the Internet or even to create websites, everyone would be able to UNL-ize documents and to extract out of them the information needed without any knowledge of UNL. UNL is therefore a formal language designed for computers, not for humans. Like other logical systems, it seeks to provide the linguistic and semiotic infrastructure for computers (and not for humans) to handle natural languages.   
+
;3. Any information may be expressed in any language
 +
:The UNL assumes that any information conveyed by natural languages is '''translatable''', i.e., that natural languages differ, not in their power to express information, but in the way they do that. The UNL also assumes that, in order to ensure this "translatability" of information, the semantic network must be independent of any natural language in particular (i.e., it must be "universal"<ref>The idea of "universality", in UNL, must be understood in the sense of "capable of being used and understood by all" (as in "Coordinated '''Universal''' Time (UTC)", or in "'''universal''' adapter"), rather than "common to all" (as in "'''Universal''' Grammar"). See [[Universal]].</ref>). This is achieved by defining a standard (uniform) set of universally-accessible semantic entities, which are the elements of UNL: [[Universal Word]]s (or UW's), [[Universal Relations]] and [[Universal Attributes]].
 +
 
 +
== Properties ==
 +
;Non-Ambiguity
 +
:As a formal system, the UNL is not expected to have any ambiguity, at any level. The sentence "The girls saw the boy with the telescope" must be represented, in UNL, in a way that there is no ambiguity concerning the meaning of "saw" (past tense of the verb "to see" x present tense of the verb "to saw" x noun "saw") or the dependency relations of "with the telescope" ("saw with the telescope" x "the boy with the telescope").
 +
;Non-Redundancy
 +
:As a knowledge representation language, the UNL is not expected to have any redundancy. Expressions such as "free gift", "round circle" and "murder to death" are expected to be represented, in UNL, as "gift", "circle" and "murder", respectively. Likewise, sentences such as "Peter killed Mary", "Peter murdered Mary", "It's Peter who killed Mary" and "Mary was killed by Peter" are expected to be represented in UNL in the same way<ref>The differences between them can be represented by attributes such as @topic and @passive, but this is rather optional, because the goal of UNL is to represent "what was meant" and not "what was said" or "how it was said".</ref>.
 +
;Compositionality
 +
:As a formal system, the UNL is always literal, i.e., fully compositional. UNL expressions must derive their semantic value thoroughly from their components, which must be explicitly defined in the [[UNL Knowledge Base]]. Accordingly, the UNL does not allow for any figure of speech, such as metaphor and metonymy. Tropes must be represented, in UNL, by their intended meaning. A sentence such as "John devoured thousands of books", for instance, must be represented, in UNL, as "John read many books eagerly"<ref>The information that this content has been conveyed through figurative language can be indicated by the corresponding attributes (@metaphor, @hyperbole, etc.), but this is optional.</ref>.
 +
;Declarativeness
 +
:As a knowledge representation language, the UNL is not expected to perform speech acts (such as promises, requests, orders etc.), but only to describe them in a constative manner. For instance, given a performative utterance such as "Can you pass me the salt?", the role of the UNL is to represent "you pass the salt to me" and to indicate that this was a polite request<ref>This can be done by the use of the attributes @polite and @request.</ref>. The UNL representation itself will not be a request, nor will be bound to provoke the same (perlocutionary) effect caused by the original utterance.
 +
;Completeness
 +
:As a fully-explicit semantic system, the UNL is not expected to have ellipses or pro-forms, except when the referent is not present in the document (exophora). A sentence such as "The monkey took the banana and ate it" must be represented, in UNL, as "[The monkey]<sub>i</sub> took [the banana]<sub>j</sub> and [the monkey]<sub>i</sub>  ate [the banana]<sub>j</sub>".
  
 
== Structure ==
 
== Structure ==
 +
{{:Semantic network}}
  
In the UNL approach, information conveyed by natural language is represented, sentence by sentence, as a hypergraph composed of a set of directed binary labeled links (referred to as [[relations]]”) between nodes or hypernodes (the “[[Universal Words]]”, or simply “UW”), which stand for concepts. UWs can also be annotated with “[[attributes]]" representing context information..
+
== UNL Specs ==
 
+
The structure of the UNL is defined by the [[Specs|UNL Specs]]. The UNL Specs specify the structure of a UNL document; the syntax of a UNL graph; the syntax of Universal Words; the set of relations; the set of attributes; and all the information concerning UNL as a formalism:
As a matter of example, the English sentence ‘The sky was blue?!’ can be represented in UNL as follows:
+
*[[Universal Words]]
 
+
*[[Universal Attributes]]
[[Image:Unl.ht1.gif]]
+
*[[Universal Relations]]
 
+
*[[UNL sentence|UNL sentence structure]]
In the example above, "sky(icl>natural world)" and "blue(icl>color)", which represent individual concepts, are UWs; "aoj" (= attribute of an object) is a directed binary semantic relation linking the two UWs; and "@def", "@interrogative", "@past", "@exclamation" and "@entry" are attributes modifying UWs.
+
*[[UNL document|UNL document structure]]
 
+
UWs are supposed to represent universal concepts and are expressed in English words in order to be humanly-readable. They consist of "headword" (the UW root) and a "constraint list" (the UW suffix between parentheses), the latter being used to disambiguate the general concept conveyed by the former. The set of UWs, which is currently around 63,000 entries, is organized in an ontology-like structure (the so-called "UW System"), where upper concepts are used to disambiguate the lower ones through "icl" (= is a kind of) and "iof" (= is an instance of) relations.
+
 
+
Relations are expected to represent semantic links between words in every existing language. They can be ontological (such as "icl" and "iof" referred to above), logical (such as "and" and "or") and thematic (such as "agt" = agent, "ins" = instrument, "tim" = time, "plc" = place, etc). There are currently 46 relations in the UNL Specs, and they define the syntax of UNL.
+
  
Attributes represent information that cannot be conveyed by UWs and relations. Normally, they represent information on tense (".@past", "@future", etc), reference ("@def", "@indef", etc), modality ("@can", "@must",  etc), focus ("@topic", "@focus", etc), .and so on.
+
== Notes ==
 +
<references />
  
Under the UNL Program, the process of representing natural language sentences in UNL graphs is called "[[enconverting]]", and the process of generating natural language sentences out of UNL graphs is called "[[deconverting]]". The former, which involves natural language analysis and understanding, is supposed to be carried out semi-automatically (i.e., in a computer-aided human basis); the latter is expected to be done fully-automatically.
+
== References ==
 +
* Martins, R. (ed). (2013). Lexical issues of UNL. Cambridge Scholar Publishing.
 +
* Uchida, H.; Zhu, M.; Della Senta, T. (1999). A gift for a millenium. Tokyo: IAS/UNU.
 +
* UNL. (1996). Universal Networking Language: an electronic language for communication, understanding and collaboration. Tokyo: UNL Center.
 +
* Cardeñosa, J.; Gelbukh, A.; Tovar, E. (Eds.) (2005).  [http://www.cicling.org/2005/UNL-book/ Universal Networking Language: Advances in Theory and Applications]. 443 pp.

Latest revision as of 16:14, 13 February 2014

The Universal Networking Language (UNL) is an artificial language created to represent and process information across language barriers.

Contents

History

The UNL Programme started in 1996, as an initiative of the Institute of Advanced Studies of the United Nations University in Tokyo, Japan. In January 2001, the United Nations University set up an autonomous organization, the UNDL Foundation, to be responsible for the development and management of the UNL Programme. The Foundation, a non-profit international organisation, has an independent identity from the United Nations University, although it has special links with the United Nations. It inherited from the UNU/IAS the mandate of implementing the UNL Programme. Its headquarters are based in Geneva, Switzerland.

The UNL Programme has already crossed important milestones. The overall architecture of the UNL System has been developed with a set of basic software and tools necessary for its functioning. These are being tested and improved. A vast amount of linguistic resources from the various native languages already under development has been accumulated in the last few years. Moreover, the technical infrastructure for expanding these resources is already in place, thus facilitating the participation of many more languages in the UNL system from now on. A growing number of scientific papers and academic dissertations on the UNL are being published every year.

The most visible accomplishment so far is the recognition by the Patent Co-operation Treaty (PCT) of the innovative character and industrial applicability of the UNL, which was obtained in May 2002 through the World Intellectual Property Organisation (WIPO). Acquiring the patent for the UNL is a completely novel achievement within the United Nations.

Commitments

The main goal of the UNL Programme is to construct the UNL, an artificial language that can be used to process information across the language barriers. The major commitments of the UNL are the following:

I - The UNL must represent information
The UNL is first and foremost a knowledge representation language. The most important corollary of this first commitment is that UNL is not a meta-language, i.e., it is not intended to describe or represent natural languages; on the contrary, it is used to represent the information conveyed by natural languages. The goal of UNL is to represent "what was meant" and not "what was said". Accordingly, the UNL provides an interpretation rather than a translation of a given utterance. The UNL version of an existing document is not bound to preserve the lexical and the syntactic choices of the original, but must represent, in a non-ambiguous format, one of its possible meanings, preferably the most conventional one.
II - The UNL must be a language for computers
The UNL is an artificial language shaped to represent knowledge in a machine-tractable format. Like other formal systems, it seeks to provide the infrastructure for computers to handle what is meant by natural languages. Differently from other auxiliary languages (such as Esperanto, Interlingua, Volapük, Ido and others), the UNL is not intended to be a human language. We do not expect people to speak UNL or to communicate in UNL. But we do expect computers to process UNL: to generate UNL out of natural language, and vice-versa, with and without human aid. We expect computers to be able to extract information from UNL documents, and to detect paraphrases, entailments, implicatures, presuppositions, inferences, contradictions and redundancies among a set of propositions represented in UNL.
III - The UNL must be self-sufficient
In the UNL approach, there are two basic movements: UNLization and NLization. UNLization is the process of representing the information conveyed by natural language into UNL; NLization, conversely, is the process of generating a natural language document out of UNL. In order to be fully "understandable" (and manageable) by machines, the UNL must be self-sufficient, i.e., should be as semantically complete and saturated as possible. The UNL representation must not depend on any implicit knowledge, and should explicitly codify all information. This means that the UNLization should be completely independent from the NLization, and vice-versa, i.e., the UNLization should not take into consideration which will be the target language or format of any future NLization; and the NLization should not need any information about the original source language or previous structure of any UNL document.
IV - The UNL must be general-purpose
At first glance, the UNL seems to be a pivot-language to which the source texts are converted before being translated into the target languages. It can, in fact, be used for such a purpose, but its primary objective is to serve as an infrastructure for handling knowledge. In addition to translation, the UNL is expected to be used in several other different tasks, such as text mining, multilingual document generation, summarization, text simplification, information retrieval and extraction, sentiment analysis etc. Indeed, in UNL-based systems there is no need for the source language to be different from the target language: an English text may be represented in UNL in order to be generated, once again, in English, as a summarized, a simplified, a localized or a simply rephrased version of the original.
V - The UNL must be independent from any particular natural language
The UNL is expected to be the language of the United Nations and, therefore, must not be circumscribed to any existing natural language in particular, under the risk of being rejected by the state members of the General Assembly.

Assumptions

1. Languages convey information
The UNL assumes that one of the most outstanding uses of natural languages is to convey information, i.e., that natural languages can be used to represent what we know about the world. This "aboutness" of natural languages, i.e., its representational role, is the main object of the UNL, which is expected, not to do what natural languages do, but to represent what they represent.
2. Information can be represented by semantic networks
The UNL assumes that any information conveyed by natural language can be formally and usefully represented by a semantic network. This idea is not new. Semantic networks have been used in knowledge representation at least since Charles S. Peirce, and as an interlingua for machine translation since the 1950's. In the UNL approach, this semantic network (or UNL graph) is made of three different types of discrete semantic entities: concepts, relations and attributes. Concepts are nodes in the network; relations are arcs linking nodes; and attributes are used to delimit the use of nodes. This three-layered representation model is the cornerstone of the UNL, and a distinctive feature over other semantic networks, which normally propose only two levels: edges and vertices.
3. Any information may be expressed in any language
The UNL assumes that any information conveyed by natural languages is translatable, i.e., that natural languages differ, not in their power to express information, but in the way they do that. The UNL also assumes that, in order to ensure this "translatability" of information, the semantic network must be independent of any natural language in particular (i.e., it must be "universal"[1]). This is achieved by defining a standard (uniform) set of universally-accessible semantic entities, which are the elements of UNL: Universal Words (or UW's), Universal Relations and Universal Attributes.

Properties

Non-Ambiguity
As a formal system, the UNL is not expected to have any ambiguity, at any level. The sentence "The girls saw the boy with the telescope" must be represented, in UNL, in a way that there is no ambiguity concerning the meaning of "saw" (past tense of the verb "to see" x present tense of the verb "to saw" x noun "saw") or the dependency relations of "with the telescope" ("saw with the telescope" x "the boy with the telescope").
Non-Redundancy
As a knowledge representation language, the UNL is not expected to have any redundancy. Expressions such as "free gift", "round circle" and "murder to death" are expected to be represented, in UNL, as "gift", "circle" and "murder", respectively. Likewise, sentences such as "Peter killed Mary", "Peter murdered Mary", "It's Peter who killed Mary" and "Mary was killed by Peter" are expected to be represented in UNL in the same way[2].
Compositionality
As a formal system, the UNL is always literal, i.e., fully compositional. UNL expressions must derive their semantic value thoroughly from their components, which must be explicitly defined in the UNL Knowledge Base. Accordingly, the UNL does not allow for any figure of speech, such as metaphor and metonymy. Tropes must be represented, in UNL, by their intended meaning. A sentence such as "John devoured thousands of books", for instance, must be represented, in UNL, as "John read many books eagerly"[3].
Declarativeness
As a knowledge representation language, the UNL is not expected to perform speech acts (such as promises, requests, orders etc.), but only to describe them in a constative manner. For instance, given a performative utterance such as "Can you pass me the salt?", the role of the UNL is to represent "you pass the salt to me" and to indicate that this was a polite request[4]. The UNL representation itself will not be a request, nor will be bound to provoke the same (perlocutionary) effect caused by the original utterance.
Completeness
As a fully-explicit semantic system, the UNL is not expected to have ellipses or pro-forms, except when the referent is not present in the document (exophora). A sentence such as "The monkey took the banana and ate it" must be represented, in UNL, as "[The monkey]i took [the banana]j and [the monkey]i ate [the banana]j".

Structure

The main goal of the UNL is to represent, in a machine-tractable format, the information conveyed by natural language documents. In the UNL framework, this information is represented by a semantic network, i.e., a network which represents semantic relations between concepts. This semantic network, or UNL graph, is made of three different types of discrete semantic entities: Universal Words, Universal Relations and Universal Attributes. Universal Words, or simply UW's, are the nodes in the semantic network; Universal Relations are arcs linking UW's; and Universal Attributes are used to instantiate UW's.

For instance, the English sentence "Peter killed Mary yesterday with a knife in the kitchen because of John" could be represented, in simplified UNL, as:

Graph0.png

In the above:

  • "Peter", "kill", "Mary", "yesterday", "knife", "kitchen" and "John" are Universal Words
  • "agt" (agent), "obj" (patient), "tim" (time), "ins" (instrument), "plc" (place) and "rsn" (reason) are Universal Relations
  • "@past", "@def" and "@indef" are Universal Attributes

General Principles

The three-layered representation model poses several problems to the UNLization as the distinction between what is supposed to be represented by each unit is not always clear. One difficulty concerns what is to be represented as a UW (i.e., as a node in the UNL graph) and what is to be represented as a link between UW's. How many UW's are there, for instance, in the sentence "Charles Dickens was the author of Oliver Twist"? Should "author" be represented as a UW or as a relation between "Charles Dickens" and "Oliver Twist"? Should the verb "to be" be represented as a UW or as a relation between "Charles Dickens" and "author"? Should the preposition "of" be represented as a UW or as a relation between "author" and "Oliver Twist"?

Given the difficulty to categorize concepts, the UNL assumes the following principles:

1. If the information can only be conveyed by open lexical categories (nouns, adjectives, adverbs and verbs), or if it is conveyed by pronouns and numbers,
   it is represented by UW's, i.e., as nodes in the UNL graph;

2. If the information can be conveyed, in any language, 
   by closed class categories (affixes, determiners, auxiliary verbs, copula, classifiers, conjunctions, interjections and prepositions), or
   by syntactic phenomena (word order, agreement, government and case marking), 
   it is represented
   2.1. as attributes, if the information is not relational, i.e., if it can be associated to a single node (or hyper-node) in the graph; or
   2.2. as relations, if the information is relational and reducible to the set of Universal Relations; or
   2.3. as relations and attributes, if the information is relational but not reducible to the set of Universal Relations.

Examples

(1) Mary died

Graph1.png

Consider, for instance, the sentence
(1) Mary died.
This sentence is said to convey the following information
(1a) There is Mary (i.e., there is someone named Mary)
(1b) There is the process of dying
(1c) There is a relation between "Mary" and "die" (i.e., "Mary" undergoes a change of state expressed by "dying")
(1d) The fact described by (1c) happened in the past
The information conveyed by (1a) and (1b) can only be expressed by open lexical categories (noun and verb, respectively) and, therefore, (1a) and (1b) are defined as UW's, i.e., nodes in the graph. The information conveyed by (1c) cannot be said to be represented by a lexical item (such as "Mary" or "die"); it is defined by the position of the words in the sentence, i.e., by the fact that "Mary" comes right before "die". Actually, this information is relational, i.e., it links "Mary" and "die". This relation ("patient") is already part of the repertoire of Universal Relations and can be expressed by the tag "obj". The information conveyed by (1d) is not relational, in the sense that it does not link two nodes, but rather modify the whole relation between "Mary" and "die". As it is not relational, and can be expressed by closed class categories (the suffix "-d"), it is represented by the attribute @past, to be assigned to head of the relation (the UW "die").

(2) The book is on the table

Graph2.png

Consider, now, the sentence
(2) The book is on the table.
This sentence is said to convey the following information
(2a) There is a book
(2b) We know this book (i.e., this book is definite)
(2c) There is a table
(2d) We know this table (i.e., this table is definite)
(2e) There is a relation between "book" and "table"
(2f) The relation (2e) is of the type "on" (and not "under", or "inside")
The information conveyed by (2a) and (2c) is, again, expressed by open lexical categories (noun, in both cases) and cannot be reduced to any closed class category. The information conveyed by (2b) and (2d), which is expressed by the article "the", modifies isolated nodes and, accordingly, is not relational: (2b) modifies (2a), and (2d) modifies (2c). They are therefore expressed by attributes (@def): book.@def and table.@def. The information conveyed by (2e) can be associated to the copula ("is") and it is definitely relational: it links "book" to "table". This relation is said to describe a "place", which is also part of the repertoire of Universal Relations (expressed by "plc"). However, "plc(book;table)" is too vague to express the information conveyed by the sentence, which explicitly indicates that the book is "on" the table. The information conveyed by (2f) is also relational and is expressed by a preposition ("on"). Ideally, we would have a relation "place_on", and we would represent (2) as "place_on(book;table)" instead of simply "plc(book;table)". But this would lead to several other relations: place_on, place_above, place_under, place_in_front_of, and so on. In order to avoid the proliferation of the repertoire of relations, we have decided to express these details by the combination of relations and attributes, i.e., "plc(book;table.@on)".

(3) Charles Dickens was the author of Oliver Twist

Graph3.png

At last, let's come back to the case of
(3) Charles Dickens was the author of Oliver Twist.
In this sentence, we notice the following:
(3a) There is Charles Dickens
(3b) There is Oliver Twist
(3c) There is the concept of "author"
(3d) There is a relation between "Charles Dickens" and "author" (we can say that "Charles Dickens is an author")
(3e) There is a relation between "Oliver Twist" and "author" (we can say that "Oliver Twist is the product of an author")
(3f) There is a relation between "Charles Dickens" and "Oliver Twist" (and this relation is mediated by the concept of "author")
(3g) The relation described by (3f) happened in the past
Once again, (3a) and (3b) can only be realized by an open lexical category (noun) and, therefore, must be represented as UW's. The concept conveyed by (3c) is far more controversial. One may argue that this concept may be represented by derivational suffixes, such as -er (as in "writer") or -or (as in "creator"), or by the preposition "by" (as in "Oliver Twist by Charles Dickens"), but this is not really accurate, since both -er and -or are used for "one who performs an action", and "by" denotes rather an agent. None of them can fully replace "author" in this context, i.e., as someone who writes a book. Accordingly, (3c) should also be expressed by a UW. The information conveyed by (3d) and (3e) is relational and fits existing relations in the repertoire of Universal Relations: attribute ("author" is an attribute of "Charles Dickens", or aoj(Charles Dickens;author)), and content ("Oliver Twist" is the theme of "author", or cnt(author;Oliver Twist)). The relation (3f) poses a problem to UNL because, in UNL, relations must be necessarily binary, i.e., they must have only two arguments. But this is solved by the general assumption that, given rel(a;b) and rel(b;c), "a" is related to "c" through "b". As for (3g), this is another source of problem. We know that the scope of the past tense, in this case, is not the whole sentence (the fact that Charles Dickens wrote Oliver Twist is still true); the information of past is rather related to Charles Dickens, in the sense that it indicates that he no longer lives. In any case, this information is not relational, and must be expressed by the attribute .@past, to be assigned to "Charles Dickens".

UNL Specs

The structure of the UNL is defined by the UNL Specs. The UNL Specs specify the structure of a UNL document; the syntax of a UNL graph; the syntax of Universal Words; the set of relations; the set of attributes; and all the information concerning UNL as a formalism:

Notes

  1. The idea of "universality", in UNL, must be understood in the sense of "capable of being used and understood by all" (as in "Coordinated Universal Time (UTC)", or in "universal adapter"), rather than "common to all" (as in "Universal Grammar"). See Universal.
  2. The differences between them can be represented by attributes such as @topic and @passive, but this is rather optional, because the goal of UNL is to represent "what was meant" and not "what was said" or "how it was said".
  3. The information that this content has been conveyed through figurative language can be indicated by the corresponding attributes (@metaphor, @hyperbole, etc.), but this is optional.
  4. This can be done by the use of the attributes @polite and @request.

References

  • Martins, R. (ed). (2013). Lexical issues of UNL. Cambridge Scholar Publishing.
  • Uchida, H.; Zhu, M.; Della Senta, T. (1999). A gift for a millenium. Tokyo: IAS/UNU.
  • UNL. (1996). Universal Networking Language: an electronic language for communication, understanding and collaboration. Tokyo: UNL Center.
  • Cardeñosa, J.; Gelbukh, A.; Tovar, E. (Eds.) (2005). Universal Networking Language: Advances in Theory and Applications. 443 pp.
Software