UNL2010

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(UNIVERSAL WORDS (UWs))
Line 18: Line 18:
  
 
The structure of UWs is presented in [[UW]]s.
 
The structure of UWs is presented in [[UW]]s.
 +
 +
  
 
=== [[Attributes|ATTRIBUTES]] ===
 
=== [[Attributes|ATTRIBUTES]] ===

Revision as of 13:34, 7 December 2010

The specifications here stated are still experimental and tentative, and have been continuously extended and amended in order to be as comprehensive as possible. They follow the general strategies defined in the UNL 2005 Specifications (version of June 7th, 2005), but introduce several important changes derived from different UNLization experiences (Cratylus, EOLSS, Le Petit Prince, IGLU). Although formally adopted in the UNDL Foundation tools, projects and certificates, they should not be taken yet as the official specs, as they are still under construction and have not been widely discussed with the UNL Community.

Contents

PREMISES

These specifications are derived from three main premises:

Information conveyed by natural language can be represented by a natural language independent hyper-graph structure.

Texts can be treated as a set of semantic nodes interlinked by semantic relations and modified by semantic attributes.

The UNL representation is an interpretation rather than a translation of a given text. 

The main goal of the UNLization process is to represent the knowledge structure of the source text, which should be detached from its verbal structure. This means that the UNL representation should not be committed to replicate the lexical and the syntactic choices of the original, but should focus in representing, in a language-independent and non-ambiguous format, one of its possible readings, preferably the most conventional one.

The UNL representation should be as semantically complete as possible. 

Whenever possible, all the semantic valencies of the original text should be saturated, including anaphora, ellipses, presuppositions and implicatures. Pronouns and pro-forms, for instance, are expected to be replaced by their antecedents, and should not be represented in UNL, except in case of exophoric reference (indefinite pronouns, interrogative pronouns and personal pronouns that are not co-indexed to any existing antecedent).

THREE-LAYERED REPRESENTATION

The basic assumption of the UNL approach is that the meaning conveyed by natural language can be formally represented through three different types of semantic units: UWs, attributes and relations. This three-layered representation model is the cornerstone of UNL and its most distinctive feature over other semantic networks, which normally propose only two levels: edges and vertices.

Unlgraph.jpg

UNIVERSAL WORDS (UWs)

Uw.jpg
Universal Words, or simply UWs, are the words of UNL, and correspond to the nodes - to be interlinked by relations or modified by attributes - in a UNL graph. They are labels for relatively stable units of knowledge (the concepts) that can be associated to natural language open lexical categories (noun, verb, adjective and adverb). The set of UWs is relatively open and is listed in the UNL Dictionary. Additionally, UWs are organized in a hierarchy (the UNL Ontology), are defined in the UNL Knowledge Base (UNLKB) and exemplified in the UNL Example Base (UNLEB), which are the lexical databases for UNL.

The structure of UWs is presented in UWs.


ATTRIBUTES

Attribute.jpg
Attributes are arcs linking a node to itself. In opposition to relations, they correspond to one-place predicates, i.e., functions that take a single argument. In UNL, attributes are always preceded by "@" and have been normally used to represent information conveyed by bound morphemes and closed classes, such as affixes (gender, number, tense, aspect, mood, voice, etc), determiners (articles and demonstratives), adpositions (prepositions, postpositions and circumpositions), conjunctions, auxiliary and quasi-auxiliary verbs (auxiliaries, modals, coverbs, preverbs) and degree adverbs (specifiers). They are also used to deal with non-verbal elements of communication, such as prosody, sentence and text structure, politeness, schemes, speech acts, etc.

The current set of attributes is presented in Attributes.


RELATIONS

Relation.jpg
Relations, formerly known as "links", are labelled arcs connecting a node to another node in a UNL graph. They correspond to two-place semantic predicates holding between two Universal Words. In UNL, relations have been normally used to represent semantic cases or thematic roles (such as agent, object, instrument, etc.) associated to the interpretation of syntactic relations (such as specification, complementation and adjunction). These functions are binary and directed (from a source to a target) and are claimed to be universal.

Relations are organized in a hierarchy where lower nodes subsume upper nodes. The topmost level is the relation "rel", which simply indicates that there is a relation between two UWs. The following level brings four general relations: participant (ptp), for the necessary arguments (subject and complements) of verbal predicates; attribute (aoj), for the necessary arguments (subject and complement) of nominal predicates; specifier (mod), for general specifiers; and adjunct (adj), for general adjuncts, including time, location and manner.

The current set of relations is presented in Relations

Software