Lexica

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Features, Frames, Mappings and Rules)
(Features, Frames, Mappings and Rules)
Line 10: Line 10:
 
'''Frames''' are dyadic predicates that represent interactions between entries. They can be either semantic or syntactic.  
 
'''Frames''' are dyadic predicates that represent interactions between entries. They can be either semantic or syntactic.  
 
<ul>
 
<ul>
<li>'''Semantic Frames''' represent a collection of facts that specifies or distinguishes (i.e., "defines") each UW. They represent interactions between UW's that can be either "necessary" or "typical". The set of necessary (essential) interactions constitutes the [[UNL Knowledge Base]]; the set of typical (essential and accidental) interactions constitutes the [[UNL Memory]], which includes the UNL Knowledge Base. The difference between "necessary" and "typical" interactions is a matter of logic: an interaction between two UW's X and Y is considered to be "essential" if Y is a logical consequence of X, i.e., if X entails Y; and it is considered to be "typical", if it is simply recurring<ref>Consider, for instance, the case of the UW's corresponding to the concepts conveyed by the English words "table" (= piece of furniture having a smooth flat top that is usually supported by one or more vertical legs), "furniture" (= furnishings that make a room or other area ready for occupancy) and "leg" (= one of the supports for a piece of furniture). The interaction between "table" and "furniture" is considered "necessary", because there is no table, in that sense, which is not a piece of furniture. However, the interaction between "table" and "leg" is considered "typical", because, although highly frequent, there can be tables without legs.</ref>. The necessary interactions are further analyzed in monotonic and non-monotonic: the former corresponds to the relations "is-a-kind-of" and "is-an-instance-of", whose set constitutes the [[UNL Ontology]], a tree hierarchical structure which is part of the UNL Knowledge Base.<br />All these interactions, either necessary or typical, are represented as UNL graphs, i.e., as a coherent (network) structure made of UW's, relations and attributes. Semantic frames are used mostly for word sense disambiguation, for lexicalization (i.e., to fill in lexical gaps) and for semantic reasoning.
+
<li>'''Semantic Frames''' represent a collection of facts that specifies or distinguishes (i.e., "defines") each UW. They represent interactions between UW's and can be either "necessary" or "typical". The set of necessary (essential) interactions constitutes the [[UNL Knowledge Base]]; the set of typical (essential and accidental) interactions constitutes the [[UNL Memory]], which includes the UNL Knowledge Base. The difference between "necessary" and "typical" interactions is a matter of logic: an interaction between two UW's X and Y is considered to be "essential" if Y is a logical consequence of X, i.e., if X entails Y; and it is considered to be "typical", if it is simply recurring<ref>Consider, for instance, the case of the UW's corresponding to the concepts conveyed by the English words "table" (= piece of furniture having a smooth flat top that is usually supported by one or more vertical legs), "furniture" (= furnishings that make a room or other area ready for occupancy) and "leg" (= one of the supports for a piece of furniture). The interaction between "table" and "furniture" is considered "necessary", because there is no table, in that sense, which is not a piece of furniture. However, the interaction between "table" and "leg" is considered "typical", because, although highly frequent, there can be tables without legs.</ref>. The necessary interactions are further analyzed in monotonic and non-monotonic: the former corresponds to the relations "is-a-kind-of" and "is-an-instance-of", whose set constitutes the [[UNL Ontology]], a tree hierarchical structure which is part of the UNL Knowledge Base.<br />All these interactions, either necessary or typical, are represented as UNL graphs, i.e., as a coherent (network) structure made of UW's, relations and attributes. Semantic frames are used mostly for word sense disambiguation, for lexicalization (i.e., to fill in lexical gaps) and for semantic reasoning.
 
</li>
 
</li>
 
<li>'''Syntactic Frames''' represent interactions between natural language words. These interactions can also be "necessary" or "typical". They are considered to be necessary when a given word requires another word in order to form a syntactic unit. That is the case, for instance, of the English verb "to depend", which requires the complement to be introduced by the preposition "on". An interaction is considered to be "typical" when only recurring, but not obligatory. That is the case of collocations, such as "highly sophisticated" and "extremely happy" (instead of "extremely sophisticated" or "highly happy"). The necessary syntactic frames are defined as [[subcategorizaton frames]] or [[subcategorization rules]] inside the NL dictionaries. The typical syntactic frames are listed in the [[NL Example Base]].  
 
<li>'''Syntactic Frames''' represent interactions between natural language words. These interactions can also be "necessary" or "typical". They are considered to be necessary when a given word requires another word in order to form a syntactic unit. That is the case, for instance, of the English verb "to depend", which requires the complement to be introduced by the preposition "on". An interaction is considered to be "typical" when only recurring, but not obligatory. That is the case of collocations, such as "highly sophisticated" and "extremely happy" (instead of "extremely sophisticated" or "highly happy"). The necessary syntactic frames are defined as [[subcategorizaton frames]] or [[subcategorization rules]] inside the NL dictionaries. The typical syntactic frames are listed in the [[NL Example Base]].  

Revision as of 17:17, 21 September 2012

The UNL System contains three different types of lexical databases: dictionaries, knowledge bases and example bases.

Contents

Features, Frames, Mappings and Rules

The lexical resources of UNL are represented in four different types of data structures: features, frames, mappings and rules.

  • Features are monadic predicates that describe distinctive properties of each entry. They are normally represented in the <ATTRIBUTE>=<VALUE> pair format, where <ATTRIBUTE> corresponds to general linguistic attributes (such as "part of speech", "gender", "number", "polarity", "abstractness", etc.), and <VALUE> corresponds to the value that an attribute may assume. In the UNL framework, the set of attributes and values is closed and strongly standardized, and is explicitly and exhaustively defined by the Tagset. Language-independent features (such semantic class, abstractness, polarity, etc.)[1] are represented only in the UNL Dictionary, and language-dependent features (such as number, tense, mood, aspect, etc.) are represented in the NL Dictionaries. Both features are merged in the UNL-NL dictionaries.
  • Frames are dyadic predicates that represent interactions between entries. They can be either semantic or syntactic.
    • Semantic Frames represent a collection of facts that specifies or distinguishes (i.e., "defines") each UW. They represent interactions between UW's and can be either "necessary" or "typical". The set of necessary (essential) interactions constitutes the UNL Knowledge Base; the set of typical (essential and accidental) interactions constitutes the UNL Memory, which includes the UNL Knowledge Base. The difference between "necessary" and "typical" interactions is a matter of logic: an interaction between two UW's X and Y is considered to be "essential" if Y is a logical consequence of X, i.e., if X entails Y; and it is considered to be "typical", if it is simply recurring[2]. The necessary interactions are further analyzed in monotonic and non-monotonic: the former corresponds to the relations "is-a-kind-of" and "is-an-instance-of", whose set constitutes the UNL Ontology, a tree hierarchical structure which is part of the UNL Knowledge Base.
      All these interactions, either necessary or typical, are represented as UNL graphs, i.e., as a coherent (network) structure made of UW's, relations and attributes. Semantic frames are used mostly for word sense disambiguation, for lexicalization (i.e., to fill in lexical gaps) and for semantic reasoning.
    • Syntactic Frames represent interactions between natural language words. These interactions can also be "necessary" or "typical". They are considered to be necessary when a given word requires another word in order to form a syntactic unit. That is the case, for instance, of the English verb "to depend", which requires the complement to be introduced by the preposition "on". An interaction is considered to be "typical" when only recurring, but not obligatory. That is the case of collocations, such as "highly sophisticated" and "extremely happy" (instead of "extremely sophisticated" or "highly happy"). The necessary syntactic frames are defined as subcategorizaton frames or subcategorization rules inside the NL dictionaries. The typical syntactic frames are listed in the NL Example Base.
  • Mappings represent relations between UNL and natural languages, and are classified in two different categories: lexical mappings and translation mappings. Lexical mappings are represented in the UNL-NL dictionaries, where UW's are associated to natural language lexical items, and vice-versa. Translation mappings are represented in the UNL-NL memories, where recurring translations between UNL and NL are stored. The main difference between UNL-NL dictionaries and UNL-NL memories is that the former involves only lexical units, whereas the latter may involve larger segments. Both resources are used in UNLization and NLization, and UNL-NL memories normally prevail over UNL-NL dictionaries: the UNL-NL dictionary is activated only when there is no UNL-NL memory available or suitable for a given input.
  • Rules represent

Dictionaries

Main article: Dictionary Specs

In the UNL System, a dictionary is a flat list of entries with their corresponding features. The dictionaries comply with the structure defined in the Dictionary Specs and must contain only tags defined in the Tagset. They are divided in three different categories:

  • The UNL Dictionary, or simply UNLdic, is a list of UW's and their semantic (language-independent) markers. It is divided into three different nested lexical databases: the UNL Core Dictionary, the UNL Abridged Dictionary and the UNL Unabridged Dictionary. The UNL Core Dictionary brings permanent UW's which are supposed to be lexicalized in all languages; the UNL Abridged Dictionary brings permanent UW's which are lexicalized in at least two language families (and includes therefore the UNL Core Dictionary); the UNL Unabridged Dictionary, which contains the UNL Abridged Dictionary, brings the whole sent of permanent UW's (i.e., the concepts that are lexicalized in at least one language).
  • The NL Dictionary, or simply NLdic, is a list of natural language entries with the corresponding morphological and syntactic (language-dependent) features.
  • The UNL-NL Dictionary, or simply UNL-NLdic, is list of lexical mappings between UW's and natural language entries. The UNL-NL Dictionary is provided in two different formats: the generative, which is used normally in natural language generation, brings only base forms and the corresponding inflectional rules; the enumerative, which has been used in natural language analysis, brings all the word forms.

The UNL Dictionary and the NL Dictionary are monolingual databases, whose entries are interlinked in the UNL-NL Dictionary, which brings the mappings between UW's and natural language entries, whenever available[3]. These three dictionaries are normally made through the UNLarium in different steps and constitute the basic resource for UNLization and NLization.

Knowledge Bases

Main article: UNL Knowledge Base

The UNL Dictionary is simply a flat list of UW's and their corresponding classifiers (such as lexical category, semantic class, abstractness, cardinality, etc.). The UNL Dictionary does not contain any distinguisher, i.e., any information that can be used to differentiate a given UW from the others that belong to the same class. This information is provided in the UNL Knowledge Base, or UNLKB, which is a semantic network made of relations that are necessary to define UW's.

The UNL Knowledge Base is expected to represent the intension (the meaning) of UW's.

The UNL Knowledge Base contains the UNL Ontology, which is a part of the UNLKB where UW's are interconnected by the ontological relations of UNL, i.e., "is-a-kind-of" ("icl") and "is-an-instance-of" ("iof").

Example Bases

In the UNL System, there are two different types of example bases:

  • The UNL Memory is a network of UW's that extends and complements the UNLKB. The difference is that the UNLKB, which is dictionary-based, contains only necessary relations between UW's, whereas the UNL Memory, which is corpus-based, brings any relations between UW's along with their frequency of occurrence. For instance, the idea that a "table" is "supported by one or more vertical legs" is not represented in the UNLKB because it is not supposed to be necessary (there are tables that are not supported by legs). This information, as the information that tables are normally round or square, that they are made of hard materials, etc., is repesented in the UNL Memory, which is expected to represent not only common sense knowledge about UW's, but all the possible instances of a given UW.
  • The UNL-NL Memory is a list of frequent mappings between UNL and a given natural language. It is the UNLization (translation) memory. Differently from the UNL-NLdic, which involves only lexical mappings, the UNL-NL Memory involves any UNLization units, which may include several lexical units.

Notes

  1. Language-independent (semantic) features are closely related to the notions of "classeme" (Pottier, 1965), and of "semantic markers" or "classifiers" (Katz & Fodor, 1963).
  2. Consider, for instance, the case of the UW's corresponding to the concepts conveyed by the English words "table" (= piece of furniture having a smooth flat top that is usually supported by one or more vertical legs), "furniture" (= furnishings that make a room or other area ready for occupancy) and "leg" (= one of the supports for a piece of furniture). The interaction between "table" and "furniture" is considered "necessary", because there is no table, in that sense, which is not a piece of furniture. However, the interaction between "table" and "leg" is considered "typical", because, although highly frequent, there can be tables without legs.
  3. Not all NL dictionary entries may be mapped onto UNL. Articles, prepositions, conjunctions and other particles do not have any correspondence in UNL.
Software