CORE

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Methodology)
(Instructions)
 
(4 intermediate revisions by one user not shown)
Line 1: Line 1:
The UNL Core Dictionary is expected to contain UW's that are expected to be lexicalized in more than two language groups.
+
The UNL Core Dictionary contains UW's that are expected to be lexicalized in more than two language groups.
  
 
== Goal ==
 
== Goal ==
Line 13: Line 13:
 
*Headword, which is the citation form of the UW in English
 
*Headword, which is the citation form of the UW in English
 
*Semantic root, which is the UW that defines the root of a UW
 
*Semantic root, which is the UW that defines the root of a UW
 +
*Semantic structure, which defines the semantic structure of the UW
 
*Hypernym, which is the UW that defines the class of which the UW is a type
 
*Hypernym, which is the UW that defines the class of which the UW is a type
 
*Definition, which is the definition of the UW, in English
 
*Definition, which is the definition of the UW, in English
Line 31: Line 32:
 
#*Assign, to each UW, a "degree of universality", according to the lexicalization in different language groups (consider "English", "French" and your native language as different language groups: if the concept is lexicalized in all three languages, the UW must receive the value 1; if it is lexicalized only in two, it must receive the value 2; if it is lexicalized only in English, assign the value 3; if the UW is not lexicalized even in English, assign the value 4).  
 
#*Assign, to each UW, a "degree of universality", according to the lexicalization in different language groups (consider "English", "French" and your native language as different language groups: if the concept is lexicalized in all three languages, the UW must receive the value 1; if it is lexicalized only in two, it must receive the value 2; if it is lexicalized only in English, assign the value 3; if the UW is not lexicalized even in English, assign the value 4).  
 
#*Choose only one English word as the headword (the most frequent one). Normally, the headword contains all the English words belonging to the same synset. Choose only one as the headword for the UW.
 
#*Choose only one English word as the headword (the most frequent one). Normally, the headword contains all the English words belonging to the same synset. Choose only one as the headword for the UW.
#*Choose a semantic root to the UW. The semantic root is a classeme, i.e., an existing UW that defines a general class to which the UW belongs.
+
#*Choose a semantic root to the UW. The semantic root corresponds to the basic unit of meaning and it must always be expressed by a '''nominal UW'''. It should be used when the UW can be "reduced" to another UW. For instance: "hungry", "hungrily" and "hunger" can be reduced to "hunger": "hunger.@full_of", "hunger.@full_of.@manner", "hunger.@full_of.@make". Note that these attributes do not actually exist (they just translated the role of English derivational suffixes). In many cases, there is no root at all. But be careful, the semantic root, differently from the grammatical root, is not related only to compounds and derived forms, and sometimes is counter-intuitive (the derived word is the root of the simple word). For instance: the root of "kill" (simple form) is "die" (simple form), because "kill" can be reduced to "die.@cause_to"; the root of "abstract" (simple form) is "abstractness" (derived form), because "abstract" can be reduced to "abstractness.@full_of" (note that "abstractness" could also be reduced to "abstract.@ness", but not that the root must be a nominal UW). Note also that heteronyms can normally be reduced to a non-marked form: the semantic root of "cow" is "cattle", because cattle can be reduced to "cattle.@female".
#*Choose a hypernym to the the UW. The hypernym is an existing UW that defines a general class of which the UW is a type.
+
#*Define the semantic structure of the UW. The semantic structure corresponds to the analyzed form of the UWs that can be reduced to others. For instance: in the case of "cow", the headword is "cow", the semantic root is "cattle" and the semantic structure is "cattle.@female".
 +
#*Verify the hypernym of the the UW. The hypernym is an existing '''nominal UW''' that defines a general class of which the UW is a type. For instance: the hypernym of "blue" is "color"; of "kill" is action; of "table" is "furniture"; etc. The existing hypernym has been imported from the WordNet. Check whether it's really pertinent and change it in case it is not.
 
#*Definition. Provide the definition in case this field is empty.
 
#*Definition. Provide the definition in case this field is empty.
 
#*Examples. Provide examples, in English, in case this field is empty.
 
#*Examples. Provide examples, in English, in case this field is empty.
Line 41: Line 43:
 
#*Polarity. This field refers to the sentiment conveyed by the word. There are UWs that convey positive senses ("win", "good", "birth") and UWs that convey negative senses ("lose", "bad", "death"). Many UWs do not convey any sentiment at all ("computer", for instance).
 
#*Polarity. This field refers to the sentiment conveyed by the word. There are UWs that convey positive senses ("win", "good", "birth") and UWs that convey negative senses ("lose", "bad", "death"). Many UWs do not convey any sentiment at all ("computer", for instance).
 
#*Semantic class. This field has been filled in the WordNet. Change it only in case it is empty.
 
#*Semantic class. This field has been filled in the WordNet. Change it only in case it is empty.
#*Semantic frame. You have to associate a semantic frame to each UW. The semantic frame is the semantic valency of a UW, and it indicates which are the relations that a UW "necessarily" assigns (i.e., which are the necessary specifiers or complements of UWs). Do not think of optional adjuncts, but only of the necessary arguments. For instance, the UW corresponding to the concept "to die" assigns, necessarily, one relation: exp (semantic frame = exp();); the UW corresponding to "to kill" assigns, necessarily, two relations: agt and obj (semantic frame = agt()obj();); the UW "to give" assigns, necessarily, three relations: agt, obj and adr (semantic frame: agt()obj()adr();). Note this is not only related to verbal UWs, but to any UW: there are nominal UWs (such as "construction") and adjective UW (such as "necessary") that also assigns necessary semantic relations. For the relations, use the extended set defined at [[relations]]. Some examples have been provided at UNLARIUM>GRAMMAR>UNL>SEMANTIC FRAME.
+
#*Semantic frame. You have to associate a semantic frame to each UW. The semantic frame is the semantic valency of a UW, and it indicates which are the relations that a UW "necessarily" assigns (i.e., which are the necessary specifiers or complements of UWs). Do not think of optional adjuncts, but only of the necessary arguments. For instance, the UW corresponding to the concept "to die" assigns, necessarily, one relation: exp (semantic frame = exp();); the UW corresponding to "to kill" assigns, necessarily, two relations: agt and obj (semantic frame = agt()obj();); the UW "to give" assigns, necessarily, three relations: agt, obj and adr (semantic frame: agt()obj()adr();). Note this is not only related to verbal UWs, but to any UW: there are nominal UWs (such as "construction") and adjective UW (such as "necessary") that also assigns necessary semantic relations. For the relations, use the extended set defined at [[Universal Relations]]. Some examples have been provided at UNLARIUM>GRAMMAR>UNL>SEMANTIC FRAME.
 
#*Source language. For the time being, this is English, by default, because all the entries have been extracted out of English lexical databases.
 
#*Source language. For the time being, this is English, by default, because all the entries have been extracted out of English lexical databases.
  

Latest revision as of 19:00, 16 August 2013

The UNL Core Dictionary contains UW's that are expected to be lexicalized in more than two language groups.

Contents

Goal

The main goal of the UNL Core Dictionary Project is to choose and to classify UW's that are lexicalized in several language gropus.

Methodology

The first candidate entries to the UNL Core Dictionary have been extracted from the intersection of the WordNet3.0 and the following English lexical databases

These entries are expected to be validated and analyzed according to the following categories:

  • Degree of Universality, from 1 to 4, according to the number of language groups in which the entry is lexicalized
  • Headword, which is the citation form of the UW in English
  • Semantic root, which is the UW that defines the root of a UW
  • Semantic structure, which defines the semantic structure of the UW
  • Hypernym, which is the UW that defines the class of which the UW is a type
  • Definition, which is the definition of the UW, in English
  • Examples, which are examples of the UW, in English
  • Abstractness, which refers to whether the UW is concrete or abstract
  • Alineability, which refers to whether the UW is alienable or not
  • Animacy, which refers to whether the UW is animate or inanimate
  • Cardinality, which refers to whether the UW is countable or not
  • Gender, which refers to the semantic gender of the UW, if any
  • Polarity, which refers to whether the UW conveys a positive or a negative concept
  • Semantic class, which refers to the semantic class of the UW
  • Semantic frame, which refers to the semantic frame of the UW

Instructions

  1. Join the project CORE at UNLWEB>PROJECT
  2. Create a VERIFICATION assignment at UNLWEB>ASSIGNMENT for the language UNL and project UNL CORE Dictionary
  3. Verify the corresponding entries according to the instructions below:
    • Assign, to each UW, a "degree of universality", according to the lexicalization in different language groups (consider "English", "French" and your native language as different language groups: if the concept is lexicalized in all three languages, the UW must receive the value 1; if it is lexicalized only in two, it must receive the value 2; if it is lexicalized only in English, assign the value 3; if the UW is not lexicalized even in English, assign the value 4).
    • Choose only one English word as the headword (the most frequent one). Normally, the headword contains all the English words belonging to the same synset. Choose only one as the headword for the UW.
    • Choose a semantic root to the UW. The semantic root corresponds to the basic unit of meaning and it must always be expressed by a nominal UW. It should be used when the UW can be "reduced" to another UW. For instance: "hungry", "hungrily" and "hunger" can be reduced to "hunger": "hunger.@full_of", "hunger.@full_of.@manner", "hunger.@full_of.@make". Note that these attributes do not actually exist (they just translated the role of English derivational suffixes). In many cases, there is no root at all. But be careful, the semantic root, differently from the grammatical root, is not related only to compounds and derived forms, and sometimes is counter-intuitive (the derived word is the root of the simple word). For instance: the root of "kill" (simple form) is "die" (simple form), because "kill" can be reduced to "die.@cause_to"; the root of "abstract" (simple form) is "abstractness" (derived form), because "abstract" can be reduced to "abstractness.@full_of" (note that "abstractness" could also be reduced to "abstract.@ness", but not that the root must be a nominal UW). Note also that heteronyms can normally be reduced to a non-marked form: the semantic root of "cow" is "cattle", because cattle can be reduced to "cattle.@female".
    • Define the semantic structure of the UW. The semantic structure corresponds to the analyzed form of the UWs that can be reduced to others. For instance: in the case of "cow", the headword is "cow", the semantic root is "cattle" and the semantic structure is "cattle.@female".
    • Verify the hypernym of the the UW. The hypernym is an existing nominal UW that defines a general class of which the UW is a type. For instance: the hypernym of "blue" is "color"; of "kill" is action; of "table" is "furniture"; etc. The existing hypernym has been imported from the WordNet. Check whether it's really pertinent and change it in case it is not.
    • Definition. Provide the definition in case this field is empty.
    • Examples. Provide examples, in English, in case this field is empty.
    • Abstractness. This field must be used only in case of nominal UWs. A UW is considered "concrete" if conveys a concept that can be perceived by touch; and abstract, otherwise. In case of doubt, leave this field empty.
    • Animacy. This field must be used only in case of nominal UWs. A UW is considered "animate" if conveys a concept that may work as an agent; and inanimate, otherwise. In case of doubt, leave this field empty.
    • Cardinality. This field must be used only in case of nominal UWs. It is associated to the "semantic number" of the UW, which is language-independent, and may not be the same as the "grammatical number", which is language dependent. For instance: "glasses", in English, is plural, but its semantic number is singular, because it corresponds to a singular object.
    • Gender. This field must be used only in case of nominal UWs that refer to animate beings. Note that it corresponds to the "semantic gender", which is language-independent, and not to the "grammatical gender". The UW corresponding to "cow", for instance, is feminine; the one one corresponding to "bull", is masculine. Many UWs do not have any semantic gender (as "cattle", for instance). In this case, leave this field empty.
    • Polarity. This field refers to the sentiment conveyed by the word. There are UWs that convey positive senses ("win", "good", "birth") and UWs that convey negative senses ("lose", "bad", "death"). Many UWs do not convey any sentiment at all ("computer", for instance).
    • Semantic class. This field has been filled in the WordNet. Change it only in case it is empty.
    • Semantic frame. You have to associate a semantic frame to each UW. The semantic frame is the semantic valency of a UW, and it indicates which are the relations that a UW "necessarily" assigns (i.e., which are the necessary specifiers or complements of UWs). Do not think of optional adjuncts, but only of the necessary arguments. For instance, the UW corresponding to the concept "to die" assigns, necessarily, one relation: exp (semantic frame = exp();); the UW corresponding to "to kill" assigns, necessarily, two relations: agt and obj (semantic frame = agt()obj();); the UW "to give" assigns, necessarily, three relations: agt, obj and adr (semantic frame: agt()obj()adr();). Note this is not only related to verbal UWs, but to any UW: there are nominal UWs (such as "construction") and adjective UW (such as "necessary") that also assigns necessary semantic relations. For the relations, use the extended set defined at Universal Relations. Some examples have been provided at UNLARIUM>GRAMMAR>UNL>SEMANTIC FRAME.
    • Source language. For the time being, this is English, by default, because all the entries have been extracted out of English lexical databases.

Observations

Think about concepts rather than about words
Note that we are dealing with concepts and not with English words. The same English word may belong to several different synsets. So, be careful when analyzing a UW: you should not think about the English word in general, but only about the instance of the English word that is captured by the UW.
Do not delete any UW
Although the delete button is available at the interface, do not use it.
Software