Tagset

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Tree of attributes and values)
(Tree of attributes and values)
Line 274: Line 274:
 
***superior status (SPS)
 
***superior status (SPS)
 
*[[syntax|syntactic roles]] (SYN)
 
*[[syntax|syntactic roles]] (SYN)
**adverbial phrase (AP)
+
**adjunct (ADJT)
 +
***adjunct to the head of an adjective phrase (JA)
 
***adjunct to the head of an adverbial phrase (AA)
 
***adjunct to the head of an adverbial phrase (AA)
***adverbial phrase (intermediate projection) (AB)
 
***complement of the head of an adverbial phrase (AC)
 
***head of an adverbial phrase (AH)
 
***specifier of the head of an adverbial phrase (AS)
 
**complementizer phrase (CP)
 
 
***adjunct to the head of a complementizer phrase (CA)
 
***adjunct to the head of a complementizer phrase (CA)
***complementizer phrase (intermediate projection) (CB)
 
***complement of the head of a complementizer phrase (CC)
 
***head of a complementizer phrase (CH)
 
***specifier of the head of a complementizer phrase (CS)
 
**determiner phrase (DP)
 
 
***adjunct to the head of a determiner phrase (DA)
 
***adjunct to the head of a determiner phrase (DA)
***determiner phrase (intermediate projection) (DB)
 
***complement of the head of a determiner phrase (DC)
 
***head of a determiner phrase (DH)
 
***specifier of the head of a determiner phrase(DS)
 
**inflectional phrase (IP)
 
 
***adjunct to the head of an inflectional phrase (IA)
 
***adjunct to the head of an inflectional phrase (IA)
***inflectional phrase (intermediate projection) (IB)
 
***complement of the head of an inflectional phrase (IC)
 
***head of an inflectional phrase (IH)
 
***specifier of the head of an inflectional phrase (IS)
 
**adjective phrase (JP)
 
***adjunct to the head of an adjective phrase (JA)
 
***adjective phrase (intermediate projection) (JB)
 
***complement of the head of an adjective phrase (JC)
 
***head of an adjective phrase (JH)
 
***specifier of the head of an adjective phrase(JS)
 
**nominal phrase (NP)
 
 
***adjunct to the head of a nominal phrase (NA)
 
***adjunct to the head of a nominal phrase (NA)
***nominal phrase (intermediate projection) (NB)
 
***complement of the head of a nominal phrase (NC)
 
***head of a nominal phrase (NH)
 
***specifier of the head of a nominal phrase (NS)
 
**prepositional phrase (PP)
 
 
***adjunct to the head of a prepositional phrase (PA)
 
***adjunct to the head of a prepositional phrase (PA)
***prepositional phrase (intermediate projection) (PB)
 
***complement of the head of a prepositional phrase (PC)
 
***head of a prepositional phrase (PH)
 
***specifier of the head of a prepositional phrase (PS)
 
**verbal phrase (VP)
 
 
***adjunct to the head of a verbal phrase (VA)
 
***adjunct to the head of a verbal phrase (VA)
***verbal phrase (intermediate projection) (VB)
+
**complement (COMP)
 +
***complement of the head of an adjective phrase (JC)
 +
***complement of the head of an adverbial phrase (AC)
 +
***complement of the head of a complementizer phrase (CC)
 +
***complement of the head of a determiner phrase (DC)
 +
***complement of the head of an inflectional phrase (IC)
 +
***complement of the head of a nominal phrase (NC)
 +
***complement of the head of a prepositional phrase (PC)
 
***complement of the head of a verbal phrase (VC)
 
***complement of the head of a verbal phrase (VC)
 +
**head (HEAD)
 +
***head of an adverbial phrase (AH)
 +
***head of an adjective phrase (JH)
 +
***head of a complementizer phrase (CH)
 +
***head of a determiner phrase (DH)
 +
***head of an inflectional phrase (IH)
 +
***head of a nominal phrase (NH)
 +
***head of a prepositional phrase (PH)
 
***head of a verbal phrase (VH)
 
***head of a verbal phrase (VH)
 +
**specifier (SPEC)
 +
***specifier of the head of an adjective phrase(JS)
 +
***specifier of the head of an adverbial phrase (AS)
 +
***specifier of the head of a complementizer phrase (CS)
 +
***specifier of the head of a determiner phrase(DS)
 +
***specifier of the head of an inflectional phrase (IS)
 +
***specifier of the head of a nominal phrase (NS)
 +
***specifier of the head of a prepositional phrase (PS)
 
***specifier of the head of a verbal phrase (VS)
 
***specifier of the head of a verbal phrase (VS)
 +
**maximal projection (XP)
 +
***adjective phrase (JP)
 +
***adverbial phrase (AP)
 +
***complementizer phrase (CP)
 +
***determiner phrase (DP)
 +
***inflectional phrase (IP)
 +
***nominal phrase (NP)
 +
***prepositional phrase (PP)
 +
***verbal phrase (VP)
 +
**intermediate projection (XB)
 +
***adverbial phrase (AB)
 +
***adjective phrase (JB)
 +
***complementizer phrase (CB)
 +
***determiner phrase (DB)
 +
***inflectional phrase (IB)
 +
***nominal phrase (NB)
 +
***prepositional phrase (PB)
 +
***verbal phrase (VB)
 
*[[tense]] (TNS)
 
*[[tense]] (TNS)
 
**absolute tense  
 
**absolute tense  

Revision as of 12:28, 25 January 2010

The set of features in a UNL-driven dictionary depends on the structure of the natural language and may vary a lot. However, in order to better standardize lexical resources inside the UNL framework, the UNDL Foundation recommends the adoption of the following tags for some specific and pervasive grammatical phenomena. Several of those linguistic constants have been already proposed to the Data Category Registry (ISO 12620), and represent widely accepted linguistic concepts. Our main intention here is just to provide a harmonized system to be shared by the UNL community so as to make dictionaries as easily understandable and exchangeable as possible.

When to use the UNDLF Tagset

The UNDLF Tagset is required for providing lexical resources (dictionary entries and grammar rules) in the UNLarium framework. Indeed, the whole environment has been already prepared to accept only the tags here presented. In most cases, the use of tags is rather unnoticeable and effortless, since users are supposed to make higher-level choices ("adjective", for instance) which will be internally represented through the corresponding authorized labels ("ADJ"). However, in several circumstances, as when creating inflectional paradigms or subcategorization frames, users are expected to address more fine-grained linguistic phenomena that may require a specialized metalanguage. That's exactly the purpose of this tagset: to provide the technical means for describing any linguistic behaviour. And it should do that in a strongly standardised way, i.e., so that others could easily understand and exploit the data for their own benefit.

General Guidelines

In order to define the tags to be used in the UNDLF Tagset, the following premises were adopted:

  • Tags should be as comprehensive as possible (i.e., they should cover all widely accepted linguistic concepts)
  • Tags should be as few as possible (i.e., they should avoid redundancy)
  • Tags should be as short as possible (i.e., they should fit in a three-character string)
  • Tags should be as mnemonic as possible (i.e., they should be provided through English acronyms or abbreviations)
  • Tags should constitute a taxonomic hierarchy (so that upper level values could be inferred from the lower ones).

Additionally, the following conventions were adopted:

  • Tags are written in upper case letters;
  • Negation is represented by prefixation with "N-" (past = PAS, nonpast = NPAS).

We have tried to stick to the standard abbreviations proposed by the Leipzig Glossing Rules and by David Crystal in A dictionary of Linguistics and Phonetics (2008), as much as they comply with the rules above. The resulting set of tags, which is still subject to additions and revisions, is presented below. For the time being, the definitions and examples have been extracted out of the Glossary of Linguistic Terms (Loos et alii), available at SIL International. The tags are expected to migrate to an on-line environment, still under construction, where accredited linguists will have the opportunity to enhance and to improve this repertoire.

Tree of attributes and values

The hierarchy of tags is depicted in the tree below. The topmost level represents the attributes of which the tags are a value. Lower positions subsume upper levels (for instance: animate is a value of concrete which is a value of abstracteness), but are not mandatory, as they can be too specialized ("go" is just a verb, and not any of the subcategories of verb). In any case, natural language phenomena should be classified as deep as possible in the tagset structure ("un-" should be classified as a prefix, rather than as an affix).

List of attributes and values in alphabetical order (pdf)

Software