Tagset

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Tree of attributes and values)
Line 24: Line 24:
 
The hierarchy of tags is depicted in the tree below. The topmost level represents the attributes of which the tags are a value. Lower positions subsume upper levels (for instance: progressive is a value of continuative, which is a value of imperfective, which is a value of the attribute aspect), but are not mandatory, as they can be too specialized ("go" is just a verb, and not any of the subcategories of verb). In any case, natural language phenomena should be classified as deep as possible in the tagset structure ("un-" should be classified as a prefix, rather than as an affix).
 
The hierarchy of tags is depicted in the tree below. The topmost level represents the attributes of which the tags are a value. Lower positions subsume upper levels (for instance: progressive is a value of continuative, which is a value of imperfective, which is a value of the attribute aspect), but are not mandatory, as they can be too specialized ("go" is just a verb, and not any of the subcategories of verb). In any case, natural language phenomena should be classified as deep as possible in the tagset structure ("un-" should be classified as a prefix, rather than as an affix).
  
  [[Media:UNDLFTagset.pdf|List of attributes and values in alphabetical order (pdf)]]
+
   
 +
[http://www.unlweb.net/unlarium/dictionary/export_tagset.php List of tags in alphabetical order]
  
{{#tree:id=tagset|openlevels=0|root=Attributes|
+
{{#tree:id=tagset|openlevels=0|root=Tags|
 +
 
 +
*abstractness (ABN)
 +
**abstract (ABT)
 +
**concrete (CCT)
 +
*[[adjacency]] (AJC)
 +
**immediate (AJ0)
 +
**nearest (AJ1)
 +
**near (AJ2)
 +
**distant (AJ3)
 +
**most distant (AJ4)
 
*[[agreement]] (AGR)
 
*[[agreement]] (AGR)
 
**assigns case (ACAS)
 
**assigns case (ACAS)
Line 32: Line 43:
 
**assigns number (ANUM)
 
**assigns number (ANUM)
 
**assigns person (APER)
 
**assigns person (APER)
 +
**assigns tense (ATNS)
 
**receives case (RCAS)
 
**receives case (RCAS)
 
**receives gender (RGEN)
 
**receives gender (RGEN)
 
**receives number (RNUM)
 
**receives number (RNUM)
 
**receives person (RPER)
 
**receives person (RPER)
 +
**receives tense (RTNS)
 +
*alienability (ALY)
 +
**alienable (ALI)
 +
**unalienable (NALI)
 
*animacy (ANI)
 
*animacy (ANI)
 
**animate (ANM)
 
**animate (ANM)
 
**inanimate (NANM)
 
**inanimate (NANM)
 
*[[aspect]] (ASP)
 
*[[aspect]] (ASP)
 +
**causative (CAU)
 
**perfective (PFV)
 
**perfective (PFV)
 
**imperfective (NPFV)
 
**imperfective (NPFV)
Line 54: Line 71:
 
**inceptive (ICP)
 
**inceptive (ICP)
 
**terminative (TER)
 
**terminative (TER)
 +
*cardinality (CAR)
 +
**one single referent (ONE)
 +
**a pair of referents (TWO)
 +
**three referents (TRE)
 +
**countable (CTB)
 +
**uncountable (NCTB)
 +
**collective (COL)
 +
**more than one referent (PLU)
 
*[[case]] (CAS)
 
*[[case]] (CAS)
 
**abessive (ABE)
 
**abessive (ABE)
 
**ablative (ABL)
 
**ablative (ABL)
 
**accusative (ACC)
 
**accusative (ACC)
 +
**adessive (ADE)
 
**allative (ALL)
 
**allative (ALL)
**absolutive (ASL)
+
**absolutive (ABS)
 
**benefactive (BEN)
 
**benefactive (BEN)
**causative (CAU)
 
 
**comitative (CMT)
 
**comitative (CMT)
**dative case (DAT)
+
**dative (DAT)
 
**delative (DEL)
 
**delative (DEL)
 
**elative (ELA)
 
**elative (ELA)
Line 70: Line 95:
 
**essive (ESS)
 
**essive (ESS)
 
**genitive (GNT)
 
**genitive (GNT)
 +
**hortative (HOR)
 
**illative (ILL)
 
**illative (ILL)
 
**inessive (INE)
 
**inessive (INE)
Line 81: Line 107:
 
**partitive (PTT)
 
**partitive (PTT)
 
**superessive (SPE)
 
**superessive (SPE)
 +
**terminative (TRM)
 
**translative (TLT)
 
**translative (TLT)
 
**vocative (VOC)
 
**vocative (VOC)
Line 90: Line 117:
 
**specificied (SPC)
 
**specificied (SPC)
 
*[[degree]] (DEG)
 
*[[degree]] (DEG)
 +
**augmentative (AUG)
 
**comparative (CMP)
 
**comparative (CMP)
 +
**diminutive (DIM)
 
**positive (PST)
 
**positive (PST)
 
**superlative (SUP)
 
**superlative (SUP)
 +
***absolute superlative (SUPA)
 +
***comparative superlative (SUPR)
 
*[[distribution]] (DIS)
 
*[[distribution]] (DIS)
**order
+
**after (AFT)
***premodifier (BEF)
+
**before (BEF)
***postmodifier (AFT)
+
**end (END)
***middle position (MID)
+
**free (FRE)
***free distribution (FRE)
+
**front (FRT)
**position
+
**immediately after (IAFT)
***immediately (IMM)
+
**immediately before (IBEF)
***far (FAR)
+
**middle (MID)
 +
*[[information structure]] (IST)
 +
**focus (FOC)
 +
**rheme (RHE)
 +
**theme (THE)
 
*[[gender]] (GEN)
 
*[[gender]] (GEN)
 
**feminine (FEM)
 
**feminine (FEM)
Line 109: Line 144:
 
**variable (VAR)
 
**variable (VAR)
 
*[[lexical category]] (LEX)
 
*[[lexical category]] (LEX)
**adjective (J)
+
**[[adjective]] (J)
**adposition (P)
+
**[[adposition]] (P)
**adverb (A)
+
**[[adverb]] (A)
**affix (F)
+
**[[affix]] (F)
**conjunction (C)
+
**[[conjunction]] (C)
**determiner (D)
+
**[[determiner]] (D)
**inflection (I)
+
**[[inflection]] (I)
**noun (N)
+
**[[noun]] (N)
***pronoun (R)
+
**[[numeral]] (U)
***proper noun (E)
+
**[[pronoun]] (R)
**verb (V)
+
**[[verb]] (V)
 
**other (O)
 
**other (O)
 
*[[lexical structure]] (LST)
 
*[[lexical structure]] (LST)
 
**subword (SBW)
 
**subword (SBW)
 
**simple word (WRD)
 
**simple word (WRD)
***abbreviation (ABR) and single-word contraction
+
***abbreviation (ABB) and single-word contraction
 +
***clitic (CLI)
 
**multiword expression (MTW)
 
**multiword expression (MTW)
 
***acronym (ACR) and initialism
 
***acronym (ACR) and initialism
 
***multiple-word contraction (CTT) and blend
 
***multiple-word contraction (CTT) and blend
 +
*[[modality]] (MOD)
 +
**realis (REA)
 +
**irrealis (NREA)
 +
**alethic (ALE)
 +
**deontic (DEO)
 +
***comissive (CMS)
 +
***directive (DRT)
 +
***volitive (VLT)
 +
**epistemic (EPI)
 +
***evidentiality (EVI)
 +
***judgment (JDG)
 
*[[mood]] (MOO)
 
*[[mood]] (MOO)
 +
**none (non-finite verb forms) (VBL)
 +
***gerund (GER)
 +
***gerundive (GDV)
 +
***infinitive (INF)
 +
***participle (PTP)
 +
***supine (SPN)
 
**assumptive (AUM)
 
**assumptive (AUM)
 +
**causative (CAU)
 
**conditional (CON)
 
**conditional (CON)
 
**declarative (DEC)
 
**declarative (DEC)
Line 139: Line 193:
 
**imprecative (IPC)
 
**imprecative (IPC)
 
**indicative (IND)
 
**indicative (IND)
 +
**inferential (INFR)
 
**interrogative (INT)
 
**interrogative (INT)
 
**jussive (JUS)
 
**jussive (JUS)
Line 153: Line 208:
 
***inflectional affix (IAX)  
 
***inflectional affix (IAX)  
 
***derivational affix (DAX)
 
***derivational affix (DAX)
**root (ROO)
+
**base form (BF)
**stem (STE)
+
***root (ROO)
 +
***stem (STE)
 
**word form (WFO)
 
**word form (WFO)
 +
**alternative form (ALT)
 +
***alternative form 1 (ALT1)
 +
***alternative form 2 (ALT2)
 +
***alternative form 3 (ALT3)
 +
***short or weak form (SHO)
 +
***long or strong form (STR)
 
*[[number]] (NUM)
 
*[[number]] (NUM)
 
**singular (SNG)
 
**singular (SNG)
Line 168: Line 230:
 
**invariant (INV)
 
**invariant (INV)
 
*[[part of speech]] (POS)
 
*[[part of speech]] (POS)
**adjective (ADJ)
+
**[[adjective]]s (J)
**adposition (ADP)
+
***adjective (ADJ)
 +
***participle (PTL)
 +
**[[adposition]] (P)
 
***circumposition (CIR)
 
***circumposition (CIR)
 
***postposition (PPS)
 
***postposition (PPS)
 
***preposition (PRE)
 
***preposition (PRE)
**adverb (ADV)
+
**[[adverb]] (A)
 
***specifier adverb (SAV)
 
***specifier adverb (SAV)
 
***adjunct adverb (AAV)
 
***adjunct adverb (AAV)
 
***conjunct (CJT)
 
***conjunct (CJT)
 
***disjunct (DJT)
 
***disjunct (DJT)
**affix (AFF)
+
**[[affix]] (F)
 
***circumfix (CCX)
 
***circumfix (CCX)
 
***infix (IFX)
 
***infix (IFX)
 
***prefix (PFX)
 
***prefix (PFX)
 
***suffix (SFX)
 
***suffix (SFX)
**classifier (CLA)
+
**[[conjunction]] (C)
**conjunction (CJC)
+
 
***coordinating conjunction (COO)
 
***coordinating conjunction (COO)
 
****correlative conjunction (CRC)
 
****correlative conjunction (CRC)
Line 191: Line 254:
 
****complementizer (CMR)
 
****complementizer (CMR)
 
****relativizer (RVZ)
 
****relativizer (RVZ)
**determiner (DET)
+
**[[determiner]] (D)
 
***article (ART)
 
***article (ART)
 
***demonstrative determiner (DEM)
 
***demonstrative determiner (DEM)
 
***possessive determiner (POD)
 
***possessive determiner (POD)
 
***quantifier (QUA)
 
***quantifier (QUA)
**interjection (ITJ)
+
**inflection (I)
**noun (NOU)
+
***auxiliary verb (AUX)
 +
****modal verb (MOV)
 +
**[[noun]] (N)
 +
***common noun (NOU)
 
***proper noun (PPN)
 
***proper noun (PPN)
**numeral (NMR)
+
**[[numeral]] (U)
 +
***DIGIT (digits)
 +
****DOZEN (used to deal with dozens)
 +
****HUNDRED (used to deal with hundreds)
 
***cardinal numeral (CDN)
 
***cardinal numeral (CDN)
 
***distributive numeral (DTN)
 
***distributive numeral (DTN)
Line 205: Line 274:
 
***multiplicative numeral (MLN)
 
***multiplicative numeral (MLN)
 
***ordinal numeral (ORD)
 
***ordinal numeral (ORD)
**particle (PTC)
+
**[[pronoun]] (R)
**pronoun (PRO)
+
 
***demonstrative pronoun (DEP)
 
***demonstrative pronoun (DEP)
 
***dummy pronoun (DUM)
 
***dummy pronoun (DUM)
Line 217: Line 285:
 
***reflexive pronoun (FPR)
 
***reflexive pronoun (FPR)
 
***relative pronoun (RPR)
 
***relative pronoun (RPR)
**verb (VER)
+
**[[verb]] (V)
***auxiliary verb (AUX)
+
***full verb (VER)
****modal verb (MOV)
+
 
***copula (COP)
 
***copula (COP)
***verbal (VBL)
+
**other (O)
****infinitive (INF)
+
***classifier (CLA)
****gerund (GER)
+
***interjection (ITJ)
****participle (PTP)
+
***particle (PTC)
****supine (SPN)
+
***punctuation (PUT)
****gerundive (GDV)
+
****blank (BLK)
 +
****<nowiki>' </nowiki>(APOSTROPHE)
 +
****<nowiki>- </nowiki>(HYPHEN)
 +
****<nowiki>! </nowiki>(EMARK)
 +
****<nowiki>" </nowiki>(QUOTE)
 +
****<nowiki># </nowiki>(HASH)
 +
****<nowiki>$ </nowiki>(DOLLAR)
 +
****<nowiki>% </nowiki>(PERCENTAGE)
 +
****<nowiki>& </nowiki>(AMPERSAND)
 +
****<nowiki>( </nowiki>(OPARENTHESIS)
 +
****<nowiki>) </nowiki>(CPARENTHESIS)
 +
****<nowiki>* </nowiki>(ASTERISK)
 +
****<nowiki>, </nowiki>(COMMA)
 +
****<nowiki>. </nowiki>(PERIOD)
 +
****<nowiki>/ </nowiki>(FSLASH)
 +
****<nowiki>: </nowiki>(COLON)
 +
****<nowiki>; </nowiki>(SEMICOLON)
 +
****<nowiki>? </nowiki>(QMARK)
 +
****<nowiki>[ </nowiki>(OSBRACKET)
 +
****<nowiki>\ </nowiki>(BSLASH)
 +
****<nowiki>] </nowiki>(CSBRACKET)
 +
****<nowiki>{ </nowiki>(OCBRACE)
 +
****<nowiki>} </nowiki>(CCBRACE)
 +
****<nowiki>€ </nowiki>(EURO)
 +
****<nowiki>+ </nowiki>(PLUS)
 +
****<nowiki>< </nowiki>(LTHAN)
 +
****<nowiki>= </nowiki>(EQUAL)
 +
****<nowiki>> </nowiki>(GTHAN)
 
*[[person]] (PER)
 
*[[person]] (PER)
**first person singular (1PS)
+
**impersonal (NPER)
**first person plural (1PP)
+
**first person (1PER)
**second person singular (2PS)
+
***first person singular (1PS)
**second person plural (2PP)
+
***first person plural (1PP)
**third person singular (3PS)
+
****123PP (me, you and others)
**third person plural (3PP)
+
****13PP (me and others)
 +
**second person (2PER)
 +
***second person singular (2PS)
 +
***second person plural (2PP)
 +
**third person (3PER)
 +
***third person singular (3PS)
 +
***third person plural (3PP)
 
*[[polarity]] (POL)
 
*[[polarity]] (POL)
 
**affirmative (AFM)
 
**affirmative (AFM)
Line 242: Line 342:
 
**dialect (DIA)
 
**dialect (DIA)
 
**jargon (JGN)
 
**jargon (JGN)
 +
**literary (LIT)
 +
**pejorative (PEJ)
 
**slang (SLG)
 
**slang (SLG)
*semantic typology (SEM)
+
**taboo (TAB)
**act or action (ACT)
+
**animal (ANL)
+
**artifact (ARF)
+
**attribute (ATT)
+
**body part (BON)
+
**body action (BOV)
+
**cognitive noun (CGN)
+
**cognitive verb (CGV)
+
**change (CHA)
+
**communication noun (CMN)
+
**communication verb (CMV)
+
**competition (CPT)
+
**creation (CRE)
+
**consumption (CSM)
+
**contact (CTC)
+
**emotion (EMO)
+
**feeling (FEE)
+
**food (FOO)
+
**group (GRO)
+
**location (LCT)
+
**motion (MOT)
+
**motive (MTV)
+
**natural event (NEV)
+
**natural object (NOB)
+
**perception (PCP)
+
**natural phenomena (PHE)
+
**plant (PLA)
+
**possession noun (PON)
+
**possession verb (POV)
+
**natural process (NAT)
+
**person (PRS)
+
**quantity (QTT)
+
**relation (REL)
+
**substance (SBS)
+
**shape (SHA)
+
**social (SOC)
+
**state (STA)
+
**stative (STT)
+
**time (TIM)
+
**weather (WEA)
+
 
*[[social deixis]] (SOD)
 
*[[social deixis]] (SOD)
 
**solidarity (SOL)
 
**solidarity (SOL)
***familiarity (FAM)
+
***familiar (FAM)
***intimate social deixis (ITM)
+
***intimate (ITM)
***politeness (PLN)
+
***polite (PLN)
 
**status (STS)
 
**status (STS)
 
***equivalent (EVL)
 
***equivalent (EVL)
***inferior status (IFS)
+
***inferior (IFS)
***reverential form (REV)
+
***reverential (REV)
***superior status (SPS)
+
***superior (SPS)
*[[syntax|syntactic roles]] (SYN)
+
*[[syntactic roles]] (SYN)
**adjunct (ADJT)
+
**adjunct (XA)
 
***adjunct to the head of an adjective phrase (JA)
 
***adjunct to the head of an adjective phrase (JA)
 
***adjunct to the head of an adverbial phrase (AA)
 
***adjunct to the head of an adverbial phrase (AA)
Line 304: Line 366:
 
***adjunct to the head of a prepositional phrase (PA)
 
***adjunct to the head of a prepositional phrase (PA)
 
***adjunct to the head of a verbal phrase (VA)
 
***adjunct to the head of a verbal phrase (VA)
**complement (COMP)
+
**complement (XC)
 
***complement of the head of an adjective phrase (JC)
 
***complement of the head of an adjective phrase (JC)
 
***complement of the head of an adverbial phrase (AC)
 
***complement of the head of an adverbial phrase (AC)
Line 313: Line 375:
 
***complement of the head of a prepositional phrase (PC)
 
***complement of the head of a prepositional phrase (PC)
 
***complement of the head of a verbal phrase (VC)
 
***complement of the head of a verbal phrase (VC)
**head (HEAD)
+
**head (XH)
 
***head of an adverbial phrase (AH)
 
***head of an adverbial phrase (AH)
 
***head of an adjective phrase (JH)
 
***head of an adjective phrase (JH)
Line 322: Line 384:
 
***head of a prepositional phrase (PH)
 
***head of a prepositional phrase (PH)
 
***head of a verbal phrase (VH)
 
***head of a verbal phrase (VH)
**specifier (SPEC)
+
**specifier (XS)
 
***specifier of the head of an adjective phrase(JS)
 
***specifier of the head of an adjective phrase(JS)
 
***specifier of the head of an adverbial phrase (AS)
 
***specifier of the head of an adverbial phrase (AS)
Line 349: Line 411:
 
***prepositional phrase (PB)
 
***prepositional phrase (PB)
 
***verbal phrase (VB)
 
***verbal phrase (VB)
 +
**trace (TRACE)
 
*[[tense]] (TNS)
 
*[[tense]] (TNS)
 
**absolute tense (ATE)
 
**absolute tense (ATE)
***present (PRS)
 
 
***past (PAS)
 
***past (PAS)
 +
***present (PRS)
 +
****preterit (PTR)
 
****hesternal past tense (HEP)
 
****hesternal past tense (HEP)
 
****prehesternal past tense (PEP)
 
****prehesternal past tense (PEP)
Line 376: Line 440:
 
***relative nonfuture (NRFT)
 
***relative nonfuture (NRFT)
 
*[[transitivity]] (TRA)
 
*[[transitivity]] (TRA)
**ditransitive (DTST)
+
**no transitivity (NTRA) (linking verb)
**indirect transitive (ITST)
+
**transitive (TST)
 +
***direct transitive (TSTD)
 +
***indirect transitive (TSTI)
 +
***ditransitive (TST2)
 +
***tritransitive (TST3)
 
**intransitive (NTST)
 
**intransitive (NTST)
 
***unergative (NERG)
 
***unergative (NERG)
 
***unaccusative (NACC)
 
***unaccusative (NACC)
**direct transitive (TST)
+
*[[Universal Attribute]]s (att)
**tritransitive (TTST)
+
**animacy attributes (ANIA)
 +
**aspect attributes (ASPA)
 +
**degree attributes (DEGA)
 +
**emotion attributes (FEEL)
 +
**figure of speech attributes (FIGA)
 +
**gender attributes (GENA)
 +
**information structure attributes (ISTA)
 +
**lexical attributes (LEXA)
 +
**manner attributes (HOW)
 +
**modality attributes (MODA)
 +
**person attributes (PERA)
 +
**polarity attributes (POLA)
 +
**place attributes (WHERE)
 +
**quantification attributes (QUAA)
 +
**register attributes (REGA)
 +
**social deixis attributes (SODA)
 +
**specification attributes (WHICH)
 +
**syntactic structures (SYNA)
 +
**time attributes (WHEN)
 +
**voice attribute (VOIA)
 +
*[[Universal Relations]] (rel)
 +
*[[Universal Words]] (SEM)
 +
**Adjective concepts
 +
***age (AGE)
 +
***colour (COR)
 +
***dimension (DMS)
 +
***human propensity (HPP)
 +
***physical property (PHY)
 +
***speed (SPD)
 +
***value (VLE)
 +
***other adjectives (JJJ)
 +
**Adverbial concepts
 +
***degree (DGR)
 +
***manner (MAN)
 +
***place (PLE)
 +
***time (TME)
 +
***other adverbs (AAA)
 +
**Nominal concepts
 +
***act or action (ACT)
 +
***animal (ANL)
 +
***artifact (ARF) (man-made objects)
 +
***attribute (ATR) (of people and objects)
 +
***body part (BON)
 +
***cognitive processes and contents (CGN)
 +
***communicative processes and contents (CMN)
 +
***feelings and emotions (FEE)
 +
***foods and drinks (FOO)
 +
***groupings of people or objects (GRO)
 +
***location (LCT) (spatial position)
 +
***motive (MTV) (goals)
 +
***natural events (NEV)
 +
***natural objects (NOB) (non man-made objects)
 +
***natural phenomena (PHE)
 +
***plant (PLA)
 +
***possession or transfer of possession (PON)
 +
***natural process (NAT)
 +
***person (HUM)
 +
***quantities and units of measure (QTT)
 +
***relations between people or things or ideas (REL)
 +
***substance (SBS)
 +
***shape (SHA) (two or three-dimensional shapes)
 +
***state (STA) (stable states of affairs)
 +
***time and temporal relations (TIM)
 +
**Verbal concepts
 +
***body action (BOV)
 +
***cognitive verb (CGV)
 +
***change (CHA)
 +
***communication verb (CMV)
 +
***competition (CPT)
 +
***creation (CRE)
 +
***consumption (CSM)
 +
***contact (CTC)
 +
***emotion (EMO)
 +
***motion (MOT)
 +
***perception (PCP)
 +
***possession verb (POV)
 +
***social (SOC)
 +
***stative (STT)
 +
***weather (WEA)
 
*[[valency]] (VAL)
 
*[[valency]] (VAL)
 
**avalent (VAL0)
 
**avalent (VAL0)
Line 393: Line 539:
 
**middle voice (MIV)
 
**middle voice (MIV)
 
**passive voice (PSV)
 
**passive voice (PSV)
 +
*other
 +
**System-defined values
 +
***CHEAD (beginning of a scope)
 +
***CTAIL (end of a scope)
 +
***DIGIT (digits)
 +
***SCOPE (scope)
 +
***SHEAD (beginning of the sentence)
 +
***STAIL (end of the sentence)
 +
***TEMP (temporary entry - not found in the dictionary)
 +
**Grammar-related attributes
 +
***FLX (inflectional rules)
 +
***FRA (subcategorization frame)
 +
***GOV (subcategorization rules)
 +
***PAR (inflectional paradigm)
 +
***SFR (semantic frame)
 
}}
 
}}

Revision as of 18:55, 19 November 2014

The set of features in a UNL-driven dictionary depends on the structure of the natural language and may vary a lot. However, in order to better standardize lexical resources inside the UNL framework, the UNDL Foundation recommends the adoption of the following tags for some specific and pervasive grammatical phenomena. Several of those linguistic constants have been already proposed to the Data Category Registry (ISO 12620), and represent widely accepted linguistic concepts. Our main intention here is just to provide a harmonized system to be shared by the UNL community so as to make dictionaries as easily understandable and exchangeable as possible.

When to use the UNDLF Tagset

The UNDLF Tagset is required for providing lexical resources (dictionary entries and grammar rules) in the UNLarium framework. Indeed, the whole environment has been already prepared to accept only the tags here presented. In most cases, the use of tags is rather unnoticeable and effortless, since users are supposed to make higher-level choices ("adjective", for instance) which will be internally represented through the corresponding authorized labels ("ADJ"). However, in several circumstances, as when creating inflectional paradigms or subcategorization frames, users are expected to address more fine-grained linguistic phenomena that may require a specialized metalanguage. That's exactly the purpose of this tagset: to provide the technical means for describing any linguistic behaviour. And it should do that in a strongly standardised way, i.e., so that others could easily understand and exploit the data for their own benefit.

General Guidelines

In order to define the tags to be used in the UNDLF Tagset, the following premises were adopted:

  • Tags should be as comprehensive as possible (i.e., they should cover all widely accepted linguistic concepts)
  • Tags should be as few as possible (i.e., they should avoid redundancy)
  • Tags should be as short as possible (i.e., they should fit in a three-character string)
  • Tags should be as mnemonic as possible (i.e., they should be provided through English acronyms or abbreviations)
  • Tags should constitute a taxonomic hierarchy (so that upper level values could be inferred from the lower ones).

Additionally, the following conventions were adopted:

  • Tags are written in upper case letters;
  • Negation is represented by prefixation with "N-" (past = PAS, nonpast = NPAS).

We have tried to stick to the standard abbreviations proposed by the Leipzig Glossing Rules and by David Crystal in A dictionary of Linguistics and Phonetics (2008), as much as they comply with the rules above. The resulting set of tags, which is still subject to additions and revisions, is presented below. For the time being, the definitions and examples have been extracted out of the Glossary of Linguistic Terms (Loos et alii), available at SIL International. The tags are expected to migrate to an on-line environment, still under construction, where accredited linguists will have the opportunity to enhance and to improve this repertoire.

Tree of attributes and values

The hierarchy of tags is depicted in the tree below. The topmost level represents the attributes of which the tags are a value. Lower positions subsume upper levels (for instance: progressive is a value of continuative, which is a value of imperfective, which is a value of the attribute aspect), but are not mandatory, as they can be too specialized ("go" is just a verb, and not any of the subcategories of verb). In any case, natural language phenomena should be classified as deep as possible in the tagset structure ("un-" should be classified as a prefix, rather than as an affix).


List of tags in alphabetical order

Software