Tagset

From UNL Wiki
Revision as of 14:33, 17 November 2009 by Admin (Talk | contribs)
Jump to: navigation, search

The set of features in a UNL-driven dictionary depends on the structure of the natural language and may vary a lot. However, in order to better standardize lexical resources inside the UNL framework, the UNDL Foundation recommends the adoption of the following tags for some specific and pervasive grammatical phenomena. Several of those linguistic constants have been already proposed to the Data Category Registry (ISO 12620), and represent widely accepted linguistic concepts. Our main intention here is just to provide a harmonized system to be shared by the UNL community so as to make dictionaries as easily understandable and exchangeable as possible.

General Guidelines

In order to define the tags to be used in the UNL Tagset, the following premises were adopted:

  • Tags should be as few as possible
  • Tags should be as short as possible
  • Tags should be as mnemonic as possible

These assumptions led us to the following general guidelines:

  • Tags should be made of a three-character upper-case string
  • Tags should be labelled out of English words
  • Tags should be provided in a attribute-value structure, along with definitions and examples.

The resulting set of tags, which is still subject to additions and revisions, is presented below. For the time being, the definitions and examples have been extracted out of the Glossary of Linguistic Terms (Loos et alii), available at SIL International. The tags are expected to migrate to an on-line environment, still under construction, where accredited linguists will have the opportunity to improve this repertoire.

List of tags

Software