What is it

How to participate

The UNLarium is an integrated development environment for producing language resources for natural language processing (NLP). It is mainly a web-based database management system where registered users are able to create, to edit and to export dictionary entries and grammar rules according to the UNDL Foundation standards for language engineering.

Although originally conceived inside the UNL framework, the UNLarium intends not to require any deep knowledge on UNL, and its data may be used in several NLP systems, in addition to UNL-based applications. Furthermore, the system is supposed to be used as a research workplace for exchanging information and testing several linguistic constants that have been proposed for describing and predicting natural language phenomena. One of our main goals is to figure out a language-independent metalanguage that would be as comprehensive, as harmonized and as confluent as required by multilingual processing.

The UNLarium is an open and free collaborative environment. It intends to be as linguist-friendly as possible, and targets language specialists rather than computer experts. The system does not require intensive knowledge of UNL or of Computational Linguistics. Nevertheless, it requires some acquaintance with linguistic terminology, with semantic and syntactic formalisms, and very good knowledge of the working language. For the time being, it also requires knowledge of English, which is the language of the interface and of all the documentation.

In order join the project, users have to be approved in VALERIE, the Virtual Learning Environment for UNL. Non-accredited users have access to several facilities of the UNLarium, but are not allowed to add entries or rules.



Once in the system, users are assigned an account and start receiving UNLdots, which is a unit of time and complexity for measuring the effort spent in performing UNLarium-related tasks. UNLdots are used not only for calculating how long it takes to create dictionary entries and grammar rules, but also to evaluate the expertise of a given contributor.

For the time being, there are seven different user levels:

  • A0, up to 5,000 UNLdots
  • A1, from 5,001 to 15,000 UNLdots
  • A2, from 15,001 to 30,000 UNLdots
  • B1, from 30,001 to 50,000 UNLdots
  • B2, from 50,001 to 75,000 UNLdots
  • C1, from 75,001 to 100,000 UNLdots
  • C2, above 100,001 UNLdots
The UNLarium is a lexicon-based environment. Everything starts from the dictionary, and grammar rules are expected to be provided on demand. In the UNLarium, users are able to create and to store their own dictionaries and grammars, and to export the data in several different formats. Users are also allowed to search, to browse and to export corpora in UNL, and to download dictionaries and grammars that have been provided by other users and in other languages.

Once you enter the system, you will have four different possibilities:

  • Dictionary, for adding and editing dictionary entries;
  • Grammar, for adding and editing inflectional paradigms, subcategorization frames, and other grammar rules;
  • Corpus, for exploring the UNL corpus; and
  • Tools, for exporting and importing language resources, and generating metrics concerning natural language engineering.

Permissions and Workflow

Contributions & Remuneration

Users are assigned a profile, which is defined according to several characteristics, including level, expertise, institutional status and academic records. They can be promoted or demoted at any time depending on their participation in the project. The initial (default) level is Observer. In order to be promoted to the author level, users have to be approved in VALERIE, the Virtual Learning Environment for UNL.

Permissions are related mainly to the scope of actions, as follows:
  • Observers are allowed to browse dictionaries and grammars and navigate the system, but cannot add entries or grammar rules;
  • Trainees are allowed to add entries, but only under supervision;
  • Authors (A1 required) are allowed to add entries, but may edit only their own data;
  • Editors (B1 required) may also edit authors' data, but cannot edit other editors' data;
  • Revisers (C1 required) may edit editors' data, but cannot edit other revisers' data; and
  • Managers may edit any data, create projects and delete entries.
  • Supermanagers may edit the source code of the system.
In order to avoid problems, every entry or rule is double-checked inside the UNLarium: first by the editor, and then by the reviser. Permissions may be canceled depending on the users' track history. Authors can be demoted to Observers if their entries achieve more than 10% of errors. Vandalism and non-compliance with our Terms and Conditions will also be punished.
The UNLarium is open to anyone interested in producing dictionaries and grammars for natural language processing. Lexicographers, grammarians, language specialists, students of Linguistics, translators and other language-related professionals are specially welcomed. Institutions may also join the initiative, and may even propose or manage special projects.

For the time being, there have been four different types of contributors:

  • Volunteers, i.e., those who participate in the project voluntarily;
  • Freelancers, i.e., accredited professionals who are paid for their work;
  • Partners, i.e., members of affiliate institutions; and
  • Employees of the UNDL Foundation.
Freelancers are remunerated according to their level and to the amount of UNLdots accumulated in a given period of time. For the time being, freelancer assignments are restricted to some specific languages and to some specific projects, depending on the UNDL Foundation needs and funding. Only accredited professionals (i.e., those approved in VALERIE) are admitted as freelancers.

Hosted Projects

About UNL

The UNLarium is a corpus-driven environment. Users always have to choose a working corpus (project), in order to address the corresponding entries and grammatical phenomena, which are extracted automatically out of the UNL-ized version of the source document. The main goal is to provide resources to generate the corpus back from UNL into a natural language. This is a sort of validation strategy for the data inserted in the database. Any institution or individual may propose new projects, provided that they comply with the UNL Specs. The Universal Networking Language (UNL) is a knowledge representation language that has been used for several different tasks in natural language engineering, such as machine translation, multilingual document generation, summarization, information retrieval and semantic reasoning. It has been originally proposed by the Institute of Advanced Studies of the United Nations University, in Tokyo, and has been currently promoted by the UNDL Foundation, in Geneva, Switzerland, under a mandate of the United Nations.



Creative Commons LicenseAs a result of a collaborative project, the data stored in UNLarium is available under an Attribution Share Alike (CC-BY-SA) Creative Commons license, which means that you may use the resources as you want, provided that you cite the authors and that the derivative work is released under the same or a similar license. Users will find several quick hints (marked with ) available at the interface and more complete documentation () can be found at the UNLwiki. Additionally, we have also created the UNLforum, a place to discuss linguistic issues related to the dictionary and grammar structure.