III UNL Olympiad

From UNL Wiki
Revision as of 19:11, 5 December 2013 by Martins (Talk | contribs)
Jump to: navigation, search

The UNL Olympiad is a series of competitions organised by the UNDL Foundation in order to foster the development of UNL-driven resources (dictionaries, grammars and corpora). The third edition of the Olympiad is devoted to the development of grammars and dictionaries for UNLizing (with IAN) and NLizing (with EUGENE) the corpus derived from the project UGO-A1.

Contents

Important dates

Preparatory phases

  • UGO-A1: until Jan 15, 2014
  • Open discussion: from Jan 16 to Jan 30, 2014
  • Official release of the corpus: Feb 15th, 2014

Competition

  • Deadline for the grammars and dictionaries: Mar 30, 2014
  • Results: April 15, 2014

Preparatory Phases

Participation in preparatory phases is open to all candidates and is not compulsory, provided that at least one user of the working language completes the project UGO-A1.

UGO-A1

The first phase of the III UNL Olympiad is the project UGO-A1, which is open and funded for all languages. This project will set the corpus and the reference for UNLization and NLization. It consists of 250 UNL graphs that must be NLized (i.e., generated into natural language, manually). This process must be done only once for each language, i.e., it is not necessary (nor possible) that all users address the 250 UNL graphs. The progress report of the project UGO-A1 with the number of available entries is available at UNLWEB>PROJECTS>UGO-A1>PROGRESS REPORT.

Open Discussion

The second phase of the III UNL Olympiad will consist in an open discussion of the results of the project UGO-A1. All users will be able to propose the inclusion, suppression or modification of the NLizations proposed to the UNL graphs, in order to avoid any biases.

Oficial Release

The official corpus and set of languages, resulting from the preparatory phases, will be released on Feb 15th, 2014.

Competition

Goals

The III UNL Olympiad has two main goals:

  • To prepare the dictionaries and grammars for UNLizing, with IAN, the corpus UGO-A1; and/or
  • To prepare the dictionaries and grammars for NLizing, with EUGENE, the corpus UGO-A1.

UNLization is the process of representing, into UNL, the information conveyed by a natural language document. NLization, conversely, is the process of representing, in natural language, the information conveyed by a UNL document. IAN and EUGENE are engines used to automatic UNLization and NLization, respectively. They are available at the UNLdev.

Modalities

The competition is organised in two modalities:

  • Best UNLization Grammar for <LANGUAGE>
  • Best NLization Grammar for <LANGUAGE>

Where <LANGUAGE> is one of the languages participating in this Olympiad (see the complete list below).
Candidates may participate in one or two modalities, i.e., they may work with the UNLization grammar, with the NLization grammar, or with both.
Candidates may also participate in one or more languages, provided that they belong to the list below.

Prizes

Prizes are awarded to the best grammars of each modality (UNLization and NLization) for each language[1]:

  • 1st place: Gold Medal
  • 2nd place: Silver Medal
  • 3rd place: Bronze Medal

Additionally, the 10 best UNLization grammars in general (i.e., among all languages) and the 10 best NLization grammars in general will be awarded with USD500.00 each.

Corpus

The corpus will be extracted from the results of the project UGO-A1 and will be officially released on Feb 15th, 2015.

Instructions

Manuals, instructions, examples and samples of grammars for the corpus UCA1 may be found at UCA1.

Requisites

The competition is free and open to any participant, but it is limited to the set of languages described below.
The files must comply with the following requisites:

  • The corpus must comply with the translation standards of the target language and should not be artificially translated in order to provoke better results.
  • The input corpus used in UNLization will be used as the reference corpus used to evaluate the NLization output.
  • The dictionary files must comply with the Dictionary Specs and may only bring features present in the Tagset. They should not contain temporary words.
  • The grammar files must comply with the Grammar Specs and must be as generic possible. They should not target the specific sentences of the corpus, but the general structures presented there.
  • The F-measure of the grammars must be equal or greater than 0.9[2]
  • The files must be original. Grammars whose similarity proves to go beyond any reasonable doubt will be discarded, unless provided by the same author (for different languages).

Evaluation

Grammars will be evaluated and ranked according to the following criteria:

  • Best F-measure
  • Scalability (i.e., extendibility, or the capacity of being reused to other corpora), in case of grammars with the same F-Measure
  • Date of submission, in case of grammars with the same F-Measure and equally scalable

Notes

  1. This means that for each language there will be awarded up to 6 prizes: Best UNLization Grammar, Second Best UNLization Grammar, Third Best UNLization Grammar, Best NLization Grammar, Second Best UNLization Grammar and Third Best NLization Grammar
  2. The F-measure may be calculate at UNLWEB>UNLARIUM>GRAMMAR>[LOCALE]>F-MEASURE
Software