IV UNL Olympiad

From UNL Wiki
Revision as of 12:14, 5 November 2014 by Martins (Talk | contribs)
Jump to: navigation, search

The UNL Olympiad is a series of competitions organised by the UNDL Foundation in order to foster the development of UNL-driven resources (dictionaries, grammars and corpora). The fourth edition of the Olympiad is devoted to the development of grammars and dictionaries for UNLizing (with IAN) and NLizing (with EUGENE) the corpus derived from the project AESOP-A1.

Contents

Important dates

Preparatory phase

  • AESOP-A1: until 22 Sep 2014
  • Open discussion: 23-30 Sep 2014
  • Official release of the corpus: 7 Oct 2014

Competition

  • Deadline for uploading the grammars and dictionaries: 15 Nov 2014
  • First results: 23 Nov 2014
  • Open discussion of the results: 24-30 Nov 2014
  • Final results: 7 Dec 2014

Preparatory Phase (Concluded)

Participation in the preparatory phase is open to all candidates and is not compulsory, provided that at least one user of the working language completes the project AESOP-A1.

AESOP-A1

The first phase of the IV UNL Olympiad is the project AESOP-A1, which is open and funded for all languages. This project will set the corpus and the reference for UNLization and NLization. It consists of 13 UNL graphs that must be NLized (i.e., generated into natural language, manually). This process must be done only once for each language, i.e., it is not necessary (nor possible) that all users address the 13 UNL graphs. The progress report of the project AESOP-A1 with the number of available entries is available at UNLWEB>PROJECTS>AESOP-A1>PROGRESS REPORT.

Open Discussion

The second phase of the IV UNL Olympiad will consist in an open discussion of the results of the project AESOP-A1. All users will be able to propose the inclusion, suppression or modification of the NLizations proposed to the UNL graphs, in order to avoid any biases and privileges for specific users.

Official Release

The official corpus and set of languages, resulting from the preparatory phases, was released on 7 Oct 2014.

Competition

Goals

The IV UNL Olympiad has two main goals:

  • To prepare the dictionaries and grammars for UNLizing, with IAN, the corpus AESOP-A1; and/or
  • To prepare the dictionaries and grammars for NLizing, with EUGENE, the corpus AESOP-A1.

UNLization is the process of representing, into UNL, the information conveyed by a natural language document. NLization, conversely, is the process of representing, in natural language, the information conveyed by a UNL document. These are done, respectively, with IAN and EUGENE, which are engines available at the UNLdev.

Modalities

The competition is organised in two modalities:

  • Best UNLization Grammar for <LANGUAGE>
  • Best NLization Grammar for <LANGUAGE>

Where <LANGUAGE> is one of the languages participating in this Olympiad (see the complete list below).
Candidates may participate in one or two modalities, i.e., they may work with the UNLization grammar, with the NLization grammar, or with both.
Candidates may also participate in one or more languages, provided that they belong to the official list.

Prizes

Prizes are awarded to the best grammars of each modality (UNLization and NLization) for each language[1]:

  • 1st place: Gold Medal
  • 2nd place: Silver Medal
  • 3rd place: Bronze Medal

Additionally, the 10 best UNLization grammars among all languages and the 10 best NLization grammars among all languages will be awarded USD500.00 each.[2]

Corpus

The official corpus will be available for download at UNLWEB>UNLARIUM>CORPUS>AESOP-A1>EXPORT on 7 Oct 2014.

Languages

The list of languages participating in the IV UNL Olympiad is the following:

  • Armenian
  • Baatonum
  • Bengali
  • Bosnian
  • Bulgarian
  • Czech
  • Estonian
  • French
  • Georgian
  • German
  • Greek (Ancient)
  • Greek (Modern)
  • Hindi
  • Khmer
  • Latin
  • Oriya
  • Panjabi
  • Persian
  • Portuguese
  • Russian
  • Sinhala
  • Tamil
  • Telugu
  • Ukrainian
  • Vietnamese

Instructions

  1. The competition is free and open to any participant, but it is limited to the set of languages listed above.
  2. The OFFICIAL CORPUS must be extracted from the UNLarium (at UNLWEB>UNLARIUM>CORPUS>AESOP-A1>EXPORT>WORKING LANGUAGE). The NL corpus is used in UNLization (with IAN); the UNL corpus is used in NLization (with EUGENE).
  3. Candidates must build their WORKING CORPUS out of the official corpus by selecting one NL sentence for each UNL graph. Note that, in the official corpus, the same UNL graph may have several candidate NL sentences in the same language. Candidates must select only one (per graph) to work with. This means that the UNLization will involve 13 NL sentences (i.e., candidates are expected to provide the dictionaries and grammars to map 13 sentences from their working language to UNL), and the NLization will involve 13 UNL graphs (i.e., candidates are expected to provide the dictionaries and grammars to map 13 UNL graphs to their working language).
  4. The goal of the UNLization modality is to UNLize ALL sentences from the NL WORKING CORPUS (not from the official corpus); the goal of the NLization modality is to NLize ALL graphs from the UNL WORKING CORPUS.[3]
  5. Absolutely no change can be made to any sentence from the official corpus (either in natural language or in UNL), i.e., candidates can only choose among the official sentences but cannot alter them, and must use them as they are.
  6. In order to apply, candidates must upload the grammar and dictionary files to www.unlweb.net/unlversity until the deadline
  7. The dictionary files must comply with the Dictionary Specs and may only bring features present in the Tagset. They should not contain temporary words.
  8. The grammar files must comply with the Grammar Specs and must be as generic possible. They should not target the specific sentences of the corpus, but the general structures presented there.
  9. The F-measure of the grammars must be equal or greater than 0.9[4].
  10. The files must be original. Grammars whose similarity proves to go beyond any reasonable doubt will be discarded, unless provided by the same author (for different languages).

Evaluation

Grammars will be evaluated and ranked according to the following criteria:

  • Best F-measure
  • Scalability (i.e., extendibility, or the capacity of being reused to other corpora), in case of grammars with the same F-Measure
  • Date of submission, in case of grammars with the same F-Measure and equally scalable

Notes

  1. This means that for each language there will be awarded up to 6 prizes: Best UNLization Grammar, Second Best UNLization Grammar, Third Best UNLization Grammar, Best NLization Grammar, Second Best UNLization Grammar and Third Best NLization Grammar
  2. The value of USD500.00 will be paid only to the 10 best UNLization or NLization grammars in general, and not to the 10 best UNLization/NLizations of each language.
  3. The goal of the Olympiad is to provide ONE POSSIBLE MAPPING for each structure, i.e., to map natural language sentences to at least one valid UNL graph, and to map the UNL graph into at least one valid NL sentence. This means that, if the natural language is ambiguous, and may be mapped into several different UNL graphs, the UNLization will be considered valid if the resulting UNL graph is one of the possible candidates according to the OFFICIAL CORPUS. Conversely, whenever the same UNL graph may be mapped into several different NL sentences, the NLization is considered valid if the resulting NL sentence is one of the possible mappings according to the OFFICIAL CORPUS.
  4. The F-measure may be calculate at UNLWEB>UNLARIUM>TOOLS>F-MEASURE
Software