III UNL Olympiad

From UNL Wiki
Jump to: navigation, search

The UNL Olympiad is a series of competitions organised by the UNDL Foundation in order to foster the development of UNL-driven resources (dictionaries, grammars and corpora). The third edition of the Olympiad is devoted to the development of grammars and dictionaries for UNLizing (with IAN) and NLizing (with EUGENE) the corpus derived from the project UGO-A1.

Contents

Important dates

Preparatory phase

  • UGO-A1: until Jan 15, 2014
  • Open discussion: from Jan 16 to Jan 30, 2014
  • Official release of the corpus: Feb 15, 2014

Competition

  • Deadline for uploading the grammars and dictionaries: Mar 30, 2014
  • First results: Apr 15, 2014
  • Open discussion of the results: Apr 16-22, 2014
  • Final results: Apr 30, 2014

Preparatory Phase (Concluded)

Participation in the preparatory phase is open to all candidates and is not compulsory, provided that at least one user of the working language completes the project UGO-A1.

UGO-A1

The first phase of the III UNL Olympiad is the project UGO-A1, which is open and funded for all languages. This project will set the corpus and the reference for UNLization and NLization. It consists of 250 UNL graphs that must be NLized (i.e., generated into natural language, manually). This process must be done only once for each language, i.e., it is not necessary (nor possible) that all users address the 250 UNL graphs. The progress report of the project UGO-A1 with the number of available entries is available at UNLWEB>PROJECTS>UGO-A1>PROGRESS REPORT.

Open Discussion

The second phase of the III UNL Olympiad will consist in an open discussion of the results of the project UGO-A1. All users will be able to propose the inclusion, suppression or modification of the NLizations proposed to the UNL graphs, in order to avoid any biases and privileges for specific users.

Official Release

The official corpus and set of languages, resulting from the preparatory phases, was released on Feb 17th, 2014.

Competition (Concluded)

Goals

The III UNL Olympiad has two main goals:

  • To prepare the dictionaries and grammars for UNLizing, with IAN, the corpus UGO-A1; and/or
  • To prepare the dictionaries and grammars for NLizing, with EUGENE, the corpus UGO-A1.

UNLization is the process of representing, into UNL, the information conveyed by a natural language document. NLization, conversely, is the process of representing, in natural language, the information conveyed by a UNL document. These are done, respectively, with IAN and EUGENE, which are engines available at the UNLdev.

Modalities

The competition is organised in two modalities:

  • Best UNLization Grammar for <LANGUAGE>
  • Best NLization Grammar for <LANGUAGE>

Where <LANGUAGE> is one of the languages participating in this Olympiad (see the complete list below).
Candidates may participate in one or two modalities, i.e., they may work with the UNLization grammar, with the NLization grammar, or with both.
Candidates may also participate in one or more languages, provided that they belong to the official list.

Prizes

Prizes are awarded to the best grammars of each modality (UNLization and NLization) for each language[1]:

  • 1st place: Gold Medal
  • 2nd place: Silver Medal
  • 3rd place: Bronze Medal

Additionally, the 10 best UNLization grammars among all languages and the 10 best NLization grammars among all languages will be awarded USD500.00 each.[2]

Corpus

The official corpus is available for download at UNLWEB>UNLARIUM>CORPUS>UGO-A1>EXPORT.

Languages

The languages participating in the III UNL Olympiad are the following[3]:

  • Afrikaans
  • Bulgarian
  • Chinese
  • Estonian
  • Georgian
  • Greek (Modern)
  • Hindi
  • Kannada
  • Khmer
  • Malay
  • Nepali
  • Panjabi
  • Persian
  • Portuguese
  • Russian
  • Sinhala
  • Slovenian
  • Telugu
  • Ukrainian
  • Vietnamese

Instructions

  1. The competition is free and open to any participant, but it is limited to the set of languages listed above.
  2. In order to apply, candidates must upload the grammar and dictionary files to www.unlweb.net/unlversity until the deadline (i.e., March 30, 2014)
  3. The corpus must be extracted from the UNLarium (at UNLWEB>UNLARIUM>CORPUS>UGO-A1>EXPORT) and may not undergo any change. The goal of the UNLization modality is to UNLize ALL sentences from the NL corpus; the goal of the NLization modality is to NLize ALL graphs from the UNL corpus.[4]
  4. The dictionary files must comply with the Dictionary Specs and may only bring features present in the Tagset. They should not contain temporary words.
  5. The grammar files must comply with the Grammar Specs and must be as generic possible. They should not target the specific sentences of the corpus, but the general structures presented there.
  6. The F-measure of the grammars must be equal or greater than 0.9[5].
  7. The files must be original. Grammars whose similarity proves to go beyond any reasonable doubt will be discarded, unless provided by the same author (for different languages).
  8. Manuals, instructions, examples and samples of grammars may be found at UCA1.[6]

Evaluation

Grammars will be evaluated and ranked according to the following criteria:

  • Best F-measure
  • Scalability (i.e., extendibility, or the capacity of being reused to other corpora), in case of grammars with the same F-Measure
  • Date of submission, in case of grammars with the same F-Measure and equally scalable

Final Results

UNLization

General Position Language Pair Author F-Measure Submission Date Medal Files
Dictionary T-Grammar D-Grammar Output
1 slv>unl Grega Milharcic 1.000 07/03/2014 GOLD [1] [2] [3] [4]
2 pan>unl Parteek Kumar 0.990 29/03/2014 GOLD [5] [6] [7] [8]
3 bul>unl Yordanka Stancheva 0.976 21/03/2014 GOLD [9] [10] [11] [12]
4 ukr>unl Sergiy Prots 0.946 25/03/2014 GOLD [13] [14] [15] [16]
5 rus>unl Sergiy Prots 0.942 23/03/2014 GOLD [17] [18] [19] [20]

NLization

General Position Language Pair Author F-Measure Submission Date Medal Files
Dictionary T-Grammar D-Grammar Output
1 unl>slv Grega Milharcic 1.000 07/03/2014 GOLD [21] [22] [23] [24]
2 unl>bul Yordanka Stancheva 1.000 21/03/2014 GOLD [25] [26] [27] [28]
3 unl>pan Parteek Kumar 0.996 30/03/2014 GOLD [29] [30] [31] [32]
4 unl>per Maryam Faal Hamedanchi 0.994 29/03/2014 GOLD [33] [34] [35] [36]
5 unl>ukr Sergiy Prots 0.957 30/03/2014 GOLD [37] [38] [39] [40]
6 unl>rus Sergiy Prots 0.916 29/03/2014 GOLD [41] [42] [43] [44]

Notes

  1. This means that for each language there will be awarded up to 6 prizes: Best UNLization Grammar, Second Best UNLization Grammar, Third Best UNLization Grammar, Best NLization Grammar, Second Best UNLization Grammar and Third Best NLization Grammar
  2. The value of USD500.00 will be paid only to the 10 best UNLization or NLization grammars in general, and not to the 10 best UNLization/NLizations of each language.
  3. Those are the languages that finished UGO-A1 on time.
  4. The goal of the Olympiad is to provide ONE POSSIBLE MAPPING for each structure, i.e., to map each natural language to at least one valid UNL graph, and to map the UNL graph into at least one valid NL sentence. This means that, if the natural language is ambiguous, and may be mapped into several different UNL graphs, the UNLization will be considered valid if the resulting UNL graph is one of the possible candidates according to the corpus. Conversely, whenever the same UNL graph may be mapped into several different NL sentences, the NLization is considered valid if the resulting NL sentence is one of the possible mappings according to the corpus.
  5. The F-measure may be calculate at UNLWEB>UNLARIUM>TOOLS>F-MEASURE
  6. The corpus UC-A1, although similar, is not the same as UGO-A1, and the samples do not cover all cases.
Software