I UNL Olympiad

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Rules)
m (moved Olympiad to I UNL Olympiad over redirect)
 
(40 intermediate revisions by one user not shown)
Line 1: Line 1:
The UNL Olympiad is a series of competitions organised by the UNDL Foundation in order to foster the development of UNL-driven resources (dictionaries, grammars and corpora). The first edition of the Olympiad is devoted to the development of grammars for the corpus [[UC-A1]], comprising 100 sentences. The competition is open to any participant, and the deadline is January 30th, 2013.  
+
The UNL Olympiad is a series of competitions organised by the UNDL Foundation in order to foster the development of UNL-driven resources (dictionaries, grammars and corpora). The first edition of the Olympiad is devoted to the development of grammars for the corpus [[UC-A1]], comprising 100 sentences. The competition is open to any participant, and the deadline is February 15th, 2013.  
  
 
== Important dates ==
 
== Important dates ==
*October 30th, 2012: Call for Participation
+
*<strike>November 15th, 2012: Call for Participation</strike>
*January 30th, 2013: Deadline for submitting the grammars
+
*<strike>February 15th, 2013: Deadline for submitting the files</strike>
  
== Categories ==
+
== Modalities ==
 
The competition is organised in two modalities:
 
The competition is organised in two modalities:
 
*Best UNLization Grammar for <LANGUAGE>
 
*Best UNLization Grammar for <LANGUAGE>
 
*Best NLization Grammar for <LANGUAGE>
 
*Best NLization Grammar for <LANGUAGE>
Where <LANGUAGE> is one of the languages participating in this Olympiad (see the complete list below).
+
Where <LANGUAGE> is one of the languages participating in this Olympiad (see the complete list below).<br />
 +
Candidates may participate in one or two modalities, i.e., they may work with the UNLization grammar, with the NLization grammar, or with both.<br />
 +
Candidates may also participate in one or more languages, provided that they belong to the list below.
 +
 
 +
== Prizes ==
 +
Prizes are awarded to the best grammars of each modality (UNLization and NLization) for each language<ref>This means that for each language there will be awarded up to 6 prizes: Best UNLization Grammar, Second Best UNLization Grammar, Third Best UNLization Grammar, Best NLization Grammar, Second Best UNLization Grammar and Third Best NLization Grammar</ref>:
 +
*1st place: Gold Medal and USD500.00
 +
*2nd place: Silver Medal
 +
*3rd place: Bronze Medal
 +
Additionally, the authors of the three best UNLization Grammars among all languages and the authors of the three best NLization Grammars among all languages will also be invited to participate in the next intermediate-level grammar workshop, to be held in Geneva, Switzerland, on May 2013.
 +
 
 +
== Registration ==
 +
Candidates must be registered at the [http://www.unlweb.net UNLweb]. Participation is open and free, and the registration to the Olympiad is done by sending the following files to r.martins@undlfoundation.org until 23:59:59 (UTC) of February 15th, 2013.
 +
*For the participants working with the UNLization grammar (IAN):
 +
**UCA1_<LID>.txt, with the human translation, to the target language, of the sentences of the Corpus UC-A1;
 +
**<LID>_unl_dic.txt, with the natural language analysis dictionary used to UNLize the translated version of the Corpus UC-A1;
 +
**<LID>_unl_tgrammar.txt, with the transformation grammar used to UNLize the translated version of the Corpus UC-A1;
 +
**<LID>_unl_dgrammar.txt, with the disambiguation grammar used to UNLize the translated version of the Corpus UC-A1;
 +
**<LID>_unl_output.txt, with the output provided by IAN
 +
*For the participants working with the NLization grammar (EUGENE)
 +
**UCA1_<LID>.txt, with the human translation, to the target language, of the sentences of the Corpus UC-A1;
 +
**unl_<LID>_dic.txt, with the natural language generation dictionary used to NLize the UNL version of Corpus UC-A1;
 +
**unl_<LID>_tgrammar.txt, with the transformation grammar used to NLize the UNL version of the Corpus UC-A1;
 +
**unl_<LID>_dgrammar.txt, with the disambiguation grammar used to NLize the UNL version of the Corpus UC-A1;
 +
**unl_<LID>_output.txt, with the output provided by EUGENE
 +
Where <LID> must be replaced by the three-character language according to [http://en.wikipedia.org/wiki/List_of_ISO_639-3_codes ISO 639-3].<ref>For instance, the files to be provided by Russian (code = "rus") must be UCA1_rus.txt, rus_unl_dic.txt, rus_unl_tgrammar.txt, etc.</ref>.<br />
 +
All files must be provided in UTF-8.
 +
 
 +
== Requisites ==
 +
The competition is free and open to any participant, but it is limited to the set of languages described below.<br />
 +
The files must comply with the following requisites:
 +
*The corpus must comply with the translation standards of the target language and should not be artificially translated in order to provoke better results.
 +
*The input corpus used in UNLization will be used as the reference corpus used to evaluate the NLization output.
 +
*The dictionary files must comply with the [[Dictionary Specs]] and may only bring features present in the [[Tagset]]. They should not contain temporary words.
 +
*The grammar files must comply with the [[Grammar Specs]] and must be as generic possible. They should not target the specific sentences of the corpus, but the general structures presented there.
 +
*The [[F-measure]] of the grammars must be equal or greater than 0.8.<ref>The F-measure may be calculate at UNLWEB>UNLARIUM>GRAMMAR>[LOCALE]>F-MEASURE</ref>
 +
*The files must be original. Grammars whose similarity proves to go beyond any reasonable doubt will be discarded, unless provided by the same author (for different languages).
 +
 
 +
== Evaluation ==
 +
Grammars will be evaluated and ranked according to the following criteria:
 +
*Best [[F-measure]]
 +
*Scalability (i.e., extendibility, or the capacity of being reused to other corpora), in case of grammars with the same F-Measure
 +
*Date of submission, in case of grammars with the same F-Measure and equally scalable
  
 
== Languages ==
 
== Languages ==
The I UNL Olympiad will be dedicated to the development of grammars for the following languages:
+
The I UNL Olympiad will be dedicated to the development of grammars for the following languages<ref>The choice of the languages was motivated by three criteria: 1) Languages for which we do not have the basic grammars yet; 2) Languages that have participated in the recent UNL Schools; and 3) Languages that have already started the project MIR-A1.</ref>
 
*Assamese
 
*Assamese
 
*Baatonum
 
*Baatonum
Line 29: Line 71:
 
*Kashmiri
 
*Kashmiri
 
*Malayalam
 
*Malayalam
 +
*Manipuri
 
*Marathi
 
*Marathi
 
*Oriya
 
*Oriya
Line 46: Line 89:
 
*Ukrainian
 
*Ukrainian
  
== Prizes ==
+
Candidates may participate in one or more languages above.
Prizes are awarded to the best grammars of each modality (UNLization and NLization) for each language:
+
*1st place: Gold Medal and USD500.00
+
*2nd place: Silver Medal
+
*3rd place: Bronze Medal
+
Additionally, the authors of the three best UNLization Grammars among all languages and the authors of the three best NLization Grammars among all languages will also be invited to participate in the next intermediate-level grammar workshop, to be held in Geneva, Switzerland, on May 2013.
+
  
== Rules ==
+
== Results ==
#The competition is free and open to any participant.
+
#Candidates may participate in one or two modalities, i.e., they may work with the UNLization grammar, with the NLization grammar, or with both.
+
#Candidates may participate in more than one language.
+
  
== Registration ==
+
=== General results ===
No previous registration is required. Registration is done by sending the following files to olympiad@undlfoundation.org until 23:59:59 (UTC) of January 30th, 2013.
+
*Best UNLization Grammar:
*For the participants working with the UNLization grammar (IAN):
+
**Gold Metal: Grega Milharcic (hun,ita,nld,slk,ukr)
**UCA1_<LID>.txt, with the human translation, to the target language, of the sentences of the Corpus UC-A1;
+
**Silver Medal: Mihaela Ilioaia (rom)
**<LID>_unl_dic.txt, with the natural language analysis dictionary used to UNLize the translated version of the Corpus UC-A1;
+
**Bronze Medal: Sergiy Prots (pol)
**<LID>_unl_tgrammar.txt, with the transformation grammar used to UNLize the translated version of the Corpus UC-A1;
+
*Best NLization Grammar
**<LID>_unl_dgrammar.txt, with the disambiguation grammar used to UNLize the translated version of the Corpus UC-A1;
+
**Gold Medal: Grega Milharcic (ukr, nld)
**<LID>_unl_output.txt, with the output provided by IAN
+
**Silver Medal: Sergiy Prots (rus)
*For the participants working with the NLization grammar (EUGENE)
+
**Bronze Medal: Mihaela Ilioaia (rom)
**UCA1_<LID>.txt, with the human translation, to the target language, of the sentences of the Corpus UC-A1;
+
**unl_<LID>_dic.txt, with the natural language generation dictionary used to NLize the UNL version of Corpus UC-A1;
+
**unl_<LID>_tgrammar.txt, with the transformation grammar used to NLize the UNL version of the Corpus UC-A1;
+
**unl_<LID>_dgrammar.txt, with the disambiguation grammar used to NLize the UNL version of the Corpus UC-A1;
+
**unl_<LID>_output.txt, with the output provided by EUGENE
+
Where <LID> must be replaced by the three-character language according to [http://en.wikipedia.org/wiki/List_of_ISO_639-3_codes ISO 639-3].<ref>For instance, the files to be provided by Russian (code = "rus") must be UCA1_rus.txt, rus_unl_dic.txt, rus_unl_tgrammar.txt, etc.</ref>.<br />
+
All files must be provided in UTF-8.
+
  
== Requisites ==
+
=== Results by language pair* ===
The files must comply with the following requisites:
+
<nowiki>*</nowiki>Only for grammars whose F-measure are equal or higher than 0.8
*The corpus must comply with the translation standards of the target language and should not be artificially translated in order to provoke better results.
+
{|border=1 cellpadding=5
*The dictionary files must comply with the [[Dictionary Specs]] and may only bring features present in the [[Tagset]]. They should not contain temporary words.  
+
!Grammars
*The grammar files must comply with the [[Grammar Specs]] and must be as generic possible. They should not target only the corpus.  
+
!F-measure
*The [[F-Measure]] of the grammars must be equal or greater than 0.8.
+
!Author
 
+
!Position<br/>in the language pair
== Evaluation ==
+
!Files
Grammars will be evaluated and ranked according to the following criteria:
+
|-
*Best [[F-Measure]]
+
|align=center|bul>unl||align=center|0.873||align=center|Yordanka Stancheva||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/bul_unl.rar]
*Scalability, in case of grammars with the same F-Measure
+
|-
*Date of submission, in case of grammars with the same F-Measure and equally scalable
+
|align=center|hun>unl||align=center|1.000||align=center|Grega Milharcic||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/hun_unl.rar]
 +
|-
 +
|align=center|ita>unl||align=center|1.000||align=center|Grega Milharcic||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/ita_unl.rar]
 +
|-
 +
|align=center|nld>unl||align=center|1.000||align=center|Grega Milharcic||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/nld_unl.rar]
 +
|-
 +
|align=center|ori>unl||align=center|0.840||align=center|Ranjan Das||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/ori_unl.rar]
 +
|-
 +
|align=center|pol>unl||align=center|0.920||align=center|Sergiy Prots||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/pol_unl.rar]
 +
|-
 +
|align=center|rom>unl||align=center|0.940||align=center|Mihaela Ilioaia||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/rom_unl.rar]
 +
|-
 +
|align=center|rus>unl||align=center|0.880||align=center|Sergiy Prots||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/rus_unl.rar]
 +
|-
 +
|align=center|slk>unl||align=center|0.970||align=center|Grega Milharcic||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/slk_unl.rar]
 +
|-
 +
|align=center|ukr>unl||align=center|0.970||align=center|Grega Milharcic||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/ukr_unl_1.rar]
 +
|-
 +
|align=center|ukr>unl||align=center|0.880||align=center|Sergiy Prots||align=center|Silver Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/ukr_unl_2.rar]
 +
|-
 +
|align=center|unl>hun||align=center|0.930||align=center|Grega Milharcic||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/unl_hun.rar]
 +
|-
 +
|align=center|unl>ita||align=center|0.930||align=center|Grega Milharcic||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammars/UCA1/unl_ita.rar]
 +
|-
 +
|align=center|unl>nld||align=center|0.950||align=center|Grega Milharcic||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/unl_nld.rar]
 +
|-
 +
|align=center|unl>pol||align=center|0.920||align=center|Sergiy Prots||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/unl_pol.rar]
 +
|-
 +
|align=center|unl>rom||align=center|0.900||align=center|Mihaela Ilioaia||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/unl_rom.rar]
 +
|-
 +
|align=center|unl>rus||align=center|0.940||align=center|Sergiy Prots||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/unl_rus.rar]
 +
|-
 +
|align=center|unl>slk||align=center|0.930||align=center|Grega Milharcic||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/unl_slk.rar]
 +
|-
 +
|align=center|unl>ukr||align=center|0.970||align=center|Grega Milharcic||align=center|Gold Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/unl_ukr_1.rar]
 +
|-
 +
|align=center|unl>ukr||align=center|0.940||align=center|Sergiy Prots||align=center|Silver Medal||align=center|[http://www.unlweb.net/resources/grammar/UCA1/unl_ukr_2.rar]
 +
|}
  
 
== Notes ==
 
== Notes ==
 
<references />
 
<references />
 
== Instructions ==
 
 
 
 
 
 
 
The authors of the grammars with the best [[F-measure]]s for each language will receive medals.  the prize of USD500.00 and the right to participate in the
 

Latest revision as of 13:40, 20 March 2013

The UNL Olympiad is a series of competitions organised by the UNDL Foundation in order to foster the development of UNL-driven resources (dictionaries, grammars and corpora). The first edition of the Olympiad is devoted to the development of grammars for the corpus UC-A1, comprising 100 sentences. The competition is open to any participant, and the deadline is February 15th, 2013.

Contents

Important dates

  • November 15th, 2012: Call for Participation
  • February 15th, 2013: Deadline for submitting the files

Modalities

The competition is organised in two modalities:

  • Best UNLization Grammar for <LANGUAGE>
  • Best NLization Grammar for <LANGUAGE>

Where <LANGUAGE> is one of the languages participating in this Olympiad (see the complete list below).
Candidates may participate in one or two modalities, i.e., they may work with the UNLization grammar, with the NLization grammar, or with both.
Candidates may also participate in one or more languages, provided that they belong to the list below.

Prizes

Prizes are awarded to the best grammars of each modality (UNLization and NLization) for each language[1]:

  • 1st place: Gold Medal and USD500.00
  • 2nd place: Silver Medal
  • 3rd place: Bronze Medal

Additionally, the authors of the three best UNLization Grammars among all languages and the authors of the three best NLization Grammars among all languages will also be invited to participate in the next intermediate-level grammar workshop, to be held in Geneva, Switzerland, on May 2013.

Registration

Candidates must be registered at the UNLweb. Participation is open and free, and the registration to the Olympiad is done by sending the following files to r.martins@undlfoundation.org until 23:59:59 (UTC) of February 15th, 2013.

  • For the participants working with the UNLization grammar (IAN):
    • UCA1_<LID>.txt, with the human translation, to the target language, of the sentences of the Corpus UC-A1;
    • <LID>_unl_dic.txt, with the natural language analysis dictionary used to UNLize the translated version of the Corpus UC-A1;
    • <LID>_unl_tgrammar.txt, with the transformation grammar used to UNLize the translated version of the Corpus UC-A1;
    • <LID>_unl_dgrammar.txt, with the disambiguation grammar used to UNLize the translated version of the Corpus UC-A1;
    • <LID>_unl_output.txt, with the output provided by IAN
  • For the participants working with the NLization grammar (EUGENE)
    • UCA1_<LID>.txt, with the human translation, to the target language, of the sentences of the Corpus UC-A1;
    • unl_<LID>_dic.txt, with the natural language generation dictionary used to NLize the UNL version of Corpus UC-A1;
    • unl_<LID>_tgrammar.txt, with the transformation grammar used to NLize the UNL version of the Corpus UC-A1;
    • unl_<LID>_dgrammar.txt, with the disambiguation grammar used to NLize the UNL version of the Corpus UC-A1;
    • unl_<LID>_output.txt, with the output provided by EUGENE

Where <LID> must be replaced by the three-character language according to ISO 639-3.[2].
All files must be provided in UTF-8.

Requisites

The competition is free and open to any participant, but it is limited to the set of languages described below.
The files must comply with the following requisites:

  • The corpus must comply with the translation standards of the target language and should not be artificially translated in order to provoke better results.
  • The input corpus used in UNLization will be used as the reference corpus used to evaluate the NLization output.
  • The dictionary files must comply with the Dictionary Specs and may only bring features present in the Tagset. They should not contain temporary words.
  • The grammar files must comply with the Grammar Specs and must be as generic possible. They should not target the specific sentences of the corpus, but the general structures presented there.
  • The F-measure of the grammars must be equal or greater than 0.8.[3]
  • The files must be original. Grammars whose similarity proves to go beyond any reasonable doubt will be discarded, unless provided by the same author (for different languages).

Evaluation

Grammars will be evaluated and ranked according to the following criteria:

  • Best F-measure
  • Scalability (i.e., extendibility, or the capacity of being reused to other corpora), in case of grammars with the same F-Measure
  • Date of submission, in case of grammars with the same F-Measure and equally scalable

Languages

The I UNL Olympiad will be dedicated to the development of grammars for the following languages[4]

  • Assamese
  • Baatonum
  • Bengali
  • Bulgarian
  • Chinese
  • Croatian
  • Dutch
  • German
  • Gujarati
  • Hindi
  • Hungarian
  • Indonesian
  • Italian
  • Japanese
  • Kashmiri
  • Malayalam
  • Manipuri
  • Marathi
  • Oriya
  • Persian
  • Polish
  • Romanian
  • Russian
  • Sanskrit
  • Sindhi
  • Slovak
  • Swahili
  • Swedish
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian

Candidates may participate in one or more languages above.

Results

General results

  • Best UNLization Grammar:
    • Gold Metal: Grega Milharcic (hun,ita,nld,slk,ukr)
    • Silver Medal: Mihaela Ilioaia (rom)
    • Bronze Medal: Sergiy Prots (pol)
  • Best NLization Grammar
    • Gold Medal: Grega Milharcic (ukr, nld)
    • Silver Medal: Sergiy Prots (rus)
    • Bronze Medal: Mihaela Ilioaia (rom)

Results by language pair*

*Only for grammars whose F-measure are equal or higher than 0.8

Grammars F-measure Author Position
in the language pair
Files
bul>unl 0.873 Yordanka Stancheva Gold Medal [1]
hun>unl 1.000 Grega Milharcic Gold Medal [2]
ita>unl 1.000 Grega Milharcic Gold Medal [3]
nld>unl 1.000 Grega Milharcic Gold Medal [4]
ori>unl 0.840 Ranjan Das Gold Medal [5]
pol>unl 0.920 Sergiy Prots Gold Medal [6]
rom>unl 0.940 Mihaela Ilioaia Gold Medal [7]
rus>unl 0.880 Sergiy Prots Gold Medal [8]
slk>unl 0.970 Grega Milharcic Gold Medal [9]
ukr>unl 0.970 Grega Milharcic Gold Medal [10]
ukr>unl 0.880 Sergiy Prots Silver Medal [11]
unl>hun 0.930 Grega Milharcic Gold Medal [12]
unl>ita 0.930 Grega Milharcic Gold Medal [13]
unl>nld 0.950 Grega Milharcic Gold Medal [14]
unl>pol 0.920 Sergiy Prots Gold Medal [15]
unl>rom 0.900 Mihaela Ilioaia Gold Medal [16]
unl>rus 0.940 Sergiy Prots Gold Medal [17]
unl>slk 0.930 Grega Milharcic Gold Medal [18]
unl>ukr 0.970 Grega Milharcic Gold Medal [19]
unl>ukr 0.940 Sergiy Prots Silver Medal [20]

Notes

  1. This means that for each language there will be awarded up to 6 prizes: Best UNLization Grammar, Second Best UNLization Grammar, Third Best UNLization Grammar, Best NLization Grammar, Second Best UNLization Grammar and Third Best NLization Grammar
  2. For instance, the files to be provided by Russian (code = "rus") must be UCA1_rus.txt, rus_unl_dic.txt, rus_unl_tgrammar.txt, etc.
  3. The F-measure may be calculate at UNLWEB>UNLARIUM>GRAMMAR>[LOCALE]>F-MEASURE
  4. The choice of the languages was motivated by three criteria: 1) Languages for which we do not have the basic grammars yet; 2) Languages that have participated in the recent UNL Schools; and 3) Languages that have already started the project MIR-A1.
Software