IX UNL School

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Corpus)
(Corpus)
Line 56: Line 56:
 
*UNL
 
*UNL
 
**[http://www.unlweb.net/resources/mumbai2012/corpus50_unl.txt Corpus 50] Training corpus in UNL (50 sentences), to be used as the input for EUGENE
 
**[http://www.unlweb.net/resources/mumbai2012/corpus50_unl.txt Corpus 50] Training corpus in UNL (50 sentences), to be used as the input for EUGENE
*[http://www.unlweb.net/resources/geneva2012/corpus_unl.txt Corpus 500], Experimental corpus in UNL (500 sentences), to be used as the input for EUGENE
+
**[http://www.unlweb.net/resources/geneva2012/corpus_unl.txt Corpus 500], Experimental corpus in UNL (500 sentences), to be used as the input for EUGENE
*Corpus 500 according to the complexity of the graphs (the same as above, but split in different files)
+
  
  
 +
*Corpus 500 according to the complexity of the graphs (the same as above, but split in different files)
 
{| border="1" cellpadding="2" align=center
 
{| border="1" cellpadding="2" align=center
 
|+Corpus
 
|+Corpus
Line 66: Line 66:
 
!Analysis (English original)
 
!Analysis (English original)
 
!Generation (UNL)
 
!Generation (UNL)
!Word list (English original)
 
 
|-
 
|-
 +
|0
 
|Training Corpus (Corpus 50)
 
|Training Corpus (Corpus 50)
 
|[http://www.unlweb.net/resources/mumbai2012/corpus50_eng.txt Corpus 50]
 
|[http://www.unlweb.net/resources/mumbai2012/corpus50_eng.txt Corpus 50]
 
|[http://www.unlweb.net/resources/mumbai2012/corpus50_unl.txt Corpus 50]
 
|[http://www.unlweb.net/resources/mumbai2012/corpus50_unl.txt Corpus 50]
|
+
|-
 
|1
 
|1
 
|Temporary entries
 
|Temporary entries
 
|[http://www.unlweb.net/resources/geneva2012/temp_org.txt temp_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/temp_org.txt temp_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/temp_unl.txt temp_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/temp_unl.txt temp_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/temp_dic.txt temp_dic.txt]
 
 
|-
 
|-
 
|2
 
|2
Line 82: Line 81:
 
|[http://www.unlweb.net/resources/geneva2012/attribute0_org.txt attribute0_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/attribute0_org.txt attribute0_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/attribute0_unl.txt attribute0_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/attribute0_unl.txt attribute0_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/attribute0_dic.txt attribute0_dic.txt]
 
 
|-
 
|-
 
|3
 
|3
Line 88: Line 86:
 
|[http://www.unlweb.net/resources/geneva2012/attribute1_org.txt attribute1_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/attribute1_org.txt attribute1_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/attribute1_unl.txt attribute1_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/attribute1_unl.txt attribute1_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/attribute1_dic.txt attribute1_dic.txt]
 
 
|-
 
|-
 
|4
 
|4
Line 94: Line 91:
 
|[http://www.unlweb.net/resources/geneva2012/attribute2_org.txt attribute2_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/attribute2_org.txt attribute2_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/attribute2_unl.txt attribute2_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/attribute2_unl.txt attribute2_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/attribute2_dic.txt attribute2_dic.txt]
 
 
|-
 
|-
 
|5
 
|5
Line 100: Line 96:
 
|[http://www.unlweb.net/resources/geneva2012/attribute3_org.txt attribute3_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/attribute3_org.txt attribute3_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/attribute3_unl.txt attribute3_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/attribute3_unl.txt attribute3_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/attribute3_dic.txt attribute3_dic.txt]
 
 
|-
 
|-
 
|6
 
|6
Line 106: Line 101:
 
|[http://www.unlweb.net/resources/geneva2012/relation1_org.txt relation1_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation1_org.txt relation1_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation1_unl.txt relation1_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation1_unl.txt relation1_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/relation1_dic.txt relation1_dic.txt]
 
 
|-
 
|-
 
|7
 
|7
Line 112: Line 106:
 
|[http://www.unlweb.net/resources/geneva2012/relation2_org.txt relation2_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation2_org.txt relation2_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation2_unl.txt relation2_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation2_unl.txt relation2_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/relation2_dic.txt relation2_dic.txt]
 
 
|-
 
|-
 
|8
 
|8
Line 118: Line 111:
 
|[http://www.unlweb.net/resources/geneva2012/relation3_org.txt relation3_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation3_org.txt relation3_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation3_unl.txt relation3_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation3_unl.txt relation3_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/relation3_dic.txt relation3_dic.txt]
 
 
|-
 
|-
 
|9
 
|9
Line 124: Line 116:
 
|[http://www.unlweb.net/resources/geneva2012/relation4_org.txt relation4_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation4_org.txt relation4_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation4_unl.txt relation4_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation4_unl.txt relation4_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/relation4_dic.txt relation4_dic.txt]
 
 
|-
 
|-
 
|10
 
|10
Line 130: Line 121:
 
|[http://www.unlweb.net/resources/geneva2012/relation5_org.txt relation5_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation5_org.txt relation5_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation5_unl.txt relation5_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation5_unl.txt relation5_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/relation5_dic.txt relation5_dic.txt]
 
 
|-
 
|-
 
|11
 
|11
Line 136: Line 126:
 
|[http://www.unlweb.net/resources/geneva2012/relation6_org.txt relation6_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation6_org.txt relation6_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation6_unl.txt relation6_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relation6_unl.txt relation6_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/relation6_dic.txt relation6_dic.txt]
 
 
|-
 
|-
 
|12
 
|12
Line 142: Line 131:
 
|[http://www.unlweb.net/resources/geneva2012/numbers.txt numbers_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/numbers.txt numbers_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/numbers.txt numbers_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/numbers.txt numbers_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/numbers.txt numbers_dic.txt]
 
 
|-
 
|-
 
|13
 
|13
Line 148: Line 136:
 
|[http://www.unlweb.net/resources/geneva2012/time.txt time_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/time.txt time_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/time.txt time_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/time.txt time_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/time.txt time_dic.txt]
 
 
|-
 
|-
 
|14
 
|14
Line 154: Line 141:
 
|[http://www.unlweb.net/resources/geneva2012/relatives.txt relatives_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relatives.txt relatives_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relatives.txt relatives_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/relatives.txt relatives_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/relatives.txt relatives_dic.txt]
 
 
|-
 
|-
 
|15
 
|15
Line 160: Line 146:
 
|[http://www.unlweb.net/resources/geneva2012/problems.txt problems_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/problems.txt problems_org.txt]
 
|[http://www.unlweb.net/resources/geneva2012/problems.txt problems_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/problems.txt problems_unl.txt]
|[http://www.unlweb.net/resources/geneva2012/problems.txt problems_dic.txt]
 
 
|}
 
|}
  

Revision as of 21:26, 29 June 2012

Contents

Goals

  • To build the basic modules of a NL-UNL (analysis) grammar
  • To build the basic modules of a UNL-NL (generation) grammar

Slides

Files

Methodology

The following activities must be accomplished during the workshop.

Corpus
  1. Translate the 50 sentences of Corpus50_eng.txt into your native language. Be as close as possible to the original.
  2. Save the translated text (without the English original) in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>IAN>NL FILES.
  3. Upload the file Corpus50_unl.txt to UNLWEB>UNLDEV>PROJECTS>EUGENE>UNL DOCUMENTS
NL-UNL Dictionary (Analysis)
  1. Extract the word list (i.e., the set of all distinct word forms) appearing in your translation of the Corpus 50
  2. Create the NL-UNL dictionary for all the word forms following the English model available at English Analysis Dictionary 50. Use only the tags available at the tagset. For further information on the dictionary structure, see Dictionary Specs.
  3. Save the NL-UNL dictionary in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>IAN>DICTIONARIES.
UNL-NL Dictionary (Generation)
  1. Localize the UNL-NL dictionary available at English Generation Dictionary 50. The localized version must reflect the word list of your translated corpus. Use only the tags available at the tagset. For further information on the dictionary structure, see Dictionary Specs.
  2. Save the UNL-NL dictionary in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>EUGENE>DICTIONARIES.
Morphology
  1. Export the inflectional grammar of your language from UNLARIUM>GRAMMAR>[YOUR LOCALE]>EXPORT. If the grammar of your language is not available yet, you may:
    1. Provide it through the UNLarium (only for users approved in CLEA700); or
    2. Create the inflectional paradigms only for the inflected forms appearing in the UNL-NL dictionary. In that case, follow the model available at English Inflectional Grammar. The documentation of the English grammar is available at English Inflectional Grammar (only for reference). For further information, see Inflectional paradigms.
  2. Save the inflectional grammar in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>EUGENE>RULES.
NL-UNL (Analysis) Grammar
  1. Provide the NL-UNL (analysis) grammar necessary to analyze, in UNL, the natural language sentences of the translated corpus.
  2. Save the NL-UNL grammar in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>IAN>RULES.
  3. Test the grammar against the corpus and provide the necessary changes
UNL-NL (Generation) Grammar
  1. Provide the UNL-NL (generation) grammar necessary to generate natural language sentences from the UNL corpus.
  2. Save the UNL-NL grammar in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>EUGENE>RULES.
  3. Test the grammar against the corpus and provide the necessary changes

Follow-up

In order to get the bonus and apply to the intermediate-level workshop, the participants are requested to repeat the steps above to the Corpus 500. The instructions are available at Day #5.

Corpus

  • English
    • Corpus 50 Training corpus in English (50 sentences), to be manually translated to the target languages, in order to be used as the input for IAN
    • Corpus 500 Experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for IAN
  • UNL
    • Corpus 50 Training corpus in UNL (50 sentences), to be used as the input for EUGENE
    • Corpus 500, Experimental corpus in UNL (500 sentences), to be used as the input for EUGENE


  • Corpus 500 according to the complexity of the graphs (the same as above, but split in different files)
Corpus
Order Description Analysis (English original) Generation (UNL)
0 Training Corpus (Corpus 50) Corpus 50 Corpus 50
1 Temporary entries temp_org.txt temp_unl.txt
2 Entries with no attribute or relation attribute0_org.txt attribute0_unl.txt
3 one-attribute entries attribute1_org.txt attribute1_unl.txt
4 two-attribute entries attribute2_org.txt attribute2_unl.txt
5 three-attribute entries attribute3_org.txt attribute3_unl.txt
6 one-relation entries relation1_org.txt relation1_unl.txt
7 two-relation entries relation2_org.txt relation2_unl.txt
8 three-relation entries relation3_org.txt relation3_unl.txt
9 four-relation entries relation4_org.txt relation4_unl.txt
10 five-relation entries relation5_org.txt relation5_unl.txt
11 six-relation entries relation6_org.txt relation6_unl.txt
12 numbers and numerals numbers_org.txt numbers_unl.txt
13 expressions of time time_org.txt time_unl.txt
14 relative clauses relatives_org.txt relatives_unl.txt
15 special issues problems_org.txt problems_unl.txt

Dictionary

  • Analysis
    • Corpus 50 Sample of the English analysis dictionary for the entries appearing in the Corpus 50
  • Generation
    • Corpus 50 Sample of the English generation dictionary for the entries appearing in the Corpus 50

Participants

  • Aadil Kak (Kashmiri)
  • Ankur Aher (Marathi)
  • Arulmozi Selvaraj (Tamil)
  • Balaji Jagan (Tamil)
  • Brijesh Bhatt (Gujarati)
  • Jyotesh Choudhari (Marathi)
  • Kashyap Popat (Gujarati)
  • Laishram Rishikanta Meitei (Manipuri)
  • Navanath Saharia (Assamese)
  • Niladri Sekhar Dash (Bengali)
  • Pallab Bhattacharjee
  • Parameswarappa S (Kannada)
  • Parteek Kumar (Punjabi)
  • Pinkey Nainwani (Sindhi)
  • Pradnya Mohite (Marathi)
  • Raj Dabre (Marathi)
  • Ranjan Das (Oriya)
  • Renuka Devi (Telugu)
  • Sachin Pawar (Marathi)
  • Samir J. Sohoni (Sanskrit)
  • Shaikh Samiulla Z. (Marathi)
  • Shailendra Kumar (Hindi)
  • Sreelekha S. (Malayalam)
  • Sudha Bhingardire (Marathi)
  • Swapnil S. Ghuge (Marathi)
  • Tanuja Ajotikar (Sanskrit)
  • Trupti Nisar (Gujarati)

Schedule

Jun 11th, 2012 - Monday
09:00-10:00 Introduction
10:00-12:00 I – Corpus
14:00-17:00 II – UNL-NL dictionary
Jun 12th, 2012 - Tuesday
09:00-12:00 III – Morphology (inflectional paradigms)
14:00-17:00 IV – NL dictionary
Jun 13th, 2012- Wednesday
09:00-12:00 V – UNL-NL grammar (I)
14:00-17:00 V – UNL-NL grammar (II)
Jun 14th, 2012 - Thursday
09:00-12:00 VI – NL-UNL grammar (I)
14:00-17:00 VI – NL-UNL grammar (II)
Jun 15th, 2012 - Friday
09:00-12:00 Evaluation
14:00-17:00 Discussion

Venue

SIC 301
Kanwal Rekhi Building
IIT Bombay
Mumbai, India

Local Organization

  • Pushpak Bhattacharyya
  • Deepak D Jagtap

Instructors

  • Ronaldo Martins (UNDL Foundation)
  • Sameh Alansary (University of Alexandria)
Software