IX UNL School

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
Line 5: Line 5:
 
== Methodology ==
 
== Methodology ==
 
;Corpus
 
;Corpus
#Translate the 50 sentences of [http://www.unlweb.net/resources/mumbai2012/corpus50_eng.txt Corpus 50] into your native language. Be as close as possible to the original.
+
#Translate the 50 sentences of [http://www.unlweb.net/resources/mumbai2012/corpus50_eng.txt Corpus50_eng.txt] into your native language. Be as close as possible to the original.
#Save the translated text (without the English original) in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>IAN>NL FILES.  
+
#Save the translated text (without the English original) in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>IAN>NL FILES.  
 +
#Upload the file [http://www.unlweb.net/resources/mumbai2012/corpus50_unl.txt Corpus50_unl.txt] to UNLWEB>UNLDEV>PROJECTS>EUGENE>UNL DOCUMENTS
 
;NL-UNL Dictionary (Analysis)
 
;NL-UNL Dictionary (Analysis)
 
#Extract the word list (i.e., the set of all distinct word forms) appearing in your translation of the Corpus 50
 
#Extract the word list (i.e., the set of all distinct word forms) appearing in your translation of the Corpus 50
 
#Create the NL-UNL dictionary for all the word forms following the English model available at [http://www.unlweb.net/resources/mumbai2012/eng50_dic_ana.txt English Analysis Dictionary 50]. Use only the tags available at the [[tagset]]. For further information on the dictionary structure, see [[Dictionary Specs]].
 
#Create the NL-UNL dictionary for all the word forms following the English model available at [http://www.unlweb.net/resources/mumbai2012/eng50_dic_ana.txt English Analysis Dictionary 50]. Use only the tags available at the [[tagset]]. For further information on the dictionary structure, see [[Dictionary Specs]].
#Save the NL-UNL dictionary in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>IAN>DICTIONARIES.  
+
#Save the NL-UNL dictionary in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>IAN>DICTIONARIES.  
 
;UNL-NL Dictionary (Generation)
 
;UNL-NL Dictionary (Generation)
 
#Localize the UNL-NL dictionary available at [http://www.unlweb.net/resources/mumbai2012/eng50_dic_gen.txt English Generation Dictionary 50]. The localized version must reflect the word list of your translated corpus. Use only the tags available at the [[tagset]]. For further information on the dictionary structure, see [[Dictionary Specs]].
 
#Localize the UNL-NL dictionary available at [http://www.unlweb.net/resources/mumbai2012/eng50_dic_gen.txt English Generation Dictionary 50]. The localized version must reflect the word list of your translated corpus. Use only the tags available at the [[tagset]]. For further information on the dictionary structure, see [[Dictionary Specs]].
#Save the UNL-NL dictionary in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>EUGENE>DICTIONARIES.  
+
#Save the UNL-NL dictionary in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>EUGENE>DICTIONARIES.  
 
;Morphology
 
;Morphology
 
#Export the inflectional grammar of your language from UNLARIUM>GRAMMAR>INFLECTIONAL PARADIGMS. If the grammar of your language is not available yet, create the inflectional paradigms for the inflectional entries appearing in the UNL-NL dictionary following the model available [http://www.unlweb.net/resources/mumbai2012/eng_ig.txt]. For further information, see [[Inflectional Paradigm]].
 
#Export the inflectional grammar of your language from UNLARIUM>GRAMMAR>INFLECTIONAL PARADIGMS. If the grammar of your language is not available yet, create the inflectional paradigms for the inflectional entries appearing in the UNL-NL dictionary following the model available [http://www.unlweb.net/resources/mumbai2012/eng_ig.txt]. For further information, see [[Inflectional Paradigm]].
#Save the inflectional grammar in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>EUGENE>RULES.  
+
#Save the inflectional grammar in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>EUGENE>RULES.  
  
 
== Corpus ==
 
== Corpus ==

Revision as of 20:31, 29 May 2012

Contents

Goals

  • To build the basic modules of a NL-UNL (analysis) grammar
  • To build the basic modules of a UNL-NL (generation) grammar

Methodology

Corpus
  1. Translate the 50 sentences of Corpus50_eng.txt into your native language. Be as close as possible to the original.
  2. Save the translated text (without the English original) in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>IAN>NL FILES.
  3. Upload the file Corpus50_unl.txt to UNLWEB>UNLDEV>PROJECTS>EUGENE>UNL DOCUMENTS
NL-UNL Dictionary (Analysis)
  1. Extract the word list (i.e., the set of all distinct word forms) appearing in your translation of the Corpus 50
  2. Create the NL-UNL dictionary for all the word forms following the English model available at English Analysis Dictionary 50. Use only the tags available at the tagset. For further information on the dictionary structure, see Dictionary Specs.
  3. Save the NL-UNL dictionary in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>IAN>DICTIONARIES.
UNL-NL Dictionary (Generation)
  1. Localize the UNL-NL dictionary available at English Generation Dictionary 50. The localized version must reflect the word list of your translated corpus. Use only the tags available at the tagset. For further information on the dictionary structure, see Dictionary Specs.
  2. Save the UNL-NL dictionary in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>EUGENE>DICTIONARIES.
Morphology
  1. Export the inflectional grammar of your language from UNLARIUM>GRAMMAR>INFLECTIONAL PARADIGMS. If the grammar of your language is not available yet, create the inflectional paradigms for the inflectional entries appearing in the UNL-NL dictionary following the model available [1]. For further information, see Inflectional Paradigm.
  2. Save the inflectional grammar in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>EUGENE>RULES.

Corpus

  • English
    • Corpus 50 Training corpus in English (50 sentences), to be manually translated to the target languages, in order to be used as the input for IAN
    • Corpus 500 Training corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for IAN (to be provided after the workshop)
  • UNL
    • Corpus 50 Training corpus in UNL (50 sentences), to be used as the input for EUGENE
    • Corpus 500 Training corpus in UNL (500 sentences), to be used as the input for EUGENE (to be provided after the workshop)

Dictionary

  • Analysis
    • Corpus 50 Sample of the English analysis dictionary for the entries appearing in the Corpus 50
  • Generation
    • Corpus 50 Sample of the English generation dictionary for the entries appearing in the Corpus 50


Participants

  • Aadil Kak (Kashmiri)
  • Arulmozi Selvaraj (Tamil)
  • Balaji Jagan (Tamil)
  • Laishram Rishikanta Meitei (Manipuri)
  • Navanath Saharia (Assamese)
  • Niladri Sekhar Dash (Bengali)
  • Parameswarappa S (Kannada)
  • Parteek Kumar (Punjabi)
  • Pinkey Nainwani (Sindhi)
  • Ranjan Das (Oriya)
  • Renuka Devi (Telugu)
  • Sachin Pawar (Marathi)
  • Shailendra Kumar (Hindi)
  • Trupti Nisar (Gujarati)

Schedule

Jun 11th, 2012 - Monday
09:00-10:00 Introduction
10:00-12:00 I – Corpus
14:00-17:00 II – UNL-NL dictionary
Jun 12th, 2012 - Tuesday
09:00-12:00 III – Morphology (inflectional paradigms)
14:00-17:00 IV – NL dictionary
Jun 13th, 2012- Wednesday
09:00-12:00 V – UNL-NL grammar (I)
14:00-17:00 V – UNL-NL grammar (II)
Jun 14th, 2012 - Thursday
09:00-12:00 VI – NL-UNL grammar (I)
14:00-17:00 VI – NL-UNL grammar (II)
Jun 15th, 2012 - Friday
09:00-12:00 Evaluation
14:00-17:00 Discussion
Software