IX UNL School

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Methodology)
Line 66: Line 66:
 
:09:00-12:00 Evaluation
 
:09:00-12:00 Evaluation
 
:14:00-17:00 Discussion
 
:14:00-17:00 Discussion
 +
 +
== Venue ==
 +
SIC 301
 +
Kanwal Rekhi Building
 +
IIT Bombay
 +
Mumbai, India
 +
 +
== Local Organization ==
 +
Dr. Pushpak Bhattacharyya
 +
Deepak D Jagtap
 +
Janardhan Singh
 +
Vijay P. Ambre
 +
 +
== Instructors ==
 +
Ronaldo Martins (UNDL Foundation)
 +
Sameh Alansary (University of Alexandria)

Revision as of 19:45, 29 May 2012

Contents

Goals

  • To build the basic modules of a NL-UNL (analysis) grammar
  • To build the basic modules of a UNL-NL (generation) grammar

Methodology

Corpus
  1. Translate the 50 sentences of Corpus50_eng.txt into your native language. Be as close as possible to the original.
  2. Save the translated text (without the English original) in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>IAN>NL FILES.
  3. Upload the file Corpus50_unl.txt to UNLWEB>UNLDEV>PROJECTS>EUGENE>UNL DOCUMENTS
NL-UNL Dictionary (Analysis)
  1. Extract the word list (i.e., the set of all distinct word forms) appearing in your translation of the Corpus 50
  2. Create the NL-UNL dictionary for all the word forms following the English model available at English Analysis Dictionary 50. Use only the tags available at the tagset. For further information on the dictionary structure, see Dictionary Specs.
  3. Save the NL-UNL dictionary in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>IAN>DICTIONARIES.
UNL-NL Dictionary (Generation)
  1. Localize the UNL-NL dictionary available at English Generation Dictionary 50. The localized version must reflect the word list of your translated corpus. Use only the tags available at the tagset. For further information on the dictionary structure, see Dictionary Specs.
  2. Save the UNL-NL dictionary in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>EUGENE>DICTIONARIES.
Morphology
  1. Export the inflectional grammar of your language from UNLARIUM>GRAMMAR>INFLECTIONAL PARADIGMS. If the grammar of your language is not available yet, create the inflectional paradigms only for the inflected forms appearing in the UNL-NL dictionary following the model available at English Inflectional Grammar. For further information, see Inflectional paradigms.
  2. Save the inflectional grammar in a plain text (.txt) file with UTF-8 encoding and upload it to UNLWEB>UNLDEV>PROJECTS>EUGENE>RULES.

Corpus

  • English
    • Corpus 50 Training corpus in English (50 sentences), to be manually translated to the target languages, in order to be used as the input for IAN
    • Corpus 500 Training corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for IAN (to be provided after the workshop)
  • UNL
    • Corpus 50 Training corpus in UNL (50 sentences), to be used as the input for EUGENE
    • Corpus 500 Training corpus in UNL (500 sentences), to be used as the input for EUGENE (to be provided after the workshop)

Dictionary

  • Analysis
    • Corpus 50 Sample of the English analysis dictionary for the entries appearing in the Corpus 50
  • Generation
    • Corpus 50 Sample of the English generation dictionary for the entries appearing in the Corpus 50

Participants

  • Aadil Kak (Kashmiri)
  • Arulmozi Selvaraj (Tamil)
  • Balaji Jagan (Tamil)
  • Laishram Rishikanta Meitei (Manipuri)
  • Navanath Saharia (Assamese)
  • Niladri Sekhar Dash (Bengali)
  • Parameswarappa S (Kannada)
  • Parteek Kumar (Punjabi)
  • Pinkey Nainwani (Sindhi)
  • Ranjan Das (Oriya)
  • Renuka Devi (Telugu)
  • Sachin Pawar (Marathi)
  • Shailendra Kumar (Hindi)
  • Trupti Nisar (Gujarati)

Schedule

Jun 11th, 2012 - Monday
09:00-10:00 Introduction
10:00-12:00 I – Corpus
14:00-17:00 II – UNL-NL dictionary
Jun 12th, 2012 - Tuesday
09:00-12:00 III – Morphology (inflectional paradigms)
14:00-17:00 IV – NL dictionary
Jun 13th, 2012- Wednesday
09:00-12:00 V – UNL-NL grammar (I)
14:00-17:00 V – UNL-NL grammar (II)
Jun 14th, 2012 - Thursday
09:00-12:00 VI – NL-UNL grammar (I)
14:00-17:00 VI – NL-UNL grammar (II)
Jun 15th, 2012 - Friday
09:00-12:00 Evaluation
14:00-17:00 Discussion

Venue

SIC 301 Kanwal Rekhi Building IIT Bombay Mumbai, India

Local Organization

Dr. Pushpak Bhattacharyya Deepak D Jagtap Janardhan Singh Vijay P. Ambre

Instructors

Ronaldo Martins (UNDL Foundation) Sameh Alansary (University of Alexandria)

Software