NADIA

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
The project NADIA ('''NA'''tural language '''DI'''ctionary for UNL-NL m'''A'''ppings is devoted to the creation of NL dictionaries for the entries derived from [[GD|generation dictionaries]].
+
The project NADIA ('''NA'''tural language '''DI'''ctionary for UNL-NL m'''A'''ppings is devoted to the creation of NL dictionaries for entries derived from [[GD|generation dictionaries]].
  
 
== Goal ==
 
== Goal ==
 
The project NADIA has two main goals:
 
The project NADIA has two main goals:
#To provide several word-to-concept monolingual databases (i.e., encoding or reader's dictionaries). These dictionaries are expected to be used in [[UNLization|language-based shallow UNLization]], i.e., in generating UNL graphs out of natural language documents, especially through [[IAN]].
+
#To review UNL-NL mappings provided in generation dictionary projects (such as [[MIR]]).
#To find concepts that are not enclosed in the WordNet3.0 and should be incorporated to the [[UNL Dictionary]].
+
#To provide morphological (such as gender, number, inflectional paradigm, etc) and syntactic information (transitivity, subcategorization frame, etc) to natural language entries derived from generation dictionaries
  
 
== The repository ==
 
== The repository ==
Line 35: Line 35:
  
 
== Methodology ==
 
== Methodology ==
NADIA is open to all languages, except to English, for which it is expected to be already finished. As in any NLization project, we expect users to provide the features for natural language lemmas that were linked to UW's. This process must take into consideration the following:
+
NADIA is open to all languages, except to English, for which it is expected to be already finished. As a derivative project, NADIA depends on the results of MIR, and it is open only when the language reaches at least 90% of the corresponding level in MIR<ref>For instance, NADIA-A1 for French is open only after French achieves 90% in MIR-A1.</ref>. As in any NLization project, users are expected to provide the features for natural language lemmas that were linked to UW's.  
*The UW always represent an '''open-class category''' (noun, adjective, adverb or verb). Prepositions, conjunctions, articles, interjections, etc. are not mapped into UW's, but must be included (and treated) in the NL-UNL Dictionary. On the other hand, all nouns, adjectives, adverbs and verbs must be associated to at least one UW. If the UW does not exist yet, it should be proposed to be incorporated to the UNL Dictionary.
+
 
*There should be as many lemmas as different '''morphological behavior''' (part-of-speech, gender, number, inflections, etc.). The word "book", in English, should correspond to two lemmas: "book" as a noun, and "book" as a verb. The noun "livre", in French, should correspond to two lemmas: "livre" as a noun masculine (="book"), and "livre" as a noun feminine (="pound"). The verb "haver", in Portuguese, should correspond to two lemmas: "haver" (auxiliary verb inflected in all verb forms) and "haver" (main verb inflected only in the 3rd person, i.e., defective).
+
*The same lemma may be associated to '''more than one UW''', i.e., lemmas should not be proliferated according to their semantic value (but only according to their morphological behavior). The noun "book", in English, should correspond to one single lemma, despite of its several possible meanings, which must be all associated to the same entry.
+
  
 
== Notes ==
 
== Notes ==
 
<references />
 
<references />

Revision as of 11:45, 7 August 2013

The project NADIA (NAtural language DIctionary for UNL-NL mAppings is devoted to the creation of NL dictionaries for entries derived from generation dictionaries.

Contents

Goal

The project NADIA has two main goals:

  1. To review UNL-NL mappings provided in generation dictionary projects (such as MIR).
  2. To provide morphological (such as gender, number, inflectional paradigm, etc) and syntactic information (transitivity, subcategorization frame, etc) to natural language entries derived from generation dictionaries

The repository

NADIA is language dependent. Every language has its own set of entries to be addressed. The list of entries is derived from generation dictionaries.

Structure

NADIA is divided into 6 different subprojects according to the source of the entries.

Source

</ref>

NADIA-A1 MIR-A1
NADIA-A2 MIR-A2
NADIA-B1 MIR-B1
NADIA-B2 MIR-B2
NADIA-C1 MIR-C1
NADIA-C2 MIR-C2

Methodology

NADIA is open to all languages, except to English, for which it is expected to be already finished. As a derivative project, NADIA depends on the results of MIR, and it is open only when the language reaches at least 90% of the corresponding level in MIR[1]. As in any NLization project, users are expected to provide the features for natural language lemmas that were linked to UW's.


Notes

  1. For instance, NADIA-A1 for French is open only after French achieves 90% in MIR-A1.
Software