FoR-UNL

From UNL Wiki
Revision as of 20:34, 17 September 2012 by Martins (Talk | contribs)
Jump to: navigation, search

FRAU (FRAmework of reference for UNL) is a guideline used to describe achievements of natural languages in relation to UNL. It was inspired by the Common European Framework of Reference for Languages (CEFR), and its main goal is to provide a method for assessing the availability and quality of natural language resources inside the UNL framework.

Reference Levels

FRAU divides languages into three broad divisions which can be divided into six levels:

  • A - Basic Level
    • A1 - Breakthrough or beginner
    • A2 - Waystage or elementary
  • B - Intermediate Level
    • B1 - Threshold or intermediate
    • B2 - Vantage or upper intermediate
  • C - Advanced Level
    • C1 - Effective Operational
    • C2 - Mastery

Descriptors

The descriptors below inform what is required for a language to be classified in each level:

Level UNL-NL Dictionary
(entries)
NL-UNL Dictionary
(entries)
UNL-NL Grammar
(sentences)
NL-UNL Grammar
(sentences)
A1 2,000 2,000 RC-A1 500
A2 5,000 5,000 RC-A2 1,000
B1 10,000 10,000 RC-B1 2,000
B2 20,000 20,000 RC-B2 3,000
C1 35,000 35,000 RC-C1 5,000
C2 50,000 50,000 RC-C2 8,000

Where:

  • The numbers are cumulative. The number of entries and sentences required for the level C2 include the entries and sentences of the lower levels (i.e., in order to go from A1 to A2, a language must provide 3,000 new dictionary entries for each dictionary and address new 500 sentences in each direction).
  • UNL-NL Dictionary is the number of UW's addressed in the UNL-NL dictionary according to the frequency of use. For instance: in order to achieve the level A1, languages must have addressed the 2,000 most frequent UW's of the UNL Dictionary (i.e., they should have completed MIR A1)
  • NL-UNL Dictionary is the number of natural language lemmas addressed in the NL-UNL dictionary according to the frequency of use. For instance: in order to achieve the level A1, languages must have addressed their 2,000 most frequent lemmas (i.e., they should have completed BRUNO A1)
  • UNL-NL Grammar is the UNL Reference Corpus that the languages should be able to generate. For instance: in order to achieve the level A1, languages must have succeeded in generating the sentences from the corpus RC-A1 with EUGENE.
  • NL-UNL Grammar is the number of natural language sentences of the NL Reference Corpus that the languages are able to analyze. For instance: in order to achieve the level A1, languages must have succeeded in analyzing sentences representing the 500 most frequent syntactic constructions with IAN.

Assessment

A dictionary entry is considered valid if verified by at least one editor.
A grammar is considered valid if its F-measure for the reference corpus is equal or higher than 0.8.

Software