CORNELIA
- Open-Class Word List (3,000 word forms)
- Corpus NC-A1
- Original corpus: 5-10 original articles from the Wikipedia about culture-specific subjects (minimum of 5,000 words), in separate files, in plain text format with UTF-8 encoding
- List of at least 1,000 noun phrases appearing in the corpus with the following characteristics:
- the length of the NP must be equal or greater than 2 words (one-word NP's must be excluded):
Geneva - NP's must not contain foreign words:
the city of Genève(note that "the city of Geneva" is OK) - NP's must be continuous (there cannot be any extra-content, e.g., parentheses, inside the NP):
the second most populous city in Switzerland (after Zurich)(note that the NP will be "the second most populous city in Switzerland") - NP's must not contain verbs, even when used as nouns, adjectives or adverbs:
French-speaking part of Switzerland,numerous international organizations, including the headquarters of many of the agencies of the United Nations and the Red Cross(in the latter case, there will be 2 NP's: "numerous international organizations" and "the headquarters... Red Cross") - NP's must be original (no change should be made to the original text from the Wikipedia)
- NP's must ignore nesting (only the longest NP must be considered): "the headquarters of many of the agencies of the United Nations and the Red Cross" must be treated as a single NP (the inner NP's, such as "the agencies of the United Nations and the Red Cross" must not be extracted from the longer NP)
- NP's must be unique (repetitions must be ignored)
- NP's must be provided one per line in a plain text file, with UTF-8 encoding.
- the length of the NP must be equal or greater than 2 words (one-word NP's must be excluded):
The completion of the post-workshop tasks is not mandatory but any intermediate-level workshop will only accept candidates having finished all A1 activities described in FoR-UNL.
Contents |
FOLLOW-UP
The following projects will be open upon the accomplishment of the post-workshop tasks
- BRUNO-A1 (open only for languages where number of subcategorization frames (all languages) > 15 and number of paradigms (inflectional languages) > 15): 2,000 entries (around 4,000 UNLdots)
- NC-A1: 1,000 entries (3,000 UNLdots)
ADDITIONAL MATERIAL
Open Class Word List
Extracted from the most frequent words in Wikipedia
Language | File |
---|---|
Arabic | ar_words.xls |
Armenian | hy_words.xls |
Bulgarian | bg_words.xls |
Chinese | zh_words.xls |
Kannada | kn_words.xls |
Khmer | km_words.xls |
Malay | ms_words.xls |
Punjabi | pa_words.xls |
Ukrainian | uk_words.xls |
NP Examples
original text | NP |
---|---|
Geneva is the second most populous city in Switzerland (after Zurich) and is the most populous city of Romandy, the French-speaking part of Switzerland. Situated where the Rhone exits Lake Geneva, it is the capital of the Republic and Canton of Geneva. The municipality (ville de Genève) has a population (as of March 2013) of 194,245, and the canton (République et Canton de Genève, which includes the city) has 472,530 residents. In 2007, the urban area, or agglomération franco-valdo-genevoise (Great Geneva or Grand Genève in French) had 1,240,000 inhabitants in 189 municipalities in both Switzerland and France. | the second most populous city in Switzerland |
SSS Examples
sentence | SSS |
---|---|
book | NH(book) |
the book | NS(book;the) |
beautiful book | NA(book;beautiful) |
book of John | NA(book;:01) PC:01(of;John) |
the book of John | NS(book;the) NA(book;:01) PC:01(of;John) |
the beautiful book of John | NS(book;the) NA(book;beautiful) NA(book;:01) PC:01(of;John) |
the book of Math of John | NS(book;the) NA(book;:01) PC:01(of;Math) NA(book;:02) PC:02(of;John) |
the book about the construction of Babel | NS(book;the) NA(book;:01) PC:01(about;:02) NS:02(construction;the) NA:02(construction;:03) PC:03(of;Babel) |
UNL Simplified Examples
sentence | UNL |
---|---|
book | book |
the book | book.@def |
beautiful book | mod(book;beautiful) |
book of John | pos(book;John) |
the book of John | pos(book.@def;John) |
the beautiful book of John | mod(book.@def;beautiful) pos(book.@def;John) |
the book of Math of John | cnt(book.@def;Math) pos(book.@def;John) |
the book about the construction of Babel | cnt(book.@def;:01) obj(construction.@def;Babel) |