Lexical structure

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
Line 19: Line 19:
 
}}
 
}}
  
 +
== Subwords ==
  
 
'''Subwords''' (SBW) are structures that do not have independent existence in the language and only appear together with other morphemes to form a lexeme. Subwords include affixes (such as "un-", "re-", "-ful", "-ness") and roots that do not occur alone (such as "rupt" in "interrupt", "disrupt", "corrupt", "rupture", etc).
 
'''Subwords''' (SBW) are structures that do not have independent existence in the language and only appear together with other morphemes to form a lexeme. Subwords include affixes (such as "un-", "re-", "-ful", "-ness") and roots that do not occur alone (such as "rupt" in "interrupt", "disrupt", "corrupt", "rupture", etc).
 +
 +
== Simple words ==
  
 
'''Simple Words''' (WRD) are the smallest (indivisible) lexemes in the semantic system of a language. They may consist of:
 
'''Simple Words''' (WRD) are the smallest (indivisible) lexemes in the semantic system of a language. They may consist of:
Line 28: Line 31:
  
 
Simple words may also include abbreviations (such as "ad", for advertisement, "dr." and "St." ).
 
Simple words may also include abbreviations (such as "ad", for advertisement, "dr." and "St." ).
 +
 +
In some languages, a given inflection may assume different forms. The feature ALT must be used for alternative forms.<br />
 +
In English, for instance, the word 'volcano' may have two different plural forms:
 +
*PLR:=volcanos;
 +
*PLR&ALT:=volcanoes;
 +
In case of more than one possible alternative form, the features ALT1, ALT2 and ALT3 must be used instead of ALT.<br />
 +
For instance, in Arabic the word 'elephant' has three plural forms, as indicated below:
 +
*PLR:=فِيَلة;
 +
*PLR&ALT1:=فُيُول;
 +
*PLR&ALT2:=أفْيال;
 +
 +
== Multiword expressions ==
  
 
'''[[Multiword Expression]]s''' (MTW) are lexical structures made up of a sequence of two or more lexemes. They can be concatenated ("darkroom", "skinhead") or isolated by hyphens ("blue-green", "African-American") or blank spaces ("round table", "part of speech"). Multiword expressions can be continuous ("get over") or discontinuous ("get <something> together"). They correspond to compounds ("fireman", "hardware"), phrases ("in spite of", "take into account"), idioms ("kick the bucket", "play cat and mouse"), fragments of sentences ("and so on", "whatever the case") or sentences ("Every evil is followed by some good", "No flies enter a mouth that is shut").  
 
'''[[Multiword Expression]]s''' (MTW) are lexical structures made up of a sequence of two or more lexemes. They can be concatenated ("darkroom", "skinhead") or isolated by hyphens ("blue-green", "African-American") or blank spaces ("round table", "part of speech"). Multiword expressions can be continuous ("get over") or discontinuous ("get <something> together"). They correspond to compounds ("fireman", "hardware"), phrases ("in spite of", "take into account"), idioms ("kick the bucket", "play cat and mouse"), fragments of sentences ("and so on", "whatever the case") or sentences ("Every evil is followed by some good", "No flies enter a mouth that is shut").  
Line 34: Line 49:
  
 
Classical compounds ("agriculture", "photograph") and their derivations ("agricultural", "photographically") are to be treated as simple words if they do not include more than one free morpheme. Phrasal verbs ("give in", "come across") are treated as multiword expressions.
 
Classical compounds ("agriculture", "photograph") and their derivations ("agricultural", "photographically") are to be treated as simple words if they do not include more than one free morpheme. Phrasal verbs ("give in", "come across") are treated as multiword expressions.
 
In some languages, a given inflection may assume different forms. In Arabic, for instance, the word 'elephant' has three plural forms, as indicated below:
 
*PLR:=فِيَلة;
 
*PLR&ALT1:=فُيُول;
 
*PLR&ALT2:=أفْيال;
 
Alternative forms are to be indicated by the attribute ALT, if there is one single alternative form, or ALT1, ALT2 and ALT3, in case there are more than two alternative forms.
 

Revision as of 14:53, 11 July 2013

Lexical structure is a category that indicates the internal structure of a lexical item.

In the UNLarium framework, there can be three different types of word forms, depending on their internal structure:

Subwords

Subwords (SBW) are structures that do not have independent existence in the language and only appear together with other morphemes to form a lexeme. Subwords include affixes (such as "un-", "re-", "-ful", "-ness") and roots that do not occur alone (such as "rupt" in "interrupt", "disrupt", "corrupt", "rupture", etc).

Simple words

Simple Words (WRD) are the smallest (indivisible) lexemes in the semantic system of a language. They may consist of:

  • one single free morpheme (such as "happy", "break");
  • one single free morpheme and bound morphemes ("unhappy", "happiness", "happily", "unbreakable", "unbreakableness"); and
  • compounds of bound morphemes (such as "interrupt", "disrupt", "corrupt").

Simple words may also include abbreviations (such as "ad", for advertisement, "dr." and "St." ).

In some languages, a given inflection may assume different forms. The feature ALT must be used for alternative forms.
In English, for instance, the word 'volcano' may have two different plural forms:

  • PLR:=volcanos;
  • PLR&ALT:=volcanoes;

In case of more than one possible alternative form, the features ALT1, ALT2 and ALT3 must be used instead of ALT.
For instance, in Arabic the word 'elephant' has three plural forms, as indicated below:

  • PLR:=فِيَلة;
  • PLR&ALT1:=فُيُول;
  • PLR&ALT2:=أفْيال;

Multiword expressions

Multiword Expressions (MTW) are lexical structures made up of a sequence of two or more lexemes. They can be concatenated ("darkroom", "skinhead") or isolated by hyphens ("blue-green", "African-American") or blank spaces ("round table", "part of speech"). Multiword expressions can be continuous ("get over") or discontinuous ("get <something> together"). They correspond to compounds ("fireman", "hardware"), phrases ("in spite of", "take into account"), idioms ("kick the bucket", "play cat and mouse"), fragments of sentences ("and so on", "whatever the case") or sentences ("Every evil is followed by some good", "No flies enter a mouth that is shut").

Multiword expressions may also include acronyms (such as "UNESCO"), multiple-word contractions (such as "don't") and blends (such as "sitcom") that are still analysable (differently from "radar" and "motel", which are represented as simple words).

Classical compounds ("agriculture", "photograph") and their derivations ("agricultural", "photographically") are to be treated as simple words if they do not include more than one free morpheme. Phrasal verbs ("give in", "come across") are treated as multiword expressions.

Software