Subcategorization rules

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(When to use subcategorization rules)
Line 27: Line 27:
 
Subcategorization rules must comply with the '''[[S-Rule]]''' formalism for representing syntactic information in the UNLarium framework.
 
Subcategorization rules must comply with the '''[[S-Rule]]''' formalism for representing syntactic information in the UNLarium framework.
  
== Examples of subcategorization rules ==
+
== Templates ==
;bring back
+
 
:VA(>AP("back"));
+
{| border="1" cellpadding="5" align="center"
:The string "back", which is the head of an adverbial phrase (AP), is the adjunct of the verb (VA), and should be generated at its right (">").
+
!Action
 +
!Rule
 +
!Example (English)
 +
!Example (Rule)
 +
!Gloss
 +
|-
 +
|Add X to the head
 +
|<CATEGORY>P(<DIRECTION><ADDED><FEATURES>);
 +
|give in (lemma="give")
 +
|VP(>>[in],PRE);
 +
|"Give in" is a phrasal verb, i.e., "in" must come immediately after the verb
 +
|-
 +
|Add X as a complement to the head
 +
|<CATEGORY>P(<DIRECTION><ADDED><FEATURES>);
 +
|make a mistake (lemma="make")
 +
|VC(>>[mistake],NOU);
 +
|"mistake" is the complement of the verb and may be modified ("to make a terrible mistake")
 +
|-
 +
|Add X as an adjunct to the head
 +
|<CATEGORY>P(<DIRECTION><ADDED><FEATURES>);
 +
|go on foot (lemma="go")
 +
|VA(>>[on foot],PP);
 +
|"on foot" is an adjunct to the verb
 +
|-
 +
|Add X as an specifier to the head
 +
|<CATEGORY>P(<DIRECTION><ADDED><FEATURES>);
 +
|Le Caire (lemma="Caire")
 +
|NS(<<[le],ART);
 +
|}
 +
Where
 +
<CATEGORY> should be replaced by the lexical category (N, V, P, A, etc)
 +
<DIRECTION> should be replaced by the direction of the insertion, as follows:
 +
*>  (insert to the right, no blank space)
 +
*>> (insert to the right, after a blank space)
 +
*<nowiki><</nowiki>  (insert to the left, no blank space)
 +
*<nowiki><<</nowiki> (insert to the left, blank space)
 +
<ADDED> should be replaced by the item to be inserted, as follows:
 +
* "string" (strings must come between parentheses)
 +
* [word] (words must come between square brackets)
 +
<FEATURES> is optional and should be replaced by a list of features of the item to be inserted (according to the UNL Dictionary Tagset). Tags must come separated by commas.
 +
 
 +
== Observations ==

Revision as of 12:22, 18 September 2009

In the UNL framework, Subcategorization Rules are rules for generating multi-word expressions out of lemmas. They can also be used for creating exceptions to subcategorization frames.

Contents

When to use subcategorization rules

Subcategorization rules must be used in four cases:

  • Separable words (such as "take into account");
  • Inflectional compounds (such as "part-of-speech");
  • Inflectional multi-word expressions (such as "give in"); and
  • Exceptions to the subcategorization frames.

Let's consider, for instance, the case for "take into account", which is inflectional and separable at the same time, and it is the English word corresponding to the UW "take into account(agt>thing, obj>thing)". In ordinary English dictionaries, "take into account" is not an independent entry, but a sub-entry of the verb "take", which should be considered therefore as the lemma, so that we would be able to use the same inflectional paradigm already created for all instances of "take" (= "takes", "taking", "took", "taken"). However, in order to deal with the fact that the verb is actually "take into account" and not only "take", we should create a subcategorization rule with the following format:

VA(>"into account");

which means that the string "into account" is an adjunct of the verb (VA), after which (">") it should be generated. It is interesting to notice that the alternative, which is to represent the whole string "take into account" as the lemma and to provide a very specific inflectional paradigm ("takes into account", "taking into account", "took into account", "taken into account") is not only much more expensive but also insufficient, because it does not allow any string to come in between the verb and the particle (as in "take something into account").

When not to use subcategorization rules

Subcategorization rules should be avoided in case of expressions that are not inflectional or separable, such as many English prepositional phrases ("in accordance with"), adverbial phrases ("once upon a time") and conjunctional phrases ("on the contrary"). In those cases, there is no need for splitting the expression at the level of lemma.

Subcategorization rules should also be avoided in case of inflectional compounds and multiword expressions that do not pose any problem to generating inflected forms. The plural of the English noun phrase "acid rain" or of the compound "skinhead", for instance, can be formed by simply adding an "s" to the string, such as many other single-word entries, and there is no actual need for creating a subcategorization rule for those cases.

It should be noticed that, if the multiword expression is not to be represented by a subcategorization rule, the lemma should be the multiword expression itself.

Syntax of subcategorization rules

Subcategorization rules must comply with the S-Rule formalism for representing syntactic information in the UNLarium framework.

Templates

Action Rule Example (English) Example (Rule) Gloss
Add X to the head <CATEGORY>P(<DIRECTION><ADDED><FEATURES>); give in (lemma="give") VP(>>[in],PRE); "Give in" is a phrasal verb, i.e., "in" must come immediately after the verb
Add X as a complement to the head <CATEGORY>P(<DIRECTION><ADDED><FEATURES>); make a mistake (lemma="make") VC(>>[mistake],NOU); "mistake" is the complement of the verb and may be modified ("to make a terrible mistake")
Add X as an adjunct to the head <CATEGORY>P(<DIRECTION><ADDED><FEATURES>); go on foot (lemma="go") VA(>>[on foot],PP); "on foot" is an adjunct to the verb
Add X as an specifier to the head <CATEGORY>P(<DIRECTION><ADDED><FEATURES>); Le Caire (lemma="Caire") NS(<<[le],ART);

Where <CATEGORY> should be replaced by the lexical category (N, V, P, A, etc) <DIRECTION> should be replaced by the direction of the insertion, as follows:

  • > (insert to the right, no blank space)
  • >> (insert to the right, after a blank space)
  • < (insert to the left, no blank space)
  • << (insert to the left, blank space)

<ADDED> should be replaced by the item to be inserted, as follows:

  • "string" (strings must come between parentheses)
  • [word] (words must come between square brackets)

<FEATURES> is optional and should be replaced by a list of features of the item to be inserted (according to the UNL Dictionary Tagset). Tags must come separated by commas.

Observations

Software