Subcategorization rules

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Templates)
m (When not to use subcategorization frames)
 
(24 intermediate revisions by 3 users not shown)
Line 1: Line 1:
In the UNL framework, '''Subcategorization Rules''' are rules for generating multi-word expressions out of lemmas. They can also be used for creating exceptions to [[subcategorization frames]].  
+
'''Subcategorization rules''' are sets of rules used to generate particular syntactic structures out of the [[base form]].  
  
== When to use subcategorization rules ==
+
== What are subcategorization rules ==
  
Subcategorization rules must be used in four cases:
+
Subcategorization rules are rules for describing the necessary constituents for a form to project its corresponding maximal projection.
*Separable words (such as "take into account");
+
*Inflectional compounds (such as "part-of-speech");
+
*Inflectional multi-word expressions (such as "give in"); and
+
*Exceptions to the subcategorization frames.
+
  
Let's consider, for instance, the case for "take into account", which is inflectional and separable at the same time, and it is the English word corresponding to the UW "take into account(agt>thing, obj>thing)". In ordinary English dictionaries, "take into account" is not an independent entry, but a sub-entry of the verb "take", which should be considered therefore as the [[lemma]], so that we would be able to use the same inflectional paradigm already created for all instances of "take" (= "takes", "taking", "took", "taken"). However, in order to deal with the fact that the verb is actually "take into account" and not only "take", we should create a subcategorization rule with the following format:
+
== When to use subcategorization rules ==
  
VA(>"into account");
+
Subcategorization rules are used in case of '''valent''' words that have an '''irregular syntactic behaviour'''.
 
+
which means that the string "into account" is an adjunct of the verb (VA), after which (">") it should be generated. It is interesting to notice that the alternative, which is to represent the whole string "take into account" as the lemma and to provide a very specific inflectional paradigm ("takes into account", "taking into account", "took into account", "taken into account") is not only much more expensive but also insufficient, because it does not allow any string to come in between the verb and the particle (as in "take something into account").
+
  
 
== When not to use subcategorization rules ==
 
== When not to use subcategorization rules ==
  
Subcategorization rules should be avoided in case of expressions that are not inflectional or separable, such as many English prepositional phrases ("in accordance with"), adverbial phrases ("once upon a time") and conjunctional phrases ("on the contrary"). In those cases, there is no need for splitting the expression at the level of lemma.
+
Subcategorization rules are not used in case of '''avalent words''' or in case of valent words that have a regular syntactic behaviour (i.e., which may be described by [[subcategorization frames]]).
  
Subcategorization rules should also be avoided in case of inflectional compounds and multiword expressions that do not pose any problem to generating inflected forms. The plural of the English noun phrase "acid rain" or of the compound "skinhead", for instance, can be formed by simply adding an "s" to the string, such as many other single-word entries, and there is no actual need for creating a subcategorization rule for those cases.
+
== Syntax ==
  
It should be noticed that, if the multiword expression is not to be represented by a subcategorization rule, the lemma should be the multiword expression itself.
+
Subcategorization rules are expressed by [[S-rule]]s, a special formalism for representing the syntactic structure of phrases.
 +
 +
<SYNTACTIC ROLE>(<REQUIRED>);
  
== Syntax of subcategorization rules ==
+
Where:<br/>
 +
<SYNTACTIC ROLE> is the [[Syntactic roles]] (VA, VC, VS, VH, etc) of the term required by the base form; and<br />
 +
<REQUIRED> is the term required by the base form to saturate its syntactic structure, to be expressed as:
 +
*the maximal projection (NP, VP, JP, AP, PP, DP), in case of general phrases, or a specific head, in case of particular cases;
 +
*the order, if not default;
 +
*the adjacency, if not default;
 +
*other features, when pertinent.
 +
The head is represented between "quotes", if a string, or between [brackets], if a lemma.
  
Subcategorization rules must comply with the '''[[S-Rule]]''' formalism for representing syntactic information in the UNLarium framework.
+
== Examples ==
  
== Templates ==
+
{| border="1" cellpadding="5"
 
+
!Rules
{| border="1" cellpadding="5" align="center"
+
!Description
!Action
+
!Examples
!Template
+
!Example (English)
+
!Example (Rule)
+
 
|-
 
|-
|Add X to the head
+
|VS(NP)VC(NP)VA(PP("into account"));
|<CATEGORY>'''P'''(<DIRECTION><ADDED><FEATURES>);
+
|The verbal phrase requires three arguments: a specifier (NP), a complement (NP) and an adjunct (the fixed PP "into account")
|give in (lemma="give")
+
|take into account
|VP(>[in],PRE);
+
 
|-
 
|-
|Add X as a '''complement''' to the head
+
|VS(NP)VC(NP,HUM)VA(PP("to the lions"));
|<CATEGORY>'''C'''(<DIRECTION><ADDED><FEATURES>);  
+
|The verbal phrase requires three arguments: a specifier (NP), a complement (NP with the feature HUM = human) and an adjunct (the fixed PP "to the lions")
|make a mistake (lemma="make")
+
|throw someone to the lions
|VC(>[mistake],NOU);
+
 
|-
 
|-
|Add X as an '''adjunct''' to the head
 
|<CATEGORY>'''A'''(<DIRECTION><ADDED><FEATURES>);
 
|go blind (lemma="go")
 
|VA(>[blind],ADJ);
 
|-
 
|Add X as an '''specifier''' to the head
 
|<CATEGORY>'''S'''(<DIRECTION><ADDED><FEATURES>);
 
|Le Caire (lemma="Caire")
 
|NS(<[le],ART);
 
 
|}
 
|}
 
Where
 
*<CATEGORY> is to be replaced by the lexical category (N, V, P, A, etc)
 
*<DIRECTION> is to be replaced by the direction of the insertion, as follows:
 
**>  (insert to the right)
 
**<nowiki><</nowiki>  (insert to the left)
 
**<nowiki><<</nowiki> (insert to the left, blank space)
 
*<ADDED> is to be replaced by the item to be inserted, as follows:
 
** "string" (strings must come between parentheses)
 
** [word] (words must come between square brackets)
 
** a phrase (NP, VP, PP, etc), to be detailed in the same rule
 
*<FEATURES> is optional and is to be replaced by a list of features of the item to be inserted (according to the UNL Dictionary Tagset). Tags must come separated by commas.
 
 
== Observations ==
 
 
*The difference between '''strings''' and '''words''' has to do with their lexical status. Words are supposed to be included in the dictionary as independent entries, while strings are not.
 
*The structure of '''subordinate phrases''', if variable, must be detailed. 
 
** Word: "draw someone's attention to something"
 
** Lemma: "draw"
 
** Subcategorization rules: VC(>>NP([attention])),VA(>>PP([to]));
 
** Gloss: The complement of the verb is a noun phrase (NP) whose head is the word "attention" and the adjunct to the verb a prepositional phrase (PP) whose head is the word "to".
 
*The structure of '''subordinate phrases''', if fixed, must not be detailed. 
 
** Word: "go on foot"
 
** Lemma: "fo"
 
** Subcategorization rules: VA(>>"on foot",PP);
 
** Gloss: The adjunct to the verb is the string "on foot" which plays the role of a prepositional phrase (PP).
 

Latest revision as of 16:16, 14 March 2014

Subcategorization rules are sets of rules used to generate particular syntactic structures out of the base form.

Contents

What are subcategorization rules

Subcategorization rules are rules for describing the necessary constituents for a form to project its corresponding maximal projection.

When to use subcategorization rules

Subcategorization rules are used in case of valent words that have an irregular syntactic behaviour.

When not to use subcategorization rules

Subcategorization rules are not used in case of avalent words or in case of valent words that have a regular syntactic behaviour (i.e., which may be described by subcategorization frames).

Syntax

Subcategorization rules are expressed by S-rules, a special formalism for representing the syntactic structure of phrases.

<SYNTACTIC ROLE>(<REQUIRED>);

Where:
<SYNTACTIC ROLE> is the Syntactic roles (VA, VC, VS, VH, etc) of the term required by the base form; and
<REQUIRED> is the term required by the base form to saturate its syntactic structure, to be expressed as:

  • the maximal projection (NP, VP, JP, AP, PP, DP), in case of general phrases, or a specific head, in case of particular cases;
  • the order, if not default;
  • the adjacency, if not default;
  • other features, when pertinent.

The head is represented between "quotes", if a string, or between [brackets], if a lemma.

Examples

Rules Description Examples
VS(NP)VC(NP)VA(PP("into account")); The verbal phrase requires three arguments: a specifier (NP), a complement (NP) and an adjunct (the fixed PP "into account") take into account
VS(NP)VC(NP,HUM)VA(PP("to the lions")); The verbal phrase requires three arguments: a specifier (NP), a complement (NP with the feature HUM = human) and an adjunct (the fixed PP "to the lions") throw someone to the lions
Software