Subcategorization rules
In the UNL framework, Subcategorization Rules are rules for generating multi-word expressions out of base forms. They can also be used for creating exceptions to subcategorization frames.
Contents |
When to use subcategorization rules
Subcategorization rules must be used in two cases:
- Multi-word expressions that include more than one maximal syntactic projection, such as separable multi-word expressions ("behind <someone's> back") or idioms ("play cat and mouse"); and
- Exceptions to the subcategorization frames.
Let's consider, for instance, the case for "take into account", which is separable and involves three maximal projections (VP, NP and PP), as depicted below:
VP | VB / \ / \ / \ VB \ / \ \ / \ \ V NP PP | | | take <something> into account
In ordinary English dictionaries, "take into account" is not an independent entry, but a sub-entry of the verb "take", which should be considered therefore as the base form, so that we would be able to use the same inflectional paradigm already created for all other instances of "take" (= "takes", "taking", "took", "taken"). However, in order to deal with the fact that the verb is actually "take into account" and not only "take", it is necessary to create a subcategorization rule, which will have the following format:
VA("into account");
which means that the string "into account" should be generated as an adjunct of the verb (VA). It is interesting to notice that the alternative, which is to represent the whole string "take into account" as the base form and to provide a very specific inflectional paradigm ("takes into account", "taking into account", "took into account", "taken into account") is not only much more expensive but also insufficient, because it does not allow any string to come in between the verb and the particle (as in "take something into account").
When not to use subcategorization rules
Subcategorization rules should be avoided in case of expressions that are do not involve infixation or are not separable, such as many English phrasal verbs ("give in"), prepositional phrases ("in accordance with"), adverbial phrases ("once upon a time") and conjunctional phrases ("on the contrary"). In those cases, there is no need for splitting the expression at the level of lemma.
Subcategorization rules should also be avoided in case of inflectional compounds and multiword expressions that do not pose any problem to generating inflected forms. The plural of the English noun phrase "acid rain" or of the compound "skinhead", for instance, can be formed by simply adding an "s" to the string, such as many other single-word entries, and there is no actual need for creating a subcategorization rule for those cases.
It should be noticed that, if the multiword expression is not to be represented by a subcategorization rule, the lemma should be the multiword expression itself.
Syntax of subcategorization rules
Subcategorization rules must comply with the S-Rule formalism for representing syntactic information in the UNLarium framework.
Templates
Action | Template | Example (English) | Example (Rule) |
---|---|---|---|
Add X to the head | <CATEGORY>H(<DIRECTION>,<ADDED>,<FEATURES>); | give in (lemma="give") | VH(>,[in],PRE); |
Add X as a complement to the head | <CATEGORY>C(<DIRECTION>,<ADDED>,<FEATURES>); | make a mistake (lemma="make") | VC(>,[mistake],NOU); |
Add X as an adjunct to the head | <CATEGORY>A(<DIRECTION>,<ADDED>,<FEATURES>); | go blind (lemma="go") | VA(>,[blind],ADJ); |
Add X as an specifier to the head | <CATEGORY>S(<DIRECTION>,<ADDED><,FEATURES>); | Le Caire (lemma="Caire") | NS(<,[le],ART); |
Where
- <CATEGORY> is to be replaced by the lexical category (N, V, P, A, etc)
- <DIRECTION> is to be replaced by the direction of the insertion, as follows:
- > (insert to the right)
- < (insert to the left)
- <ADDED> is to be replaced by the item to be inserted, as follows:
- "string" (strings must come between parentheses)
- [word] (words must come between square brackets)
- a phrase (NP, VP, PP, etc), to be detailed in the same rule
- <FEATURES> is optional and is to be replaced by a list of features of the item to be inserted (according to the UNDLF Tagset). Tags must come separated by commas.
Observations
- The difference between strings and words has to do with their lexical status. Words are supposed to be included in the dictionary as independent entries, while strings are not.
- The structure of variable subordinate phrases must be detailed.
- Word: "draw someone's attention to something"
- Lemma: "draw"
- Subcategorization rules: VC(>,NP([attention])),VA(>,PP([to]));
- Gloss: The complement of the verb is a noun phrase (NP) whose head is the word "attention" and the adjunct to the verb a prepositional phrase (PP) whose head is the word "to".
- The structure of fixed subordinate phrases must not be detailed.
- Word: "go on foot"
- Lemma: "fo"
- Subcategorization rules: VA(>,"on foot",PP);
- Gloss: The adjunct to the verb is the string "on foot" which plays the role of a prepositional phrase (PP).