Subcategorization

From UNL Wiki
Revision as of 17:17, 23 March 2010 by Martins (Talk | contribs)
Jump to: navigation, search

Subcategorization is the definition of the number and types of the syntactic arguments that co-occurs with the base form in order to form a multi-word expression or a phrase.

Subcategorization rules and subcategorization frames

In the UNLarium framework, subcategorization is indicated by a set of transformations carried over the base form. This set of transformations can be represented by:

  • subcategorization frames, in case of regular behaviour (i.e., a set of transformations that is followed by several different words)
  • subcategorization rules, in case of irregular behaviour (i.e., a set of transformation that is followed by very few words); or
  • subcategorization frames and subcategorization rules, in case of quasi-regular behaviour (i.e., when the word is mainly regular but has some subcategorization particularities).

For instance, the rule "VS(NP)VC(NP);" (= the verb takes a noun phrase as the subject and a noun phrase as a complement) is associated to all direct transitive verbs of English (to buy, to make, to do, etc) and should be defined, therefore, as a subcategorization frame. The same happens to the rule "VS(NP)VC(PP([on]));" (= the verb takes a noun phrase as the subject and a prepositional phrase headed by "on" as a complement), which is less general, but still quite comprehensive, and would be applicable to all indirect transitive verbs that select the preposition on (such as to depend, to insist, to operate, etc).

Examples of subcategorization frames
Intransitive verbs: VS(NP);
Direct transitive verbs: VS(NP)VC(NP);
Indirect transitive verbs selecting prepositional phrases headed by "on": VS(NP)VC(PP([on]));
Indirect transitive verbs selecting prepositional phrases headed by "in": VS(NP)VC(PP([in]));
Ditransitive verbs: VS(NP)VC(NP)VC(PP[to]));
Nouns selecting prepositional phrases headed by "of": NC(PP([of]));
Adjectives selecting prepositional phrases headed by "in": JC(PP([in]));
Adjectives selecting prepositional phrases headed by "of": JC(PP([of]));
Adverbs selecting prepositional phrases headed by "to": AC(PP([to]));
etc.

The number and the type of arguments, however, is not often as regular as described above. Consider, for instance, the Latin expression "lingua franca", whose base form is "lingua" because of the case system of Latin ("lingua franca", "linguae francae", "linguam francam", "linguas francas", etc). The lemma "lingua franca" will require then a subcategorization rule to generate "lingua franca" out of the BF "lingua". This subcategorization rule, which would be "NA([franca]);" (i.e., the noun takes the lemma "franca" as an adjunct), is too specific and will probably be associated only to the lemma "lingua franca". Therefore, the rule should be defined as a subcategorization rule instead of a subcategorization frame. Actually, subcategorization rules are mainly used to form compounds, i.e., to form new words by combining lexemes, which is normally a very specific behaviour.

Examples of subcategorization rules
NA("franca"); (as in lingua > lingua franca)
NA("of war"); (as in man > man of war)
NA("of intent"); (as in letter > letter of intent)
etc.

The main difference between subcategorization rules and subcategorization frames is that the former is stored in the dictionary (and hence is activated only when the entry is found in a given corpus) whereas the latter is stored in the grammar and is always processed. Subcategorization frames are thus much more expensive than subcategorization rules and must be reserved only for general rules.

In any case, subcategorization frames and subcategorization rules may be combined to avoid redundancy. Consider, for instance, the case of "take into account", whose subcategorization schema would be "VS(NP)VC(NP)VA("into account");" (= the verb takes a noun phrase as the subject, a noun phrase as a complement, and the string "into account" as an adjunct). A significant part of the rule ("VS(NP)VC(NP)") is perfectly regular, because "take into account" is still a transitive verb, as "to buy", "to make", etc. The other part ("VA("into account")) is very specific and would be applicable only in the case of "take into account". It's perfectly possible then to split this subcategorization schema in two parts: the entry "take into account" will be associated to the subcategorization frame of transitive words and, additionally, its particular syntactic behaviour will be described by a specific subcategorization rule. This will happen to all phrasal verbs and prepositional verbs in English:

Examples of subcategorization frames + subcategorization rules
  • take into account (base form = take)
    • subcategorization frame: VS(NP)VC(NP); (DIRECT TRANSITIVE)
    • subcategorization rule: VA("into account");
  • come true (base form = come)
    • subcategorization frame: VS(NP); (INTRANSITIVE)
    • subcategorization rule: VC("true");
  • come to an end (base form = come)
    • subcategorization frame: VS(NP); (INTRANSITIVE)
    • subcategorization rule: VA("to an end");
  • etc.

Syntax

Subcategorization frames and subcategorization rules are expressed by S-rules, a special formalism for representing the syntactic structure of the phrase.

Software