C-rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Syntax)
Line 1: Line 1:
'''Compounding''' or '''composition''' is the word-formation process of creating compounds by combining or putting together lexemes.
+
'''Compounding''' or '''composition''' is the word-formation process of creating compounds by combining or putting together lexemes. This process is performed by <b>Composition rules</b> (CPWR), which are used to generate compounds out of the [[base form]].
  
== Syntax ==
+
== When to use composition rules ==
In the UNL<sup>arium</sup> framework, compounds are treated as ordinary simple words except in case of discontinuous [[multiword expression]]s or with infixation (such as "give in" or "take into account"). In these cases, the [[lemma]] is different from the [[base form]], and the compound-formation process is expected to be defined through [[S-rule]]s such as the following:
+
In the UNL<sup>arium</sup> framework, compounds are treated as ordinary simple words except in case of discontinuous [[multiword expression]]s or with infixation (such as "give in" or "take into account"). In these cases, the [[lemma]] is different from the [[base form]], and the compound-formation process is expected to be defined through special rules. <br />
 
+
Composition rules must be created when and only when the [[base form]] is different from the [[lemma]].<br />
<SYNTACTIC ROLE>(<ADDED>);
+
This situation occurs only in case of the following [[multiword expression]]s:
 +
*when inflections are formed by infixation (in opposition to simple suffixation or prefixation); or
 +
*when the multiword expression is discontinuous.
 +
For instance:<br />
 +
The English multiword expression "call for" has the following inflections: "call for", "call'''s''' for", "call'''ed''' for", "call'''ing''' for", etc. These inflections are formed by infixation, in the sense they apply in the middle of the expression (between "call" and "for"). If we simply associate this expression to the inflectional paradigm of "call", we will have the following results: "call for", "call for'''s'''", "call for'''ed'''", "call for'''ing'''", etc. In order to prevent this problem, and to avoid the unnecessary proliferation of rules in the grammar, we split the multiword expression into two segments: the '''base form''' (BF), i.e., the term over which the inflections will be directly applied; and the '''composition rule''' (CPWR), which is the rule used to rebuild the lemma out of the base form. In the case of "call for", the lemma is "call" and the composition rule is "VH([for],P,M0);".
  
Where:<br/>
+
== When not to use composition rules ==
<SYNTACTIC ROLE> is the [[Syntactic roles|syntactic role]] (VA, VC, VS, VH, etc) of the term to be added to the base form; and<br />
+
Composition rules must not be used in the following circumstances:
<ADDED> is the term to be added to the base form to form the compound. It can be a string between "quotes" or a lemma between [brackets].<br />
+
*When the word is not a multiword expression;
 +
*When the multiword expression is invariant;
 +
*When the inflections of the multiword expression are formed by prefixation or suffixation (such as in "call center" > "call center'''s'''");
  
 
== Examples ==
 
== Examples ==
Line 32: Line 38:
 
|the string "to the lions" is to be added to the base form as an adjunct to the verb (VA)  
 
|the string "to the lions" is to be added to the base form as an adjunct to the verb (VA)  
 
|}
 
|}
 +
 +
== Syntax ==
 +
The syntax for composition rules is the following:
 +
<SYNTACTIC ROLE>(<ADDED>,<FEATURES);
 +
Where:<br />
 +
*<SYNTACTIC ROLE> is the [[syntactic role]] (VA, VC, VS, VH, etc) of the term to be added to the base form;
 +
*<ADDED> is the term to be added to the base form to form the compound. It must be represented between <nowiki>[</nowiki>brackets<nowiki>]</nowiki>, if it is a lemma (i.e., if it is an entry in the dictionary), or between <nowiki>"</nowiki>quotes<nowiki>"</nowiki>, if a string (i.e., if it is not an entry in the dictionary)
 +
*<FEATURES> are the features of the term to be added to the base form. The following features are mandatory:
 +
**the [[lexical category]] (A,J,N,V,C,P,D) of the term to be added
 +
**the [[inflection|inflectional properties]] (paradigm and/or inflectional rules) of the term to be added
 +
**the [[distribution]] (i.e., the position) of the term to be added, if not default
 +
**the [[adjacency]] of the term to be added, if not default
 +
There can be several different composition rules associated to the same base form in order to form complex expressions such as "give up the ghost", where there are at least three constituents: "give", which is the base form; and "up" and "the ghost", which will part of different composition rules:
 +
*VH([up],P,M0);VC("the ghost",N,M0);
 +
 +
== Composition rules in the dictionary ==
 +
In the UNL<sup>arium</sup> framework, composition rules may be expressed in two different formats:
 +
*As complex structures, such as<ref>For further information on complex structures inside the dictionary, refer to [[Dictionary Specs#Complex structures as NLW*]]</ref>
 +
[[sub-NLW][sub-NLW]...[sub-NLW]]  {ID}  “UW”  (ATTR , ..., #01(ATTR, ...), #02(ATTR, ...), ...)  < FLG , FRE , PRI >; COMMENTS
 +
*As simple structures, such as
 +
[NLW]  {ID}  “UW”  (ATTR , ..., BF=<BASE FORM>, CPWR=MTW(<COMPOSITION RULE>) )  < FLG , FRE , PRI >; COMMENTS
 +
Where <COMPOSITION RULE> is the rule or set of composition rules used to form the lemma out of the base form, and <BASE FORM> is the [[base form]]. Notice that the compostion rule must be informed as a value of the attribute CPWR and should be preceded by the multiword tag MTW because there can be several different composition rules associated to the same entry.
 +
 +
== Example of dictionary entries containing composition rules ==
 +
*[[bring] [back]] {12343} "202078294" (pos=VER, #01(IFX(ET0:=4>"ought")), #02(pos=PRE)) <eng, 0, 0>;
 +
*[bring back] {12343} "202078294" (pos=VER, BF=bring, CPWR=MTW(VA([back],A,M0);)) <eng, 0, 0>;
  
 
== Observations ==
 
== Observations ==
Line 55: Line 87:
 
:*VH([up])VC("the ghost"); (adjacency must not be informed, because in English head particles come before complements, by default: ''give'' > ''give up the ghost'')
 
:*VH([up])VC("the ghost"); (adjacency must not be informed, because in English head particles come before complements, by default: ''give'' > ''give up the ghost'')
 
:*VA([home],AJ1)VC("the bacon",AJ2); (adjacency must be informed because in English the complement is normally generated before the adjunct: ''bring the bacon home'')
 
:*VA([home],AJ1)VC("the bacon",AJ2); (adjacency must be informed because in English the complement is normally generated before the adjunct: ''bring the bacon home'')
 +
 +
== Notes ==
 +
<references />

Revision as of 20:16, 8 December 2011

Compounding or composition is the word-formation process of creating compounds by combining or putting together lexemes. This process is performed by Composition rules (CPWR), which are used to generate compounds out of the base form.

Contents

When to use composition rules

In the UNLarium framework, compounds are treated as ordinary simple words except in case of discontinuous multiword expressions or with infixation (such as "give in" or "take into account"). In these cases, the lemma is different from the base form, and the compound-formation process is expected to be defined through special rules.
Composition rules must be created when and only when the base form is different from the lemma.
This situation occurs only in case of the following multiword expressions:

  • when inflections are formed by infixation (in opposition to simple suffixation or prefixation); or
  • when the multiword expression is discontinuous.

For instance:
The English multiword expression "call for" has the following inflections: "call for", "calls for", "called for", "calling for", etc. These inflections are formed by infixation, in the sense they apply in the middle of the expression (between "call" and "for"). If we simply associate this expression to the inflectional paradigm of "call", we will have the following results: "call for", "call fors", "call fored", "call foring", etc. In order to prevent this problem, and to avoid the unnecessary proliferation of rules in the grammar, we split the multiword expression into two segments: the base form (BF), i.e., the term over which the inflections will be directly applied; and the composition rule (CPWR), which is the rule used to rebuild the lemma out of the base form. In the case of "call for", the lemma is "call" and the composition rule is "VH([for],P,M0);".

When not to use composition rules

Composition rules must not be used in the following circumstances:

  • When the word is not a multiword expression;
  • When the multiword expression is invariant;
  • When the inflections of the multiword expression are formed by prefixation or suffixation (such as in "call center" > "call centers");

Examples

Lemma Base Form Compound Description
give in give VH([in]) the string "in" is to be added to the base form as part of the head of the verb (VH)
take into account take VA("into account") the string "into account" is to be added to the base form as an adjunct to the verb (VA)
throw <person> to the lions throw VA("to the lions") the string "to the lions" is to be added to the base form as an adjunct to the verb (VA)

Syntax

The syntax for composition rules is the following:

<SYNTACTIC ROLE>(<ADDED>,<FEATURES);

Where:

  • <SYNTACTIC ROLE> is the syntactic role (VA, VC, VS, VH, etc) of the term to be added to the base form;
  • <ADDED> is the term to be added to the base form to form the compound. It must be represented between [brackets], if it is a lemma (i.e., if it is an entry in the dictionary), or between "quotes", if a string (i.e., if it is not an entry in the dictionary)
  • <FEATURES> are the features of the term to be added to the base form. The following features are mandatory:

There can be several different composition rules associated to the same base form in order to form complex expressions such as "give up the ghost", where there are at least three constituents: "give", which is the base form; and "up" and "the ghost", which will part of different composition rules:

  • VH([up],P,M0);VC("the ghost",N,M0);

Composition rules in the dictionary

In the UNLarium framework, composition rules may be expressed in two different formats:

  • As complex structures, such as[1]
[[sub-NLW][sub-NLW]...[sub-NLW]]  {ID}  “UW”  (ATTR , ..., #01(ATTR, ...), #02(ATTR, ...), ...)  < FLG , FRE , PRI >; COMMENTS
  • As simple structures, such as
[NLW]  {ID}  “UW”  (ATTR , ..., BF=<BASE FORM>, CPWR=MTW(<COMPOSITION RULE>) )  < FLG , FRE , PRI >; COMMENTS

Where <COMPOSITION RULE> is the rule or set of composition rules used to form the lemma out of the base form, and <BASE FORM> is the base form. Notice that the compostion rule must be informed as a value of the attribute CPWR and should be preceded by the multiword tag MTW because there can be several different composition rules associated to the same entry.

Example of dictionary entries containing composition rules

  • [[bring] [back]] {12343} "202078294" (pos=VER, #01(IFX(ET0:=4>"ought")), #02(pos=PRE)) <eng, 0, 0>;
  • [bring back] {12343} "202078294" (pos=VER, BF=bring, CPWR=MTW(VA([back],A,M0);)) <eng, 0, 0>;

Observations

Phrasal verbs
Particles of phrasal verbs must be represented as part of the head, if non separable, or as adjuncts, if separable:
  • give in = VH([in]); ("give in something" but "give something in")
  • give back = VA([back]); ("give back something" or "give something back")
General syntactic roles (NP, PP, XP) must not be defined in composition rules but inside the subcategorization frame
  • throw <person> to the lions =+VA("to the lions"); (and not "VA("to the lions")VC(NP);". The lemma should be associated to the transitive frame instead)
"Quotes" or [brackets]?
In the compound-formation process, the UNLarium distinguishes between strings (to be represented between "") and lemmas (to be represented between [ ]). The difference between strings and lemmas has to do with the dictionary status: lemmas (but not strings) are expected to be dictionary entries.
  • VA("into account"); (the string "into account" is not expected to be a dictionary entry)
  • VC([sense]); (the term "sense" is expected to be a dictionary entry).
Complex compounds
Compounds must include as many terms as different syntactic roles. One single "+" must be provided at the beginning of the rule:
  • give up the ghost = VH([up])VC("the ghost"); (+VH("up the ghost") or +VC("up the ghost"))
Order is to be represented by the distribution features (">", ">>", "<", "<<", ...), if not default
  • VC([love]); (order must not be informed, because in English complements come at the right side by default: make > make love)
  • NS([the]); (order must not be informed, because in English specifiers come at the left side, by default: Netherlands > the Netherlands)
  • NA(>>,[available]); (order must be informed, because in English nominal adjuncts come at the left side, by default: table > new table)
Adjacency is to be represented by the adjacency features (AJ0,AJ1,AJ2,...), if not default
  • VC([love]); (adjacency must not be informed, because in English complements come after the head, by default: make > make love)
  • VH([up])VC("the ghost"); (adjacency must not be informed, because in English head particles come before complements, by default: give > give up the ghost)
  • VA([home],AJ1)VC("the bacon",AJ2); (adjacency must be informed because in English the complement is normally generated before the adjunct: bring the bacon home)

Notes

  1. For further information on complex structures inside the dictionary, refer to Dictionary Specs#Complex structures as NLW*
Software