Default grammar

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Transformation)
(Transformation)
Line 68: Line 68:
 
=== Transformation ===
 
=== Transformation ===
 
The transformation module converts a surface syntactic (tree) structure into a deep syntactic (tree) structure.<br />
 
The transformation module converts a surface syntactic (tree) structure into a deep syntactic (tree) structure.<br />
The output of the parsing is a surface (tree) structure. This surface structure does not contain yet some dependency relations that were not represented directly inside the list and which are important in the UNLization process. For instance, in case of "John did not kill Mary yesterday", the NP "John" will be represented at the position of the specifier of the IP "did not kill Mary yesterday", and it is important to move it to the position of the specifier of the VP "kill Mary yesterday". In order to do that, we have to convert the surface structure into a deep structure. This is done by movement rules such as the following:
+
The output of the parsing is a surface (tree) structure. This surface structure does not contain yet some dependency relations that were not represented directly inside the list and which are important in the UNLization process. For instance, in case of "John did not kill Mary yesterday", the NP "John" will be represented at the position of specifier of the IP "did not kill Mary yesterday", but it is important to move it to the position of specifier of the VP "kill Mary yesterday". In order to do that, we have to convert the surface structure into a deep structure. This is done by movement rules such as the following:
 
  IP(IB(;VP(;e));%s,^e):=IP(IB(;VP(;%s));%e,+e);  
 
  IP(IB(;VP(;e));%s,^e):=IP(IB(;VP(;%s));%e,+e);  
 
:Moves the non-empty (^e) node %s, which is at the position of specifier of IP, to the position of specifier of VP that is a complement of this IP, if this position is empty (e).
 
:Moves the non-empty (^e) node %s, which is at the position of specifier of IP, to the position of specifier of VP that is a complement of this IP, if this position is empty (e).

Revision as of 21:36, 26 October 2012

The Default grammar is expected to be language-independent and is normally loaded, after the language-specific grammars, in order to handle phenomena that are not covered by them. The default grammar is used only in transformation (t-grammar) and is unidirectional: there is a default grammar for UNLization, and a different default grammar for NLization.

Contents

Files

NL>UNL Default Grammar (UNLization)

The NL>UNL Default Grammar is divided into 7 sections

  1. Pre-processing (prepares the input for the processing)
  2. Normalization (standardizes the feature structure)
  3. Parsing (converts the input list structure into a tree structure)
  4. Transformation (converts the surface tree struture into the deep tree structure)
  5. Dearborization (converts the tree structure into a network structure)
  6. Interpretation (converts the syntactic network into a semantic network)
  7. Post-processing (adjusts the final output)

Pre-processing

The pre-processing module aims at preparing the input for processing. It includes rule such as the following:

(TEMP,%x)(BLK,%y)(TEMP,%z):=(%x&%y&%z,-BLK); merges temporary nodes

if there are two nodes (TEMP) isolated by a blank space (BLK) they become one single node

("asdfgh")(" ")("asdfgh")>("asdfgh asdfgh")
(PPN,%x)(BLK,%y)(PPN,%z):=(%x&%y&%z,+TEMP,-BLK); merges sequences of proper names

if there are two proper names (PPN) isolated by a blank space (BLK) they become one single node

("John")(" ")("Smith") > ("John Smith")
(BLK):=; deletes the blank space

deletes all blank spaces

("a")(" ")("b") > ("a")("b")

Normalization

The normalization section is divided into three modules:

  • Standardization, where isolated features are rewritten in the attribute-value format.

This is used when the feature list of entries are not represented in the dictionary in the attribute-value format, or as a cross-check for the feature assignment operations performed by the grammar itself. An example of standardization rules is:

(CAU,^ASP):=(-CAU,+ASP=CAU);

if a node has the feature "CAU" (= causative) but does not have the attribute "ASP" (aspect), then rewrite CAU as ASP=CAU

  • Propagation, where the features of top categories are copied to their children.

This is used to avoid proliferating rules. For instance, every word having the feature SNGT (singulare tantum) is also SNG (singular). This information is not stated in the dictionary, and must be made explicit in the grammar, in order not to simply duplicate all rules dealing with SNG. This generalization movement is performed by rules such as:

(SNGT,^SNG):=(-NUM,-SGNT,+NUM=SNG,+NUM=SNGT);

if a node has the feature SNGT (singulare tantum) and does not have the feature SNG (singular), then copy the feature SNG to it

  • Other normalization rules, to deal with special cases such as temporary UW's, pronouns and numbers, such as:
(TEMP,^LEX):=(+LEX=N,+POS=PPN); treats all temporary words as proper nouns

temporary UW's, which are absent from the dictionary, do not have any information other than the feature TEMP. In order to manipulate them inside the grammar, we assign them the feature PPN (proper name) (i.e., all temporary words are interpreted as proper names)

Parsing

The parsing module performs the syntactic analysis of the normalized input. It converts the list structure (a sequence of tokens) into a tree structure (a hierarchy of tokens). The parsing follows some general procedures coming from the X-bar theory and results in a tree structure with binary branching. Language-specific parsing rules are described in the language grammar; the parsing performed by the Default Grammar covers only very general structures. It is done in five steps:

  • Complementation, when the system tries to create intermediate projections (XB) out of the combination of the existing intermediate projections with possible complement candidates (XB + COMP > XB, or COMP + XB > XB)
  • Adjunction, when the system tries to create intermediate projections (XB) out of the combination of the existing intermediate projections with possible adjunct candidates (XB + ADJT > XB, or ADJT + XB > XB)
  • Specification, when the system tries to create maximal projections (XP) out of the combination of the existing intermediate projections with possible specifier candidates (SPEC + XB > XP, or XB + SPEC > XP)
  • Maximal projection, when the system tries to create maximal projections (XP) out of intermediate projections when there is no possible combination (no complement, adjunct or specifier)
  • Intermediate projection, when the system tries to create intermediate projections (XB) out of the heads when there is no possible combination (no complement or adjunct)

As it is seen above, the parsing module of the Default Grammar is highly dependent on the X-bar configuration. Consider, for instance, the case of the sentence "John killed Mary yesterday", supposing that the constituents have been already analyzed as:

  • John = NP
  • killed = V
  • Mary = NP
  • yesterday = AP

The following rules are applied:

  1. First projection: (V,^proj,^XB,^XP):=(+XB=VB);
    V projects VB, whatever the case: V[killed] > VB[killed]
  2. Complementation: (VB,%vb)(NP,%np):=(VB(%vb,+proj;%np,+comp,+proj),+XB=VB,+LEX=V,%new);
    VB + NP, i.e., VB[killed] NP[Mary], projects another VB: VB[killed Mary]
  3. Adjunction: (VB,%vb)(AP,%ap):=(VB(%vb,+proj;%ap,+adjt,+proj),+XB=VB,+LEX=V,%new);
    VB + AP, i.e., VB[killed Mary] AP[yesterday], projects another VB: VB[killed Mary yesterday]
  4. Specification: (NP,%np)(VB,%vb):=(VP(%vb,+proj;%np,+spec,+proj),+XP=VP,+LEX=V,%new);
    NP + VB, i.e., NP[John] VB[killed Mary yesterday], projects a VP: VP[John killed Mary yesterday]

Unfortunately, the rules cannot be so direct, because they are applied from left to right, and the system must wait for the processing of all possible candidates before starting this arborization process. That is why in most cases the rules are context-sensitive, as follows:

(DP,%dp)(NB,%nb)({^D,^J,^N,^P|PUT|STAIL|CTAIL},%right):=(NP(%nb,+proj;%dp,+spec,+proj),+XP=NP,+LEX=N,%new)(%right);
DP + NB will form a NP if, and only if, the NB is not followed by any determiner, adjective, noun or preposition (because they will still be part of the NB) or if the NB is followed by a punctuation sign (PUT), the end of the sentence (STAIL) or the end of the scope (CTAIL)

Transformation

The transformation module converts a surface syntactic (tree) structure into a deep syntactic (tree) structure.
The output of the parsing is a surface (tree) structure. This surface structure does not contain yet some dependency relations that were not represented directly inside the list and which are important in the UNLization process. For instance, in case of "John did not kill Mary yesterday", the NP "John" will be represented at the position of specifier of the IP "did not kill Mary yesterday", but it is important to move it to the position of specifier of the VP "kill Mary yesterday". In order to do that, we have to convert the surface structure into a deep structure. This is done by movement rules such as the following:

IP(IB(;VP(;e));%s,^e):=IP(IB(;VP(;%s));%e,+e); 
Moves the non-empty (^e) node %s, which is at the position of specifier of IP, to the position of specifier of VP that is a complement of this IP, if this position is empty (e).

Dearborization

Interpretation

Post-processing

UNL>NL Default Grammar (NLization)

The NL>UNL Default Grammar is divided into 6 sections

  • Pre-processing (prepares the input for the processing)
  • Normalization (standardizes the feature structure)
  • Arborization (converts the syntactic network into a syntactic tree)
  • Transformation (converts the deep syntactic structure into the surface syntactic structure)
  • Linearization (converts the syntactic structure into a list structure)
  • Post-processing (adjusts the final output)

Pre-processing

Normalization

Arborization

Transformation

Linearization

Post-processing

Software