Default grammar

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Normalization)
(UNL>NL Default Grammar (NLization))
 
(20 intermediate revisions by one user not shown)
Line 6: Line 6:
  
 
== NL>UNL Default Grammar (UNLization) ==
 
== NL>UNL Default Grammar (UNLization) ==
The NL>UNL Default Grammar is divided into 7 sections
+
The NL>UNL Default Grammar is divided into 6 sections
 
#Pre-processing (prepares the input for the processing)
 
#Pre-processing (prepares the input for the processing)
#Normalization (standardizes the feature structure)
 
 
#Parsing (converts the input list structure into a tree structure)
 
#Parsing (converts the input list structure into a tree structure)
 
#Transformation (converts the surface tree struture into the deep tree structure)
 
#Transformation (converts the surface tree struture into the deep tree structure)
Line 26: Line 25:
 
deletes all blank spaces  
 
deletes all blank spaces  
 
:("a")(" ")("b") > ("a")("b")
 
:("a")(" ")("b") > ("a")("b")
 
=== Normalization ===
 
The normalization section is divided into three modules:
 
*'''Standardization''', where isolated features are rewritten in the attribute-value format.
 
This is used when the feature list of entries are not represented in the dictionary in the attribute-value format, or as a cross-check for the feature assignment operations performed by the grammar itself. An example of standardization rules is:
 
(CAU,^ASP):=(-CAU,+ASP=CAU);
 
if a node has the feature "CAU" (= causative) but does not have the attribute "ASP" (aspect), then rewrite CAU as ASP=CAU
 
*'''Propagation''', where the features of top categories are copied to their children.
 
This is used to avoid proliferating rules. For instance, every word having the feature SNGT (singulare tantum) is also SNG (singular). This information is not stated in the dictionary, and must be made explicit in the grammar, in order not to simply duplicate all rules dealing with SNG. This generalization movement is performed by rules such as:
 
(SNGT,^SNG):=(-NUM,-SGNT,+NUM=SNG,+NUM=SNGT);
 
if a node has the feature SNGT (singulare tantum) and does not have the feature SNG (singular), then copy the feature SNG to it
 
*'''Other normalization rules''', to deal with special cases such as temporary UW's, pronouns and numbers, such as:
 
(TEMP,^LEX):=(+LEX=N,+POS=PPN); treats all temporary words as proper nouns
 
temporary UW's, which are absent from the dictionary, do not have any information other than the feature TEMP. In order to manipulate them inside the grammar, we assign them the feature PPN (proper name) (i.e., all temporary words are interpreted as proper names)
 
  
 
=== Parsing ===
 
=== Parsing ===
 +
[[File:Parsing.png|thumb|left|200px]]
 +
The parsing module performs the syntactic analysis of the normalized input. It converts the list structure (a sequence of tokens) into a tree structure (a hierarchy of tokens). The parsing follows some general procedures coming from the [[X-bar theory]] and results in a tree structure with binary branching. Language-specific parsing rules are described in the language grammar; the parsing performed by the Default Grammar covers only very general structures. It is done in five steps:
 +
*'''Complementation''', when the system tries to create intermediate projections (XB) out of the combination of the existing intermediate projections with possible complement candidates (XB + COMP > XB, or COMP + XB > XB)
 +
*'''Adjunction''', when the system tries to create intermediate projections (XB) out of the combination of the existing intermediate projections with possible adjunct candidates (XB + ADJT > XB, or ADJT + XB > XB)
 +
*'''Specification''', when the system tries to create maximal projections (XP) out of the combination of the existing intermediate projections with possible specifier candidates (SPEC + XB > XP, or XB + SPEC > XP)
 +
*'''Maximal projection''', when the system tries to create maximal projections (XP) out of intermediate projections when there is no possible combination (no complement, adjunct or specifier)
 +
*'''Intermediate projection''', when the system tries to create intermediate projections (XB) out of the heads when there is no possible combination (no complement or adjunct)
 +
As it is seen above, the parsing module of the Default Grammar is highly dependent on the X-bar configuration. Consider, for instance, the case of the sentence "John killed Mary yesterday", supposing that the constituents have been already analyzed as:
 +
*John = NP
 +
*killed = V
 +
*Mary = NP
 +
*yesterday = AP
 +
The following rules are applied:
 +
#First projection: (V,^proj,^XB,^XP):=(+XB=VB);
 +
#:V projects VB, whatever the case: V[killed] > VB[killed]
 +
#Complementation: (VB,%vb)(NP,%np):=(VB(%vb,+proj;%np,+comp,+proj),+XB=VB,+LEX=V,%new);
 +
#:VB + NP, i.e., VB[killed] NP[Mary], projects another VB: VB[killed Mary]
 +
#Adjunction: (VB,%vb)(AP,%ap):=(VB(%vb,+proj;%ap,+adjt,+proj),+XB=VB,+LEX=V,%new);
 +
#:VB + AP, i.e., VB[killed Mary] AP[yesterday], projects another VB: VB[killed Mary yesterday]
 +
#Specification: (NP,%np)(VB,%vb):=(VP(%vb,+proj;%np,+spec,+proj),+XP=VP,+LEX=V,%new);
 +
#:NP + VB, i.e., NP[John] VB[killed Mary yesterday], projects a VP: VP[John killed Mary yesterday]
 +
Unfortunately, the rules cannot be so direct, because they are applied from left to right, and the system must wait for the processing of all possible candidates before starting this arborization process. That is why in most cases the rules are context-sensitive, as follows:
 +
(DP,%dp)(NB,%nb)({^D,^J,^N,^P|PUT|STAIL|CTAIL},%right):=(NP(%nb,+proj;%dp,+spec,+proj),+XP=NP,+LEX=N,%new)(%right);
 +
:DP + NB will form a NP if, and only if, the NB is not followed by any determiner, adjective, noun or preposition (because they will still be part of the NB) or if the NB is followed by a punctuation sign (PUT), the end of the sentence (STAIL) or the end of the scope (CTAIL)
 +
<br style="clear: both" />
 +
=== Transformation ===
 +
[[File:Transformation.png|thumb|left|200px]]
 +
The transformation module converts a surface syntactic (tree) structure into a deep syntactic (tree) structure.<br />
 +
The output of the parsing is a surface (tree) structure. This surface structure does not contain yet some dependency relations that were not represented directly inside the list and which are important in the UNLization process. For instance, in case of "John did not kill Mary", the NP "John" will be represented at the position of specifier of the IP "did not kill Mary", but it is important to move it to the position of specifier of the VP "kill Mary". In order to do that, we have to convert the surface structure into a deep structure. This is done by movement rules such as the following:
 +
IP(IB(;VP(;e));%s,^e):=IP(IB(;VP(;%s));%e,+e);
 +
:Moves the non-empty (^e) node %s, which is at the position of specifier of IP, to the position of specifier of VP that is a complement of this IP, if this position is empty (e).
 +
<br style="clear: both" />
 +
=== Dearborization ===
 +
[[File:Dearborization.png|thumb|left|200px]]
 +
The UNL graph is a network rather than a tree. In order to be converted to UNL, the deep syntactic structure must be "dearborized", i.e., transformed into a network structure. This is done by rewriting X-bar relations (XP,XB) as head-driven syntactic relations (XS,XC,XA) as indicated in [[X-bar]], i.e.:
 +
XP(XB(XB(HEAD;COMP);ADJT);SPEC) = XC(HEAD;COMP)XA(HEAD;ADJT)XS(HEAD;SPEC)
 +
In the Default Grammar, dearborization is performed by rules of the type:
 +
XP(XB(%x;%y);%z):=XB(%x;%y)XS(%x;%z);
 +
:If there are three nodes %x, %y and %z, so that %x and %y projects an XB, and this XB, along with %z, projects an XP, then rewrites this as two relations having %x as first argument: XB(%x;%y) and XS(%x;%z)
 +
<br style="clear: both" />
 +
 +
=== Interpretation ===
 +
[[File:Interpretation.png|thumb|left|200px]]
 +
The interpretation module simply maps the syntactic network into a semantic network by analyzing the arguments of each relation. It is formed by rules of the type:
 +
/[ACDIJNPV][ACS]/(%x;%y,tim):=tim(%x;%y);
 +
:if there is any syntactic relation (AA,AC,AS,CA,CC,CS,...) between two nodes %x and %y, and %y has the feature "tim", then rewrites this (syntactic) relation as a semantic relation of the type "tim" between these two nodes.
 +
<br style="clear: both" />
 +
 +
=== Post-processing ===
 +
The post-processing module simply adjusts the resulting graph to the UNL standards, in order to eliminate contradictions and redundancies. For instance, the rule
 +
(@pl,{@multal|@paucal|@all|@both}):=(-@pl); eliminates the redundancy of @pl
 +
is used to eliminate the redundancy of @pl and @multal. As @multal conveys already the idea of plural, @multal.@pl is redundant and should be fixed.
  
 
== UNL>NL Default Grammar (NLization) ==
 
== UNL>NL Default Grammar (NLization) ==
 
The NL>UNL Default Grammar is divided into 6 sections
 
The NL>UNL Default Grammar is divided into 6 sections
*Pre-processing (prepares the input for the processing)
+
#Pre-processing (prepares the input for the processing)
*Normalization (standardizes the feature structure)
+
#Arborization (converts the syntactic network into a syntactic tree)
*Arborization (converts the syntactic network into a syntactic tree)
+
#Transformation (converts the deep syntactic structure into the surface syntactic structure)
*Transformation (converts the deep syntactic structure into the surface syntactic structure)
+
#Linearization (converts the syntactic structure into a list structure)
*Linearization (converts the syntactic structure into a list structure)
+
#Morphological generation (inflects the words that need to be inflected)
*Post-processing (adjusts the final output)
+
#Post-processing (adjusts the final output)
  
== Modules ==
 
 
=== Pre-processing ===
 
=== Pre-processing ===
  
 +
=== Arborization ===
 +
[[File:Arborization.png|thumb|left|200px]]
 +
The arborization module converts a syntactic network into a syntactic tree. It is the same as the dearborization module, but in the opposite direction. It converts the network into a tree because trees are much more suitable to linearization than networks. Arborization rules are of the following type:
 +
VB(%x;%y)VS(%x;%z):=VP(VB(%x;%y);%z);
 +
VB(%x;%y)VA(%x;%z):=VB(VB(%x;%y);%z);
 +
VB(%x;%y)VC(%x;%z):=VB(VB(%x;%y);%z);
 +
VC(%x;%y):=VB(%x;%y);
 +
VA(%x;%y):=VB(%x;%y);
 +
VS(%x;%y):=VP(%x;%y);
 +
:if there is a VB and a VS sharing the same head, create a VP between the VB and the second argument of VS.
 +
<br style="clear: both" />
  
 +
=== Transformation ===
 +
[[File:Transformation2.png|thumb|left|200px]]
 +
In the UNL>NL Default Grammar, the transformation module transforms the deep syntactic structure derived from the arborization into a surface structure. If in the UNLization process, the specifier of IP was moved to the position of specifier of VP, now it goes back to its original position, in order to be generated in the right position (if not, the verb would be generated after the auxiliary, in case of English, for instance).
 +
<br style="clear: both" />
  
 +
=== Linearization ===
 +
[[File:Linearization.png|thumb|left|200px]]
 +
The linearization module converts a tree structure into a list structure. It is formed by rules of the following type:
 +
VP(%x;%y):=(%y,+>BLK)(%x);
 +
:if the there is a VP between %x and %y, remove this relation and writes %y in front of %x.
 +
<br style="clear: both" />
  
 
+
=== Morphological generation ===
 
+
The morphological generation module comprises one single rule:
 
+
(%x,^inflected,FLX):=(%x,!FLX,+inflected);
== Structure ==
+
This rule triggers the rules extracted from the inflectional paradigms (grammar) or from the [[Dictionary_Specs#Inflection_rules_inside_dictionary_entries.2A|inflectional rules (dictionary)]. This is done through the operand "!" which, in this case, applies all the rules comprised inside the field FLX. For instance, the English grammar brings the following rule, which is copied to the entry during the NLization process:
The English grammars are '''unidirectional''' There is a grammar for UNLization (the ENG->UNL Analysis Grammar) and another grammar for NLization (the UNL->ENG Generation Grammar). The former takes natural languages sentences as inputs and provides the corresponding UNL graphs as outputs; the latter takes UNL graphs as inputs and provides the corresponding English sentences as outputs.
+
(%x,M2):=(%x,-M2,+FLX(SNG:=0>""; PLR:=0>"s";))
 
+
:if the word belongs to the paradigm M2, then copy to this word the rule FLX(SNG:=0>""; PLR:=0>"s";), i.e., in order to make the singular, add nothing, and in order to make the plural, add "s" to the end of the word.
The English grammars are of two types: the '''transformation grammar''', or simply [[t-grammar]], which is used to manipulate data structures (i.e., to convert a list into a tree, a tree into network, a network into a tree, a tree into list); and the disambiguation grammar, or simply [[d-grammar]], which is used to control the behavior of the t-grammar (by prohibiting or inducing some of its possibilities).
+
=== Post-processing ===
 
+
The post-processing module is used to adjust the list to the standards of natural language output. It contains rules to eliminate remaining scopes, to add blank spaces, to capitalize the first word in a sentence, etc.
The English grammars are divided into two parts: the '''English Grammar''' itself, which contains rules that are specific to English, and the '''Default Grammar''', which contains language-independent rules and may be used by any language. The English Grammar applies first (i.e., the rules of the English Grammar have higher priority); the [[Default Grammar]] applies when no rule from the English Grammar can be fired.
+
((%x)(%y)(%z)):=(%x)(%y)(%z);
 
+
((%x)(%y)):=(%x)(%y);
== Files ==
+
:removes remaining scopes
*ENG-UNL (Analysis) Grammar (IAN)
+
(%x,>BLK)(%y,^BLK,^STAIL):=(%x,->BLK)(" ",+BLK)(%y);
**[http://www.unlweb.net/resources/grammar/eng_unl_tgrammar.txt ENG->UNL T-Grammar]
+
:inserts blank spaces after the words requiring a blank space (the feature >BLK was inserted during the linearization step)
**[http://www.unlweb.net/resources/grammar/nl_unl_tgrammar.txt NL->UNL Default T-Grammar]
+
  (%x,>COMMA):=(%x,->COMMA)(",",PUT=COMMA);
**[http://www.unlweb.net/resources/grammar/eng_unl_dgrammar.txt ENG->UNL D-Grammar]
+
:inserts comma after the words requiring comma (the feature >COMMA was inserted in previous steps)
**[http://www.unlweb.net/resources/grammar/nl-unl_dgrammar.txt NL->UNL Default D-Grammar]
+
*UNL-ENG (Generation) Grammar (EUGENE)
+
**[http://www.unlweb.net/resources/grammar/unl_eng_tgrammar.txt UNL->ENG T-Grammar]
+
**[http://www.unlweb.net/resources/grammar/unl_nl_tgrammar.txt UNL->NL T-Grammar]
+
**[http://www.unlweb.net/resources/grammar/unl_eng_dgrammar.txt UNL->ENG D-Grammar]
+
**[http://www.unlweb.net/resources/grammar/unl_nl_dgrammar.txt UNL->NL D-Grammar]
+
 
+
== Requisites ==
+
The grammars here presented depend heavily on the structure of the dictionary presented at [[English dictionary]]. You have to be acquainted with the formalism described at the [[UNL Dictionary Specs]] and the [[Tagset]] in order to fully understand how the grammar deal with the dictionary entry structure. You should also understand the process of [[tokenization]] done by the machine.
+
 
+
== Features ==
+
The grammars play with a set of features that come from three different sources:
+
*'''Dictionary features''' are the features ascribed to the entries in the dictionary, and appear either as simple attributes (LEX,GEN,NUM), as simple values (N,MCL,SNG) or attribute-value pairs (LEX=N,GEN=MCL,NUM=SNG).
+
*'''System-defined features''' are features automatically assigned by EUGENE and IAN during the processing. They are the following:
+
**SHEAD = beggining of the sentence (system-defined feature assigned automatically by the machine)
+
**CHEAD = beginning of a scope (system-defined feature assigned automatically by the machine)
+
**STAIL = end of the sentence (system-defined feature assigned automatically by the machine)
+
**CTAIL = end of a scope (system-defined feature assigned automatically by the machine)
+
**TEMP = temporary entry (system-defined feature assigned to the strings that are not present in the dictionary)
+
*'''Grammar features''' are features created inside the grammar in any of its intermediate states between the input and the output.
+
All the features are described at the [[Tagset]].
+
 
+
== UNLization (ENG-UNL) ==
+
The UNLization process is performed in three different steps:
+
<ol>
+
<li>[[Segmentation]] of English sentences is done automatically by the machine. It uses some punctuation signs (such as ".","?","!") and special characters (end of line, end of paragraph) as sentence boundaries. As the sentences are provided one per line, this step does not require any action from the grammar developer.</li>
+
<li>[[Tokenization]] of each sentence is done against the dictionary entries, from left to right, following the principle of the longest first. As there are several lexical ambiguities, some disambiguation rules are required to induce the correct lexical choice. </li>
+
<li>[[Transformation]] applies after tokenization and is divided in five different steps:</li>
+
<ol>
+
<li>'''Normalization''' prepares the input for the transformation rules. In the normalization step, we delete blank spaces, replace some words by symbols (such as "point" by ".", when between numbers), process numbers and temporary words (such as proper nouns) and standardize the feature structure of the nodes (by informing, for instance, that words having the feature "SNGT" (singulare tantum) are also "SNG" (singular); that "N" is a value of the attribute "LEX"; etc).</li>
+
<li>'''Parsing''' performs the syntactic analysis of the normalized input. The parsing follows some general procedures coming from the [[X-bar theory]] and results in a tree structure with binary branching with the following configuration:
+
<pre>
+
    XP
+
  / \
+
spec XB
+
    / \
+
    XB  adjt
+
  / \
+
  X  comp
+
  |
+
head
+
</pre>
+
:Where X is the category of any of the heads (N,V,J,A,P,D,I,C), XB is any of the intermediate projections (there can be as many intermediate projections as complements (comp) and adjuncts (adjt) in a phrase) and XP is the maximal projection, always linking the topmost intermediate projection to the specifier (spec).</li>
+
<li>'''Dearborization''' rewrites the tree structure as a graph structure, replacing intermediate (XB) and maximal projections (XP) by head-driven binary syntactic relations: XS(head,spec), XC(head,comp) and XA(head,adjt), where X is the category of any of the heads (e.g.,VC means complement to the verb). </li>
+
<li>'''Interpretation''' replaces syntactic binary relations by the UNL semantic binary relations (e.g., VC(head,comp) may be rewritten as obj(head,comp)).</li>
+
<li>'''Rectification''' adjusts the output graph to the UNL Standards.</li>
+
</ol>
+
</ol>
+
=== Tokenization ===
+
The tokenization is done with the [[English Disambiguation Grammar]].
+
 
+
=== Normalization ===
+
The normalization grammar is done with the [[Normalization Grammar]].
+
 
+
=== Parsing ===
+
 
+
=== Dearborization ===
+
 
+
=== Rectification ===
+
 
+
== UNL-EN (Generation) Grammar ==
+
 
+
 
+
=== UNL-EN (Generation) Transformation Grammar ===
+
 
+
 
+
=== UNL-EN (Generation) Disambiguation Grammar ===
+

Latest revision as of 23:14, 7 November 2012

The Default grammar is expected to be language-independent and is normally loaded, after the language-specific grammars, in order to handle phenomena that are not covered by them. The default grammar is used only in transformation (t-grammar) and is unidirectional: there is a default grammar for UNLization, and a different default grammar for NLization.

Contents

Files

NL>UNL Default Grammar (UNLization)

The NL>UNL Default Grammar is divided into 6 sections

  1. Pre-processing (prepares the input for the processing)
  2. Parsing (converts the input list structure into a tree structure)
  3. Transformation (converts the surface tree struture into the deep tree structure)
  4. Dearborization (converts the tree structure into a network structure)
  5. Interpretation (converts the syntactic network into a semantic network)
  6. Post-processing (adjusts the final output)

Pre-processing

The pre-processing module aims at preparing the input for processing. It includes rule such as the following:

(TEMP,%x)(BLK,%y)(TEMP,%z):=(%x&%y&%z,-BLK); merges temporary nodes

if there are two nodes (TEMP) isolated by a blank space (BLK) they become one single node

("asdfgh")(" ")("asdfgh")>("asdfgh asdfgh")
(PPN,%x)(BLK,%y)(PPN,%z):=(%x&%y&%z,+TEMP,-BLK); merges sequences of proper names

if there are two proper names (PPN) isolated by a blank space (BLK) they become one single node

("John")(" ")("Smith") > ("John Smith")
(BLK):=; deletes the blank space

deletes all blank spaces

("a")(" ")("b") > ("a")("b")

Parsing

Parsing.png

The parsing module performs the syntactic analysis of the normalized input. It converts the list structure (a sequence of tokens) into a tree structure (a hierarchy of tokens). The parsing follows some general procedures coming from the X-bar theory and results in a tree structure with binary branching. Language-specific parsing rules are described in the language grammar; the parsing performed by the Default Grammar covers only very general structures. It is done in five steps:

  • Complementation, when the system tries to create intermediate projections (XB) out of the combination of the existing intermediate projections with possible complement candidates (XB + COMP > XB, or COMP + XB > XB)
  • Adjunction, when the system tries to create intermediate projections (XB) out of the combination of the existing intermediate projections with possible adjunct candidates (XB + ADJT > XB, or ADJT + XB > XB)
  • Specification, when the system tries to create maximal projections (XP) out of the combination of the existing intermediate projections with possible specifier candidates (SPEC + XB > XP, or XB + SPEC > XP)
  • Maximal projection, when the system tries to create maximal projections (XP) out of intermediate projections when there is no possible combination (no complement, adjunct or specifier)
  • Intermediate projection, when the system tries to create intermediate projections (XB) out of the heads when there is no possible combination (no complement or adjunct)

As it is seen above, the parsing module of the Default Grammar is highly dependent on the X-bar configuration. Consider, for instance, the case of the sentence "John killed Mary yesterday", supposing that the constituents have been already analyzed as:

  • John = NP
  • killed = V
  • Mary = NP
  • yesterday = AP

The following rules are applied:

  1. First projection: (V,^proj,^XB,^XP):=(+XB=VB);
    V projects VB, whatever the case: V[killed] > VB[killed]
  2. Complementation: (VB,%vb)(NP,%np):=(VB(%vb,+proj;%np,+comp,+proj),+XB=VB,+LEX=V,%new);
    VB + NP, i.e., VB[killed] NP[Mary], projects another VB: VB[killed Mary]
  3. Adjunction: (VB,%vb)(AP,%ap):=(VB(%vb,+proj;%ap,+adjt,+proj),+XB=VB,+LEX=V,%new);
    VB + AP, i.e., VB[killed Mary] AP[yesterday], projects another VB: VB[killed Mary yesterday]
  4. Specification: (NP,%np)(VB,%vb):=(VP(%vb,+proj;%np,+spec,+proj),+XP=VP,+LEX=V,%new);
    NP + VB, i.e., NP[John] VB[killed Mary yesterday], projects a VP: VP[John killed Mary yesterday]

Unfortunately, the rules cannot be so direct, because they are applied from left to right, and the system must wait for the processing of all possible candidates before starting this arborization process. That is why in most cases the rules are context-sensitive, as follows:

(DP,%dp)(NB,%nb)({^D,^J,^N,^P|PUT|STAIL|CTAIL},%right):=(NP(%nb,+proj;%dp,+spec,+proj),+XP=NP,+LEX=N,%new)(%right);
DP + NB will form a NP if, and only if, the NB is not followed by any determiner, adjective, noun or preposition (because they will still be part of the NB) or if the NB is followed by a punctuation sign (PUT), the end of the sentence (STAIL) or the end of the scope (CTAIL)


Transformation

Transformation.png

The transformation module converts a surface syntactic (tree) structure into a deep syntactic (tree) structure.
The output of the parsing is a surface (tree) structure. This surface structure does not contain yet some dependency relations that were not represented directly inside the list and which are important in the UNLization process. For instance, in case of "John did not kill Mary", the NP "John" will be represented at the position of specifier of the IP "did not kill Mary", but it is important to move it to the position of specifier of the VP "kill Mary". In order to do that, we have to convert the surface structure into a deep structure. This is done by movement rules such as the following:

IP(IB(;VP(;e));%s,^e):=IP(IB(;VP(;%s));%e,+e); 
Moves the non-empty (^e) node %s, which is at the position of specifier of IP, to the position of specifier of VP that is a complement of this IP, if this position is empty (e).


Dearborization

Dearborization.png

The UNL graph is a network rather than a tree. In order to be converted to UNL, the deep syntactic structure must be "dearborized", i.e., transformed into a network structure. This is done by rewriting X-bar relations (XP,XB) as head-driven syntactic relations (XS,XC,XA) as indicated in X-bar, i.e.:

XP(XB(XB(HEAD;COMP);ADJT);SPEC) = XC(HEAD;COMP)XA(HEAD;ADJT)XS(HEAD;SPEC)

In the Default Grammar, dearborization is performed by rules of the type:

XP(XB(%x;%y);%z):=XB(%x;%y)XS(%x;%z);
If there are three nodes %x, %y and %z, so that %x and %y projects an XB, and this XB, along with %z, projects an XP, then rewrites this as two relations having %x as first argument: XB(%x;%y) and XS(%x;%z)


Interpretation

Interpretation.png

The interpretation module simply maps the syntactic network into a semantic network by analyzing the arguments of each relation. It is formed by rules of the type:

/[ACDIJNPV][ACS]/(%x;%y,tim):=tim(%x;%y);
if there is any syntactic relation (AA,AC,AS,CA,CC,CS,...) between two nodes %x and %y, and %y has the feature "tim", then rewrites this (syntactic) relation as a semantic relation of the type "tim" between these two nodes.


Post-processing

The post-processing module simply adjusts the resulting graph to the UNL standards, in order to eliminate contradictions and redundancies. For instance, the rule

(@pl,{@multal|@paucal|@all|@both}):=(-@pl); eliminates the redundancy of @pl

is used to eliminate the redundancy of @pl and @multal. As @multal conveys already the idea of plural, @multal.@pl is redundant and should be fixed.

UNL>NL Default Grammar (NLization)

The NL>UNL Default Grammar is divided into 6 sections

  1. Pre-processing (prepares the input for the processing)
  2. Arborization (converts the syntactic network into a syntactic tree)
  3. Transformation (converts the deep syntactic structure into the surface syntactic structure)
  4. Linearization (converts the syntactic structure into a list structure)
  5. Morphological generation (inflects the words that need to be inflected)
  6. Post-processing (adjusts the final output)

Pre-processing

Arborization

Arborization.png

The arborization module converts a syntactic network into a syntactic tree. It is the same as the dearborization module, but in the opposite direction. It converts the network into a tree because trees are much more suitable to linearization than networks. Arborization rules are of the following type:

VB(%x;%y)VS(%x;%z):=VP(VB(%x;%y);%z);
VB(%x;%y)VA(%x;%z):=VB(VB(%x;%y);%z);
VB(%x;%y)VC(%x;%z):=VB(VB(%x;%y);%z);
VC(%x;%y):=VB(%x;%y);
VA(%x;%y):=VB(%x;%y);
VS(%x;%y):=VP(%x;%y); 
if there is a VB and a VS sharing the same head, create a VP between the VB and the second argument of VS.


Transformation

Transformation2.png

In the UNL>NL Default Grammar, the transformation module transforms the deep syntactic structure derived from the arborization into a surface structure. If in the UNLization process, the specifier of IP was moved to the position of specifier of VP, now it goes back to its original position, in order to be generated in the right position (if not, the verb would be generated after the auxiliary, in case of English, for instance).

Linearization

Linearization.png

The linearization module converts a tree structure into a list structure. It is formed by rules of the following type:

VP(%x;%y):=(%y,+>BLK)(%x);
if the there is a VP between %x and %y, remove this relation and writes %y in front of %x.


Morphological generation

The morphological generation module comprises one single rule:

(%x,^inflected,FLX):=(%x,!FLX,+inflected);

This rule triggers the rules extracted from the inflectional paradigms (grammar) or from the [[Dictionary_Specs#Inflection_rules_inside_dictionary_entries.2A|inflectional rules (dictionary)]. This is done through the operand "!" which, in this case, applies all the rules comprised inside the field FLX. For instance, the English grammar brings the following rule, which is copied to the entry during the NLization process:

(%x,M2):=(%x,-M2,+FLX(SNG:=0>""; PLR:=0>"s";));  
if the word belongs to the paradigm M2, then copy to this word the rule FLX(SNG:=0>""; PLR:=0>"s";), i.e., in order to make the singular, add nothing, and in order to make the plural, add "s" to the end of the word.

Post-processing

The post-processing module is used to adjust the list to the standards of natural language output. It contains rules to eliminate remaining scopes, to add blank spaces, to capitalize the first word in a sentence, etc.

((%x)(%y)(%z)):=(%x)(%y)(%z);
((%x)(%y)):=(%x)(%y);
removes remaining scopes
(%x,>BLK)(%y,^BLK,^STAIL):=(%x,->BLK)(" ",+BLK)(%y);
inserts blank spaces after the words requiring a blank space (the feature >BLK was inserted during the linearization step)
(%x,>COMMA):=(%x,->COMMA)(",",PUT=COMMA);
inserts comma after the words requiring comma (the feature >COMMA was inserted in previous steps)
Software