Grammar Specs

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
Line 158: Line 158:
 
|}
 
|}
 
</div>
 
</div>
 +
 +
== Transformation rules ==
 +
 +
Natural language sentences and UNL graphs are supposed to convey the same amount of information in different structures: whereas the former arranges data as an ordered list of words, the latter organizes it as a hypergraph. In that sense, translating from natural language into UNL and from UNL into natural language is ultimately a matter of transforming lists into networks and vice-versa.
 +
 +
Both EUGENE and IAN, the UNDLF generation and analysis tools, assume that such transformation should be carried out progressively, i.e., through a transitional data structure: the tree, which could be used as an interface between lists and networks. Accordingly, the UNL Grammar states seven different types of rules (LL, TT, NN, LT, TL, TN, NT), as indicated below:
 +
 +
*ANALYSIS (NL-UNL)
 +
**LL - List Processing (list-to-list)
 +
**LT - Surface-Structure Formation (list-to-tree)
 +
**TT - Syntactic Processing (tree-to-tree)
 +
**TN - Deep-Structure Formation (tree-to-network)
 +
**NN - Semantic Processing (network-to-network)
 +
 +
*GENERATION (UNL-NL)
 +
**NN - Semantic Processing (network-to-network)
 +
**NT - Deep-Structure Formation (network-to-tree)
 +
**TT - Syntactic Processing (tree-to-tree)
 +
**TL - Surface-Structure Formation (tree-to-list)
 +
**LL - List Processing (list-to-list)
 +
 +
The whole process can be illustrated by the Diagram 1 below. The NL original sentence is supposed to be preprocessed, by the LL rules, in order to become an ordered list. Next, the resulting List Structure is parsed with the LT rules, so as to unveil its Surface Syntactic Structure, which is already a tree. The Tree Structure is further processed by the TT rules in order to expose its inner organization, the Deep Syntactic Structure, which is supposed to be more suitable to the semantic interpretation. Then, the Deep Syntactic Structure is projected into a semantic network by the TN rules. The resultant semantic network is then post-edited by the NN rules in order to comply with UNL Standards and generate the UNL Graph.
 +
 +
The reverse process is carried out during natural language generation. The UNL graph is preprocessed by the NN rules in order to become a more easily tractable semantic network. The resulting Network Structure is converted, by the NT rules, into a syntactic structure, which is still distant from the surface structure, as it is directly derived from the semantic arrangement. This Deep Syntactic Structure is subsequently transformed into a Surface Syntactic Structure by the TT rules. The Surface Syntactic Structure undergoes many other changes according to the TL rules, which generate a NL-like List Structure. This List Structure is finally realized as a natural language sentence by the LL rules.

Revision as of 23:07, 17 April 2009

UNL-NL grammars are sets of rules for translating UNL expressions into natural language (NL) sentences and NL sentences into UNL expressions. They are normally unidirectional, i.e., the enconversion grammar (NL-to-UNL) is different from the deconversion grammar (UNL-to-NL), even though they share the same basic syntax. In order to standardize the language resources in the UNL framework, the UNDL Foundation recommends the adoption of the following specifications for both UNL-to-NL and NL-to-UNL grammars. This formalism, however, is not supported by the UNL Centre's tools, and it is only required by those interested in using UNDL Foundation's tools.

Contents

Types of rules

In the UNL Grammar there are two basic types of rules:

Transformation rules
Used to generate natural language sentences out of UNL graphs and vice-versa.
Disambiguation rules
Used to improve the performance of transformation rules by constraining their applicability.

The Transformation Rules follow the very general formalism

α:=β;

where the left side α is a condition statement, and the right side β is an action to be performed over α.

The Disambiguation Rules, which were directly inspired by the UNL Centre's former co-occurrence dictionary and knowledge base, follows a slightly different formalism:

α=P;

where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.

We present both types of rules and their role in the UNL System. We introduce, first, the basic symbols that are used both by transformation and disambiguation rules; next, we present the transformation rules and their several subtypes; and finally we present the disambiguation rules.

Basic symbols

Both transformation and disambiguation rules use the same set of basic symbols:

Basic symbols used in UNL grammar rules

Symbol Definition Example
? any letter or digit ?b = 1b, 2b, ab, bb
$ any letter $b = ab, bb, cb, db
# any digit #b = 1b, 2b, 3b, 4b
* any sequence of letters or digits *b = 1b, 11b, 111b,
“ “ string “buy” = “buy”
( , ) complex expression
, and a,b = a and b
^ not ^a = not a
( ) optional a(b)c = ac or abc
{ } or {a,b} = a or b
* To be repeated more than 0 times a* = a, aa, aaa, …
& and + blank space a&b = a b
+ add +a = add a
- remove -a = remove a
%## placeholder for nodes %01
&## placeholder for attributes &01
# placeholder for NLWs #01



The following symbols are related to the structure of the UNL Dictionary, as illustrated below:

[NLW] {} “UW” (aa=AA, bb=BB, …) <L,F,P>;
[book] {} “book(icl>document)” (pos=NOU) <EN,0,0>;
[foot] {} “foot(icl>vertebrate foot) (pl=”feet”) <EN,0,0>;
[baby] {} “baby(icl>child) (pl=”y”>”ies”) <EN,0,0>;
[[bring] [back]] {} “to bring back(icl>to bring) (pos=VER) <EN,0,0>;


Basic symbols used for dictionary entries

Symbol Definition Example
[ ] NLW [book]
[[ ]] UW [[book(icl>document)]]
lower-case string (LCS) Attribute pos
upper-case string (UCS) Value NOU
LCS=UCS Feature pos=NOU
LCS:=”nlwv“ variant of the NLW in the case of LCS pl:=”feet”
LCS:=mod modification of the NLW in the case of LCS pl:=”y”>”ies”
[[NLW][NLW]] segmentation of NLW for infixation [[bring] [back]]

Transformation rules

Natural language sentences and UNL graphs are supposed to convey the same amount of information in different structures: whereas the former arranges data as an ordered list of words, the latter organizes it as a hypergraph. In that sense, translating from natural language into UNL and from UNL into natural language is ultimately a matter of transforming lists into networks and vice-versa.

Both EUGENE and IAN, the UNDLF generation and analysis tools, assume that such transformation should be carried out progressively, i.e., through a transitional data structure: the tree, which could be used as an interface between lists and networks. Accordingly, the UNL Grammar states seven different types of rules (LL, TT, NN, LT, TL, TN, NT), as indicated below:

  • ANALYSIS (NL-UNL)
    • LL - List Processing (list-to-list)
    • LT - Surface-Structure Formation (list-to-tree)
    • TT - Syntactic Processing (tree-to-tree)
    • TN - Deep-Structure Formation (tree-to-network)
    • NN - Semantic Processing (network-to-network)
  • GENERATION (UNL-NL)
    • NN - Semantic Processing (network-to-network)
    • NT - Deep-Structure Formation (network-to-tree)
    • TT - Syntactic Processing (tree-to-tree)
    • TL - Surface-Structure Formation (tree-to-list)
    • LL - List Processing (list-to-list)

The whole process can be illustrated by the Diagram 1 below. The NL original sentence is supposed to be preprocessed, by the LL rules, in order to become an ordered list. Next, the resulting List Structure is parsed with the LT rules, so as to unveil its Surface Syntactic Structure, which is already a tree. The Tree Structure is further processed by the TT rules in order to expose its inner organization, the Deep Syntactic Structure, which is supposed to be more suitable to the semantic interpretation. Then, the Deep Syntactic Structure is projected into a semantic network by the TN rules. The resultant semantic network is then post-edited by the NN rules in order to comply with UNL Standards and generate the UNL Graph.

The reverse process is carried out during natural language generation. The UNL graph is preprocessed by the NN rules in order to become a more easily tractable semantic network. The resulting Network Structure is converted, by the NT rules, into a syntactic structure, which is still distant from the surface structure, as it is directly derived from the semantic arrangement. This Deep Syntactic Structure is subsequently transformed into a Surface Syntactic Structure by the TT rules. The Surface Syntactic Structure undergoes many other changes according to the TL rules, which generate a NL-like List Structure. This List Structure is finally realized as a natural language sentence by the LL rules.

Software