Grammar Specs

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Indexes)
(Redirected page to Grammar)
Line 1: Line 1:
UNL grammars are sets of rules for translating UNL expressions into natural language (NL) sentences and NL sentences into UNL expressions. They are normally unidirectional, i.e., the [[UNL-ization]] grammar (NL-to-UNL) is different from the [[NL-ization]] grammar (UNL-to-NL), even though they share the same basic syntax.
+
#REDIRECT [[Grammar]]
 
+
== Basic symbols ==
+
 
+
{| border="1" cellpadding="2" align=center
+
|+Basic symbols used in UNL grammar rules
+
!Symbol
+
!Definition
+
!Example
+
|-
+
|align=center|<nowiki>^</nowiki>
+
|not
+
|^a = not a
+
|-
+
|align=center|{ | }
+
|or
+
|<nowiki>{a|b}</nowiki> = a or b
+
|-
+
|align=center|%
+
|index for nodes, attributes and values
+
|%x (see [[#Indexes|below]])
+
|-
+
|align=center|#
+
|index for sub-NLWs
+
|#01 (see [[#Indexes|below]])
+
|-
+
|align=center|=
+
|attribute-value assignment
+
|POS=NOU
+
|-
+
|align=center|!
+
|rule trigger
+
|!PLR
+
|-
+
|align=center|&
+
|merge operator
+
|%x&%y
+
|-
+
|align=center|?
+
|dictionary lookup operator
+
|?[a]
+
|-
+
|align=center|“ “
+
|string
+
|"went"
+
|-
+
|align=center|[ ]
+
|natural language entry (headword)
+
|[go]
+
|-
+
|align=center|[[ ]]
+
|UW
+
|[[to go(icl>to move)]]
+
|-
+
|align=center|( )
+
|node
+
|(a)
+
|-
+
|align=center|//
+
|regular expression
+
|/a{2,3}/ = aa,aaa
+
|}
+
 
+
;The differences between "", [] and [[]]
+
:Double quotes are always used to represent strings: "a" will match only the string "a"
+
:Simple square brackets are always used to represent natural language entries (headwords) in the dictionary: [a] will match the node associated to the entry [a] retrieved from the dictionary, no matter its current realization, which may be affected by other rules (the original [a] may have been replaced, for instance, by "b", but will still be indexed to the entry [a])
+
:Double square brackets are always used to represent UWs: <nowiki>[[a]]</nowiki> will match the node associated to the UW <nowiki>[[a]]</nowiki>
+
 
+
;Predefined values (assigned by default)
+
:SCOPE - Scope
+
:SHEAD - Sentence head (the beginning of a sentence)
+
:STAIL - Sentence tail (the end of a sentence)
+
:CHEAD - Scope head (the beginning of a scope)
+
:CTAIL - Scope tail (the end of a scope)
+
:TEMP - Temporary entry (entry not found in the dictionary)
+
:DIGIT - Any sequence of digits (i.e.: 0,1,2,3,4,5,6,7,8,9)
+
 
+
== Basic concepts ==
+
=== Nodes ===
+
A node is the most elementary unit in the grammar. It is the result of the [[tokenization]] process, and corresponds to the notion of "lexical item", to be represented by dictionary entries. At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes. Any node is a vector (one-dimensional array) containing the following necessary elements:
+
*a string, to be represented between "quotes", which expresses the actual state of the node;
+
*a headword, to be represented between [square brackets], which expresses the original value of the node in the dictionary;
+
*a UW, to be represented between <nowiki>[[double square brackets]]</nowiki>, which expresses the UW value of the node;
+
*a feature or set of features, which express the features of the node;
+
*an [[#Indexes|index]], preceded by the symbol %, which is used to reference the node;
+
Examples of nodes are
+
*("ing") (a node making reference only to its actual string value)
+
*([book]) (a node making reference only to its headword,i.e., its original state in the dictionary)
+
*([[book(icl>document)]]) (a node making reference only to its UW value)
+
*(NUM) (a node making reference only to one of its features)
+
*(POS=NOU) (a node making reference only to one of its features in the attribute-value pair format)
+
*(%x) (a node making reference only to its unique index)
+
*("string",[headword],<nowiki>[[UW]]</nowiki>,feature1,feature2,...,attribute1=value1,attribute2=value2,...,%x) (complete node)
+
==== Properties of nodes ====
+
;Nodes are enclosed between (parentheses)
+
:("a") is a node
+
:"a" is not a note
+
;The elements of a node are separated by comma
+
:("a",[a],<nowiki>[[a]]</nowiki>,A,B,A=C,%a)
+
;The order of elements inside a node is not relevant.
+
:("a",[a],<nowiki>[[a]]</nowiki>,A,B,A=C,%a) is the same as (<nowiki>[[a]]</nowiki>,B,A,"a",[a],A=C,%a)
+
;Nodes may have one single string, headword, UW and index, but may have as many features as necessary
+
:<strike>("a","b")</strike> (a node may not contain more than one string)
+
:<strike>([a],[b])</strike> (a node may not contain more than one headword)
+
:<strike>(<nowiki>[[a]]</nowiki>,<nowiki>[[b]]</nowiki>)</strike> (a node may not contain more than one UW)
+
:<strike>(%a,%b)</strike> (a node may not contain more than one index)
+
:(A,B,C,D,...,Z) (a node may contain as many features as necessary)
+
;A node may be referred by any of its elements
+
:("a") refers to all nodes where actual string = "a"
+
:([a]) refers to all nodes where headword = [a]
+
:(<nowiki>[[a]]</nowiki>) refers to all nodes where UW = <nowiki>[[a]]</nowiki>
+
:(A) refers to all nodes having the feature A
+
:("a",[a],<nowiki>[[a]]</nowiki>,A) refers to all nodes having the feature A where string = "a" and headword = [a] and UW = <nowiki>[[a]]</nowiki>
+
;Nodes are automatically indexed according to a position-based system if no explicit index is provided (see [[#Indexes|Index]])
+
:("a")("b") is actually ("a",%01)("b",%02)
+
;[[Regular expressions]] may be used to make reference to any element of the node, except the index
+
:("/a{2,3}/") refers to all nodes where string is a sequence of 2 to 3 characters "a"
+
:([/a{2,3}/]) refers to all nodes where headword is a sequence of 2 to 3 characters "a"
+
:([[/a{2,3}/]]) refers to all nodes where UW is a sequence of 2 to 3 characters "a"
+
:(/a{2,3}/) refers to all nodes having a feature that is a sequence of 2 to 3 characters "a"
+
;Nodes may contain disjoint features enclosed between {braces} and separated by comma
+
:({A|B}) refers to all nodes having the feature A OR B
+
;Node features may be expressed as simple attributes, or attribute-value pairs:
+
:(MCL) - feature as an attribute: refers to all nodes having the feature MCL
+
:(GEN=MCL) - feature as an attribute-value pair, which is the same as (GEN,MCL): refers to all nodes having the features GEN and MCL.
+
Attribute-value pairs may be used to create co-reference between different nodes (as in agreement):
+
:(%x,GEN)(%y,GEN=%x) - the value of the attribute GEN of the node %x is the same of the attribute GEN of the node %y (see [[#Index|Index]])
+
 
+
=== Relations ===
+
In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there can be three different types of relations:
+
*the '''linear''' relation L expresses the surface structure of natural language sentences
+
*'''syntactic''' relations express the deep (tree) structure of natural language sentences
+
*'''semantic''' relations express the structure of UNL graphs
+
==== Properties of relations ====
+
;The linear relation is always binary and is represented in two possible formats:
+
*L(%x;%y), where L is the invariant name of the linear relation, and %x and %y are nodes; or
+
*(%x)(%y)
+
;Syntactic relations are not predefined, although we have been using a set of binary relations based on the [[X-bar theory]].
+
;Semantic relations constitute a predefined and closed set that can be found [[relations|here]].
+
;Syntactic and semantic relations are represented in the same way:
+
*rel(%x;%y), where "rel" is the name of the relation, %x is the source node, and %y is the target node
+
;Arguments of linear, syntactic and semantic relations are not commutative.
+
:The order of the elements in a relation affects the result:
+
::(%x)(%y) is different from (%y)(%x)
+
::relation(%x;%y) is different from relation(%y;%x)
+
;Linear and semantic relations are always binary; syntactic relations may be n-ary:
+
:L(%x;%y) - linear relation
+
:agt(%x;%y) - semantic relation
+
:VH(%x) - unary syntactic relation
+
:VC(%x;%y) - binary syntactic relation
+
:XX(%x;%y;%z) - possible ternary syntactic relation
+
;Inside each relation, nodes are isolated by semicolon (;).
+
:VC(%x;%y)
+
:<strike>VC(%x,%y)</strike>
+
;Inside each relation, nodes may be referenced by any of its elements, isolated by comma (,):
+
:("a")([b]) - linear relation between a node where string = "a" and another node where headword = [b]
+
:L(<nowiki>[[c]]</nowiki>;D) - linear relation between a node where UW = <nowiki>[[c]]</nowiki> and another node having the feature D
+
:VC(%a;%b) - syntactic relation between a node where index = %a and another node where index = %b
+
:agt("a",[a],<nowiki>[[a]]</nowiki>,A;"b",[b],<nowiki>[[b]]</nowiki>,B) - semantic relation between a node having the feature A where string = "a" AND headword "a" AND UW = <nowiki>[[a]]</nowiki> AND another node having the feature B where string = "b" AND headword = [b] AND UW = <nowiki>[[b]]</nowiki>
+
;Relations may be conjoined through juxtaposition:
+
:("a")("b")("c") - two linear relations: one between ("a") and ("b") AND other between ("b") and ("c")
+
:agt(%x;%y)obj(%x;%z) - two semantic relations: one between (%x) and (%y) AND other between (%x) and (%z)
+
:<strike>VC([a];[b]),VC([a];[c])</strike> - conjoined relations must not be isolated by comma
+
;Relations may be disjoined through {braces}
+
:{("a")|("b")}("c") - either ("a")("c") or ("b")("c")
+
:{agt(%x;%y)|exp(%x;%y)}obj(%x;%z) - either agt(%x;%y)obj(%x;%z) or exp(%x;%y)obj(%x;%z)
+
;Syntactic and semantic relations may be replaced by regular expressions
+
:/.{2,3}/(%x;%y) - any relation made of two or three characters between %x and %y
+
 
+
=== Hyper-nodes ===
+
Nodes may contain one or more relations. In this case, they are said to be "hyper-nodes", and represent scopes or sub-graphs. As any node, hyper-nodes contain a string, a headword, a UW, an index and features, of which the internal relations are a special type. Examples of hyper-nodes are the following:
+
*(("a")("b")) - a hyper-node containing a linear relation between the nodes ("a") and ("b")
+
*(VC(%x;%y)VA(%x;%z)) - a hyper-node containing two syntactic relations: VC(%x;%y)AND VA(%x;%z)
+
*(agt([a];[b])obj([a];[c])) - a hyper-node containing two semantic relations: agt([a];[b]) AND obj([a];[c])
+
*(([kick],V)([the],D)([bucket],N),V,NTST) - a hyper-node having the features N and NTST and containing two linear relations: one between the nodes ([kick],V) and ([the],D), and other between ([the],D) and [bucket],N)
+
*(([kick],V)([the],D)([bucket],N),"kick the bucket",<nowiki>[[die]]</nowiki>,V,NTST) - the same as before, except for the fact that the hyper-node has string = "kick the bucket" and UW = <nowiki>[[die]]</nowiki>
+
Hyper-nodes may also contain internal hyper-nodes:
+
*((("a")("b"))("c")) - a hyper-node containing a linear relation between the hyper-node (("a")("b")) and the node ("c")
+
==== Properties of hyper-nodes ====
+
;As any node, hyper-nodes are expressed between (parentheses)
+
:(("a")("b"))
+
;As any node, hyper-nodes may have one single string, one single headword and one single UW, but may have as many features and internal relations as necessary
+
:(([kick],V)([the],D)([bucket],N),"kick the bucket",[kick the bucket],<nowiki>[[die]]</nowiki>,V,NTST)
+
;As any node, hyper-nodes may be referenced by any of its elements, including internal relations
+
:(([kick],V)) - refers to any hyper-node containing the node ([kick],V)
+
:(([the],D)([bucket],N)) - refers to any hyper-node containing a linear relation between ([the],D) AND ([bucket],N)
+
:(([kick],D),([bucket],N)) - refers to any hyper-node containing the nodes ([kick],V) AND ([bucket],N)
+
;When a hyper-node is deleted, all its internal relations are deleted as well
+
:(([kick],V)([the],D)([bucket],N)):=; (the hyper-node is deleted, as well as the relations ([kick],V)([the],D) AND ([the],D)([bucket],N))
+
 
+
=== Hyper-relations ===
+
Relations may have relations as arguments. In this case, they are said to be "hyper-relations". Examples of hyper-relations are the following:
+
*XP(XB(%a;%b);%c) - a syntactic relation XP between the syntactic relation XB(%a;%b) and the node %c
+
*and(agt([a];[b]);agt([a];[c])) - a semantic relation "and" between the semantic relations agt([a];[b]) AND agt([a];[c])
+
==== Properties of hyper-relations ====
+
;A hyper-relation may have one single relation as each argument
+
*XP(XB(%a;%b);%c) - the source argument of the hyper-relation XP is a relation
+
*XP(%a;XB(%b;%c)) - the target argument of the hyper-relation XP is a relation
+
*XP(VC(%a;%b);VA(%a;%c)) - the source and the target argument of the hyper-relation XP are relations
+
*<strike>XP(VC(%a;%b)VA(%a;%c);VS(%a;%d))</strike> - a hyper-relation may not have more than one relation as one single argument (in this case, the hyper-relation XP contained two relations as the source argument)
+
;Relations do not have strings, UWs, headwords or any features
+
*<strike>XP(XB(%a;%b),"ab",[ab],<nowiki>[[ab]]</nowiki>,A,B;%c)</strike> (the relation XB(%a;%b) may not have strings, UWs, headwords or any features)
+
 
+
== Types of rules ==
+
 
+
In the UNL Grammar there are three basic types of rules:
+
 
+
=== Normalization Rules ===
+
(main article: [[N-Rule]]s
+
Used to normalize the natural language input and to segment natural language texts into sentences.
+
 
+
=== Transformation rules ===
+
(main article: [[T-Rule]]s
+
Used to generate natural language sentences out of UNL graphs and vice-versa.
+
 
+
=== Disambiguation rules ===
+
(main article: [[D-rule]]s
+
Used to improve the performance of transformation rules by constraining their applicability.
+
 
+
The Segmentation Rules and Transformation Rules follow the very general formalism
+
 
+
α:=β;
+
 
+
where the left side α is a condition statement, and the right side β is an action to be performed over α.
+
 
+
The Disambiguation Rules, which were directly inspired by the UNL Centre's former co-occurrence dictionary and knowledge base, follows a slightly different formalism:
+
 
+
α=P;
+
 
+
where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.
+
 
+
== Notes ==
+
<references />
+

Revision as of 17:17, 31 May 2013

  1. REDIRECT Grammar
Software