Grammar Specs: Difference between revisions

From UNLwiki
Jump to navigationJump to search
imported>Martins
No edit summary
imported>Martins
No edit summary
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
UNL grammars are sets of rules for translating UNL expressions into natural language (NL) sentences and NL sentences into UNL expressions. They are normally unidirectional, i.e., the [[UNL-ization]] grammar (NL-to-UNL) is different from the [[NL-ization]] grammar (UNL-to-NL), even though they share the same basic syntax.
#REDIRECT [[Grammar]]
 
== Basic symbols ==
 
{| border="1" cellpadding="2" align=center
|+Basic symbols used in UNL grammar rules
!Symbol
!Definition
!Example
|-
|align=center|<nowiki>^</nowiki>
|not
|^a = not a
|-
|align=center|{ | }
|or
|<nowiki>{a|b}</nowiki> = a or b
|-
|align=center|%
|index for nodes, attributes and values
|%x (see [[#Indexes|below]])
|-
|align=center|#
|index for sub-NLWs
|#01 (see [[#Indexes|below]])
|-
|align=center|=
|attribute-value assignment
|POS=NOU
|-
|align=center|!
|rule trigger
|!PLR
|-
|align=center|&
|merge operator
|%x&%y
|-
|align=center|?
|dictionary lookup operator
|?[a]
|-
|align=center|“ “
|string
|"went"
|-
|align=center|[ ]
|natural language entry (headword)
|[go]
|-
|align=center|[[ ]]
|UW
|[[to go(icl>to move)]]
|-
|align=center|( )
|node
|(a)
|-
|align=center|//
|regular expression
|/a{2,3}/ = aa,aaa
|}
 
;The differences between "", [] and [[]]
:Double quotes are always used to represent strings: "a" will match only the string "a"
:Simple square brackets are always used to represent natural language entries (headwords) in the dictionary: [a] will match the node associated to the entry [a] retrieved from the dictionary, no matter its current realization, which may be affected by other rules (the original [a] may have been replaced, for instance, by "b", but will still be indexed to the entry [a])
:Double square brackets are always used to represent UWs: <nowiki>[[a]]</nowiki> will match the node associated to the UW <nowiki>[[a]]</nowiki>
 
;Predefined values (assigned by default)
:SCOPE - Scope
:SHEAD - Sentence head (the beginning of a sentence)
:STAIL - Sentence tail (the end of a sentence)
:CHEAD - Scope head (the beginning of a scope)
:CTAIL - Scope tail (the end of a scope)
:TEMP - Temporary entry (entry not found in the dictionary)
:DIGIT - Any sequence of digits (i.e.: 0,1,2,3,4,5,6,7,8,9)
 
== Basic concepts ==
=== Nodes ===
A node is the most elementary unit in the grammar. It is the result of the [[tokenization]] process, and corresponds to the notion of "lexical item", to be represented by dictionary entries. At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes. Any node is a vector (one-dimensional array) containing the following necessary elements:
*a string, to be represented between "quotes", which expresses the actual state of the node;
*a headword, to be represented between [square brackets], which expresses the original value of the node in the dictionary;
*a UW, to be represented between <nowiki>[[double square brackets]]</nowiki>, which expresses the UW value of the node;
*a feature or set of features, which express the features of the node;
*an [[#Indexes|index]], preceded by the symbol %, which is used to reference the node;
Examples of nodes are
*("ing") (a node making reference only to its actual string value)
*([book]) (a node making reference only to its headword,i.e., its original state in the dictionary)
*([[book(icl>document)]]) (a node making reference only to its UW value)
*(NUM) (a node making reference only to one of its features)
*(POS=NOU) (a node making reference only to one of its features in the attribute-value pair format)
*(%x) (a node making reference only to its unique index)
*("string",[headword],<nowiki>[[UW]]</nowiki>,feature1,feature2,...,attribute1=value1,attribute2=value2,...,%x) (complete node)
==== Properties of nodes ====
;Nodes are enclosed between (parentheses)
:("a") is a node
:"a" is not a note
;The elements of a node are separated by comma
:("a",[a],<nowiki>[[a]]</nowiki>,A,B,A=C,%a)
;The order of elements inside a node is not relevant.
:("a",[a],<nowiki>[[a]]</nowiki>,A,B,A=C,%a) is the same as (<nowiki>[[a]]</nowiki>,B,A,"a",[a],A=C,%a)
;Nodes may have one single string, headword, UW and index, but may have as many features as necessary
:<strike>("a","b")</strike> (a node may not contain more than one string)
:<strike>([a],[b])</strike> (a node may not contain more than one headword)
:<strike>(<nowiki>[[a]]</nowiki>,<nowiki>[[b]]</nowiki>)</strike> (a node may not contain more than one UW)
:<strike>(%a,%b)</strike> (a node may not contain more than one index)
:(A,B,C,D,...,Z) (a node may contain as many features as necessary)
;A node may be referred by any of its elements
:("a") refers to all nodes where actual string = "a"
:([a]) refers to all nodes where headword = [a]
:(<nowiki>[[a]]</nowiki>) refers to all nodes where UW = <nowiki>[[a]]</nowiki>
:(A) refers to all nodes having the feature A
:("a",[a],<nowiki>[[a]]</nowiki>,A) refers to all nodes having the feature A where string = "a" and headword = [a] and UW = <nowiki>[[a]]</nowiki>
;Nodes are automatically indexed according to a position-based system if no explicit index is provided (see [[#Indexes|Index]])
:("a")("b") is actually ("a",%01)("b",%02)
;[[Regular expressions]] may be used to make reference to any element of the node, except the index
:("/a{2,3}/") refers to all nodes where string is a sequence of 2 to 3 characters "a"
:([/a{2,3}/]) refers to all nodes where headword is a sequence of 2 to 3 characters "a"
:([[/a{2,3}/]]) refers to all nodes where UW is a sequence of 2 to 3 characters "a"
:(/a{2,3}/) refers to all nodes having a feature that is a sequence of 2 to 3 characters "a"
;Nodes may contain disjoint features enclosed between {braces} and separated by comma
:({A|B}) refers to all nodes having the feature A OR B
;Node features may be expressed as simple attributes, or attribute-value pairs:
:(MCL) - feature as an attribute: refers to all nodes having the feature MCL
:(GEN=MCL) - feature as an attribute-value pair, which is the same as (GEN,MCL): refers to all nodes having the features GEN and MCL.
Attribute-value pairs may be used to create co-reference between different nodes (as in agreement):
:(%x,GEN)(%y,GEN=%x) - the value of the attribute GEN of the node %x is the same of the attribute GEN of the node %y (see [[#Index|Index]])
 
=== Relations ===
In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there can be three different types of relations:
*the '''linear''' relation L expresses the surface structure of natural language sentences
*'''syntactic''' relations express the deep (tree) structure of natural language sentences
*'''semantic''' relations express the structure of UNL graphs
==== Properties of relations ====
;The linear relation is always binary and is represented in two possible formats:
*L(%x;%y), where L is the invariant name of the linear relation, and %x and %y are nodes; or
*(%x)(%y)
;Syntactic relations are not predefined, although we have been using a set of binary relations based on the [[X-bar theory]].
;Semantic relations constitute a predefined and closed set that can be found [[relations|here]].
;Syntactic and semantic relations are represented in the same way:
*rel(%x;%y), where "rel" is the name of the relation, %x is the source node, and %y is the target node
;Arguments of linear, syntactic and semantic relations are not commutative.
:The order of the elements in a relation affects the result:
::(%x)(%y) is different from (%y)(%x)
::relation(%x;%y) is different from relation(%y;%x)
;Linear and semantic relations are always binary; syntactic relations may be n-ary:
:L(%x;%y) - linear relation
:agt(%x;%y) - semantic relation
:VH(%x) - unary syntactic relation
:VC(%x;%y) - binary syntactic relation
:XX(%x;%y;%z) - possible ternary syntactic relation
;Inside each relation, nodes are isolated by semicolon (;).
:VC(%x;%y)
:<strike>VC(%x,%y)</strike>
;Inside each relation, nodes may be referenced by any of its elements, isolated by comma (,):
:("a")([b]) - linear relation between a node where string = "a" and another node where headword = [b]
:L(<nowiki>[[c]]</nowiki>;D) - linear relation between a node where UW = <nowiki>[[c]]</nowiki> and another node having the feature D
:VC(%a;%b) - syntactic relation between a node where index = %a and another node where index = %b
:agt("a",[a],<nowiki>[[a]]</nowiki>,A;"b",[b],<nowiki>[[b]]</nowiki>,B) - semantic relation between a node having the feature A where string = "a" AND headword "a" AND UW = <nowiki>[[a]]</nowiki> AND another node having the feature B where string = "b" AND headword = [b] AND UW = <nowiki>[[b]]</nowiki>
;Relations may be conjoined through juxtaposition:
:("a")("b")("c") - two linear relations: one between ("a") and ("b") AND other between ("b") and ("c")
:agt(%x;%y)obj(%x;%z) - two semantic relations: one between (%x) and (%y) AND other between (%x) and (%z)
:<strike>VC([a];[b]),VC([a];[c])</strike> - conjoined relations must not be isolated by comma
;Relations may be disjoined through {braces}
:{("a")|("b")}("c") - either ("a")("c") or ("b")("c")
:{agt(%x;%y)|exp(%x;%y)}obj(%x;%z) - either agt(%x;%y)obj(%x;%z) or exp(%x;%y)obj(%x;%z)
;Syntactic and semantic relations may be replaced by regular expressions
:/.{2,3}/(%x;%y) - any relation made of two or three characters between %x and %y
 
=== Hyper-nodes ===
Nodes may contain one or more relations. In this case, they are said to be "hyper-nodes", and represent scopes or sub-graphs. As any node, hyper-nodes contain a string, a headword, a UW, an index and features, of which the internal relations are a special type. Examples of hyper-nodes are the following:
*(("a")("b")) - a hyper-node containing a linear relation between the nodes ("a") and ("b")
*(VC(%x;%y)VA(%x;%z)) - a hyper-node containing two syntactic relations: VC(%x;%y)AND VA(%x;%z)
*(agt([a];[b])obj([a];[c])) - a hyper-node containing two semantic relations: agt([a];[b]) AND obj([a];[c])
*(([kick],V)([the],D)([bucket],N),V,NTST) - a hyper-node having the features N and NTST and containing two linear relations: one between the nodes ([kick],V) and ([the],D), and other between ([the],D) and [bucket],N)
*(([kick],V)([the],D)([bucket],N),"kick the bucket",<nowiki>[[die]]</nowiki>,V,NTST) - the same as before, except for the fact that the hyper-node has string = "kick the bucket" and UW = <nowiki>[[die]]</nowiki>
Hyper-nodes may also contain internal hyper-nodes:
*((("a")("b"))("c")) - a hyper-node containing a linear relation between the hyper-node (("a")("b")) and the node ("c")
==== Properties of hyper-nodes ====
;As any node, hyper-nodes are expressed between (parentheses)
:(("a")("b"))
;As any node, hyper-nodes may have one single string, one single headword and one single UW, but may have as many features and internal relations as necessary
:(([kick],V)([the],D)([bucket],N),"kick the bucket",[kick the bucket],<nowiki>[[die]]</nowiki>,V,NTST)
;As any node, hyper-nodes may be referenced by any of its elements, including internal relations
:(([kick],V)) - refers to any hyper-node containing the node ([kick],V)
:(([the],D)([bucket],N)) - refers to any hyper-node containing a linear relation between ([the],D) AND ([bucket],N)
:(([kick],D),([bucket],N)) - refers to any hyper-node containing the nodes ([kick],V) AND ([bucket],N)
;When a hyper-node is deleted, all its internal relations are deleted as well
:(([kick],V)([the],D)([bucket],N)):=; (the hyper-node is deleted, as well as the relations ([kick],V)([the],D) AND ([the],D)([bucket],N))
 
=== Hyper-relations ===
Relations may have relations as arguments. In this case, they are said to be "hyper-relations". Examples of hyper-relations are the following:
*XP(XB(%a;%b);%c) - a syntactic relation XP between the syntactic relation XB(%a;%b) and the node %c
*and(agt([a];[b]);agt([a];[c])) - a semantic relation "and" between the semantic relations agt([a];[b]) AND agt([a];[c])
==== Properties of hyper-relations ====
;A hyper-relation may have one single relation as each argument
*XP(XB(%a;%b);%c) - the source argument of the hyper-relation XP is a relation
*XP(%a;XB(%b;%c)) - the target argument of the hyper-relation XP is a relation
*XP(VC(%a;%b);VA(%a;%c)) - the source and the target argument of the hyper-relation XP are relations
*<strike>XP(VC(%a;%b)VA(%a;%c);VS(%a;%d))</strike> - a hyper-relation may not have more than one relation as one single argument (in this case, the hyper-relation XP contained two relations as the source argument)
;Relations do not have strings, UWs, headwords or any features
*<strike>XP(XB(%a;%b),"ab",[ab],<nowiki>[[ab]]</nowiki>,A,B;%c)</strike> (the relation XB(%a;%b) may not have strings, UWs, headwords or any features)
 
== Types of rules ==
 
In the UNL Grammar there are three basic types of rules:
 
=== Normalization Rules ===
(main article: [[N-Rule]]s
Used to normalize the natural language input and to segment natural language texts into sentences.
 
=== Transformation rules ===
(main article: [[T-Rule]]s
Used to generate natural language sentences out of UNL graphs and vice-versa.
 
=== Disambiguation rules ===
(main article: [[D-rule]]s
Used to improve the performance of transformation rules by constraining their applicability.
 
The Segmentation Rules and Transformation Rules follow the very general formalism
 
α:=β;
 
where the left side α is a condition statement, and the right side β is an action to be performed over α.
 
The Disambiguation Rules, which were directly inspired by the UNL Centre's former co-occurrence dictionary and knowledge base, follows a slightly different formalism:
 
α=P;
 
where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.
 
== Notes ==
<references />

Latest revision as of 17:11, 19 August 2013

Redirect to: