@@ Line 1: / Line 1: @@
-UNL grammars are sets of rules for translating UNL expressions into natural language (NL) sentences and NL sentences into UNL expressions. They are normally unidirectional, i.e., the [[UNL-ization]] grammar (NL-to-UNL) is different from the [[NL-ization]] grammar (UNL-to-NL), even though they share the same basic syntax.
+#REDIRECT [[Grammar]]
-== Basic symbols ==
-{| border="1" cellpadding="2" align=center
-|+Basic symbols used in UNL grammar rules
-!Symbol
-!Definition
-!Example
-|-
-|align=center|<nowiki>^</nowiki>
-|not
-|^a = not a
-|-
-|align=center|{ | }
-|or
-|<nowiki>{a|b}</nowiki> = a or b
-|-
-|align=center|%
-|index for nodes, attributes and values
-|%x (see [[#Indexes|below]])
-|-
-|align=center|#
-|index for sub-NLWs
-|#01 (see [[#Indexes|below]])
-|-
-|align=center|=
-|attribute-value assignment
-|POS=NOU
-|-
-|align=center|!
-|rule trigger
-|!PLR
-|-
-|align=center|&
-|merge operator
-|%x&%y
-|-
-|align=center|?
-|dictionary lookup operator
-|?[a]
-|-
-|align=center|“ “
-|string
-|"went"
-|-
-|align=center|[ ]
-|natural language entry (headword)
-|[go]
-|-
-|align=center|[[ ]]
-|UW
-|[[to go(icl>to move)]]
-|-
-|align=center|( )
-|node
-|(a)
-|-
-|align=center|//
-|regular expression
-|/a{2,3}/ = aa,aaa
-|}
-;The differences between "", [] and [[]]
-:Double quotes are always used to represent strings: "a" will match only the string "a"
-:Simple square brackets are always used to represent natural language entries (headwords) in the dictionary: [a] will match the node associated to the entry [a] retrieved from the dictionary, no matter its current realization, which may be affected by other rules (the original [a] may have been replaced, for instance, by "b", but will still be indexed to the entry [a])
-:Double square brackets are always used to represent UWs: <nowiki>[[a]]</nowiki> will match the node associated to the UW <nowiki>[[a]]</nowiki>
-;Predefined values (assigned by default)
-:SCOPE - Scope
-:SHEAD - Sentence head (the beginning of a sentence)
-:STAIL - Sentence tail (the end of a sentence)
-:CHEAD - Scope head (the beginning of a scope)
-:CTAIL - Scope tail (the end of a scope)
-:TEMP - Temporary entry (entry not found in the dictionary)
-:DIGIT - Any sequence of digits (i.e.: 0,1,2,3,4,5,6,7,8,9)
-== Basic concepts ==
-=== Nodes ===
-A node is the most elementary unit in the grammar. It is the result of the [[tokenization]] process, and corresponds to the notion of "lexical item", to be represented by dictionary entries. At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes. Any node is a vector (one-dimensional array) containing the following necessary elements:
-*a string, to be represented between "quotes", which expresses the actual state of the node;
-*a headword, to be represented between [square brackets], which expresses the original value of the node in the dictionary;
-*a UW, to be represented between <nowiki>[[double square brackets]]</nowiki>, which expresses the UW value of the node;
-*a feature or set of features, which express the features of the node;
-*an [[#Indexes|index]], preceded by the symbol %, which is used to reference the node;
-Examples of nodes are
-*("ing") (a node making reference only to its actual string value)
-*([book]) (a node making reference only to its headword,i.e., its original state in the dictionary)
-*([[book(icl>document)]]) (a node making reference only to its UW value)
-*(NUM) (a node making reference only to one of its features)
-*(POS=NOU) (a node making reference only to one of its features in the attribute-value pair format)
-*(%x) (a node making reference only to its unique index)
-*("string",[headword],<nowiki>[[UW]]</nowiki>,feature1,feature2,...,attribute1=value1,attribute2=value2,...,%x) (complete node)
-==== Properties of nodes ====
-;Nodes are enclosed between (parentheses)
-:("a") is a node
-:"a" is not a note
-;The elements of a node are separated by comma
-:("a",[a],<nowiki>[[a]]</nowiki>,A,B,A=C,%a)
-;The order of elements inside a node is not relevant.
-:("a",[a],<nowiki>[[a]]</nowiki>,A,B,A=C,%a) is the same as (<nowiki>[[a]]</nowiki>,B,A,"a",[a],A=C,%a)
-;Nodes may have one single string, headword, UW and index, but may have as many features as necessary
-:<strike>("a","b")</strike> (a node may not contain more than one string)
-:<strike>([a],[b])</strike> (a node may not contain more than one headword)
-:<strike>(<nowiki>[[a]]</nowiki>,<nowiki>[[b]]</nowiki>)</strike> (a node may not contain more than one UW)
-:<strike>(%a,%b)</strike> (a node may not contain more than one index)
-:(A,B,C,D,...,Z) (a node may contain as many features as necessary)
-;A node may be referred by any of its elements
-:("a") refers to all nodes where actual string = "a"
-:([a]) refers to all nodes where headword = [a]
-:(<nowiki>[[a]]</nowiki>) refers to all nodes where UW = <nowiki>[[a]]</nowiki>
-:(A) refers to all nodes having the feature A
-:("a",[a],<nowiki>[[a]]</nowiki>,A) refers to all nodes having the feature A where string = "a" and headword = [a] and UW = <nowiki>[[a]]</nowiki>
-;Nodes are automatically indexed according to a position-based system if no explicit index is provided (see [[#Indexes|Index]])
-:("a")("b") is actually ("a",%01)("b",%02)
-;[[Regular expressions]] may be used to make reference to any element of the node, except the index
-:("/a{2,3}/") refers to all nodes where string is a sequence of 2 to 3 characters "a"
-:([/a{2,3}/]) refers to all nodes where headword is a sequence of 2 to 3 characters "a"
-:([[/a{2,3}/]]) refers to all nodes where UW is a sequence of 2 to 3 characters "a"
-:(/a{2,3}/) refers to all nodes having a feature that is a sequence of 2 to 3 characters "a"
-;Nodes may contain disjoint features enclosed between {braces} and separated by comma
-:({A|B}) refers to all nodes having the feature A OR B
-;Node features may be expressed as simple attributes, or attribute-value pairs:
-:(MCL) - feature as an attribute: refers to all nodes having the feature MCL
-:(GEN=MCL) - feature as an attribute-value pair, which is the same as (GEN,MCL): refers to all nodes having the features GEN and MCL.
-Attribute-value pairs may be used to create co-reference between different nodes (as in agreement):
-:(%x,GEN)(%y,GEN=%x) - the value of the attribute GEN of the node %x is the same of the attribute GEN of the node %y (see [[#Index|Index]])
-=== Relations ===
-In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there can be three different types of relations:
-*the '''linear''' relation L expresses the surface structure of natural language sentences
-*'''syntactic''' relations express the deep (tree) structure of natural language sentences
-*'''semantic''' relations express the structure of UNL graphs
-==== Properties of relations ====
-;The linear relation is always binary and is represented in two possible formats:
-*L(%x;%y), where L is the invariant name of the linear relation, and %x and %y are nodes; or
-*(%x)(%y)
-;Syntactic relations are not predefined, although we have been using a set of binary relations based on the [[X-bar theory]].
-;Semantic relations constitute a predefined and closed set that can be found [[relations|here]].
-;Syntactic and semantic relations are represented in the same way:
-*rel(%x;%y), where "rel" is the name of the relation, %x is the source node, and %y is the target node
-;Arguments of linear, syntactic and semantic relations are not commutative.
-:The order of the elements in a relation affects the result:
-::(%x)(%y) is different from (%y)(%x)
-::relation(%x;%y) is different from relation(%y;%x)
-;Linear and semantic relations are always binary; syntactic relations may be n-ary:
-:L(%x;%y) - linear relation
-:agt(%x;%y) - semantic relation
-:VH(%x) - unary syntactic relation
-:VC(%x;%y) - binary syntactic relation
-:XX(%x;%y;%z) - possible ternary syntactic relation
-;Inside each relation, nodes are isolated by semicolon (;).
-:VC(%x;%y)
-:<strike>VC(%x,%y)</strike>
-;Inside each relation, nodes may be referenced by any of its elements, isolated by comma (,):
-:("a")([b]) - linear relation between a node where string = "a" and another node where headword = [b]
-:L(<nowiki>[[c]]</nowiki>;D) - linear relation between a node where UW = <nowiki>[[c]]</nowiki> and another node having the feature D
-:VC(%a;%b) - syntactic relation between a node where index = %a and another node where index = %b
-:agt("a",[a],<nowiki>[[a]]</nowiki>,A;"b",[b],<nowiki>[[b]]</nowiki>,B) - semantic relation between a node having the feature A where string = "a" AND headword "a" AND UW = <nowiki>[[a]]</nowiki> AND another node having the feature B where string = "b" AND headword = [b] AND UW = <nowiki>[[b]]</nowiki>
-;Relations may be conjoined through juxtaposition:
-:("a")("b")("c") - two linear relations: one between ("a") and ("b") AND other between ("b") and ("c")
-:agt(%x;%y)obj(%x;%z) - two semantic relations: one between (%x) and (%y) AND other between (%x) and (%z)
-:<strike>VC([a];[b]),VC([a];[c])</strike> - conjoined relations must not be isolated by comma
-;Relations may be disjoined through {braces}
-:{("a")|("b")}("c") - either ("a")("c") or ("b")("c")
-:{agt(%x;%y)|exp(%x;%y)}obj(%x;%z) - either agt(%x;%y)obj(%x;%z) or exp(%x;%y)obj(%x;%z)
-;Syntactic and semantic relations may be replaced by regular expressions
-:/.{2,3}/(%x;%y) - any relation made of two or three characters between %x and %y
-=== Hyper-nodes ===
-Nodes may contain one or more relations. In this case, they are said to be "hyper-nodes", and represent scopes or sub-graphs. As any node, hyper-nodes contain a string, a headword, a UW, an index and features, of which the internal relations are a special type. Examples of hyper-nodes are the following:
-*(("a")("b")) - a hyper-node containing a linear relation between the nodes ("a") and ("b")
-*(VC(%x;%y)VA(%x;%z)) - a hyper-node containing two syntactic relations: VC(%x;%y)AND VA(%x;%z)
-*(agt([a];[b])obj([a];[c])) - a hyper-node containing two semantic relations: agt([a];[b]) AND obj([a];[c])
-*(([kick],V)([the],D)([bucket],N),V,NTST) - a hyper-node having the features N and NTST and containing two linear relations: one between the nodes ([kick],V) and ([the],D), and other between ([the],D) and [bucket],N)
-*(([kick],V)([the],D)([bucket],N),"kick the bucket",<nowiki>[[die]]</nowiki>,V,NTST) - the same as before, except for the fact that the hyper-node has string = "kick the bucket" and UW = <nowiki>[[die]]</nowiki>
-Hyper-nodes may also contain internal hyper-nodes:
-*((("a")("b"))("c")) - a hyper-node containing a linear relation between the hyper-node (("a")("b")) and the node ("c")
-==== Properties of hyper-nodes ====
-;As any node, hyper-nodes are expressed between (parentheses)
-:(("a")("b"))
-;As any node, hyper-nodes may have one single string, one single headword and one single UW, but may have as many features and internal relations as necessary
-:(([kick],V)([the],D)([bucket],N),"kick the bucket",[kick the bucket],<nowiki>[[die]]</nowiki>,V,NTST)
-;As any node, hyper-nodes may be referenced by any of its elements, including internal relations
-:(([kick],V)) - refers to any hyper-node containing the node ([kick],V)
-:(([the],D)([bucket],N)) - refers to any hyper-node containing a linear relation between ([the],D) AND ([bucket],N)
-:(([kick],D),([bucket],N)) - refers to any hyper-node containing the nodes ([kick],V) AND ([bucket],N)
-;When a hyper-node is deleted, all its internal relations are deleted as well
-:(([kick],V)([the],D)([bucket],N)):=; (the hyper-node is deleted, as well as the relations ([kick],V)([the],D) AND ([the],D)([bucket],N))
-=== Hyper-relations ===
-Relations may have relations as arguments. In this case, they are said to be "hyper-relations". Examples of hyper-relations are the following:
-*XP(XB(%a;%b);%c) - a syntactic relation XP between the syntactic relation XB(%a;%b) and the node %c
-*and(agt([a];[b]);agt([a];[c])) - a semantic relation "and" between the semantic relations agt([a];[b]) AND agt([a];[c])
-==== Properties of hyper-relations ====
-;A hyper-relation may have one single relation as each argument
-*XP(XB(%a;%b);%c) - the source argument of the hyper-relation XP is a relation
-*XP(%a;XB(%b;%c)) - the target argument of the hyper-relation XP is a relation
-*XP(VC(%a;%b);VA(%a;%c)) - the source and the target argument of the hyper-relation XP are relations
-*<strike>XP(VC(%a;%b)VA(%a;%c);VS(%a;%d))</strike> - a hyper-relation may not have more than one relation as one single argument (in this case, the hyper-relation XP contained two relations as the source argument)
-;Relations do not have strings, UWs, headwords or any features
-*<strike>XP(XB(%a;%b),"ab",[ab],<nowiki>[[ab]]</nowiki>,A,B;%c)</strike> (the relation XB(%a;%b) may not have strings, UWs, headwords or any features)
-== Types of rules ==
-In the UNL Grammar there are three basic types of rules:
-=== Normalization Rules ===
-(main article: [[N-Rule]]s
-Used to normalize the natural language input and to segment natural language texts into sentences.
-=== Transformation rules ===
-(main article: [[T-Rule]]s
-Used to generate natural language sentences out of UNL graphs and vice-versa.
-=== Disambiguation rules ===
-(main article: [[D-rule]]s
-Used to improve the performance of transformation rules by constraining their applicability.
-The Segmentation Rules and Transformation Rules follow the very general formalism
- α:=β;
-where the left side α is a condition statement, and the right side β is an action to be performed over α.
-The Disambiguation Rules, which were directly inspired by the UNL Centre's former co-occurrence dictionary and knowledge base, follows a slightly different formalism:
- α=P;
-where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.
-== Indexes ==
-;Indexes (%) are used for co-indexing nodes, attributes and values inside and between the left and the right side of transformation rules.
-:X(%a;)Y(%a;) (the first node of X is also the first node of Y)
-:X(%a;%b):=Y(%b;%a); (the first node of X becomes the second node of Y, and the second node of X becomes the first node of Y)
-:X(%a;)Y(%a;):=Z(%a); (if the first node of X is the first node of Y then make it the single node of Z)
-<blockquote>Any co-indexation is made by the use of indexes and not by the repetition of features. In that sense, '''X(A;)Y(A;)''' is different from '''X(%a;)Y(%a;)'''. In the former case, the first node of X is not necessarily the first node of Y, they only share the same feature A; in the latter case, the first node of X is necessarily the first node of Y.</blockquote>
-;Indexes are made of any sequence of alphanumeric characters and underscore:
-:%index
-:%a
-:%first_index
-:%a1
-:<strike>%first index</strike> (no blank spaces are allowed)
-<blockquote>%01 (numbers are used for default indexation and must be avoided - see below)</blockquote>
-;Default indexation
-:If omitted, indexes are assigned by default, according to the following rules:
-:Default indexes are assigned from left to right in each side of the rule according to the position of the nodes:
-::X(A;B)Y(C;D) is the same as X('''%01''',A;'''%02''',B)Y('''%03''',C;'''%04''',D)
-:Default indexation is done only for non-indexed nodes (i.e., user-defined indexes prevail over indexes assigned by default):
-::X(A,%A;B)Y(C,%C;D) is the same as X(A,%A;B,'''%02''')Y(C,%C;'''%04''',D)
-:::(Notice that the user-defined indexes %A and %C are preserved and not replaced by default indexes)
-:In default indexation, left-side nodes are automatically co-indexed with right-side nodes '''if and only if''' their position and number are the same:
-::X(A;B):=Y(C;D); is the same as X('''%01''',A;'''%02''',B):=Y('''%01''',C;'''%02''',D);
-::X(A;B):=Y(C;D;E); is the same as X('''%01''',A;'''%02''',B):=Y('''%03''',C;'''%04''',D;'''%05''',E);
-:::(there is no co-indexation between the left and the right side in the latter case, because the number of the nodes is not the same)
-:Default indexes are also assigned to hyper-nodes and sub-nodes
-::(((A))):=(((B))); is the same as (%01(%01%01(%01%01%01,A))):=(%01(%01%01(%01%01%01,B)));
-:In default indexation, sub-nodes are informed by the syntax <PARENT NODE><CHILD NODE>, where <PARENT NODE> may be, itself, a sub-node:
-::X(Y(A;B);C) is the same as X('''%01''',Y('''%01%01''',A;'''%01%02''',B);'''%02''')
-:::%01 = Y(A;B), %02 = C, %01%01 = A, %01%02 = B
-::X(Y(Z(A;B);C);D) is the same as X('''%01''',Y('''%01%01''',Z('''%01%01%01''',A;'''%01%01%02''',B);'''%01%02''',C);'''%02''',D)
-:::%01 = Y(Z(A;B);C), %02 = D, %01%01 = Z(A;B), %01%02 = C, %01%01%01 = A, %01%01%02 = B
-;Non-indexed nodes in the right side means ADDITION, whereas left-side nodes that are not referred to in the right side means DELETION
-:X(%a;%b):=Y(%a;X;%b); is the same as X(%a;%b):=Y(%a;'''%02''',X,;%b); (it means that a new node with the feature X will be created for the relation Y)
-:X(%a;%b;%c):=Y(%a;%c); (it means that the second node of X will be deleted from the relation Y)
-;Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE:
-:X(A,%a,ATT1=VAL1;B,%b):=X(%a;%b,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)
-;Special indexes (#) are used to make reference to the internal structure of the field <NLW> in the dictionary
-:(X)(Y):=(X,#02)(Y)(X,#01);
-::The rule above is used for complex dictionary entries such as:
-:::[[A][B]] "uw" (X, #01(ATT=AAA), #02(ATT=BBB)) <flg,fre,pri>;
-::It means that, given (X)(Y), the output should be (B)(Y)(A).
-== Notes ==
-<references />

Grammar Specs: Difference between revisions

Latest revision as of 17:11, 19 August 2013

Navigation menu

Search