|
|
| (20 intermediate revisions by the same user not shown) |
| Line 1: |
Line 1: |
| UNL grammars are sets of rules for translating UNL expressions into natural language (NL) sentences and NL sentences into UNL expressions. They are normally unidirectional, i.e., the [[UNL-ization]] grammar (NL-to-UNL) is different from the [[NL-ization]] grammar (UNL-to-NL), even though they share the same basic syntax.
| | #REDIRECT [[Grammar]] |
| | |
| == Basic symbols ==
| |
| | |
| {| border="1" cellpadding="2" align=center
| |
| |+Basic symbols used in UNL grammar rules
| |
| !Symbol
| |
| !Definition
| |
| !Example
| |
| |-
| |
| |align=center|<nowiki>^</nowiki>
| |
| |not
| |
| |^a = not a
| |
| |-
| |
| |align=center|{ | }
| |
| |or
| |
| |<nowiki>{a|b}</nowiki> = a or b
| |
| |-
| |
| |align=center|%
| |
| |index for nodes, attributes and values
| |
| |%x (see [[#Indexes|below]])
| |
| |-
| |
| |align=center|#
| |
| |index for sub-NLWs
| |
| |#01 (see [[#Indexes|below]])
| |
| |-
| |
| |align=center|=
| |
| |attribute-value assignment
| |
| |POS=NOU
| |
| |-
| |
| |align=center|!
| |
| |rule trigger
| |
| |!PLR
| |
| |-
| |
| |align=center|&
| |
| |merge operator
| |
| |%x&%y
| |
| |-
| |
| |align=center|?
| |
| |dictionary lookup operator
| |
| |?[a]
| |
| |-
| |
| |align=center|“ “
| |
| |string
| |
| |"went"
| |
| |-
| |
| |align=center|[ ]
| |
| |natural language entry (headword)
| |
| |[go]
| |
| |-
| |
| |align=center|[[ ]]
| |
| |UW
| |
| |[[to go(icl>to move)]]
| |
| |-
| |
| |align=center|( )
| |
| |node
| |
| |(a)
| |
| |-
| |
| |align=center|//
| |
| |regular expression
| |
| |/a{2,3}/ = aa,aaa
| |
| |}
| |
| | |
| ;The differences between "", [] and [[]]
| |
| :Double quotes are always used to represent strings: "a" will match only the string "a"
| |
| :Simple square brackets are always used to represent natural language entries (headwords) in the dictionary: [a] will match the node associated to the entry [a] retrieved from the dictionary, no matter its current realization, which may be affected by other rules (the original [a] may have been replaced, for instance, by "b", but will still be indexed to the entry [a])
| |
| :Double square brackets are always used to represent UWs: <nowiki>[[a]]</nowiki> will match the node associated to the UW <nowiki>[[a]]</nowiki>
| |
| | |
| ;Predefined values (assigned by default)
| |
| :SCOPE - Scope
| |
| :SHEAD - Sentence head (the beginning of a sentence)
| |
| :STAIL - Sentence tail (the end of a sentence)
| |
| :CHEAD - Scope head (the beginning of a scope)
| |
| :CTAIL - Scope tail (the end of a scope)
| |
| :TEMP - Temporary entry (entry not found in the dictionary)
| |
| :DIGIT - Any sequence of digits (i.e.: 0,1,2,3,4,5,6,7,8,9)
| |
| | |
| == Basic concepts ==
| |
| === Nodes ===
| |
| A node is the most elementary unit in the grammar. It is the result of the [[tokenization]] process, and corresponds to the notion of "lexical item", to be represented by dictionary entries. At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes. Any node is a vector (one-dimensional array) containing the following necessary elements:
| |
| *a string, to be represented between "quotes", which expresses the actual state of the node;
| |
| *a headword, to be represented between [square brackets], which expresses the original value of the node in the dictionary;
| |
| *a UW, to be represented between <nowiki>[[double square brackets]]</nowiki>, which expresses the UW value of the node;
| |
| *a feature or set of features, which express the features of the node;
| |
| *an [[#Indexes|index]], preceded by the symbol %, which is used to reference the node;
| |
| Examples of nodes are
| |
| *("ing") (a node making reference only to its actual string value)
| |
| *([book]) (a node making reference only to its headword,i.e., its original state in the dictionary)
| |
| *([[book(icl>document)]]) (a node making reference only to its UW value)
| |
| *(NUM) (a node making reference only to one of its features)
| |
| *(POS=NOU) (a node making reference only to one of its features in the attribute-value pair format)
| |
| *(%x) (a node making reference only to its unique index)
| |
| *("string",[headword],<nowiki>[[UW]]</nowiki>,feature1,feature2,...,attribute1=value1,attribute2=value2,...,%x) (complete node)
| |
| ==== Properties of nodes ====
| |
| ;Nodes are enclosed between (parentheses)
| |
| :("a") is a node
| |
| :"a" is not a note
| |
| ;The elements of a node are separated by comma
| |
| :("a",[a],<nowiki>[[a]]</nowiki>,A,B,A=C,%a)
| |
| ;The order of elements inside a node is not relevant.
| |
| :("a",[a],<nowiki>[[a]]</nowiki>,A,B,A=C,%a) is the same as (<nowiki>[[a]]</nowiki>,B,A,"a",[a],A=C,%a)
| |
| ;Nodes may have one single string, headword, UW and index, but may have as many features as necessary
| |
| :<strike>("a","b")</strike> (a node may not contain more than one string)
| |
| :<strike>([a],[b])</strike> (a node may not contain more than one headword)
| |
| :<strike>(<nowiki>[[a]]</nowiki>,<nowiki>[[b]]</nowiki>)</strike> (a node may not contain more than one UW)
| |
| :<strike>(%a,%b)</strike> (a node may not contain more than one index)
| |
| :(A,B,C,D,...,Z) (a node may contain as many features as necessary)
| |
| ;A node may be referred by any of its elements
| |
| :("a") refers to all nodes where actual string = "a"
| |
| :([a]) refers to all nodes where headword = [a]
| |
| :(<nowiki>[[a]]</nowiki>) refers to all nodes where UW = <nowiki>[[a]]</nowiki>
| |
| :(A) refers to all nodes having the feature A
| |
| :("a",[a],<nowiki>[[a]]</nowiki>,A) refers to all nodes having the feature A where string = "a" and headword = [a] and UW = <nowiki>[[a]]</nowiki>
| |
| ;Nodes are automatically indexed according to a position-based system if no explicit index is provided (see [[#Indexes|Index]])
| |
| :("a")("b") is actually ("a",%01)("b",%02)
| |
| ;[[Regular expressions]] may be used to make reference to any element of the node, except the index
| |
| :("/a{2,3}/") refers to all nodes where string is a sequence of 2 to 3 characters "a"
| |
| :([/a{2,3}/]) refers to all nodes where headword is a sequence of 2 to 3 characters "a"
| |
| :([[/a{2,3}/]]) refers to all nodes where UW is a sequence of 2 to 3 characters "a"
| |
| :(/a{2,3}/) refers to all nodes having a feature that is a sequence of 2 to 3 characters "a"
| |
| ;Nodes may contain disjoint features enclosed between {braces} and separated by comma
| |
| :({A|B}) refers to all nodes having the feature A OR B
| |
| ;Node features may be expressed as simple attributes, or attribute-value pairs:
| |
| :(MCL) - feature as an attribute: refers to all nodes having the feature MCL
| |
| :(GEN=MCL) - feature as an attribute-value pair, which is the same as (GEN,MCL): refers to all nodes having the features GEN and MCL.
| |
| Attribute-value pairs may be used to create co-reference between different nodes (as in agreement):
| |
| :(%x,GEN)(%y,GEN=%x) - the value of the attribute GEN of the node %x is the same of the attribute GEN of the node %y (see [[#Index|Index]])
| |
| | |
| === Relations ===
| |
| In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there can be three different types of relations:
| |
| *the '''linear''' relation L expresses the surface structure of natural language sentences
| |
| *'''syntactic''' relations express the deep (tree) structure of natural language sentences
| |
| *'''semantic''' relations express the structure of UNL graphs
| |
| ==== Properties of relations ====
| |
| ;The linear relation is always binary and is represented in two possible formats:
| |
| *L(%x;%y), where L is the invariant name of the linear relation, and %x and %y are nodes; or
| |
| *(%x)(%y)
| |
| ;Syntactic relations are not predefined, although we have been using a set of binary relations based on the [[X-bar theory]].
| |
| ;Semantic relations constitute a predefined and closed set that can be found [[relations|here]].
| |
| ;Syntactic and semantic relations are represented in the same way:
| |
| *rel(%x;%y), where "rel" is the name of the relation, %x is the source node, and %y is the target node
| |
| ;Arguments of linear, syntactic and semantic relations are not commutative.
| |
| :The order of the elements in a relation affects the result:
| |
| ::(%x)(%y) is different from (%y)(%x)
| |
| ::relation(%x;%y) is different from relation(%y;%x)
| |
| ;Linear and semantic relations are always binary; syntactic relations may be n-ary:
| |
| :L(%x;%y) - linear relation
| |
| :agt(%x;%y) - semantic relation
| |
| :VH(%x) - unary syntactic relation
| |
| :VC(%x;%y) - binary syntactic relation
| |
| :XX(%x;%y;%z) - possible ternary syntactic relation
| |
| ;Inside each relation, nodes are isolated by semicolon (;).
| |
| :VC(%x;%y)
| |
| :<strike>VC(%x,%y)</strike>
| |
| ;Inside each relation, nodes may be referenced by any of its elements, isolated by comma (,):
| |
| :("a")([b]) - linear relation between a node where string = "a" and another node where headword = [b]
| |
| :L(<nowiki>[[c]]</nowiki>;D) - linear relation between a node where UW = <nowiki>[[c]]</nowiki> and another node having the feature D
| |
| :VC(%a;%b) - syntactic relation between a node where index = %a and another node where index = %b
| |
| :agt("a",[a],<nowiki>[[a]]</nowiki>,A;"b",[b],<nowiki>[[b]]</nowiki>,B) - semantic relation between a node having the feature A where string = "a" AND headword "a" AND UW = <nowiki>[[a]]</nowiki> AND another node having the feature B where string = "b" AND headword = [b] AND UW = <nowiki>[[b]]</nowiki>
| |
| ;Relations may be conjoined through juxtaposition:
| |
| :("a")("b")("c") - two linear relations: one between ("a") and ("b") AND other between ("b") and ("c")
| |
| :agt(%x;%y)obj(%x;%z) - two semantic relations: one between (%x) and (%y) AND other between (%x) and (%z)
| |
| :<strike>VC([a];[b]),VC([a];[c])</strike> - conjoined relations must not be isolated by comma
| |
| ;Relations may be disjoined through {braces}
| |
| :{("a")|("b")}("c") - either ("a")("c") or ("b")("c")
| |
| :{agt(%x;%y)|exp(%x;%y)}obj(%x;%z) - either agt(%x;%y)obj(%x;%z) or exp(%x;%y)obj(%x;%z)
| |
| ;Syntactic and semantic relations may be replaced by regular expressions
| |
| :/.{2,3}/(%x;%y) - any relation made of two or three characters between %x and %y
| |
| | |
| === Hyper-nodes ===
| |
| Nodes may contain one or more relations. In this case, they are said to be "hyper-nodes", and represent scopes or sub-graphs. As any node, hyper-nodes contain a string, a headword, a UW, an index and features, of which the internal relations are a special type. Examples of hyper-nodes are the following:
| |
| *(("a")("b")) - a hyper-node containing a linear relation between the nodes ("a") and ("b")
| |
| *(VC(%x;%y)VA(%x;%z)) - a hyper-node containing two syntactic relations: VC(%x;%y)AND VA(%x;%z)
| |
| *(agt([a];[b])obj([a];[c])) - a hyper-node containing two semantic relations: agt([a];[b]) AND obj([a];[c])
| |
| *(([kick],V)([the],D)([bucket],N),V,NTST) - a hyper-node having the features N and NTST and containing two linear relations: one between the nodes ([kick],V) and ([the],D), and other between ([the],D) and [bucket],N)
| |
| *(([kick],V)([the],D)([bucket],N),"kick the bucket",<nowiki>[[die]]</nowiki>,V,NTST) - the same as before, except for the fact that the hyper-node has string = "kick the bucket" and UW = <nowiki>[[die]]</nowiki>
| |
| Hyper-nodes may also contain internal hyper-nodes:
| |
| *((("a")("b"))("c")) - a hyper-node containing a linear relation between the hyper-node (("a")("b")) and the node ("c")
| |
| ==== Properties of hyper-nodes ====
| |
| ;As any node, hyper-nodes are expressed between (parentheses)
| |
| :(("a")("b"))
| |
| ;As any node, hyper-nodes may have one single string, one single headword and one single UW, but may have as many features and internal relations as necessary
| |
| :(([kick],V)([the],D)([bucket],N),"kick the bucket",[kick the bucket],<nowiki>[[die]]</nowiki>,V,NTST)
| |
| ;As any node, hyper-nodes may be referenced by any of its elements, including internal relations
| |
| :(([kick],V)) - refers to any hyper-node containing the node ([kick],V)
| |
| :(([the],D)([bucket],N)) - refers to any hyper-node containing a linear relation between ([the],D) AND ([bucket],N)
| |
| :(([kick],D),([bucket],N)) - refers to any hyper-node containing the nodes ([kick],V) AND ([bucket],N)
| |
| ;When a hyper-node is deleted, all its internal relations are deleted as well
| |
| :(([kick],V)([the],D)([bucket],N)):=; (the hyper-node is deleted, as well as the relations ([kick],V)([the],D) AND ([the],D)([bucket],N))
| |
| | |
| === Hyper-relations ===
| |
| Relations may have relations as arguments. In this case, they are said to be "hyper-relations". Examples of hyper-relations are the following:
| |
| *XP(XB(%a;%b);%c) - a syntactic relation XP between the syntactic relation XB(%a;%b) and the node %c
| |
| *and(agt([a];[b]);agt([a];[c])) - a semantic relation "and" between the semantic relations agt([a];[b]) AND agt([a];[c])
| |
| ==== Properties of hyper-relations ====
| |
| ;A hyper-relation may have one single relation as each argument
| |
| *XP(XB(%a;%b);%c) - the source argument of the hyper-relation XP is a relation
| |
| *XP(%a;XB(%b;%c)) - the target argument of the hyper-relation XP is a relation
| |
| *XP(VC(%a;%b);VA(%a;%c)) - the source and the target argument of the hyper-relation XP are relations
| |
| *<strike>XP(VC(%a;%b)VA(%a;%c);VS(%a;%d))</strike> - a hyper-relation may not have more than one relation as one single argument (in this case, the hyper-relation XP contained two relations as the source argument)
| |
| ;Relations do not have strings, UWs, headwords or any features
| |
| *<strike>XP(XB(%a;%b),"ab",[ab],<nowiki>[[ab]]</nowiki>,A,B;%c)</strike> (the relation XB(%a;%b) may not have strings, UWs, headwords or any features)
| |
| | |
| == Types of rules ==
| |
| | |
| In the UNL Grammar there are three basic types of rules:
| |
| | |
| === Normalization Rules ===
| |
| (main article: [[N-Rule]]s
| |
| Used to normalize the natural language input and to segment natural language texts into sentences.
| |
| | |
| === Transformation rules ===
| |
| (main article: [[T-Rule]]s
| |
| Used to generate natural language sentences out of UNL graphs and vice-versa.
| |
| | |
| === Disambiguation rules ===
| |
| (main article: [[D-rule]]s
| |
| Used to improve the performance of transformation rules by constraining their applicability.
| |
| | |
| The Segmentation Rules and Transformation Rules follow the very general formalism
| |
| | |
| α:=β;
| |
| | |
| where the left side α is a condition statement, and the right side β is an action to be performed over α.
| |
| | |
| The Disambiguation Rules, which were directly inspired by the UNL Centre's former co-occurrence dictionary and knowledge base, follows a slightly different formalism:
| |
| | |
| α=P;
| |
| | |
| where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.
| |
| | |
| == Indexes ==
| |
| ;Indexes (%) are used for co-indexing nodes, attributes and values inside and between the left and the right side of transformation rules.
| |
| :X(%a;)Y(%a;) (the first node of X is also the first node of Y)
| |
| :X(%a;%b):=Y(%b;%a); (the first node of X becomes the second node of Y, and the second node of X becomes the first node of Y)
| |
| :X(%a;)Y(%a;):=Z(%a); (if the first node of X is the first node of Y then make it the single node of Z)
| |
| <blockquote>Any co-indexation is made by the use of indexes and not by the repetition of features. In that sense, '''X(A;)Y(A;)''' is different from '''X(%a;)Y(%a;)'''. In the former case, the first node of X is not necessarily the first node of Y, they only share the same feature A; in the latter case, the first node of X is necessarily the first node of Y.</blockquote>
| |
| ;Indexes are made of any sequence of alphanumeric characters and underscore:
| |
| :%index
| |
| :%a
| |
| :%first_index
| |
| :%a1
| |
| :<strike>%first index</strike> (no blank spaces are allowed)
| |
| <blockquote>%01 (numbers are used for default indexation and must be avoided - see below)</blockquote>
| |
| ;Default indexation
| |
| :If omitted, indexes are assigned by default, according to the following rules:
| |
| :Default indexes are assigned from left to right in each side of the rule according to the position of the nodes:
| |
| ::X(A;B)Y(C;D) is the same as X('''%01''',A;'''%02''',B)Y('''%03''',C;'''%04''',D)
| |
| :Default indexation is done only for non-indexed nodes (i.e., user-defined indexes prevail over indexes assigned by default):
| |
| ::X(A,%A;B)Y(C,%C;D) is the same as X(A,%A;B,'''%02''')Y(C,%C;'''%04''',D)
| |
| :::(Notice that the user-defined indexes %A and %C are preserved and not replaced by default indexes)
| |
| :In default indexation, left-side nodes are automatically co-indexed with right-side nodes '''if and only if''' their position and number are the same:
| |
| ::X(A;B):=Y(C;D); is the same as X('''%01''',A;'''%02''',B):=Y('''%01''',C;'''%02''',D);
| |
| ::X(A;B):=Y(C;D;E); is the same as X('''%01''',A;'''%02''',B):=Y('''%03''',C;'''%04''',D;'''%05''',E);
| |
| :::(there is no co-indexation between the left and the right side in the latter case, because the number of the nodes is not the same)
| |
| :Default indexes are also assigned to hyper-nodes and sub-nodes
| |
| ::(((A))):=(((B))); is the same as (%01(%01%01(%01%01%01,A))):=(%01(%01%01(%01%01%01,B)));
| |
| :In default indexation, sub-nodes are informed by the syntax <PARENT NODE><CHILD NODE>, where <PARENT NODE> may be, itself, a sub-node:
| |
| ::X(Y(A;B);C) is the same as X('''%01''',Y('''%01%01''',A;'''%01%02''',B);'''%02''')
| |
| :::%01 = Y(A;B), %02 = C, %01%01 = A, %01%02 = B
| |
| ::X(Y(Z(A;B);C);D) is the same as X('''%01''',Y('''%01%01''',Z('''%01%01%01''',A;'''%01%01%02''',B);'''%01%02''',C);'''%02''',D)
| |
| :::%01 = Y(Z(A;B);C), %02 = D, %01%01 = Z(A;B), %01%02 = C, %01%01%01 = A, %01%01%02 = B
| |
| ;Non-indexed nodes in the right side means ADDITION, whereas left-side nodes that are not referred to in the right side means DELETION
| |
| :X(%a;%b):=Y(%a;X;%b); is the same as X(%a;%b):=Y(%a;'''%02''',X,;%b); (it means that a new node with the feature X will be created for the relation Y)
| |
| :X(%a;%b;%c):=Y(%a;%c); (it means that the second node of X will be deleted from the relation Y)
| |
| ;Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE:
| |
| :X(A,%a,ATT1=VAL1;B,%b):=X(%a;%b,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)
| |
| ;Special indexes (#) are used to make reference to the internal structure of the field <NLW> in the dictionary
| |
| :(X)(Y):=(X,#02)(Y)(X,#01);
| |
| ::The rule above is used for complex dictionary entries such as:
| |
| :::[[A][B]] "uw" (X, #01(ATT=AAA), #02(ATT=BBB)) <flg,fre,pri>;
| |
| ::It means that, given (X)(Y), the output should be (B)(Y)(A).
| |
| | |
| == Notes ==
| |
| <references />
| |