Grammar Specs

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
 
+
The following Grammar Specs are used for writing rules for the UNDL Foundation tools ([[IAN]],[[EUGENE]],[[SEAN]],[[NORMA]],etc.).
 
+
  
 
== Basic symbols ==
 
== Basic symbols ==
Line 63: Line 62:
 
|}
 
|}
  
== Basic Concepts ==
+
== Basic concepts ==
 
+
;[[Node]]
 
+
:A node is the most elementary unit in the graph. It is the result of the [[tokenization]] process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
 
+
;[[Relation]]
 
+
:In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there can be three different types of relations: linear, syntactic or semantic.
 
+
;[[Hyper-Node]]
 
+
:A hyper-node is a sub-graph, i.e., a node containing relations between nodes.
 
+
;[[Hyper-Relation]]
 
+
:A hyper-relation is a relation between relations.
;The differences between "", [] and [[]]
+
:Double quotes are always used to represent strings: "a" will match only the string "a"
+
:Simple square brackets are always used to represent natural language entries (headwords) in the dictionary: [a] will match the node associated to the entry [a] retrieved from the dictionary, no matter its current realization, which may be affected by other rules (the original [a] may have been replaced, for instance, by "b", but will still be indexed to the entry [a])
+
:Double square brackets are always used to represent UWs: <nowiki>[[a]]</nowiki> will match the node associated to the UW <nowiki>[[a]]</nowiki>
+
 
+
;Predefined values (assigned by default)
+
:SCOPE - Scope
+
:SHEAD - Sentence head (the beginning of a sentence)
+
:STAIL - Sentence tail (the end of a sentence)
+
:CHEAD - Scope head (the beginning of a scope)
+
:CTAIL - Scope tail (the end of a scope)
+
:TEMP - Temporary entry (entry not found in the dictionary)
+
:DIGIT - Any sequence of digits (i.e.: 0,1,2,3,4,5,6,7,8,9)
+

Revision as of 20:32, 16 August 2013

The following Grammar Specs are used for writing rules for the UNDL Foundation tools (IAN,EUGENE,SEAN,NORMA,etc.).

Basic symbols

Basic symbols used in UNL grammar rules
Symbol Definition Example
^ not ^a = not a
{ | } or {a|b} = a or b
% index for nodes, attributes and values %x (see below)
# index for sub-NLWs #01 (see below)
= attribute-value assignment POS=NOU
! rule trigger !PLR
& merge operator %x&%y
? dictionary lookup operator ?[a]
“ “ string "went"
[ ] natural language entry (headword) [go]
[[ ]] UW [[to go(icl>to move)]]
( ) node (a)
// regular expression /a{2,3}/ = aa,aaa

Basic concepts

Node
A node is the most elementary unit in the graph. It is the result of the tokenization process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
Relation
In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there can be three different types of relations: linear, syntactic or semantic.
Hyper-Node
A hyper-node is a sub-graph, i.e., a node containing relations between nodes.
Hyper-Relation
A hyper-relation is a relation between relations.
Software