Rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
  
N-rules (normalization rules) and T-rules (transformation rules) follow the very general formalism
+
== Basic concepts ==
 +
;[[Node]]
 +
:A node is the most elementary unit in the graph. It is the result of the [[tokenization]] process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
 +
;[[Relation]]
 +
:In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there can be three different types of relations: linear, syntactic or semantic.
 +
;[[Hyper-Node]]
 +
:A hyper-node is a sub-graph, i.e., a node containing relations between nodes.
 +
;[[Hyper-Relation]]
 +
:A hyper-relation is a relation between relations.
  
α:=β;
 
  
where the left side α is a condition statement, and the right side β is an action to be performed over α.
 
 
D-rules (disambiguation rules) follow a slightly different formalism:
 
 
α=P;
 
 
where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.
 
  
  

Revision as of 20:26, 16 August 2013

Basic concepts

Node
A node is the most elementary unit in the graph. It is the result of the tokenization process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
Relation
In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there can be three different types of relations: linear, syntactic or semantic.
Hyper-Node
A hyper-node is a sub-graph, i.e., a node containing relations between nodes.
Hyper-Relation
A hyper-relation is a relation between relations.



Basic symbols

Basic symbols used in UNL grammar rules
Symbol Definition Example
^ not ^a = not a
{ | } or {a|b} = a or b
% index for nodes, attributes and values %x (see below)
# index for sub-NLWs #01 (see below)
= attribute-value assignment POS=NOU
! rule trigger !PLR
& merge operator %x&%y
? dictionary lookup operator ?[a]
“ “ string "went"
[ ] natural language entry (headword) [go]
[[ ]] UW [[to go(icl>to move)]]
( ) node (a)
// regular expression /a{2,3}/ = aa,aaa
The differences between "", [] and [[]]
Double quotes are always used to represent strings: "a" will match only the string "a"
Simple square brackets are always used to represent natural language entries (headwords) in the dictionary: [a] will match the node associated to the entry [a] retrieved from the dictionary, no matter its current realization, which may be affected by other rules (the original [a] may have been replaced, for instance, by "b", but will still be indexed to the entry [a])
Double square brackets are always used to represent UWs: [[a]] will match the node associated to the UW [[a]]
Predefined values (assigned by default)
SCOPE - Scope
SHEAD - Sentence head (the beginning of a sentence)
STAIL - Sentence tail (the end of a sentence)
CHEAD - Scope head (the beginning of a scope)
CTAIL - Scope tail (the end of a scope)
TEMP - Temporary entry (entry not found in the dictionary)
DIGIT - Any sequence of digits (i.e.: 0,1,2,3,4,5,6,7,8,9)
Software