Rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
m (Examples of T-rules)
 
(6 intermediate revisions by one user not shown)
Line 1: Line 1:
 +
Grammars are sets of rules used to go from UNL into natural language, or from natural language into UNL. In the UNL framework, there can be two different types of rules:
 +
:*[[T-rule]]s, or transformation rules, are used to perform changes to nodes or relations
 +
:*[[D-rule]]s, or disambiguation rules, are used to control changes over nodes or relations
  
== Basic concepts ==
+
=== [[T-rule]]s ===
;[[Node]]
+
:''main article'':[[T-rule]]
:A node is the most elementary unit in the graph. It is the result of the [[tokenization]] process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
+
T-rules are used to perform actions and follow the very general formalism
;[[Relation]]
+
:In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there can be three different types of relations: linear, syntactic or semantic.
+
;[[Hyper-Node]]
+
:A hyper-node is a sub-graph, i.e., a node containing relations between nodes.
+
;[[Hyper-Relation]]
+
:A hyper-relation is a relation between relations.
+
  
 +
α:=β;
  
 +
where the left side α is a condition statement, and the right side β is an action to be performed over α.
  
 +
There are several different especial types of T-rules:
 +
*[[A-rule]] is a specific type of T-rule used for affixation (prefixation, infixation, suffixation)
 +
*[[C-rule]] is a specific type of T-rule used for composition (word formation in case of compounds and multiword expressions)
 +
*[[L-rule]] is a specific type of T-rule used for handling word order
 +
*[[N-rule]] is a specific type of T-rule used for segmenting sentences and normalizing the input text
 +
*[[S-rule]] is a specific type of T-rule used for handling syntactic structures
  
 +
==== Examples of T-rules ====
 +
*PLR:=0>"s"; (A-rule: add "s" in case of plural, as in ''book''>''books'')
 +
*MTW:=+VA("into account",PP); (C-rule: add the prepositional phrase "into account" as an adjunct to the verbal phrase (VA) in order to form the multiword expression, as in ''take''>take ''into account'')
 +
*(ART,%x)(QUA,%y):=(%y)(%x); (L-rule: reverse the order ART+QUA to QUA+ART, as in ''the all''>''all the'')
 +
*("don't"):=("do not"); (N-rule: replace the contraction "don't" by "do not")
 +
*(V,%x)(N,%y):=VC(%x;%y); (S-rule: replace the linear relation between a verb and a noun by the syntactic relation VC between them)
  
== Basic symbols ==
+
=== [[D-rule]]s ===
 +
:''main article:'' [[D-rule]]
 +
D-rules are used to control the action of T-rules. They are used to control the dictionary retrieval (in [[tokenization]]) and to prevent or to induce the application of rules in transformation.
  
{| border="1" cellpadding="2" align=center
+
D-rules follow the syntax:
|+Basic symbols used in UNL grammar rules
+
!Symbol
+
!Definition
+
!Example
+
|-
+
|align=center|<nowiki>^</nowiki>
+
|not
+
|^a = not a
+
|-
+
|align=center|{ | }
+
|or
+
|<nowiki>{a|b}</nowiki> = a or b
+
|-
+
|align=center|%
+
|index for nodes, attributes and values
+
|%x (see [[#Indexes|below]])
+
|-
+
|align=center|#
+
|index for sub-NLWs
+
|#01 (see [[#Indexes|below]])
+
|-
+
|align=center|=
+
|attribute-value assignment
+
|POS=NOU
+
|-
+
|align=center|!
+
|rule trigger
+
|!PLR
+
|-
+
|align=center|&
+
|merge operator
+
|%x&%y
+
|-
+
|align=center|?
+
|dictionary lookup operator
+
|?[a]
+
|-
+
|align=center|“ “
+
|string
+
|"went"
+
|-
+
|align=center|[ ]
+
|natural language entry (headword)
+
|[go]
+
|-
+
|align=center|[[ ]]
+
|UW
+
|[[to go(icl>to move)]]
+
|-
+
|align=center|( )
+
|node
+
|(a)
+
|-
+
|align=center|//
+
|regular expression
+
|/a{2,3}/ = aa,aaa
+
|}
+
  
;The differences between "", [] and [[]]
+
α=P;
:Double quotes are always used to represent strings: "a" will match only the string "a"
+
:Simple square brackets are always used to represent natural language entries (headwords) in the dictionary: [a] will match the node associated to the entry [a] retrieved from the dictionary, no matter its current realization, which may be affected by other rules (the original [a] may have been replaced, for instance, by "b", but will still be indexed to the entry [a])
+
:Double square brackets are always used to represent UWs: <nowiki>[[a]]</nowiki> will match the node associated to the UW <nowiki>[[a]]</nowiki>
+
  
;Predefined values (assigned by default)
+
where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.
:SCOPE - Scope
+
 
:SHEAD - Sentence head (the beginning of a sentence)
+
==== Examples of D-rules ====
:STAIL - Sentence tail (the end of a sentence)
+
*(ART)(VER)=0; (there cannot be any article before a verb)
:CHEAD - Scope head (the beginning of a scope)
+
*agt(^V,^J;)=0; (the source node of an agent relation must be either a verb or an adjective)
:CTAIL - Scope tail (the end of a scope)
+
*(D)(N)=1; (determiners may come before nouns)
:TEMP - Temporary entry (entry not found in the dictionary)
+
:DIGIT - Any sequence of digits (i.e.: 0,1,2,3,4,5,6,7,8,9)
+

Latest revision as of 20:53, 16 December 2013

Grammars are sets of rules used to go from UNL into natural language, or from natural language into UNL. In the UNL framework, there can be two different types of rules:

  • T-rules, or transformation rules, are used to perform changes to nodes or relations
  • D-rules, or disambiguation rules, are used to control changes over nodes or relations

Contents

T-rules

main article:T-rule

T-rules are used to perform actions and follow the very general formalism

α:=β;

where the left side α is a condition statement, and the right side β is an action to be performed over α.

There are several different especial types of T-rules:

  • A-rule is a specific type of T-rule used for affixation (prefixation, infixation, suffixation)
  • C-rule is a specific type of T-rule used for composition (word formation in case of compounds and multiword expressions)
  • L-rule is a specific type of T-rule used for handling word order
  • N-rule is a specific type of T-rule used for segmenting sentences and normalizing the input text
  • S-rule is a specific type of T-rule used for handling syntactic structures

Examples of T-rules

  • PLR:=0>"s"; (A-rule: add "s" in case of plural, as in book>books)
  • MTW:=+VA("into account",PP); (C-rule: add the prepositional phrase "into account" as an adjunct to the verbal phrase (VA) in order to form the multiword expression, as in take>take into account)
  • (ART,%x)(QUA,%y):=(%y)(%x); (L-rule: reverse the order ART+QUA to QUA+ART, as in the all>all the)
  • ("don't"):=("do not"); (N-rule: replace the contraction "don't" by "do not")
  • (V,%x)(N,%y):=VC(%x;%y); (S-rule: replace the linear relation between a verb and a noun by the syntactic relation VC between them)

D-rules

main article: D-rule

D-rules are used to control the action of T-rules. They are used to control the dictionary retrieval (in tokenization) and to prevent or to induce the application of rules in transformation.

D-rules follow the syntax:

α=P;

where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.

Examples of D-rules

  • (ART)(VER)=0; (there cannot be any article before a verb)
  • agt(^V,^J;)=0; (the source node of an agent relation must be either a verb or an adjective)
  • (D)(N)=1; (determiners may come before nouns)
Software