Node

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
Martins (Talk | contribs)
(Created page with "A node is the most elementary unit in the grammar. It is the result of the tokenization process, and corresponds to the notion of "lexical item", to be represented by dict...")
Newer edit →

Revision as of 15:49, 16 August 2013

A node is the most elementary unit in the grammar. It is the result of the tokenization process, and corresponds to the notion of "lexical item", to be represented by dictionary entries. At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.

Elements

Any node is a vector (one-dimensional array) containing the following necessary elements:

  • a string, to be represented between "quotes", which expresses the actual state of the node;
  • a headword, to be represented between [square brackets], which expresses the original value of the node in the dictionary;
  • a UW, to be represented between [[double square brackets]], which expresses the UW value of the node;
  • a feature or set of features, which express the features of the node;
  • an Index, preceded by the symbol %, which is used to reference the node;

Examples of nodes are

  • ("ing") (a node making reference only to its actual string value)
  • ([book]) (a node making reference only to its headword,i.e., its original state in the dictionary)
  • ([[book(icl>document)]]) (a node making reference only to its UW value)
  • (NUM) (a node making reference only to one of its features)
  • (POS=NOU) (a node making reference only to one of its features in the attribute-value pair format)
  • (%x) (a node making reference only to its unique index)
  • ("string",[headword],[[UW]],feature1,feature2,...,attribute1=value1,attribute2=value2,...,%x) (complete node)

Properties of nodes

Nodes are enclosed between (parentheses)
("a") is a node
"a" is not a note
The elements of a node are separated by comma
("a",[a],[[a]],A,B,A=C,%a)
The order of elements inside a node is not relevant.
("a",[a],[[a]],A,B,A=C,%a) is the same as ([[a]],B,A,"a",[a],A=C,%a)
Nodes may have one single string, headword, UW and index, but may have as many features as necessary
("a","b") (a node may not contain more than one string)
([a],[b]) (a node may not contain more than one headword)
([[a]],[[b]]) (a node may not contain more than one UW)
(%a,%b) (a node may not contain more than one index)
(A,B,C,D,...,Z) (a node may contain as many features as necessary)
A node may be referred by any of its elements
("a") refers to all nodes where actual string = "a"
([a]) refers to all nodes where headword = [a]
([[a]]) refers to all nodes where UW = [[a]]
(A) refers to all nodes having the feature A
("a",[a],[[a]],A) refers to all nodes having the feature A where string = "a" and headword = [a] and UW = [[a]]
Nodes are automatically indexed according to a position-based system if no explicit index is provided (see Index)
("a")("b") is actually ("a",%01)("b",%02)
Regular expressions may be used to make reference to any element of the node, except the index
("/a{2,3}/") refers to all nodes where string is a sequence of 2 to 3 characters "a"
([/a{2,3}/]) refers to all nodes where headword is a sequence of 2 to 3 characters "a"
([[/a{2,3}/]]) refers to all nodes where UW is a sequence of 2 to 3 characters "a"
(/a{2,3}/) refers to all nodes having a feature that is a sequence of 2 to 3 characters "a"
Nodes may contain disjoint features enclosed between {braces} and separated by comma
({A|B}) refers to all nodes having the feature A OR B
Node features may be expressed as simple attributes, or attribute-value pairs
(MCL) - feature as an attribute: refers to all nodes having the feature MCL
(GEN=MCL) - feature as an attribute-value pair, which is the same as (GEN,MCL): refers to all nodes having the features GEN and MCL.

Attribute-value pairs may be used to create co-reference between different nodes (as in agreement):

(%x,GEN)(%y,GEN=%x) - the value of the attribute GEN of the node %x is the same of the attribute GEN of the node %y (see Index)
Software