Node
From UNL Wiki
A node is the most elementary unit in the grammar. It is the result of the tokenization process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
Elements
Any node is a vector (one-dimensional array) containing the following necessary elements:
- a string, to be represented between "quotes", which expresses the actual state of the node;
- a headword, to be represented between [square brackets], which expresses the original value of the node in the dictionary;
- a UW, to be represented between [[double square brackets]], which expresses the UW value of the node;
- a feature or set of features, which express the features of the node;
- an Index, preceded by the symbol %, which is used to reference the node;
Examples
Examples of nodes:
- ("ing") (a node making reference only to its actual string value)
- ([book]) (a node making reference only to its headword,i.e., its original state in the dictionary)
- ([[book(icl>document)]]) (a node making reference only to its UW value)
- (NUM) (a node making reference only to one of its features)
- (POS=NOU) (a node making reference only to one of its features in the attribute-value pair format)
- (%x) (a node making reference only to its unique index)
- ("string",[headword],[[UW]],feature1,feature2,...,attribute1=value1,attribute2=value2,...,%x) (complete node)
Properties
- Nodes are enclosed between (parentheses)
- ("a") is a node
- "a" is not a node
- The elements of a node are separated by comma
- ("a",[a],[[a]],A,B,A=C,%a)
- The order of elements inside a node is not relevant.
- ("a",[a],[[a]],A,B,A=C,%a) is the same as ([[a]],B,A,"a",[a],A=C,%a)
- Nodes may have one single string, headword, UW and index, but may have as many features as necessary
("a","b")(a node may not contain more than one string)([a],[b])(a node may not contain more than one headword)([[a]],[[b]])(a node may not contain more than one UW)(%a,%b)(a node may not contain more than one index)- (A,B,C,D,...,Z) (a node may contain as many features as necessary)
- A node may be referred by any of its elements, but only the index make it unique
- ("a") refers to all nodes where actual string = "a"
- ([a]) refers to all nodes where headword = [a]
- ([[a]]) refers to all nodes where UW = [[a]]
- (A) refers to all nodes having the feature A
- ("a",[a],[[a]],A) refers to all nodes having the feature A where string = "a" and headword = [a] and UW = [[a]]
- (%a) refers to the specific node with the index %a
- Nodes are automatically indexed according to a position-based system if no explicit index is provided (see Index)
- ("a")("b") is actually ("a",%01)("b",%02)
- Regular expressions may be used to make reference to any element of the node, except the index
- ("/a{2,3}/") refers to all nodes where string is a sequence of 2 to 3 characters "a"
- ([/a{2,3}/]) refers to all nodes where headword is a sequence of 2 to 3 characters "a"
- ([[/a{2,3}/]]) refers to all nodes where UW is a sequence of 2 to 3 characters "a"
- (/a{2,3}/) refers to all nodes having a feature that is a sequence of 2 to 3 characters "a"
- Nodes may contain disjoint features enclosed between {braces} and separated by comma
- ({A|B}) refers to all nodes having the feature A OR B
- Node features may be expressed as simple attributes, or attribute-value pairs
- (MCL) - feature as an attribute: refers to all nodes having the feature MCL
- (GEN=MCL) - feature as an attribute-value pair, which is the same as (GEN,MCL): refers to all nodes having the features GEN and MCL.
Attribute-value pairs may be used to create co-reference between different nodes (as in agreement):
- (%x,GEN)(%y,GEN=%x) - the value of the attribute GEN of the node %x is the same of the attribute GEN of the node %y (see Index)