UNL document

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
m (Protected "UNL document" [edit=sysop:move=sysop])
(Tags used in UNL Documents)
Line 12: Line 12:
 
!Description
 
!Description
 
|-
 
|-
|[D:<dinf>]  
+
|[D:<id>]  
|indicates the beginning of a document and the necessary information about the document
+
|indicates the beginning of a document and the necessary information about the document.
 
|-
 
|-
 
|[/D]  
 
|[/D]  
 
|indicates the end of a document
 
|indicates the end of a document
 
|-
 
|-
|[P:<p_num>]  
+
|[P:<id>]  
|indicates the beginning of a paragraph
+
|indicates the beginning of a paragraph.
 
|-
 
|-
 
|[/P]  
 
|[/P]  
 
|indicates the end of a paragraph
 
|indicates the end of a paragraph
 
|-
 
|-
|[S:<s_num>]  
+
|[S:<id>]  
|indicates the beginning of a sentence and the sentence number
+
|indicates the beginning of a sentence and the sentence number.
 
|-
 
|-
 
|[/S]  
 
|[/S]  
 
|indicates the end of a sentence
 
|indicates the end of a sentence
 
|-
 
|-
|<nowiki>{org:<l_tag>=<code>}</nowiki>  
+
|<nowiki>{org:<lang>=<code>}</nowiki>  
 
|<nowiki>indicates the beginning of an original/source sentence, language and character code, “=<code>” can be omitted</nowiki>
 
|<nowiki>indicates the beginning of an original/source sentence, language and character code, “=<code>” can be omitted</nowiki>
 
|-
 
|-
Line 36: Line 36:
 
|indicates the end of an original sentence
 
|indicates the end of an original sentence
 
|-
 
|-
|{unl:<uinf>}  
+
|{unl:<id>}  
 
|indicates the beginning of the UNL expressions of a sentence and necessary information, “:<uinf>” can be omitted
 
|indicates the beginning of the UNL expressions of a sentence and necessary information, “:<uinf>” can be omitted
 
|-
 
|-
Line 42: Line 42:
 
|indicates the end of the UNL expressions of a sentence
 
|indicates the end of the UNL expressions of a sentence
 
|-
 
|-
|{<l_tag>}  
+
|{<lang>}  
 
|indicates the beginning of a target sentence of the language indicated by <l_tag>  
 
|indicates the beginning of a target sentence of the language indicated by <l_tag>  
 
|-
 
|-
|{/<l_tag>}  
+
|{/<lang>}  
 
|indicates the end of a target sentence of the language indicated by <l_tag>  
 
|indicates the end of a target sentence of the language indicated by <l_tag>  
 
|}
 
|}

Revision as of 20:04, 6 December 2010

UNL documents are documents written in UNL. They are plain text files that include UNL Sentences and some special tags. They are the output of the enconversion process and the input of the deconversion process.

Syntax

A UNL document is enclosed with tags “[D:<dinf>]” and “[/D]”. Within these tags, each paragraph is enclosed with a pair of tags “[P:<p_num>]” and “[/P]”, and each sentence is enclosed with a pair of tags “[S:<s_num>]” and “[/S]”. Inside a sentence, the text of original sentence is enclosed with “{org:<l_tag>}” and “{/org}”, its UNL expression is enclosed with “{unl:<uinf>}” and “{/unl}”. Sentences of target languages can also be stored in the UNL document. Each target sentence is enclosed with a pair of language tags “{<l_tag>}” and “{</l_tag>}” following the UNL expression of each sentence.

Tags used in UNL Documents

Tag Description
[D:<id>] indicates the beginning of a document and the necessary information about the document.
[/D] indicates the end of a document
[P:<id>] indicates the beginning of a paragraph.
[/P] indicates the end of a paragraph
[S:<id>] indicates the beginning of a sentence and the sentence number.
[/S] indicates the end of a sentence
{org:<lang>=<code>} indicates the beginning of an original/source sentence, language and character code, “=<code>” can be omitted
{/org} indicates the end of an original sentence
{unl:<id>} indicates the beginning of the UNL expressions of a sentence and necessary information, “:<uinf>” can be omitted
{/unl} indicates the end of the UNL expressions of a sentence
{<lang>} indicates the beginning of a target sentence of the language indicated by <l_tag>
{/<lang>} indicates the end of a target sentence of the language indicated by <l_tag>

Semantics

For the time being, a UNL document is simply a collection of UNL sentences. However, it can also be treated as a hypergraph itself, comprising several subhypergraphs (the UNL sentences) inter-related by a special relation "nxt" (for "next"), which indicates sequential order. In the XUNL Project, we have been proposing some other strategies for representing cross-sentential relations, which are, however, still under discussion.

Software