Segmentation

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Created page with "Segmentation is the processing of splitting the input into processing units. In UNLization with IAN, the natural language input document is split into sentences; in [[...")
 
(EUGENE)
 
Line 8: Line 8:
  
 
== EUGENE ==
 
== EUGENE ==
In [[EUGENE]], segmentation is done using the [[UNL Document]] tags.
+
In [[EUGENE]], segmentation is done using the [[UNL document]] tags.
 
*The tag [S] defines the beginning of a sentence, and the tag [/S] defines the end of a sentence
 
*The tag [S] defines the beginning of a sentence, and the tag [/S] defines the end of a sentence
 
*The tag {org} defines the beginning of the source sentence, and the tag {/org} defines the end of the source sentence
 
*The tag {org} defines the beginning of the source sentence, and the tag {/org} defines the end of the source sentence
 
*The tag {unl} defines the beginning of the UNL graph, and the tag {/unl} defines the end of the UNL graph
 
*The tag {unl} defines the beginning of the UNL graph, and the tag {/unl} defines the end of the UNL graph

Latest revision as of 01:43, 28 July 2012

Segmentation is the processing of splitting the input into processing units. In UNLization with IAN, the natural language input document is split into sentences; in UNLization with SEAN, the natural language input is split into texts; in NLization with EUGENE, the UNL input is split into graphs.

IAN

In IAN, segmentation is done using a set of predefined* sentence boundaries:

  • punctuation signs: ".",";","!","?","..."
  • special characters: end-of-line, end-of-paragraph

* This process is expected to be replaced by a user-defined system in the coming releases of IAN.

EUGENE

In EUGENE, segmentation is done using the UNL document tags.

  • The tag [S] defines the beginning of a sentence, and the tag [/S] defines the end of a sentence
  • The tag {org} defines the beginning of the source sentence, and the tag {/org} defines the end of the source sentence
  • The tag {unl} defines the beginning of the UNL graph, and the tag {/unl} defines the end of the UNL graph
Software