Questions/Remarks about the Tags : LEX vs. POS

kobac · Guest

I used the export_tagset.php document http://www.unlweb.net/unlarium/dictionary/export_tagset.php ; the Tagset wikipage http://www.unlweb.net/wiki/index.php/Tagset and the wikipages of some tags.

I had to change 1PP, 1PS,... tags to P1P, P1S,... because an XML ID may not begin with a cipher

What is the difference between LEX and POS ? POS seems to be more complete. There are inconsistencies the the POS subtags between the Tagset wikipage and the Part_of_speech wikipage (i.e., the names of the subtags, the adjectives, the common noun, the punctuation and the verbal). I did not use LEX at all, and I used the Part_of_speech wikipage for the POS subtags.

Regards,
Maxime Lefran�ois

martins

I couldn�t understand why you had to define the values of the attribute person as XML ID�s. Normally, we would define �person� as an XML attribute and �1PS�, �1PP�, etc. as its values:

<word lex=�R� pos=�PPR� person=�1PP�>we</word>

In this case, there would be no problem in using �1PP�, �1PS�, and so on.

LEX means lexical category, while POS means part-of-speech. Although closely related, these attributes are not the same. The attribute LEX follows the general idea that the words of language may be classified either as lexical heads or as functional heads, and that these categories are actually derived from the combination of two primitive categorial features (N, for nominal, and V, for verbal, as described in Chomsky, N. 1981: Lectures on Government and Binding), as follows:

N (nominal) = +N,-V
V (verbal) = -N,+V
J (adjective) = +N,+V
A (adpositions) = -N,-V

In addition to these open-class lexical heads, we have also closed-class functional heads (such as D, for determiner; I, for inflection; and C, for complementizer), which are used to describe aspects of the grammatical structure of a given language. In the UNLarium framework, we have also added some semi-lexical categories (such as pronouns and numerals) and we have differentiated between prepositions (P) and adverbs (A), which are normally understood as belonging to the same lexical category (A). This was necessary because we plan to derive POS (which has mainly to do with the morphological and the syntactic behavior) from LEX (which has been associated rather to the semantics). For the time being, however, LEX is normally used to classify UWs, whereas POS is used to classify natural language words.

---------------------------------------------
Ronaldo MARTINS
Language Resources Manager
UNDL Foundation
48, route de Chancy
CH-1213 � Geneva - Switzerland
+41 22 879 8090
http://www.undlfoundation.org
---------------------------------------------

kobac · Guest

Thank you for your answer,

Just a short reply about the use of XML IDs to deal with the values :
The model I design is based on the Resource Description Framework (RDF). The atom of a RDF document is a triple subject-predicate-object, where subject object and predicates are "Resources" that may be described. This modeling strategy augmented with the RDF Schema and the Web Ontology Language OWL enables to define a knowledge domain, to constrain "positively" its use, and to reason. Existing reasoners can thus be used to : a) infer new knowledge b) validate the knowledge base.
And existing query and update languages (SPARQL-query ; SPARQL-update) can also be used as high level graph query and update languages (I use them to define the Grammar rules).

Although RDF documents are written in XML, there is no attribute-value pair with this modeling language.
In OWL, resources are divided into Classes, Instances that populate these classes, ObjectProperties that can link these instances, and DatatypeProperties that can link an instance to a literal value.
Each of these resources can be given an XMLid. The base URL of the document concatenated with the XMLids form Universal Resource Identifiers (URI) that one can use to assert axioms in the ontology (axioms form the Terminology-box Tbox) ; to assert facts in the ontology (facts form the Assertions-box Abox) ; to exchange knowledge and to reason.

Concepts and unary-relations (tags, PER, 1PP, NLWs, a NLW, UWs, a UW, UNLattributes, ...) are represented by "Classes" ;
Every instance of the classes (a NLW in a sentence, a UW in a graph) is represented by an "Instance" of one or more classes ;
Every binary-relation (aoj, obj, mod...) is an "ObjectProperty" ;
Labels, comments, number values, are "DatatypeProperties".

Regards,
Maxime Lefran�ois