UCI

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Created page with "An '''Uniform Concept Identifier''' (UCI) is used to identify a concept. It is a URI (Uniform Resource Identifier) for UW's. In the UNL framework, UCI's are represented ei...")
 
 
(20 intermediate revisions by one user not shown)
Line 1: Line 1:
 
An '''Uniform Concept Identifier''' (UCI) is used to identify a concept. It is a URI (Uniform Resource Identifier) for [[UW]]'s. In the UNL framework, UCI's are represented either as UCL (Uniform Concept Locator) or UCN (Uniform Concept Name).
 
An '''Uniform Concept Identifier''' (UCI) is used to identify a concept. It is a URI (Uniform Resource Identifier) for [[UW]]'s. In the UNL framework, UCI's are represented either as UCL (Uniform Concept Locator) or UCN (Uniform Concept Name).
  
 
+
== Structure ==
== Syntax ==
+
 
The UCI follows the generic syntax defined for URI's:
 
The UCI follows the generic syntax defined for URI's:
 
  <scheme name> : <hierarchical part>
 
  <scheme name> : <hierarchical part>
Line 11: Line 10:
 
*<hierarchical part>  holds the identification information.
 
*<hierarchical part>  holds the identification information.
  
== UCL ==
+
== UCL (Uniform Concept Locator) ==
Uniform Concept Locators (UCL), as URL's, provide a method for finding the concept in the UNL Knowledge Base. They are represented as:  
+
Uniform Concept Locators (UCL), as URL's, provide a method for finding the concept in the [[UNL Knowledge Base]]. They are represented as:  
  ucl://undlfoundation.org/<ID>
+
  ucl://<AUTHORITY>/<ID>
 
Where:
 
Where:
 
*ucl is the scheme name for uniform concept locators
 
*ucl is the scheme name for uniform concept locators
*undlfoundation.org is the authority
+
*<AUTHORITY> is the authority (knowledge base) responsible for the concept (unlkb.unlweb.net, by default)
*<ID> is is an integer used to identify the concept
+
*<ID> is the index of the concept in the knowledge base
 
For instance, the concept "a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs", which is lexicalized in English through the noun "table", may be located through:
 
For instance, the concept "a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs", which is lexicalized in English through the noun "table", may be located through:
  ucl://undlfoundation.org/104379964
+
  ucl://unlkb.unlweb.net/104379964
 +
This address is expected to bring all the information concerning the concept, i.e., it's definition in UNL, which may be used by the languages where this concept is not lexicalized.
  
== UCN ==
+
== UCN (Uniform Concept Name) ==
 
Uniform Concept Names (UCN) use the ucn scheme and, as URN's, do not imply availability of the identified resource. They are represented as:
 
Uniform Concept Names (UCN) use the ucn scheme and, as URN's, do not imply availability of the identified resource. They are represented as:
 
  ucn:<LID>:<NSS>
 
  ucn:<LID>:<NSS>
 
Where
 
Where
 
*ucn is the scheme name for uniform concept names
 
*ucn is the scheme name for uniform concept names
*<LID> he namespace identifier, which corresponds to the three-character ISO639-2 code for languages
+
*<LID> is namespace identifier, which corresponds to the three-character ISO 639-2 code for languages
 
*<NSS> is the namespace-specific string
 
*<NSS> is the namespace-specific string
 
For instance, the concept "a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs", which is lexicalized in English through the noun "table" may be associated to several different names:
 
For instance, the concept "a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs", which is lexicalized in English through the noun "table" may be associated to several different names:
Line 34: Line 34:
 
  ucn:deu:Tisch(icl>Möbel)
 
  ucn:deu:Tisch(icl>Möbel)
 
  ucn:rus:стол(icl>мебель)
 
  ucn:rus:стол(icl>мебель)
 +
UCN's must be unique and the namespace-specific string is normally split into two different parts: a root and a suffix, as exemplified above. The root can be a word or a multi-word expression. The suffix, which is always introduced by a UNL relation, is used to disambiguate the root.
 +
 +
== UCL or UCN? ==
 +
UCL and UCN are both UCI's (i.e., uniform concept identifiers). They are both used to identify UW's. The difference is that UCL is an address to the position of the UW in the [[UNL Knowledge Base]], whereas the UCN is only the name of the UW. The same address (i.e., UCL) may be associated to different UCN's, but a single UCN may not have more than one UCL. A UCL always describe an available UW, i.e., a UW that has been already defined in the UNL KB, whereas a UCN is not necessarily linked to an address. In that sense, UCL's are more "official" than UCN's, which are normally used in order to preserve the readability of the UNL code.
 +
 +
== Simplified Notation ==
 +
In the [[UNL document|UNL Document Structure]], UCI's are always abbreviated to the last part, because the scheme, the authority and the namespace may be inferred from the document header. For instance:
 +
*104379964 instead of ucl://unlkb.unlweb.net/104379964
 +
*table(icl>furniture) instead of ucn:eng:table(icl>furniture)
 +
 +
== Formal Syntax ==
 +
 +
<nowiki><UCI>        ::= <UCL>|<UCN></nowiki>
 +
<nowiki><UCL>        ::= "ucl://"<AUTHORITY>"/"<PATH></nowiki>
 +
<nowiki><AUTHORITY>  ::= <UTF-8 character>+</nowiki>
 +
<nowiki><ID>        ::= [0123456789]+</nowiki>
 +
<nowiki><UCN>        ::= "ucn://"<LID>":"<NSS></nowiki>
 +
<nowiki><LID>        ::= [a-z]{3}</nowiki>
 +
<nowiki><NSS>        ::= <root>[<suffix>]</nowiki>
 +
<nowiki><root>      ::= <UTF-8 character>+</nowiki>
 +
<nowiki><suffix>    ::= "("<relation>{">","<"}<root>")"</nowiki>
 +
<nowiki><relation>  ::= {"agt","and","aoj",...}</nowiki>
 +
where:<br>
 +
+      to be repeated 1 or more times<br >
 +
< > variable<br >
 +
" " terminal symbol<br >
 +
<nowiki>::=</nowiki> ... is defined as ...<br >
 +
|      or<br >
 +
[ ] optional element<br >
 +
{ } alternative element<br >
 +
... to be repeated more than 0 times<br >

Latest revision as of 16:07, 20 September 2012

An Uniform Concept Identifier (UCI) is used to identify a concept. It is a URI (Uniform Resource Identifier) for UW's. In the UNL framework, UCI's are represented either as UCL (Uniform Concept Locator) or UCN (Uniform Concept Name).

Contents

Structure

The UCI follows the generic syntax defined for URI's:

<scheme name> : <hierarchical part>

Where:

  • <scheme name> determines the syntax and semantics of the hierarchical part. In the UNLframework, there are two schemes:
    • ucl, which is used for uniform concept locators
    • ucn, which is used for uniform concept names
  • <hierarchical part> holds the identification information.

UCL (Uniform Concept Locator)

Uniform Concept Locators (UCL), as URL's, provide a method for finding the concept in the UNL Knowledge Base. They are represented as:

ucl://<AUTHORITY>/<ID>

Where:

  • ucl is the scheme name for uniform concept locators
  • <AUTHORITY> is the authority (knowledge base) responsible for the concept (unlkb.unlweb.net, by default)
  • <ID> is the index of the concept in the knowledge base

For instance, the concept "a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs", which is lexicalized in English through the noun "table", may be located through:

ucl://unlkb.unlweb.net/104379964

This address is expected to bring all the information concerning the concept, i.e., it's definition in UNL, which may be used by the languages where this concept is not lexicalized.

UCN (Uniform Concept Name)

Uniform Concept Names (UCN) use the ucn scheme and, as URN's, do not imply availability of the identified resource. They are represented as:

ucn:<LID>:<NSS>

Where

  • ucn is the scheme name for uniform concept names
  • <LID> is namespace identifier, which corresponds to the three-character ISO 639-2 code for languages
  • <NSS> is the namespace-specific string

For instance, the concept "a piece of furniture having a smooth flat top that is usually supported by one or more vertical legs", which is lexicalized in English through the noun "table" may be associated to several different names:

ucn:eng:table(icl>furniture)
ucn:fra:table(icl>mobilier)
ucn:esp:mesa(icl>mobiliario)
ucn:deu:Tisch(icl>Möbel)
ucn:rus:стол(icl>мебель)

UCN's must be unique and the namespace-specific string is normally split into two different parts: a root and a suffix, as exemplified above. The root can be a word or a multi-word expression. The suffix, which is always introduced by a UNL relation, is used to disambiguate the root.

UCL or UCN?

UCL and UCN are both UCI's (i.e., uniform concept identifiers). They are both used to identify UW's. The difference is that UCL is an address to the position of the UW in the UNL Knowledge Base, whereas the UCN is only the name of the UW. The same address (i.e., UCL) may be associated to different UCN's, but a single UCN may not have more than one UCL. A UCL always describe an available UW, i.e., a UW that has been already defined in the UNL KB, whereas a UCN is not necessarily linked to an address. In that sense, UCL's are more "official" than UCN's, which are normally used in order to preserve the readability of the UNL code.

Simplified Notation

In the UNL Document Structure, UCI's are always abbreviated to the last part, because the scheme, the authority and the namespace may be inferred from the document header. For instance:

  • 104379964 instead of ucl://unlkb.unlweb.net/104379964
  • table(icl>furniture) instead of ucn:eng:table(icl>furniture)

Formal Syntax

<UCI>        ::= <UCL>|<UCN>
<UCL>        ::= "ucl://"<AUTHORITY>"/"<PATH>
<AUTHORITY>  ::= <UTF-8 character>+
<ID>         ::= [0123456789]+
<UCN>        ::= "ucn://"<LID>":"<NSS>
<LID>        ::= [a-z]{3}
<NSS>        ::= <root>[<suffix>]
<root>       ::= <UTF-8 character>+
<suffix>     ::= "("<relation>{">","<"}<root>")"
<relation>   ::= {"agt","and","aoj",...}

where:
+ to be repeated 1 or more times
< > variable
" " terminal symbol
::= ... is defined as ...
| or
[ ] optional element
{ } alternative element
... to be repeated more than 0 times

Software