NL Memory

From UNLwiki
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

The NL Memory constitutes a list of syntactic (subcategorization) frames between natural language words or terms that co-occur more often than would be expected by chance. They are used to represent collocations, i.e., partly or fully fixed expressions that become established through repeated context-dependent use.

The NL Memory may be provided in two different formats:


Extended format

NL Memory entries in extended format must have the following structure:

<relation name="RNAME" frequency="RFREQ">
  <source id="SID" attribute="ATT" lang="<LID>">SOURCE</source>
  <target id="TID" attribute="ATT" lang="<LID>">TARGET</target>
</relation>

Where:
RNAME is the name of a syntactic relation ("NA", "NC", "NS", etc);
RFREQ is the frequency of the relation RNAME between the SOURCE and the TARGET in the corpus;
SID is a number used to identify the SOURCE;
TID is a number used to identify the TARGET;
ATT is a set of attribute-value pairs that apply to the SOURCE or to the TARGET ("POS=NOU", "GEN=NEU", etc);
SOURCE is the source node of the syntactic relation;
TARGET is the target node of the syntactic relation;
<LID> is the ISO 639-2 three-character code for the language.

XML Schema

<?xml version="1.0" encoding="utf-16"?>
<xsd:schema attributeFormDefault="unqualified" elementFormDefault="qualified" version="1.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <xsd:element name="nlm">
   <xsd:complexType>
     <xsd:sequence>
       <xsd:element maxOccurs="unbounded" name="relation">
         <xsd:complexType>
           <xsd:sequence>
             <xsd:element name="source">
               <xsd:complexType>
                 <xsd:attribute name="id" type="xsd:unsignedLong" use="required" />
                 <xsd:attribute name="attribute" type="xsd:string" use="optional" />
                 <xsd:attribute name="lang" type="xsd:string" use="optional" />
               </xsd:complexType>
             </xsd:element>
             <xsd:element name="target">
               <xsd:complexType>
                 <xsd:attribute name="id" type="xsd:unsignedLong" use="required"/>
                 <xsd:attribute name="attribute" type="xsd:string" use="optional" />
                 <xsd:attribute name="lang" type="xsd:string" use="optional"/>
              </xsd:complexType>
             </xsd:element>
           </xsd:sequence>
           <xsd:attribute name="name" type="xsd:string" use="required"/>
           <xsd:attribute name="frequency" type="xsd:int" use="optional"/>
         </xsd:complexType>
       </xsd:element>
     </xsd:sequence>
   </xsd:complexType>
 </xsd:element>
</xsd:schema>

Simplified format

NL Memory entries in simplified format must have the structure of network disambiguation rules, as follows:

RELATION(SOURCE;TARGET)=DC;

Where:
RELATION is the name of a syntactic relation ("NA", "NC", "NS", etc.);
SOURCE is the source node of the syntactic relation, and the corresponding attributes, if necessary;
TARGET is the target node of the syntactic relation, and the corresponding attributes, if necessary;
DC is the degree of certainty (i.e., the likelihood of the relation between the SOURCE and the TARGET), ranging from 0 (impossible) to 255 (necessary)
The SOURCE and the TARGET nodes may be referred as:

  • constants (i.e., specific natural language words), to be represented between square brackets, if lemmas, or between quotes, if strings: [United States] and "United States"
  • a feature (attribute, value, or attribute-value pair) or set of features of a group of natural language: LEX=NOU, GEN=MCL, etc.

Examples

NS([United States];[the])=1; (The lemma [United States] requires the specifier [the])