English grammar

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Determiners)
(Structure)
 
(75 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Conjunctions ==
+
The English grammars follow, in general, the [[X-bar]] approach, with some adaptations. They are used for transforming English sentences into UNL ([[UNLization]]) and for generating English sentences out of UNL graphs ([[NLization]]). They follow the syntax defined at the [[UNL Grammar Specs]] and the tags described at the [[Tagset]].
<nowiki>*</nowiki> indicates optional representation
+
  
{|border="1" cellpadding="5"
+
== Files ==
!Conjunction
+
 
!Attribute
+
{|border=1 cellpadding=5 align=center
!English
+
|+UNLization
!UNL
+
!Corpus
 +
!Dictionary<ref>Two dictionaries are necessary for each language: the language-specific dictionary, and the [[Default Dictionary]], which contains language-independent entries, such as punctuation signs and regular expressions. The default dictionary must be loaded after the language-specific dictionary.</ref>
 +
!T-Grammar<ref>Three t-grammars are necessary for each language: the [[Standardization grammar]], the language-specific grammar, and the [[Default grammar]]. The standardization grammar and the default grammar are language-independent. The grammars must be loaded in this order: 1) standardization, 2) language-specific, and 3) default.</ref>
 +
!D-Grammar
 +
!Output
 +
!F-Measure
 
|-
 
|-
|after||@after||The books will be sent to the library after I have read them.||tim(send, read.@after)
+
|[http://www.unlweb.net/resources/corpus/UCA1/UCA1_eng.txt UC-A1 in English]
 +
|[http://www.unlweb.net/resources/dic/UCA1/eng_unl_dic.txt ENG-UNL Dictionary]<br />[http://www.unlweb.net/resources/dic/default_dic.txt Default Dictionary]
 +
|[http://www.unlweb.net/resources/grammar/s-grammar.txt Standardization Grammar]<br />[http://www.unlweb.net/resources/grammar/UCA1/eng_unl_tgrammar.txt ENG-UNL T-Grammar]<br />[http://www.unlweb.net/resources/grammar/nl_unl_tgrammar.txt Default T-Grammar]
 +
|[http://www.unlweb.net/resources/grammar/UCA1/eng_unl_dgrammar.txt ENG-UNL D-Grammar]
 +
|[http://www.unlweb.net/resources/output/UCA1/eng_unl.txt ENG>UNL]
 +
|1.000
 
|-
 
|-
|although||@although||Although they have arrived early they could not enter.||seq(enter, arrive.@although)
+
|[http://www.unlweb.net/resources/corpus/UCA2/UCA2_eng.txt UC-A2 in English]
 +
|[http://www.unlweb.net/resources/dic/UCA2/eng_unl_dic.txt ENG-UNL Dictionary]<br />[http://www.unlweb.net/resources/dic/default_dic.txt Default Dictionary]
 +
|[http://www.unlweb.net/resources/grammar/s-grammar.txt Standardization Grammar]<br />[http://www.unlweb.net/resources/grammar/UCA2/eng_unl_tgrammar.txt ENG-UNL T-Grammar]<br />[http://www.unlweb.net/resources/grammar/nl_unl_tgrammar.txt Default T-Grammar]
 +
|[http://www.unlweb.net/resources/grammar/UCA2/eng_unl_dgrammar.txt ENG-UNL D-Grammar]
 +
|[http://www.unlweb.net/resources/output/UCA2/eng_unl.txt ENG>UNL]
 +
|1.000
 
|-
 
|-
|and||@and*||He sold an apartment and bought a country-house.||and(buy, sell.@and) or and(buy, sell)  
+
|[http://www.unlweb.net/resources/corpus/UCB1/UCB1_eng.txt UC-B1 in English]
 +
|[http://www.unlweb.net/resources/dic/UCB1/eng_unl_dic.txt ENG-UNL Dictionary]<br />[http://www.unlweb.net/resources/dic/default_dic.txt Default Dictionary]
 +
|[http://www.unlweb.net/resources/grammar/s-grammar.txt Standardization Grammar]<br />[http://www.unlweb.net/resources/grammar/UCB1/eng_unl_tgrammar.txt ENG-UNL T-Grammar]<br />[http://www.unlweb.net/resources/grammar/nl_unl_tgrammar.txt Default T-Grammar]
 +
|[http://www.unlweb.net/resources/grammar/UCB1/eng_unl_dgrammar.txt ENG-UNL D-Grammar]
 +
|[http://www.unlweb.net/resources/output/UCB1/eng_unl.txt ENG>UNL]
 +
|1.000
 +
|}
 +
<br />
 +
{|border=1 cellpadding=5 align=center
 +
|+NLization
 +
!Corpus
 +
!Dictionary<ref>Two dictionaries are necessary for each language: the language-specific dictionary, and the [[Default Dictionary]], which contains language-independent entries, such as punctuation signs and regular expressions. The default dictionary must be loaded after the language-specific dictionary.</ref>
 +
!T-Grammar<ref>Three t-grammars are necessary for each language: the [[Standardization grammar]], the language-specific grammar, and the [[Default grammar]]. The standardization grammar and the default grammar are language-independent. The grammars must be loaded in this order: 1) standardization, 2) language-specific, and 3) default.</ref>
 +
!D-Grammar
 +
!Output
 +
!F-Measure
 
|-
 
|-
|as||@as*||The situation is not so bad as you suggest.||man(bad, suggest.@as) or man(bad, suggest)
+
|[http://www.unlweb.net/resources/corpus/UCA1/UCA1_unl.txt UC-A1 in UNL]
 +
|[http://www.unlweb.net/resources/dic/UCA1/unl_eng_dic.txt UNL-ENG Dictionary]<br />[http://www.unlweb.net/resources/dic/default_dic.txt Default Dictionary]
 +
|[http://www.unlweb.net/resources/grammar/s-grammar.txt Standardization Grammar]<br />[http://www.unlweb.net/resources/grammar/UCA1/unl_eng_tgrammar.txt UNL-ENG T-Grammar]<br />[http://www.unlweb.net/resources/grammar/unl_nl_tgrammar.txt Default T-Grammar]
 +
|[http://www.unlweb.net/resources/grammar/UCA1/unl_eng_dgrammar.txt UNL-ENG D-Grammar]
 +
|[http://www.unlweb.net/resources/output/UCA1/unl_eng.txt UNL>ENG]
 +
|
 
|-
 
|-
|as||@because*||It`s empty, as it had been upside down||rsn(empty, upside.@because) or  rsn(empty, upside)
+
|[http://www.unlweb.net/resources/corpus/UCA2/UCA2_unl.txt UC-A2 in UNL]
 +
|[http://www.unlweb.net/resources/dic/UCA2/unl_eng_dic.txt UNL-ENG Dictionary]<br />[http://www.unlweb.net/resources/dic/default_dic.txt Default Dictionary]
 +
|[http://www.unlweb.net/resources/grammar/s-grammar.txt Standardization Grammar]<br />[http://www.unlweb.net/resources/grammar/UCA2/unl_eng_tgrammar.txt UNL-ENG T-Grammar]<br />[http://www.unlweb.net/resources/grammar/unl_nl_tgrammar.txt Default T-Grammar]
 +
|[http://www.unlweb.net/resources/grammar/UCA2/unl_eng_dgrammar.txt UNL-ENG D-Grammar]
 +
|[http://www.unlweb.net/resources/output/UCA2/unl_eng.txt UNL>ENG]
 +
|
 
|-
 
|-
|as||@when*||His hands trembled as he spoke.||tim(tremble, speak.@when) or tim(tremble, speak)
+
|[http://www.unlweb.net/resources/corpus/UCB1/UCB1_unl.txt UC-A1 in UNL]
|-
+
|[http://www.unlweb.net/resources/dic/UCB1/unl_eng_dic.txt UNL-ENG Dictionary]<br />[http://www.unlweb.net/resources/dic/default_dic.txt Default Dictionary]
|as if||@as.@if||He gives orders as if he were the master of the house.||man(give, master.@as.@if)
+
|[http://www.unlweb.net/resources/grammar/s-grammar.txt Standardization Grammar]<br />[http://www.unlweb.net/resources/grammar/UCB1/unl_eng_tgrammar.txt UNL-ENG T-Grammar]<br />[http://www.unlweb.net/resources/grammar/unl_nl_tgrammar.txt Default T-Grammar]
|-
+
|[http://www.unlweb.net/resources/grammar/UCB1/unl_eng_dgrammar.txt UNL-ENG D-Grammar]
|as though||@as.@if||He treated me as though I were a stranger.|| man(treat, stranger.@as.@if)
+
|[http://www.unlweb.net/resources/output/UCB1/unl_eng.txt UNL>ENG]
|-
+
|
|as well as||@and*||Robert, as well as Smith, deservers punishment.||and(Smith.@and, Robert) or and(Smith, Robert)
+
|-
+
|because||@because*||He said he cannot go because he is very busy.||rsn(go, busy.@because) or rsn(go, busy)
+
|-
+
|before||@before||Look before you leap.||tim(look, leap.@before)
+
|-
+
|both... and||@and*||He both speaks and writes perfectly.||and(write.@and, speak) or and(write, speak)
+
|-
+
|but||@but||He is young but sensible.||and(sensible.@but, young)
+
|-
+
|either... or||@or*||He will arrive either tomorrow or the day after tomorrow.|| or(day.@or, tomorrow) or or(day, tomorrow)
+
|-
+
|even if||@even.@if||I intend to go even if it rains.||con(go, rain.@even.@if) or con(go, rain.@even)
+
|-
+
|even though||@although||Even though I was really tired, I couldn`t sleep.||seq(sleep, tired.@although)
+
|-
+
|except if||@except.@if||I intend to go except if it rains.||con(go, rain.@except.@if) or(go, rain.@except)
+
|-
+
|for||@because*||We must go, for it is late.||rsn(go, late.@because) or rsn(go, late)
+
|-
+
|if||-||I asked him if he intended to travel this month.||obj(ask, intend)
+
|-
+
|if||@if*||If you promise to come, I will wait for you.||con(wait, promise.@if) or con(wait, promise)
+
|-
+
|in case (that)||@in_case||He wears two watches in case one of them stops.||rsn(wear, stop.@in_case)
+
|-
+
|neither... nor||@and*||He drinks neither tea nor milk.||and(milk.@and.@not, tea.@not) or and(milk.@not, tea.@not)
+
|-
+
|nor||@and*||He would not buy it nor would he accept it as a gift.||and(accept.@and.@not, buy.@not) or and(accept.@not, buy.@not)
+
|-
+
|not only... but also||@and*||Not only does Sue raise money for the symphony, but she also ushers at all of their concerts.||and(usher.@and, raise) or and(usher, raise)
+
|-
+
|only if||.@if.@only||Only if you promise to come, I will wait for you.||con(wait, promise.@if.@only) or con(wait, promise.@only)
+
|-
+
|or||.@or*||You must prove that you are right or apologize||or(apologize.@or, prove) or or(apologize, prove)
+
|-
+
|since||@because*||Since you don`t like this model, I`ll show you another one|| rsn(show, like.@because) or rsn(show, like)
+
|-
+
|since||@since*||What have you been doign since I last say you?||tmf(do, say.@since) or tmf(do, say)
+
|-
+
|so||@so*||We are late for the train, so we must take a taxi.||seq(take, late.@so) or seq(take, late)
+
|-
+
|so that||@so*||He preferred to work in the morning, so that he might be free in the afternoon.||seq(free, prefer.@so) or seq(free, prefer)
+
|-
+
|than||-||You are taller than he (is).||bas(tall, he)
+
|-
+
|that||-||I know that is impossible.||obj(know, impossible)
+
|-
+
|that||@because*||He is so hoarse that we can hardly hear what he says.||rsn(hear, hoarse.@because) or rsn(hear, hoarse)
+
|-
+
|that||@so*||He ran that he might arrive in time.||seq(arrive.@so, run) or seq(arrive, run)
+
|-
+
|then||@so*||Our expenses will be very heavy: we will have to buy a great number of books; then we`ll have to pay several debts.||seq(pay.@so, buy) or seq(pay, buy)
+
|-
+
|though||@although||||
+
|-
+
|till||@until*||Wait till the day breaks.|| tmt(wait, break.@until) or tmt(wait, break)
+
|-
+
|unless||@unless||I shall go unless it rains.||con(go, rain.@unless)
+
|-
+
|until||@until*||Wait until the day breaks.||tmt(wait, break.@until) or tmt(wait, break)
+
|-
+
|when||@when*||When I arrive I will write.||tim(write, arrive.@when) or tim(write, arrive)
+
|-
+
|whereas||@whereas*||You didn`t work yesterday, whereas he worked till midnight.||coo(work, work.@whereas) or coo(work, work)
+
|-
+
|whether||-||I asked him whether he intended to travel this month.||obj(ask, travel)
+
|-
+
|whether... or||@or*||I would like to know whether he is in France or in England.||or(England, France)
+
|-
+
|while||@whereas*||You were punctual , while he is always late.||coo(punctual, late.@whereas) or coo(punctual, late)
+
|-
+
|while||@while*||Remain standing while they sing.||dur(stand, sing.@while) or dur(stand, sing)
+
|-
+
|yet||@but||He is extremely poor, yet he is as happy as a king.||and(king.@but, poor)
+
 
|}
 
|}
  
== Determiners ==  
+
== Structure ==
 +
The English grammars are '''unidirectional'''. There is a grammar for UNLization (the ENG->UNL Analysis Grammar) and another grammar for NLization (the UNL->ENG Generation Grammar). The former takes natural languages sentences as inputs and provides the corresponding UNL graphs as outputs; the latter takes UNL graphs as inputs and provides the corresponding English sentences as outputs.
  
{| border="1"  cellpadding="5"
+
The English grammars are of two types: the '''transformation grammar''', or simply [[t-grammar]], which is used to manipulate data structures (i.e., to convert lists into trees, trees into networks, networks into a trees, trees into lists); and the '''disambiguation grammar''', or simply [[d-grammar]], which is used to control the behavior of the t-grammar (by prohibiting or inducing some of its possibilities).
!Determiner
+
!Attribute
+
!English
+
!UNL
+
|-
+
|a, an||@indef||a book||book.@indef
+
|-
+
|a few||@paucal||a few books||book.@paucal
+
|-
+
|a little||@paucal||a little ammount||ammount.@paucal
+
|-
+
|a lot of||@multal||a lot of books||book.@multal
+
|-
+
|all||@all||all books||book.@all
+
|-
+
|another||@other||another book||book.@other
+
|-
+
|any||@any||any book||book.@any
+
|-
+
|both||@both||both books||book.@both
+
|-
+
|each||@each||each book||book.@each
+
|-
+
|either||@either||either book||book.@either
+
|-
+
|else||@other||something else||something.@other
+
|-
+
|every||@every||every book||book.@every
+
|-
+
|few||@paucal||few books||book.@paucal
+
|-
+
|little||@paucal||little damage||damage.@paucal
+
|-
+
|many||@multal||many books||book.@multal
+
|-
+
|most||@most||most books||book.@most
+
|-
+
|much||@multal||much effort||effort.@multal
+
|-
+
|neither||@neither||neither foot||foot.@neither
+
|-
+
|no||@not||no book||book.@not
+
|-
+
|other||@other||other book||book.@other
+
|-
+
|own||@own||own book||book.@own
+
|-
+
|same||@same||same book||book.@same
+
|-
+
|several||@multal||several books||book.@multal
+
|-
+
|some||@paucal||some books||book.@paucal
+
|-
+
|such||@such||such books||book.@such
+
|-
+
|that, those||@distal||that book||book.@distal
+
|-
+
|the||@def||the book||book.@def
+
|-
+
|this, these||@proximal||this book||book.@proximal
+
|-
+
|what||@wh||what book||book.@wh
+
|-
+
|whatever||@wh||whatever book||book.@wh
+
|-
+
|which||@wh||which book||book.@wh
+
|-
+
|whichever||@wh||whichever book||book.@wh
+
|}
+
  
== Prepositions ==
+
The grammars used to UNLize English sentences and to English-ize UNL graphs are actually made of three modules:
*[[for]]
+
*the [[Standardization grammar]], which is used to standardize the feature structure;
*[[in]]
+
*the '''English Grammar''' itself, which contains rules that are specific to English; and
*[[of]]
+
*the and the  [[Default grammar]], which contain language-independent transformation rules.
*[[to]]
+
The Standardization Grammar and the Default Grammar are used by all languages, and not only English. <br />
 +
The Standardization Grammar is bidirectional, i.e., the same grammar is used both in UNLization and NLization. The other two grammars are unidirectional.<br />
 +
The Standardization Grammar must be loaded first, because the other grammars depend on the normalized feature structure; the English Grammar must be loaded after the standardization grammar; and the Default Grammar is loaded be after the other two.
  
== Verbs ==
+
== Features ==
* [[be]]
+
The grammars play with a set of features that come from three different sources:
 +
*'''Dictionary features''' are the features ascribed to the entries in the dictionary, and appear as attribute-value pairs (LEX=N,GEN=MCL,NUM=SNG).
 +
*'''System-defined features''' are features automatically assigned by EUGENE and IAN during the processing. They are the following:
 +
**SHEAD = beggining of the sentence (system-defined feature assigned automatically by the machine)
 +
**CHEAD = beginning of a scope (system-defined feature assigned automatically by the machine)
 +
**STAIL = end of the sentence (system-defined feature assigned automatically by the machine)
 +
**CTAIL = end of a scope (system-defined feature assigned automatically by the machine)
 +
**TEMP = temporary entry (system-defined feature assigned to the strings that are not present in the dictionary)
 +
**SCOPE = scopes entry (system-defined feature assigned to hyper-nodes)
 +
**DIGIT = digits (system-defined feature assigned to digits)
 +
*'''Grammar features''' are features created inside the grammar in any of its intermediate states between the input and the output.
 +
The dictionary and system-defined features are described at the [[Tagset]].
  
=== Verb forms ===
+
== UNLization (ENG->UNL) ==
{| border=1 cellpadding=2
+
The UNLization process is performed in three different steps:
!Form
+
<ol>
!Tag
+
<li>[[Segmentation]] of English sentences is done automatically by the machine. It uses some punctuation signs (such as ".","?","!") and special characters (end of line, end of paragraph) as sentence boundaries. As the sentences are provided one per line, this step does not require any action from the grammar developer.</li>
!UNL
+
<li>[[Tokenization]] of each sentence is done against the dictionary entries, from left to right, following the principle of the longest first. As there are several lexical ambiguities, some disambiguation rules are required to induce the correct lexical choice. The tokenization is done with the [[English Disambiguation Grammar]].</li>
!Example
+
<li>[[Transformation]] applies after tokenization and is divided in three modules:</li>
|-
+
<ol>
|Simple present
+
<li>Standardization, which is simply the standardization of the feature structure, carried out by the [[Standardization grammar]]</li>
|PRS
+
<li>English-specific transformation is performed by the ENG->UNL T-Grammar and is divided in two steps:
|@present
+
<ol>
|He speaks = speak.@present
+
<li>'''Morphology''', where English features (such as PLR, PAS and [not]) are mapped into attributes (@pl, @past and @not, respectively).</li>
|-
+
<li>'''Syntax''', where structures that are specific to English (such as determiners, compounds and coordination) are mapped into UNL.</li>
|Present progressive
+
</ol>
|PRS&PGS
+
<li>General transformation is performed by the [[Default grammar]] and is divided in five steps:
|@present.@progressive
+
#'''Pre-processing''' (prepares the input for the processing)
|He is speaking = speak.@present.@progressive)
+
#'''Parsing''' (converts the input list structure into a tree structure)
|-
+
#'''Transformation''' (converts the surface tree struture into the deep tree structure)
|Simple past
+
#'''Dearborization''' (converts the tree structure into a network structure)
|PAS
+
#'''Interpretation''' (converts the syntactic network into a semantic network)
|@past
+
#'''Post-processing''' (adjusts the final output)
|He spoke = speak.@past
+
</ol>
|-
+
</ol>
|Past progressive
+
=== Examples of ENG->UNL Transformation Rules ===
|PAS&PGS
+
(N,PLR,^@pl,^@multal,^@paucal,^@all):=(+att=@pl);
|@past.@progressive
+
:assigns the attribute @pl to plural nouns (books > book.@pl). In order to avoid redundancy, the system checks whether the word will not receive any other plural attribute (such as @multal, @paucal and @all)
|He was speaking = speak.@past.@progressive
+
(MOV,%x)(V,%y):=(%y,+att=%x);
|-
+
:copies the attributes from the modal verb (%x) to the main verb (%y) and deletes the modal verb (must.@obligation kill > kill.@obligation). Attributes of modal verbs are assigned in the dictionary.
|rowspan="3"|Present Perfect Simple
+
(VB,%x)(FPR):=(%x,+att=@reflexive);
|rowspan="3"|PRS&PFC
+
:assigns the feature @reflexive to the verb if followed by a reflexive pronoun, and deletes the reflexive pronoun (kill himself > kill.@reflexive)
|@past (finished action that has an influence on the present)
+
(D,att,%x)(NB,%y)({^N|PUT|STAIL|CTAIL},%right):=(%y,+att=%x)(%right);
|He has spoken many times about that = speak.@past
+
:copies the attributes of the determiner to noun phrase (the.@def book > book.@def). Attributes of determiners are assigned in the dictionary. The rule only applies if the noun phrase is not followed by a noun or if it is followed by a punctuation sign, the end of sentence or the end of scope.
|-
+
(XP,%x)([and])(XP=%x,%y):=(and(%y;%x),+LEX=%x,+XP=%x,+rel=and,%xy);
|@present.@perfect (action that is still going on)
+
:creates the relation "and" between two maximal projections of the same category isolated by the conjunction "and" (John and Mary > and(Mary,John).
|He has spoken since yesterday = speak.@present.@perfect
+
 
|-
+
== NLization (UNL->ENG) ==
|@past.@recent (action that stopped recently)
+
The NLization process is performed in three different steps:
|He has just spoken = speak.@past.@recent
+
<ol>
|-
+
<li>[[Segmentation]] of UNL sentences is done automatically by the machine. It uses the [[UNL document structure]] to split the input UNL document into a set of sentences to be processed one at a time.</li>
|rowspan="3"|Present Perfect Progressive
+
<li>[[Tokenization]] of each sentence is done against the dictionary entries, following the principle of the highest priority first. As there are several lexical ambiguities, some disambiguation rules are required to induce the correct lexical choice. The tokenization is done with the [[English Disambiguation Grammar]].</li>
|rowspan="3"|PRS&PFC&PGS
+
<li>[[Transformation]] applies after tokenization and is divided in two modules:</li>
|@past.@progressive (finished action that has an influence on the present)
+
<ol>
|He has been speaking many times about that = speak.@past.@progressive
+
<li>Standardization, which is simply the standardization of the feature structure, carried out by the [[Standardization grammar]]</li>
|-
+
<li>English-specific transformation is performed by the UNL->ENG T-Grammar and is divided in three steps:
|@present.@perfect.@progressive (action that is still going on)
+
<ol>
|He has been speaking since yesterday = speak.@present.@perfect.@progressive
+
<li>'''Semantics''', where relations and attributes of UNL are mapped into English structures.</li>
|-
+
<li>'''Morphology''', where the paradigms are copied from the grammar to each entry.</li>
|@past.@recent.@progressive (action that stopped recently)
+
<li>'''Post-processing''', where the output list is adjusted to the English standards.</li>
|He has just been speaking = speak.@past.@recent.@progressive
+
</ol>
|-
+
<li>General transformation is performed by the [[Default grammar]] and is divided in six steps:
|Past Perfect Simple
+
#'''Pre-processing''' (prepares the input for the processing)
|PAS&RPT
+
#'''Arborization''' (converts the syntactic network into a syntactic tree)
|@past.@anterior
+
#'''Transformation''' (converts the deep syntactic structure into the surface syntactic structure)
|He had not spoken = speak.@past.@anterior)
+
#'''Linearization''' (converts the syntactic structure into a list structure)
|-
+
#'''Morphological generation''' (inflects the words that need to be inflected)
|Past Perfect Progressive
+
#'''Post-processing''' (adjusts the final output)
|PAS&RPT&PGS
+
</ol>
|@past.@anterior.@progressive
+
</ol>
|He had been speaking. = speak.@past.@anterior.@progressive
+
=== Examples of UNL->ENG Transformation Rules ===
|-
+
agt(%x,V;%y,N):=VS(%x,PER=%y;%y,-CAS,+CAS=NOM);
|Future Simple
+
:transforms the agent relation between a verb and a noun into verb specifier relation between the verb and the noun: agt(kill,he) > VS(kill,he)
|FUT
+
(%x,N,@def):=(NS(%x,-@def;%y,[the],LEX=D,POS=ART),+LEX=N);
|@future
+
:transforms the attribute @def into a noun specifier relation between the noun and the determiner "the": book.@def > NS(book,the)
|He will speak = speak.@future
+
(%x,@pl):=(%x,-@pl,-NUM,+NUM=PLR);
|-
+
:assigns the feature NUM=PLR to the words containing the attribute @pl
|Near future
+
(%x,>AND):=(%x,->AND,+>BLK)([and],LEX=C,POS=CCJ,+>BLK);
|FUN
+
:generates the conjunction "and" to the right of the words containing the feature ">AND"
|@future.@recent
+
(D,%d)([all],%all):=(%all)(%d);
|He is going to speak = speak.@future.@recent
+
:reverts the order between determiners and "all": the all books > all the books, my all books > all my books
|-
+
 
|Future Progressive
+
== Notes ==
|FUT&PGS
+
<references />
|@future.@progressive
+
|He will be speaking = speak.@future.@progressive.
+
|-
+
|Future Perfect
+
|FUT&RPT
+
|@future.@anterior
+
|He will have spoken = speak.@future.@anterior
+
|-
+
|Future Perfect Progressive
+
|FUT&RPT&PGS
+
|@future.@anterior.@progressive
+
|He will have been speaking = speak.@future.@anterior.@progressive
+
|-
+
|Conditional
+
|CON
+
|@past.@posterior
+
|He would speak = speak.@past.@posterior
+
|-
+
|Conditional Progressive
+
|CON&PGS
+
|@past.@posterior.@progressive
+
|He would be speaking = speak.@past.@posterior.@progressive
+
|-
+
|Conditional Perfect
+
|CON&PFC
+
|@past.@posterior.@perfective
+
|He would have spoken = speak.@past.@posterior.@perfective
+
|-
+
|Conditional Perfect Progressive
+
|CON&RPT&PGS
+
|@past.@posterior.@perfective.@progressive
+
|He would have been speaking = speak.@past.@posterior.@perfective.@progressive)
+
|}
+

Latest revision as of 21:02, 14 August 2013

The English grammars follow, in general, the X-bar approach, with some adaptations. They are used for transforming English sentences into UNL (UNLization) and for generating English sentences out of UNL graphs (NLization). They follow the syntax defined at the UNL Grammar Specs and the tags described at the Tagset.

Contents

Files

UNLization
Corpus Dictionary[1] T-Grammar[2] D-Grammar Output F-Measure
UC-A1 in English ENG-UNL Dictionary
Default Dictionary
Standardization Grammar
ENG-UNL T-Grammar
Default T-Grammar
ENG-UNL D-Grammar ENG>UNL 1.000
UC-A2 in English ENG-UNL Dictionary
Default Dictionary
Standardization Grammar
ENG-UNL T-Grammar
Default T-Grammar
ENG-UNL D-Grammar ENG>UNL 1.000
UC-B1 in English ENG-UNL Dictionary
Default Dictionary
Standardization Grammar
ENG-UNL T-Grammar
Default T-Grammar
ENG-UNL D-Grammar ENG>UNL 1.000


NLization
Corpus Dictionary[3] T-Grammar[4] D-Grammar Output F-Measure
UC-A1 in UNL UNL-ENG Dictionary
Default Dictionary
Standardization Grammar
UNL-ENG T-Grammar
Default T-Grammar
UNL-ENG D-Grammar UNL>ENG
UC-A2 in UNL UNL-ENG Dictionary
Default Dictionary
Standardization Grammar
UNL-ENG T-Grammar
Default T-Grammar
UNL-ENG D-Grammar UNL>ENG
UC-A1 in UNL UNL-ENG Dictionary
Default Dictionary
Standardization Grammar
UNL-ENG T-Grammar
Default T-Grammar
UNL-ENG D-Grammar UNL>ENG

Structure

The English grammars are unidirectional. There is a grammar for UNLization (the ENG->UNL Analysis Grammar) and another grammar for NLization (the UNL->ENG Generation Grammar). The former takes natural languages sentences as inputs and provides the corresponding UNL graphs as outputs; the latter takes UNL graphs as inputs and provides the corresponding English sentences as outputs.

The English grammars are of two types: the transformation grammar, or simply t-grammar, which is used to manipulate data structures (i.e., to convert lists into trees, trees into networks, networks into a trees, trees into lists); and the disambiguation grammar, or simply d-grammar, which is used to control the behavior of the t-grammar (by prohibiting or inducing some of its possibilities).

The grammars used to UNLize English sentences and to English-ize UNL graphs are actually made of three modules:

  • the Standardization grammar, which is used to standardize the feature structure;
  • the English Grammar itself, which contains rules that are specific to English; and
  • the and the Default grammar, which contain language-independent transformation rules.

The Standardization Grammar and the Default Grammar are used by all languages, and not only English.
The Standardization Grammar is bidirectional, i.e., the same grammar is used both in UNLization and NLization. The other two grammars are unidirectional.
The Standardization Grammar must be loaded first, because the other grammars depend on the normalized feature structure; the English Grammar must be loaded after the standardization grammar; and the Default Grammar is loaded be after the other two.

Features

The grammars play with a set of features that come from three different sources:

  • Dictionary features are the features ascribed to the entries in the dictionary, and appear as attribute-value pairs (LEX=N,GEN=MCL,NUM=SNG).
  • System-defined features are features automatically assigned by EUGENE and IAN during the processing. They are the following:
    • SHEAD = beggining of the sentence (system-defined feature assigned automatically by the machine)
    • CHEAD = beginning of a scope (system-defined feature assigned automatically by the machine)
    • STAIL = end of the sentence (system-defined feature assigned automatically by the machine)
    • CTAIL = end of a scope (system-defined feature assigned automatically by the machine)
    • TEMP = temporary entry (system-defined feature assigned to the strings that are not present in the dictionary)
    • SCOPE = scopes entry (system-defined feature assigned to hyper-nodes)
    • DIGIT = digits (system-defined feature assigned to digits)
  • Grammar features are features created inside the grammar in any of its intermediate states between the input and the output.

The dictionary and system-defined features are described at the Tagset.

UNLization (ENG->UNL)

The UNLization process is performed in three different steps:

  1. Segmentation of English sentences is done automatically by the machine. It uses some punctuation signs (such as ".","?","!") and special characters (end of line, end of paragraph) as sentence boundaries. As the sentences are provided one per line, this step does not require any action from the grammar developer.
  2. Tokenization of each sentence is done against the dictionary entries, from left to right, following the principle of the longest first. As there are several lexical ambiguities, some disambiguation rules are required to induce the correct lexical choice. The tokenization is done with the English Disambiguation Grammar.
  3. Transformation applies after tokenization and is divided in three modules:
    1. Standardization, which is simply the standardization of the feature structure, carried out by the Standardization grammar
    2. English-specific transformation is performed by the ENG->UNL T-Grammar and is divided in two steps:
      1. Morphology, where English features (such as PLR, PAS and [not]) are mapped into attributes (@pl, @past and @not, respectively).
      2. Syntax, where structures that are specific to English (such as determiners, compounds and coordination) are mapped into UNL.
    3. General transformation is performed by the Default grammar and is divided in five steps:
      1. Pre-processing (prepares the input for the processing)
      2. Parsing (converts the input list structure into a tree structure)
      3. Transformation (converts the surface tree struture into the deep tree structure)
      4. Dearborization (converts the tree structure into a network structure)
      5. Interpretation (converts the syntactic network into a semantic network)
      6. Post-processing (adjusts the final output)

Examples of ENG->UNL Transformation Rules

(N,PLR,^@pl,^@multal,^@paucal,^@all):=(+att=@pl); 
assigns the attribute @pl to plural nouns (books > book.@pl). In order to avoid redundancy, the system checks whether the word will not receive any other plural attribute (such as @multal, @paucal and @all)
(MOV,%x)(V,%y):=(%y,+att=%x); 
copies the attributes from the modal verb (%x) to the main verb (%y) and deletes the modal verb (must.@obligation kill > kill.@obligation). Attributes of modal verbs are assigned in the dictionary.
(VB,%x)(FPR):=(%x,+att=@reflexive);
assigns the feature @reflexive to the verb if followed by a reflexive pronoun, and deletes the reflexive pronoun (kill himself > kill.@reflexive)
(D,att,%x)(NB,%y)({^N|PUT|STAIL|CTAIL},%right):=(%y,+att=%x)(%right); 
copies the attributes of the determiner to noun phrase (the.@def book > book.@def). Attributes of determiners are assigned in the dictionary. The rule only applies if the noun phrase is not followed by a noun or if it is followed by a punctuation sign, the end of sentence or the end of scope.
(XP,%x)([and])(XP=%x,%y):=(and(%y;%x),+LEX=%x,+XP=%x,+rel=and,%xy);
creates the relation "and" between two maximal projections of the same category isolated by the conjunction "and" (John and Mary > and(Mary,John).

NLization (UNL->ENG)

The NLization process is performed in three different steps:

  1. Segmentation of UNL sentences is done automatically by the machine. It uses the UNL document structure to split the input UNL document into a set of sentences to be processed one at a time.
  2. Tokenization of each sentence is done against the dictionary entries, following the principle of the highest priority first. As there are several lexical ambiguities, some disambiguation rules are required to induce the correct lexical choice. The tokenization is done with the English Disambiguation Grammar.
  3. Transformation applies after tokenization and is divided in two modules:
    1. Standardization, which is simply the standardization of the feature structure, carried out by the Standardization grammar
    2. English-specific transformation is performed by the UNL->ENG T-Grammar and is divided in three steps:
      1. Semantics, where relations and attributes of UNL are mapped into English structures.
      2. Morphology, where the paradigms are copied from the grammar to each entry.
      3. Post-processing, where the output list is adjusted to the English standards.
    3. General transformation is performed by the Default grammar and is divided in six steps:
      1. Pre-processing (prepares the input for the processing)
      2. Arborization (converts the syntactic network into a syntactic tree)
      3. Transformation (converts the deep syntactic structure into the surface syntactic structure)
      4. Linearization (converts the syntactic structure into a list structure)
      5. Morphological generation (inflects the words that need to be inflected)
      6. Post-processing (adjusts the final output)

Examples of UNL->ENG Transformation Rules

agt(%x,V;%y,N):=VS(%x,PER=%y;%y,-CAS,+CAS=NOM); 
transforms the agent relation between a verb and a noun into verb specifier relation between the verb and the noun: agt(kill,he) > VS(kill,he)
(%x,N,@def):=(NS(%x,-@def;%y,[the],LEX=D,POS=ART),+LEX=N); 
transforms the attribute @def into a noun specifier relation between the noun and the determiner "the": book.@def > NS(book,the)
(%x,@pl):=(%x,-@pl,-NUM,+NUM=PLR);
assigns the feature NUM=PLR to the words containing the attribute @pl
(%x,>AND):=(%x,->AND,+>BLK)([and],LEX=C,POS=CCJ,+>BLK);
generates the conjunction "and" to the right of the words containing the feature ">AND"
(D,%d)([all],%all):=(%all)(%d); 
reverts the order between determiners and "all": the all books > all the books, my all books > all my books

Notes

  1. Two dictionaries are necessary for each language: the language-specific dictionary, and the Default Dictionary, which contains language-independent entries, such as punctuation signs and regular expressions. The default dictionary must be loaded after the language-specific dictionary.
  2. Three t-grammars are necessary for each language: the Standardization grammar, the language-specific grammar, and the Default grammar. The standardization grammar and the default grammar are language-independent. The grammars must be loaded in this order: 1) standardization, 2) language-specific, and 3) default.
  3. Two dictionaries are necessary for each language: the language-specific dictionary, and the Default Dictionary, which contains language-independent entries, such as punctuation signs and regular expressions. The default dictionary must be loaded after the language-specific dictionary.
  4. Three t-grammars are necessary for each language: the Standardization grammar, the language-specific grammar, and the Default grammar. The standardization grammar and the default grammar are language-independent. The grammars must be loaded in this order: 1) standardization, 2) language-specific, and 3) default.
Software