Issues

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Front-end)
 
(32 intermediate revisions by one user not shown)
Line 3: Line 3:
 
== IAN and EUGENE (VERSION 1.1) ==
 
== IAN and EUGENE (VERSION 1.1) ==
 
=== Back-end ===
 
=== Back-end ===
 +
<strike>
 
;Parsing  
 
;Parsing  
 
:Parsing of rules need to be improved. IAN and EUGENE were accepting rules with unbalanced parentheses. There is also a problem of an extra comma in the rules. The sensitivity of syntactic check of the Engines should be higher. Eugene and IAN must be sensitive to the following syntactic error:
 
:Parsing of rules need to be improved. IAN and EUGENE were accepting rules with unbalanced parentheses. There is also a problem of an extra comma in the rules. The sensitivity of syntactic check of the Engines should be higher. Eugene and IAN must be sensitive to the following syntactic error:
Line 26: Line 27:
 
:*<nowiki>(%x):=(%x*2);</nowiki> (i.e., multiply %x by 2)  
 
:*<nowiki>(%x):=(%x*2);</nowiki> (i.e., multiply %x by 2)  
 
:*<nowiki>(%x):=(%x/2); (</nowiki>i.e., divide % by 2)
 
:*<nowiki>(%x):=(%x/2); (</nowiki>i.e., divide % by 2)
;Indexation of relations
+
</strike>
:Relations should admit an index, as nodes. This would avoid ambiguity when dealing with relations in different scopest:
+
;Indexation of relations (postponed)
 +
:Relations should admit an index, as nodes. This would avoid ambiguity when dealing with relations in different scopes:
 
::XB:%a(%x;%y)XB:%b(%x;%z):=XB:%a(XB(%x;%y);%z); the relation XB(%x;%y) will be created as a scope inside %a
 
::XB:%a(%x;%y)XB:%b(%x;%z):=XB:%a(XB(%x;%y);%z); the relation XB(%x;%y) will be created as a scope inside %a
 
::XB:%a(%x;%y)XB:%b(%x;%z):=XB:%b(XB(%x;%y);%z); the relation XB(%x;%y) will be created as a scope inside %b
 
::XB:%a(%x;%y)XB:%b(%x;%z):=XB:%b(XB(%x;%y);%z); the relation XB(%x;%y) will be created as a scope inside %b
 
:In any case, the indexation should comply with a possible graph structure
 
:In any case, the indexation should comply with a possible graph structure
 +
<strike>
 +
;Discontinuous multiword expressions (706)
 +
:Headwords, UWs and strings used as values of attributes:
 +
*(%x,ATTRIBUTE=[%y])(%y):=ACTION; the system checks whether the value of the attribute ATTRIBUTE is the HEADWORD of %y
 +
*(%x,ATTRIBUTE=<nowiki>[[%y]]</nowiki>)(%y):=ACTION; the system checks whether the value of the attribute ATTRIBUTE is the UW of %y
 +
*(%x,ATTRIBUTE="%y")(%y):=ACTION; the system checks whether the value of the attribute ATTRIBUTE is the STRING of %y
 +
*CONDITION:=(%x,ATTRIBUTE=[%y])(%y): the system assigns the attribute ATTRIBUTE to %x with the value of the HEADWORD of %y
 +
*CONDITION:=(%x,ATTRIBUTE=<nowiki>[[%y]]</nowiki>)(%y): the system assigns the attribute ATTRIBUTE to %x with the value of the UW of %y
 +
*CONDITION:=(%x,ATTRIBUTE="%y")(%y): the system assigns the attribute ATTRIBUTE to %x with the value of the STRING of %y
 +
:Rules with discontinuous nodes
 +
*(%x)(ANY SEQUENCE OF NODES, %z)(%y):=(%x)(%y)(%z);
 +
</strike>
  
=== Front-end ===
+
===Front-end===
 +
<strike>
 
;Drag-and-drop  
 
;Drag-and-drop  
 
:To include the possibility of using "drag-and-drop" to reorder dictionaries and dictionary entries, and grammars and grammar rules (in addition to the current one);
 
:To include the possibility of using "drag-and-drop" to reorder dictionaries and dictionary entries, and grammars and grammar rules (in addition to the current one);
 +
</strike>
 
;Test sets
 
;Test sets
 
:To improve the test sets. They should show only the differences. And the results should be exportable and importable.
 
:To improve the test sets. They should show only the differences. And the results should be exportable and importable.
 
;Trace  
 
;Trace  
 
:The trace must be thoroughly revised. The desired structure is presented at [http://www.unlweb.net/forum/viewtopic.php?t=575]
 
:The trace must be thoroughly revised. The desired structure is presented at [http://www.unlweb.net/forum/viewtopic.php?t=575]
 +
<strike>
 
;Groups  
 
;Groups  
 
:Groups should be collapsible/expandable, and a single file may participate in several groups (grouping must be done using tags, instead of exclusive categories)
 
:Groups should be collapsible/expandable, and a single file may participate in several groups (grouping must be done using tags, instead of exclusive categories)
 +
</strike>
 
;Shared resources
 
;Shared resources
 
:Shared resources must bring the possibility of being reordered (currently, we cannot reorder them)
 
:Shared resources must bring the possibility of being reordered (currently, we cannot reorder them)
Line 47: Line 65:
 
;IAN/EUGENE communication
 
;IAN/EUGENE communication
 
:A given output of IAN could be used as the input for EUGENE and vice-versa - using the loaded resources
 
:A given output of IAN could be used as the input for EUGENE and vice-versa - using the loaded resources
 +
<strike>
 
;Update  
 
;Update  
 
:Dictionary and grammar update should replace the current files instead of adding the resources to the end of the existing files
 
:Dictionary and grammar update should replace the current files instead of adding the resources to the end of the existing files
 
;Range
 
;Range
 
:The trace level of the option "range" should be defined by user. It's OK to use NONE as default, but the user could also have more detailed results for more than one sentence.
 
:The trace level of the option "range" should be defined by user. It's OK to use NONE as default, but the user could also have more detailed results for more than one sentence.
 +
</strike>
  
== LILY ==
+
== IAN and EUGENE (VERSION 1.2) ==
=== End-user interface ===
+
1. To include the option UNDO for the deletion of files and entries<br />
# (BETA)To remove the option "compiled resources" from the interface. There will be an admin page where the configuration will be set.
+
2. Selecting a file should be the same as loading it (changed to: indicating clearly that a file has been loaded)<br />
# (BETA) LILY is not accepting Arabic input.
+
3. <strike>The range interval should be also user-defined. For the time being, it's only possible to select the interval from the drop-down list.</strike><br />
# (BETA)To replace the localization file (translations to be provided by the UNDL Foundation)
+
4. Users should have the possibility of uploading more than one file at once in a single .zip file<br />
# (BETA)To replace the logos of the UNDLF and UNL by others with higher-resolution (to be provided by the UNDL Foundation)
+
5. Users should have the possibility of visualizing the output of IAN as a graph<br />
# (BETA)To replace the copyright (to be provided by the UNDL Foundation)
+
6. Backtracking (top-down approach)<br />
# (BETA)To remove the login in the end-user final version
+
# (BETA)To replace the contact to info@undlfoundation.org
+
# (BETA) Background images are not being aligned in zoom in and zoom out (CSS). The application is not working in IE.
+
# (BETA)To reduce the size of the logos
+
# (BETA)To synchronize users and passwords with the UNLweb
+
# (1.0)Feedback from users (source, UNL and target must be stored with the corresponding evaluation)
+
  
=== Test interface ===
+
== IAN and EUGENE 2.0 ==
#To have IAN's and EUGENE's dictionary, t-rules and d-rules tab in the same interface inside the UNLdev.
+
1. SDK<br />
 +
2. Stand-alone version of IAN and EUGENE<br />
  
== SEAN ==
+
== LILY (VERSION 1.1) ==
These bugs have been extracted from [http://www.unlweb.net/forum/viewtopic.php?t=647]
+
1. Localization of the interface should be done through uploading a localization file (directly by admin).<br />
 +
2. Include LILY in the UNLdev. The user should have the option of seeing the results of Lily for his/her own data.<br />
 +
3. Alternative translations. The user should have the option of selecting other possible results according to the grammar.<br />
 +
4. Mobile (app) version.<br />
  
1. The interface of SEAN still uses the old model of IAN/EUGENE. This should be standardized. All the systems in the Dev should have the same appearance (i.e., the one used by IAN and EUGENE). Besides that, there are some functions available for IAN and EUGENE (such as rename files) that are not available for SEAN. (OK)
+
== KEYS (VERSION 1.0) ==
 +
1. Graphic output (as fancy as possible and with support for touch screen).<br />
 +
2. Localizable interface.<br />
 +
3. Another design for the interface (cleaner and simpler).<br />
 +
3. Mobile (app) version.<br />
 +
4. Integration with EUGENE.<br />
  
2. There seems to be a problem with the HTML cleaning. I uploaded the front page of the English Wikipedia, but got some line breaks that were not in the original text. Take a look at the file html_issue.txt attached for an example of the problem.
+
== UNL Tool Kit (VERSION BETA) ==
 
+
1. Corpus processing: given a set of documents, the system should clean it (from html tags, for instance), segment it (according to the a user-defined set of symbols), tokenize it (according to the dictionary), extract the word list (with frequency of occurrence), lemmatize it (according to the dictionary), POS tag it (according to the dictionary) and extract the POS patterns (with the frequency of occurrence). The system should also include search facilities (concordance).<br />
3. The tab "PROCESS" is refreshed (i.e., cleaned) every time we click over other tab. This prevents us from checking issues in the grammar or in the dictionary and coming back. Â The results should be preserved, as in IAN and EUGENE.
+
2. Dictionary builder: given a word list, the system should lemmatize it (according to the dictionary) and POS tag it.<br />
 
+
3. Grammar builder: given a set of POS tagged sentences, the system should build the corresponding trees in order to form a tree-bank (by hand, i.e., through a tree-builder user-friendly interface, or automatically, using a grammar provided according to the Grammar Specs). The tree-bank will be used to induce a grammar (reverse engineering).<br />
4. I've not been able to export the traces generated by SEAN. It seems that the "export trace" button is not working.
+
4. Graph builder: given a set of trees, the system should build the corresponding graphs in order to form a graph-bank (by hand, i.e., through a graph-builder user-friendly interface) or automatically (using a grammar provided according to the Grammar Specs). The graph-bank will be used to induce a grammar (reverse engineering).<br />
 
+
The observations below are valid for the file lpp.txt (attached): 17K words, 90K characters (with spaces):
+
 
+
CONCORDANCE
+
 
+
5. The results of the search are not complete. In the original file, there were 10 occurrences of the string "children", but SEAN brought only 7.
+
 
+
6. SEAN is ignoring commas. The concordance = 2 for the segment through my reserve. "Children," I say plainly,
+
was " Children " I
+
Note that the comma disappeared. It should not. Any character that is not considered a sentence boundary should count as a token. The result, in this case, should have been "Children,"
+
 
+
7. SEAN is adding blank spaces between tokens even when they do not appear in the source. Take the example of "children" above. There was no blank space between the quote and the word "children", but SEAN added it. This affects the analysis.
+
 
+
8. SEAN is not accepting queries with blank spaces. The result for "the children" is "this word doesn`t exist".
+
 
+
USING THE ENGLISH ANALYSIS DICTIONARY (400 K)
+
 
+
9. There were some issues with the structure of queries described in the section 3.2.1 of the SRS (the searches were done with the analysis dictionary, where there was [children])
+
a) "?hildren", "ch?ldren" Â and "childre?" are presenting the same results several times
+
b) "children -about" is not working (the system says that this word doesn`t exist, but it should bring instances of "children" without "about"
+
c) "/child(ren)?/" is not working. I'm not sure whether I`m using the correct syntax for regular expressions or whether regular expressions have not been implemented.
+
 
+
USING THE ENGLISH GENERATION DICTIONARY (200K)
+
 
+
10. The lemmatization is working ("child" brings "children" with the generation dictionary), but the results are being repeated.
+
 
+
UNL CORPUS
+
 
+
11. In the tab "UNL Corpus", all the sentences received the same number [1].
+
 
+
ANALYSIS (I would rather rename this tab to "KNOWLEDGE BASE")
+
 
+
12. The knowledge base is full of noise in case of sentences that have not been fully processed. See the file kb_extraction_issue.txt attached for an example. I could not understand where most of the relations and attributes come from. They are not visible in the UNL Corpus.
+
 
+
TRACE
+
 
+
13. The trace seems to be detached from the UNL corpus. In the Corpus, the first sentence was "indulgence of the children who may read." In the trace, however, the first sentence seems to be "" Children " I say ", which is actually the last one to be processed. But this only refers to the dictionary lookup and to the trace, because the final result, for the first sentence in the corpus, is for "indulgence of the children who may read".
+
 
+
14. In any case, I've not been able to expand any button in the trace other than the ones provided for the first sentence. The others "+" do not work. Because of that, I could not check the results of SEAN against IAN (with the same resources).
+

Latest revision as of 15:55, 3 February 2014

List of pending features and known bugs.

Contents

IAN and EUGENE (VERSION 1.1)

Back-end

Parsing
Parsing of rules need to be improved. IAN and EUGENE were accepting rules with unbalanced parentheses. There is also a problem of an extra comma in the rules. The sensitivity of syntactic check of the Engines should be higher. Eugene and IAN must be sensitive to the following syntactic error:
  • (%a,A,B,C):=((%a,+E); (This rule is being accepted by the system)
Encoding
Eugene and IAN should reject wrong UTF-8 encoding. From the perspective of the user, the rule was perfect, and the string was clearly and correctly displayed; but the machine was replacing it by empty.
Consistency of graphs
Rules leading to impossible graphs are working. The example below is generating an impossible graph.
(NB(N,%n;JB(%j;%j2),{and|or},%adjc),%m):= (JB(%j;%j2),rel=%adjc) (NB(N,%n;%j),rel=%m)(NB(N,%n;%j2),rel=%m);
This rule is putting the same node %j in two different positions in the node list. This should not be possible. A node cannot be inside two different nodes in a list structure.
Preprocessing module
A module for preprocessing is needed in IAN. It will serve for sentence segmentation and morphological preprocessing. Rules of the preprocessing module will be only of the LL type, will only deal with strings and will apply before any dictionary search. They will be used to assign STAIL and SHEAD. Regular expressions should be admitted. The unit of processing will be the paragraph (i.e., any string between \n and \r). Examples of possible rules:
  • (" .",%x):=(%x)(+STAIL,%y);
  • (".",%x)(/[ABCDEFGHIJKLMNOPQRSTUVWXYZ]/,%y):=(%z,+SHEAD)(%x)(%y);
  • ("an ",%x)(/[aeiouy]/,%y):=("a ",%x)(%y);
Observations:
  • +STAIL automatically creates SHEAD (in addition to STAIL itself), and +SHEAD automatically create STAIL.
  • The preprocessing module should be provided in a separate tab (S-Rules, for segmentation rules)
Mathematical operations (574)
Mathematical operations inside nodes
  • (%x):=(%x-1); (i.e., reduce the value of %x in 1)
  • (%x):=(%x+1); (i.e., add 1 to %x)
  • (%x):=(%x*2); (i.e., multiply %x by 2)
  • (%x):=(%x/2); (i.e., divide % by 2)

Indexation of relations (postponed)
Relations should admit an index, as nodes. This would avoid ambiguity when dealing with relations in different scopes:
XB:%a(%x;%y)XB:%b(%x;%z):=XB:%a(XB(%x;%y);%z); the relation XB(%x;%y) will be created as a scope inside %a
XB:%a(%x;%y)XB:%b(%x;%z):=XB:%b(XB(%x;%y);%z); the relation XB(%x;%y) will be created as a scope inside %b
In any case, the indexation should comply with a possible graph structure

Discontinuous multiword expressions (706)
Headwords, UWs and strings used as values of attributes:
  • (%x,ATTRIBUTE=[%y])(%y):=ACTION; the system checks whether the value of the attribute ATTRIBUTE is the HEADWORD of %y
  • (%x,ATTRIBUTE=[[%y]])(%y):=ACTION; the system checks whether the value of the attribute ATTRIBUTE is the UW of %y
  • (%x,ATTRIBUTE="%y")(%y):=ACTION; the system checks whether the value of the attribute ATTRIBUTE is the STRING of %y
  • CONDITION:=(%x,ATTRIBUTE=[%y])(%y): the system assigns the attribute ATTRIBUTE to %x with the value of the HEADWORD of %y
  • CONDITION:=(%x,ATTRIBUTE=[[%y]])(%y): the system assigns the attribute ATTRIBUTE to %x with the value of the UW of %y
  • CONDITION:=(%x,ATTRIBUTE="%y")(%y): the system assigns the attribute ATTRIBUTE to %x with the value of the STRING of %y
Rules with discontinuous nodes
  • (%x)(ANY SEQUENCE OF NODES, %z)(%y):=(%x)(%y)(%z);

Front-end

Drag-and-drop
To include the possibility of using "drag-and-drop" to reorder dictionaries and dictionary entries, and grammars and grammar rules (in addition to the current one);

Test sets
To improve the test sets. They should show only the differences. And the results should be exportable and importable.
Trace
The trace must be thoroughly revised. The desired structure is presented at [1]

Groups
Groups should be collapsible/expandable, and a single file may participate in several groups (grouping must be done using tags, instead of exclusive categories)

Shared resources
Shared resources must bring the possibility of being reordered (currently, we cannot reorder them)
NL and UNL documents
Shared NL inputs (currently, it's only possible to send them, but then the changes are not propagated). And they should work as dictionaries and grammars (we should have the option of grouping them and loading more than one at a time)
IAN/EUGENE communication
A given output of IAN could be used as the input for EUGENE and vice-versa - using the loaded resources

Update
Dictionary and grammar update should replace the current files instead of adding the resources to the end of the existing files
Range
The trace level of the option "range" should be defined by user. It's OK to use NONE as default, but the user could also have more detailed results for more than one sentence.

IAN and EUGENE (VERSION 1.2)

1. To include the option UNDO for the deletion of files and entries
2. Selecting a file should be the same as loading it (changed to: indicating clearly that a file has been loaded)
3. The range interval should be also user-defined. For the time being, it's only possible to select the interval from the drop-down list.
4. Users should have the possibility of uploading more than one file at once in a single .zip file
5. Users should have the possibility of visualizing the output of IAN as a graph
6. Backtracking (top-down approach)

IAN and EUGENE 2.0

1. SDK
2. Stand-alone version of IAN and EUGENE

LILY (VERSION 1.1)

1. Localization of the interface should be done through uploading a localization file (directly by admin).
2. Include LILY in the UNLdev. The user should have the option of seeing the results of Lily for his/her own data.
3. Alternative translations. The user should have the option of selecting other possible results according to the grammar.
4. Mobile (app) version.

KEYS (VERSION 1.0)

1. Graphic output (as fancy as possible and with support for touch screen).
2. Localizable interface.
3. Another design for the interface (cleaner and simpler).
3. Mobile (app) version.
4. Integration with EUGENE.

UNL Tool Kit (VERSION BETA)

1. Corpus processing: given a set of documents, the system should clean it (from html tags, for instance), segment it (according to the a user-defined set of symbols), tokenize it (according to the dictionary), extract the word list (with frequency of occurrence), lemmatize it (according to the dictionary), POS tag it (according to the dictionary) and extract the POS patterns (with the frequency of occurrence). The system should also include search facilities (concordance).
2. Dictionary builder: given a word list, the system should lemmatize it (according to the dictionary) and POS tag it.
3. Grammar builder: given a set of POS tagged sentences, the system should build the corresponding trees in order to form a tree-bank (by hand, i.e., through a tree-builder user-friendly interface, or automatically, using a grammar provided according to the Grammar Specs). The tree-bank will be used to induce a grammar (reverse engineering).
4. Graph builder: given a set of trees, the system should build the corresponding graphs in order to form a graph-bank (by hand, i.e., through a graph-builder user-friendly interface) or automatically (using a grammar provided according to the Grammar Specs). The graph-bank will be used to induce a grammar (reverse engineering).

Software