Issues

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Front-end)
(Front-end)
Line 49: Line 49:
 
;Update  
 
;Update  
 
:Dictionary and grammar update should replace the current files instead of adding the resources to the end of the existing files
 
:Dictionary and grammar update should replace the current files instead of adding the resources to the end of the existing files
 +
;Range
 +
:The trace level of the option "range" should be defined by user. It's OK to use NONE as default, but the user could also have more detailed results for more than one sentence.
  
 
== LILY ==
 
== LILY ==

Revision as of 19:54, 30 October 2012

List of pending features and known bugs.

Contents

IAN and EUGENE (VERSION 1.1)

Back-end

Parsing
Parsing of rules need to be improved. IAN and EUGENE were accepting rules with unbalanced parentheses. There is also a problem of an extra comma in the rules. The sensitivity of syntactic check of the Engines should be higher. Eugene and IAN must be sensitive to the following syntactic error:
  • (%a,A,B,C):=((%a,+E); (This rule is being accepted by the system)
Encoding
Eugene and IAN should reject wrong UTF-8 encoding. From the perspective of the user, the rule was perfect, and the string was clearly and correctly displayed; but the machine was replacing it by empty.
Consistency of graphs
Rules leading to impossible graphs are working. The example below is generating an impossible graph.
(NB(N,%n;JB(%j;%j2),{and|or},%adjc),%m):= (JB(%j;%j2),rel=%adjc) (NB(N,%n;%j),rel=%m)(NB(N,%n;%j2),rel=%m);
This rule is putting the same node %j in two different positions in the node list. This should not be possible. A node cannot be inside two different nodes in a list structure.
Preprocessing module
A module for preprocessing is needed in IAN. It will serve for sentence segmentation and morphological preprocessing. Rules of the preprocessing module will be only of the LL type, will only deal with strings and will apply before any dictionary search. They will be used to assign STAIL and SHEAD. Regular expressions should be admitted. The unit of processing will be the paragraph (i.e., any string between \n and \r). Examples of possible rules:
  • (" .",%x):=(%x)(+STAIL,%y);
  • (".",%x)(/[ABCDEFGHIJKLMNOPQRSTUVWXYZ]/,%y):=(%z,+SHEAD)(%x)(%y);
  • ("an ",%x)(/[aeiouy]/,%y):=("a ",%x)(%y);
Observations:
  • +STAIL automatically creates SHEAD (in addition to STAIL itself), and +SHEAD automatically create STAIL.
  • The preprocessing module should be provided in a separate tab (S-Rules, for segmentation rules)
Mathematical operations (574)
Mathematical operations inside nodes
  • (%x):=(%x-1); (i.e., reduce the value of %x in 1)
  • (%x):=(%x+1); (i.e., add 1 to %x)
  • (%x):=(%x*2); (i.e., multiply %x by 2)
  • (%x):=(%x/2); (i.e., divide % by 2)
Indexation of relations
Relations should admit an index, as nodes. This would avoid ambiguity when dealing with relations in different scopest:
XB:%a(%x;%y)XB:%b(%x;%z):=XB:%a(XB(%x;%y);%z); the relation XB(%x;%y) will be created as a scope inside %a
XB:%a(%x;%y)XB:%b(%x;%z):=XB:%b(XB(%x;%y);%z); the relation XB(%x;%y) will be created as a scope inside %b
In any case, the indexation should comply with a possible graph structure

Front-end

Drag-and-drop
To include the possibility of using "drag-and-drop" to reorder dictionaries and dictionary entries, and grammars and grammar rules (in addition to the current one);
Test sets
To improve the test sets. They should show only the differences. And the results should be exportable and importable.
Trace
The trace must be thoroughly revised. The desired structure is presented at [1]
Groups
Groups should be collapsible/expandable, and a single file may participate in several groups (grouping must be done using tags, instead of exclusive categories)
Shared resources
Shared resources must bring the possibility of being reordered (currently, we cannot reorder them)
NL and UNL documents
Shared NL inputs (currently, it's only possible to send them, but then the changes are not propagated). And they should work as dictionaries and grammars (we should have the option of grouping them and loading more than one at a time)
IAN/EUGENE communication
A given output of IAN could be used as the input for EUGENE and vice-versa - using the loaded resources
Update
Dictionary and grammar update should replace the current files instead of adding the resources to the end of the existing files
Range
The trace level of the option "range" should be defined by user. It's OK to use NONE as default, but the user could also have more detailed results for more than one sentence.

LILY

End-user interface

  1. (BETA)To remove the option "compiled resources" from the interface. There will be an admin page where the configuration will be set.
  2. (BETA) LILY is not accepting Arabic input.
  3. (BETA)To replace the localization file (translations to be provided by the UNDL Foundation)
  4. (BETA)To replace the logos of the UNDLF and UNL by others with higher-resolution (to be provided by the UNDL Foundation)
  5. (BETA)To replace the copyright (to be provided by the UNDL Foundation)
  6. (BETA)To remove the login in the end-user final version
  7. (BETA)To replace the contact to info@undlfoundation.org
  8. (BETA) Background images are not being aligned in zoom in and zoom out (CSS). The application is not working in IE.
  9. (BETA)To reduce the size of the logos
  10. (BETA)To synchronize users and passwords with the UNLweb
  11. (1.0)Feedback from users (source, UNL and target must be stored with the corresponding evaluation)

Test interface

  1. To have IAN's and EUGENE's dictionary, t-rules and d-rules tab in the same interface inside the UNLdev.

SEAN

These bugs have been extracted from [2]

1. The interface of SEAN still uses the old model of IAN/EUGENE. This should be standardized. All the systems in the Dev should have the same appearance (i.e., the one used by IAN and EUGENE). Besides that, there are some functions available for IAN and EUGENE (such as rename files) that are not available for SEAN. (OK)

2. There seems to be a problem with the HTML cleaning. I uploaded the front page of the English Wikipedia, but got some line breaks that were not in the original text. Take a look at the file html_issue.txt attached for an example of the problem.

3. The tab "PROCESS" is refreshed (i.e., cleaned) every time we click over other tab. This prevents us from checking issues in the grammar or in the dictionary and coming back. Â The results should be preserved, as in IAN and EUGENE.

4. I've not been able to export the traces generated by SEAN. It seems that the "export trace" button is not working.

The observations below are valid for the file lpp.txt (attached): 17K words, 90K characters (with spaces):

CONCORDANCE

5. The results of the search are not complete. In the original file, there were 10 occurrences of the string "children", but SEAN brought only 7.

6. SEAN is ignoring commas. The concordance = 2 for the segment through my reserve. "Children," I say plainly, was " Children " I Note that the comma disappeared. It should not. Any character that is not considered a sentence boundary should count as a token. The result, in this case, should have been "Children,"

7. SEAN is adding blank spaces between tokens even when they do not appear in the source. Take the example of "children" above. There was no blank space between the quote and the word "children", but SEAN added it. This affects the analysis.

8. SEAN is not accepting queries with blank spaces. The result for "the children" is "this word doesn`t exist".

USING THE ENGLISH ANALYSIS DICTIONARY (400 K)

9. There were some issues with the structure of queries described in the section 3.2.1 of the SRS (the searches were done with the analysis dictionary, where there was [children]) a) "?hildren", "ch?ldren" Â and "childre?" are presenting the same results several times b) "children -about" is not working (the system says that this word doesn`t exist, but it should bring instances of "children" without "about" c) "/child(ren)?/" is not working. I'm not sure whether I`m using the correct syntax for regular expressions or whether regular expressions have not been implemented.

USING THE ENGLISH GENERATION DICTIONARY (200K)

10. The lemmatization is working ("child" brings "children" with the generation dictionary), but the results are being repeated.

UNL CORPUS

11. In the tab "UNL Corpus", all the sentences received the same number [1].

ANALYSIS (I would rather rename this tab to "KNOWLEDGE BASE")

12. The knowledge base is full of noise in case of sentences that have not been fully processed. See the file kb_extraction_issue.txt attached for an example. I could not understand where most of the relations and attributes come from. They are not visible in the UNL Corpus.

TRACE

13. The trace seems to be detached from the UNL corpus. In the Corpus, the first sentence was "indulgence of the children who may read." In the trace, however, the first sentence seems to be "" Children " I say ", which is actually the last one to be processed. But this only refers to the dictionary lookup and to the trace, because the final result, for the first sentence in the corpus, is for "indulgence of the children who may read".

14. In any case, I've not been able to expand any button in the trace other than the ones provided for the first sentence. The others "+" do not work. Because of that, I could not check the results of SEAN against IAN (with the same resources).

Software