TUT

What is it?
What is in there?
What is the use of that?
How to use the system?
What if the book is not there?
What if the book has not been UNL-ized yet?
What if there is no result in a given language?
What if the result is not that good?
How the system works?
What is the difference between "translation" and "UNL-plication"?
What is UNL?
Who are you?

What is it?

TUT (Text-to-Text through UNL) is a digital library of texts represented in the Universal Networking Language (UNL). It comprises links to the integral version of more than 30,000 titles and, whenever available, the UNL version of the text, along with three possible realizations (summarized, simplified and rephrased), in any of the languages available in the UNL System.

What is in there?

TUT comprises links to more than 30,000 titles, written by more than 10,000 different authors, in more than 30 different languages, most of which hosted by the Project Gutenberg. As we are targeting reading material under public domain, the repository comprises mainly books published before 1923. For the time being, the collection of UNL documents is still very small (see a sample of Le Petit Prince), but you may help us either by UNL-izing texts or by supporting our initiative.

What is the use of that?

Our main goal is to increase and to extend the semantic accessibility to texts. In that sense, TUT contributes to education and to the diffusion of knowledge as:

it renders the text understandable for those who do not speak the language of the original (by generating a version of it in any of the languages available in the UNL system);
it renders the text comprehensible for those who have reading difficulties (by generating a simplified version of the text in any of the languages available in the UNL system); and
it identifies the most important information in a text (by generating a summarized version of the text in any of the languages available in the UNL system).

How to use the system?

TUT comprises two basic functions:

to search metadata (author, titles or subjects); and
to display the search result in any of its possible formats: original, UNL-ized, summarized, simplified or rephrased.

The summarized, the simplified and the rephrased versions of the text may be displayed in any of the languages available in the UNL System. You may also explore the knowledge structure of the text, or the UNL Knowledge Base, by clicking in any word of any UNL-derived version of the original.

What if the book is not there?

You must first consider that our database includes only books that have been made available in a plain text format under public domain by any digitalisation project (such as Project Gutenberg, for instance). In this sense, it is not possible to have everything there, whether because there are several works still copyrighted, whether because there are millions of books that have not been digitalised yet. If you know (or have) a plain text format version of any public domain book that is not in our database yet, you may include it, and we will appreciate it a lot. You must fill in the form and choose one of the two possibilities of registration: sending us the link to the URL that contains the plain text version of the book, or sending the plain text file itself as an attachment. In any case, you must check whether the book has already entered into public domain. If you are sending us the URL, the copyright must have already lapsed in the country where the website is hosted; if you are sending us the file itself, the work should have been released under public domain in Switzerland, where the UNDL Foundation is located.

What if the book has not been UNL-ized yet?

Unfortunately, this is the most frequent situation, since we have just started the project. And that is exactly why your participation is so important. If you need or would like to promote a given title in UNL, you have three options:

You may UNL-ize the book yourself, and we will do our best to help you;
You may sponsor the UNL-ization of the book; and
You may make a donation to the UNDL Foundation to help us keep UNL-izing books.

What if there is no result in a given language?

Unfortunately, we don’t have yet all the resources (dictionary and grammar) necessary to provide results in any language. The UNDL Foundation has been investing a lot in creating resources for UNL-based projects but there is always much more to be done in order to include every language, and perhaps you may help us. If you would like to have results for a given language, you have three options:

You may join the UNLarium, the UNDL Foundation language resources management system, and help us creating dictionaries and grammars;
You may sponsor a given language; or
You may make a donation to the UNDL Foundation to help us keep increasing and extending our natural language resources, which are actually available to anyone (and not only to the UNL community).

What if the result is not that good?

Many results are not satisfying yet, but we do believe they are already quite promising. You have to consider that, for the time being, our main goal is not to "translate" a text, but to "UNL-ply" it (see below). As you may know from other systems, the state of the art of natural language processing technology is not sufficient yet for replicating the results provided by a human. In that sense, we have decided to downgrade our expectations (or to postpone our ultimate goals) and to provide a facility that would contribute for advancing the technology in a rather incremental way. As we have been using several different techniques (rule-based, memory-based, corpus-based), the more we have the better we get. In that sense, we believe that TUT is still "learning" how to treat texts, what will take still some time. But results, even though still disappointing, have been improved consistently.

How the system works?

TUT stores and exhibits the visible part of a quite complex and intrincate process which we call "UNL-plication". The UNL-plication is the process of transforming natural language texts (deriving paraphrases, summaries, etc) through UNL. It consists of three different but integrated subprocesses:

UNL-ization, i.e., mapping the original text into UNL in order to provide a language-independent representation of the source text (the U-print);
Normalization, i.e., normalizing the resulting graph (the U-print) to eliminate redundancies, saturate its semantic valences and generate a cleaner and more machine-friendly version of the original graph (the U-text); and
NL-ization, i.e., recasting the U-text into a natural language structure according to different generation algorithms, so as to produce:
- a new full rephrased version of the original text;
- a new version of the text with higher readability scores according to the Flesch–Kincaid test;
- an abstract of the text.

What is the difference between “translation” and “UNL-plication”?

In A textbook of translation (Prentice Hall, 1995), Peter Newmark identifies eight types of translation approaches, namely (from the most source-oriented to the most target-oriented): word-for-word translation, literal translation, faithful translation, semantic translation, communicative translation, idiomatic translation, free translation, and adaptation. To the common sense, however, and despite of the pervasiveness of the other possibilities, translation is normally restricted to the notion of "fidelity" (or faithfulness), i.e., any translated version of a text is expected to be a replica (of the content and of the form) of the original in another language. This transfer process, however, is “all too human”, as Nietzsche said, to be replicated by the currently existing technology, which is not prepared to deal with several language phenomena, such as vagueness, ambiguities, metaphors, ellipses, implicatures and so on. This does not mean that natural language automatic processing, and therefore machine translation, is impracticable; it just means that it is not possible yet to do that completely without humans or in the same way humans do. The results, in any case, are likely to be different from the ones produced by humans. In order to avoid false expectations and unrealizable hopes, we have decided to carve a new term: "UNL-plication", from "UNL" + "plicare" ("to fold"), to designate the process of mapping a text into UNL (“UNL-ization”), reorganizing the resulting graph internal structure (“normalization”), and mapping the UNL graph back to a natural language structure (“NL-ization”). As this allows for generating several different versions (summarized, simplified, rephrased) of the same graph, each of which in several different languages, we believe this can be properly said to be a multiplication of the source text by means of UNL.

What is UNL (Universal Networking Language)?

The Universal Networking Language (UNL) has been, since 1996, a unique initiative to reduce language barriers and strengthen cross-cultural communication in the framework of the United Nations. It is a knowledge representation language that has been used for several different tasks in natural language engineering, such as machine translation, multilingual document generation, summarization, information retrieval and semantic reasoning. It has three main features: it is language-independent, it is concept-driven and it is hypergraph-shaped. [read more in the UNL Wiki]

Who are you?

The UNL has been originally proposed by the Institute of Advanced Studies of the United Nations University, in Tokyo, and has been currently promoted by the UNDL Foundation.The UNDL Foundation is a non-profit organization based in Geneva, Switzerland, which has received, from the United Nations, the mandate for implementing the Universal Networking Language (UNL). [read more in the UNDL Foundation website]