TUT
TUT (Text-to-Text through UNL) is a digital library of
texts represented in the Universal Networking Language (UNL). It comprises links
to the integral version of more than 30,000 titles and, whenever available, the
UNL version of the text, along with three possible realizations (summarized,
simplified and rephrased), in any of the languages available in the UNL System.

TUT comprises links to more than 30,000 titles, written by more than
10,000 different authors, in more than 30 different languages, most of which
hosted by the
Project Gutenberg. As we
are targeting reading material under public domain, the repository comprises
mainly books published before 1923. For the time being, the collection of UNL
documents is still very small (see a sample of
Le Petit Prince), but you may
help us either by UNL-izing texts or by supporting our initiative.

Our main goal is to increase and to extend the semantic
accessibility to texts. In that sense, TUT contributes to education and to the
diffusion of knowledge as:
-
it renders the text understandable for those who do not
speak the language of the original (by generating a version of it in any of
the languages available in the UNL system);
-
it renders the text comprehensible for those who have
reading difficulties (by generating a simplified version of the text in any
of the languages available in the UNL system); and
-
it identifies the most important information in a text
(by generating a summarized version of the text in any of the languages
available in the UNL system).

TUT comprises two basic functions:
-
to search metadata (author, titles or subjects); and
-
to display the search result in any of its possible
formats: original, UNL-ized, summarized, simplified or rephrased.
The summarized, the simplified and the rephrased versions of
the text may be displayed in any of the languages available in the UNL System.
You may also explore the knowledge structure of the text, or the UNL Knowledge
Base, by clicking in any word of any UNL-derived version of the original.


You must first consider that our database includes only books that have been made available in a plain text format under public domain by any digitalisation project (such as Project Gutenberg, for instance). In this sense, it is not possible to have everything there, whether because there are several works still copyrighted, whether because there are millions of books that have not been digitalised yet.
If you know (or have) a plain text format version of any public domain book that is not in our database yet, you may include it, and we will appreciate it a lot. You must fill in
the form and choose one of the two possibilities of registration: sending us the
link to the URL that contains the plain text version of the book, or sending the
plain text file itself as an attachment. In any case, you must check whether the
book has already entered into public domain. If you are sending us the URL, the
copyright must have already lapsed in the country where the website is hosted; if you
are sending us the file itself, the work should have been released under public domain in Switzerland,
where the UNDL Foundation is located.


Unfortunately, this is the most frequent situation, since we
have just started the project. And that is exactly why your participation is so important.
If you need or would like to promote a given
title in UNL, you have three options:
-
You may UNL-ize the book yourself, and we will do our
best to help you;
-
You may sponsor the UNL-ization of the book; and
-
You may make a donation to the UNDL Foundation to help us
keep UNL-izing books.

Unfortunately, we don’t have yet all the resources (dictionary and grammar) necessary to provide results in any language.
The UNDL Foundation has been investing a lot in creating resources for UNL-based
projects but there is always much more to be done in order to include every
language, and perhaps you may help us. If you would like to have results for a given language, you have three
options:
-
You may join the
UNLarium, the UNDL
Foundation language resources management system, and help us creating
dictionaries and grammars;
-
You may sponsor a given language; or
-
You may make a donation to the UNDL Foundation to help us
keep increasing and extending our natural language resources, which are actually available to anyone (and not only to the UNL community).


Many results are not satisfying yet, but we do believe they are already quite promising. You have to consider that, for the time being, our main goal is not to "translate" a text, but to "UNL-ply" it (see below). As you may know from other systems, the state of the art of natural language processing technology is not sufficient yet for replicating the results provided by a human. In that sense, we have decided to downgrade our expectations (or to postpone our ultimate goals) and to provide a facility that would contribute for advancing the technology in a rather incremental way. As we have been using several different techniques (rule-based, memory-based, corpus-based), the more we have the better we get. In that sense, we believe that TUT is still "learning" how to treat texts, what will take still some time. But results, even though still disappointing, have been improved consistently.


TUT stores and exhibits the visible part of a quite complex and intrincate process which we call "UNL-plication". The UNL-plication is the process of transforming natural language texts (deriving paraphrases, summaries, etc) through UNL. It consists of three different but integrated subprocesses:
-
UNL-ization, i.e., mapping the original text into
UNL in order to provide a language-independent representation of the source
text (the U-print);
-
Normalization, i.e., normalizing the resulting graph
(the U-print) to eliminate redundancies, saturate its semantic valences and
generate a cleaner and more machine-friendly version of the original
graph (the U-text); and
-
NL-ization, i.e., recasting the U-text into a
natural language structure according to different generation algorithms, so
as to produce:
-
a new full rephrased version of the original text;
-
a new version of the text with higher
readability scores according to the Flesch–Kincaid test;
-
an abstract of the text.

In
A textbook of translation (Prentice Hall, 1995),
Peter Newmark identifies eight types of translation approaches, namely (from the
most source-oriented to the most target-oriented): word-for-word translation,
literal translation, faithful translation, semantic translation, communicative
translation, idiomatic translation, free translation, and adaptation. To the
common sense, however, and despite of the pervasiveness of the other
possibilities, translation is normally restricted to the notion of "fidelity"
(or faithfulness), i.e., any translated version of a text is expected to be a
replica (of the content and of the form) of the original in another language.
This transfer process, however, is “all too human”, as Nietzsche said, to be
replicated by the currently existing technology, which is not prepared to deal
with several language phenomena, such as vagueness, ambiguities, metaphors,
ellipses, implicatures and so on. This does not mean that natural language
automatic processing, and therefore machine translation, is impracticable; it
just means that it is not possible yet to do that completely without humans or
in the same way humans do. The results, in any case, are likely to be different
from the ones produced by humans. In order to avoid false expectations and
unrealizable hopes, we have decided to carve a new term:
"UNL-plication", from "UNL"
+ "plicare" ("to fold"), to designate the process of mapping a text into UNL (“UNL-ization”),
reorganizing the resulting graph internal structure (“normalization”), and
mapping the UNL graph back to a natural language structure (“NL-ization”). As
this allows for generating several different versions (summarized, simplified,
rephrased) of the same graph, each of which in several different languages, we
believe this can be properly said to be a multiplication of the source text by
means of UNL.


The Universal Networking Language (UNL) has been, since 1996, a unique initiative to reduce
language barriers and strengthen cross-cultural communication in the framework
of the United Nations. It is a knowledge representation
language that has been used for several different tasks in natural language
engineering, such as machine translation, multilingual document generation,
summarization, information retrieval and semantic reasoning. It has three main features:
it is language-independent, it is concept-driven and it is hypergraph-shaped. [
read
more in the UNL Wiki]

The UNL has been originally proposed by the Institute of Advanced Studies of the United Nations
University, in Tokyo, and has been currently promoted by the UNDL Foundation.The UNDL Foundation is a non-profit organization based in
Geneva, Switzerland, which has received, from the United Nations, the mandate
for implementing the Universal Networking Language (UNL). [
read
more in the UNDL Foundation website]