Normalization

From UNL Wiki
Revision as of 16:00, 16 July 2014 by Martins (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Normalization is the process of normalizing the input document in order to be better processed. It is carried by N-rules and includes:

  • replacing abbreviations by their corresponding extended forms
  • replacing short forms by their corresponding long forms
  • replacing periphrases direct forms
  • replacing contractions by their components
  • defining processing units

Replacement

Replacement is carried by N-rules written as follows:

({SHEAD|" "})("don’t")({STAIL|" "}):=()("do not")();
({SHEAD|" "})("art. ")({STAIL|" "}):=()("article")();
({SHEAD|" "})("aux")({STAIL|" "}):=()("à les")();

Where:

  • SHEAD = beginning of the sentence
  • STAIL = end of the sentence
  • ({SHEAD|" "}) indicates left context (i.e., either SHEAD or blank space)
  • ({STAIL|" "}) indicates right context (i.e., either SHEAD or blank space)
Software