Appendix 1: General flowchart of Tag Assignment Program


  1. If the word ends in "s apostrophe" then strip the apostrophe; if the word ends in "apostrophe " then strip both characters (and any preceding full-stop).
  2. "Non-words" are the following:
    a letter followed by zero or more digits (0 to 9), possibly followed by a single,
    double, or triple prime, tagged ZZ
    a number* followed by "st", "nd", "rd" or "th", tagged OD
    a number followed by "s" tagged CDS
    a number containing "-", tagged CD-CD
    a number followed by "apostrophe s", tagged CD$
    a number followed (possibly) by a letter, tagged CD
    a word containing a superscript or subscript, tagged &FO
    a word containing letters and digits, but no hyphen, tagged &FO

    *In this context, a "number" means a sequence of digits (0-9) perhaps also including ".", "," and "/INDEX.HTM".
  3. The "standard" prefixes include "a-","co-", "counter-", "de-", "hyper-", "mis-", "out-", "over-", "re-", "retro-", "super-" , and "trans-".
  4. Words ending "ches", "shes", "sses", "zzes", "oes", "xes" have the "es" removed: words with or more letters and ending in "ies" have the "ies" changed to "y"; words ending in "full-stop s" have both characters removed; other words ending in "s" (unless they end in "ss") have it removed.
  5. Tags that take -s are VB (becoming VBZ) and CD, NN, NNP, NNU, NP NPL, NPT, NR (becoming CDS, NNS, NNPS, NNUS, NPS, NPLS, NPTS, NRS).