some 'working definitions' (mind the ambiguity :-)...
head-word first word in a dictionary/lexicon entry
lexeme symbol sequence (string literal) of the head-word
in a dictionary/lexicon entry
lemma set of all forms subsumed under a dictionary/lexicon entry
lemmatization mapping of a word-form to the lexeme of the lemma
the word-form belongs to
lexicon in computational linguistics: computer-readable form
of a ->dictionary (i.e. a linguistic resource)
in real life: a list of concepts in the world with
explanations ordered by their names (i.e. a resource
of world knowledge)
dictionary in computational linguistics: human-readable form of
a set of lemmata with annotations ordered alphabetically
or phonetically by lexemes
word-form sequence of characters that belongs to the language under
consideration (warning: in formal language theory, this
is called 'word')
Canonization (a generalization of lemmatization sometimes used in IR)
means mapping a string to a representative of the class the string
belongs to (e.g. according phonetical similiarity as in Russell & Odell's
SOUNDEX algorithm).
Usage in linguistics itself is highly problematic, few people use internally
consistent terms, and there is little concensus across sub-communities,
which
is why most people begin defining their own usage in the initial chapters of
their works.
> On Wed, 3 Nov 1999, Przemyslaw Kaszubski wrote:
>
[...]
> > Can anyone enlighten me definitively (or refer me to a source) on
> > the
> > distinction between lemma and lexeme?.
-- Jochen Leidner, M.A. <jochen.leidner@sap.com> Software Engineer <http://www.sap.com/> Knowledge Warehouse -- All views expressed are my own. SAP AG, Walldorf, Germany. phone +49 (6227) 7-63773 fax +49 6227 7-73773