Re: [Corpora-List] token clustering tool

From: Maarten Jansonius (jansonius@lige.ucl.ac.be)
Date: Mon May 24 2004 - 10:00:06 MET DST

  • Next message: Julia B. Hirschberg: "(no subject)"

    At 10:19 11-5-2004, you wrote:
    >At 09:24 11/05/2004, Murk Wuite wrote:
    >>Dear all,
    >>
    >>Does anyone know of a tool (or algorithm), preferably available freely
    >>for research purposes, that takes as its input a corpus only and
    >>produces as its output clusters of tokens that occur close to each other
    >>relatively often?
    >
    >It is possible that the document clustering toolkit CLUTO fit your
    >necessities, perhaps with some adaptation.
    >http://www-users.cs.umn.edu/~karypis/cluto/

    WordSmith Tools (not free) has a Cluster function which takes a corpus and
    outputs word clusters based on co-occurence statistics.
    http://www.lexically.net/wordsmith/
    Version 4, while still in beta, can be used freely for about a month.
    Wordsmith can be used also with annotated corpora (it can ignore or use tags).

    The freeware AntConc program has a similar function for outputting word
    clusters.
    http://www.f.waseda.jp/anthony/

    And here's a further list of links to some similar programs:
    http://www.lboro.ac.uk/research/mmethods/research/software/stats.html

    Hope this helps,
    Maarten Jansonius

    _______________________________
    Maarten Jansonius
    FLTR / GERM / LIGE
    Université catholique de Louvain

    Collège Erasme, C468
    010 / 47.49.73
    _______________________________



    This archive was generated by hypermail 2b29 : Mon May 24 2004 - 10:19:15 MET DST