Re: [Corpora-List] token clustering tool

From: Maarten Jansonius (jansonius@lige.ucl.ac.be)
Date: Mon May 24 2004 - 10:00:06 MET DST

Next message: Julia B. Hirschberg: "(no subject)"

Previous message: Deepa Gupta: "[Corpora-List] looking for a postdoctoral position in NLP"
In reply to: Jose Maria Gomez Hidalgo: "Re: [Corpora-List] token clustering tool"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

At 10:19 11-5-2004, you wrote:
>At 09:24 11/05/2004, Murk Wuite wrote:
>>Dear all,
>>
>>Does anyone know of a tool (or algorithm), preferably available freely
>>for research purposes, that takes as its input a corpus only and
>>produces as its output clusters of tokens that occur close to each other
>>relatively often?
>
>It is possible that the document clustering toolkit CLUTO fit your
>necessities, perhaps with some adaptation.
>http://www-users.cs.umn.edu/~karypis/cluto/

WordSmith Tools (not free) has a Cluster function which takes a corpus and
outputs word clusters based on co-occurence statistics.
http://www.lexically.net/wordsmith/
Version 4, while still in beta, can be used freely for about a month.
Wordsmith can be used also with annotated corpora (it can ignore or use tags).

The freeware AntConc program has a similar function for outputting word
clusters.
http://www.f.waseda.jp/anthony/

And here's a further list of links to some similar programs:
http://www.lboro.ac.uk/research/mmethods/research/software/stats.html

Hope this helps,
Maarten Jansonius

_______________________________
Maarten Jansonius
FLTR / GERM / LIGE
Université catholique de Louvain

Collège Erasme, C468
010 / 47.49.73
_______________________________

Next message: Julia B. Hirschberg: "(no subject)"
Previous message: Deepa Gupta: "[Corpora-List] looking for a postdoctoral position in NLP"
In reply to: Jose Maria Gomez Hidalgo: "Re: [Corpora-List] token clustering tool"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon May 24 2004 - 10:19:15 MET DST