Corpora: Survey of the answers about IR info and tools

From: Patrick Ruch (ruch@dim.hcuge.ch)
Date: Fri Oct 27 2000 - 17:34:42 MET DST

  • Next message: Przemyslaw KASZUBSKI: "Corpora: CFP: CCAAL/PLM2001"

    Some days ago,

    I asked (on several lists) about tools and info on vectors distance and
    indexing strategies. My question was very general, however the main target
    was concerned with IR application. I was expecting answers about packages
    for computing any kind of features distances (vectors, Boolean, Euclide,
    Levenshtein...). I should have said that our system implements its own
    indexing strategy.

    I would like to thanks:
    Romaric Besancon, Eric Gaussier, Paul Holmes-Higgin,
    Andrew MacFarlane, Ian Soboroff, Richard Boulton,
    Jian-Yun Nie, and Christian Boitet.

    Here is a survey of the available tools:

    Andrew McCallum's Bag Of Words library:
    Open source, seems complete.
    http://www.cs.cmu.edu/~mccallum/bow

    SMART: it is a very complete IR system (indexing, retrieval,
    stop words for English and Spanish...),
    totally open source.
    (ftp.cs.cornell.edu/pub/smart/).

    Muscat:
    http://open.muscat.com/
    The indexing portion of Muscat is still closed-source.

    I have started to install SMART.
    Thanks again,
    Patrick

    __________________________________
    Patrick Ruch
    HUG - Medical Informatics Division
    CH-1211 Geneva 14
    tel.: (+41 22) 372 61 64
    fax: (+41 22) 372 48 55
    email: Patrick.Ruch@dim.hcuge.ch



    This archive was generated by hypermail 2b29 : Fri Oct 27 2000 - 17:31:55 MET DST