Re: [Corpora-List] On tools for indexing and searching large corpora

From: Arne Fitschen (fitschen@ims.uni-stuttgart.de)
Date: Tue Nov 19 2002 - 14:04:56 MET

  • Next message: Michael Goetze: "[Corpora-List] Corpora with annotated information structure?"

    mdavies@ilstu.edu wrote:
    >
    > This is a question that I've asked myself many times. I would love to see a
    > book that discussed the approach taken by the BNC, the BoE, CREA, corpora based
    > on the IMS Corpus Workbench (such as O Público), etc to "look under the hood"
    > and see how each of these corpora and indexing schemes is organized. As you
    > mentioned, as more and more people start creating 100+ million word corpora, it
    > would be a shame if they all ended up having to re-invent the wheel.

    I don't know of such a book, but for the IMS Corpus Workbench I believe
    that some of the ideas concerning data storage and indexing schemes were
    taken from this book:

    Ian H. Witten, Alistair Moffat, and Timothy C. Bell
    Managing Gigabytes
    Compressing and Indexing Documents and Images
    May 1999

    (here's a link to the second edition of the book:
    http://www.cs.mu.oz.au/mg/).

    Regards,
    Arne Fitschen



    This archive was generated by hypermail 2b29 : Tue Nov 19 2002 - 14:07:27 MET