Re: [Corpora-List] On tools for indexing and searching large corpora

From: Arne Fitschen (fitschen@ims.uni-stuttgart.de)
Date: Tue Nov 19 2002 - 14:04:56 MET

Next message: Michael Goetze: "[Corpora-List] Corpora with annotated information structure?"

Previous message: mdavies@ilstu.edu: "Re: [Corpora-List] On tools for indexing and searching large corpora"
In reply to: mdavies@ilstu.edu: "Re: [Corpora-List] On tools for indexing and searching large corpora"
Next in thread: Olonichev Sergei: "Re: [Corpora-List] On tools for indexing and searching large corpora"
Reply: Olonichev Sergei: "Re: [Corpora-List] On tools for indexing and searching large corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

mdavies@ilstu.edu wrote:
>
> This is a question that I've asked myself many times. I would love to see a
> book that discussed the approach taken by the BNC, the BoE, CREA, corpora based
> on the IMS Corpus Workbench (such as O Público), etc to "look under the hood"
> and see how each of these corpora and indexing schemes is organized. As you
> mentioned, as more and more people start creating 100+ million word corpora, it
> would be a shame if they all ended up having to re-invent the wheel.

I don't know of such a book, but for the IMS Corpus Workbench I believe
that some of the ideas concerning data storage and indexing schemes were
taken from this book:

Ian H. Witten, Alistair Moffat, and Timothy C. Bell
Managing Gigabytes
Compressing and Indexing Documents and Images
May 1999

(here's a link to the second edition of the book:
http://www.cs.mu.oz.au/mg/).

Regards,
Arne Fitschen

Next message: Michael Goetze: "[Corpora-List] Corpora with annotated information structure?"
Previous message: mdavies@ilstu.edu: "Re: [Corpora-List] On tools for indexing and searching large corpora"
In reply to: mdavies@ilstu.edu: "Re: [Corpora-List] On tools for indexing and searching large corpora"
Next in thread: Olonichev Sergei: "Re: [Corpora-List] On tools for indexing and searching large corpora"
Reply: Olonichev Sergei: "Re: [Corpora-List] On tools for indexing and searching large corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Nov 19 2002 - 14:07:27 MET