in response to a posting from Lou Burnard Wed, 20 Jun 2001:
> I am in the process of writing a brief guide to how this can be done,
I cannot refraim from making some remarks here. Sara undoubtedly is among the
most frequently used programs in this field (as the BNC plays an important role
in corpus linguistics).
Nonetheless I doubt that the use of Sara for querying other corpora is
desirable. It is common sense that the software has been tailored to fit the
structure of the BNC (or vice versa, which does not really matter here). I am
not too sure if the software is flexible enough to meet the requirements of many
other corpora, as we have to keep in mind that corpus data per se has a very
informal structure: consider presence or absence of POS tags, base forms, ... Or
meta information such as titles of sample texts, legal information, dates of
publication, lists of categories the samples belong to, ...
And we must not forget that (at least) the Sara client has its weaknesses in
terms of limitations concerning size of query results etc - or being bound to a
particular hardware platform.
There is in fact a bunch of disadvantages and shortcomings of such proprietary
systems I could address here (and some of which I am going to address on a
conference soon), which forces me to claim a more flexible, more general
approach.
In the future the user should have the possibility to choose which corpus tool
to use for querying ANY corpus. Some programs are already available (which have
some limitations, too). Others are being developed right now.
Regards
Thomas Künneth
--- Thomas Kuenneth M.A. Universitaet Erlangen-Nuernberg Institut fuer Germanistik Abteilung Computerlinguistik Bismarckstr. 6 * D-91054 Erlangen * Tel.: +49 9131 8529250 http://www.linguistik.uni-erlangen.de/~tommi
This archive was generated by hypermail 2b29 : Wed Jun 20 2001 - 14:24:23 MET DST