At 11:34 AM 3/23/00 GMT, you wrote:
>
>Hi. Can anyone help me with the following:
>
>I'm looking for software - preferably freeware or shareware - to
>use to download text from Web sites, for use in a corpus.
>Geoff Wilkins
By far the best spider (I have tested over a dozen commercialware and
shareware) is httrack
developed by Xavier Roche and Yann Philippot at CERN. The software if
freeware and is available for Unix, Linux, Solaris and Windows platforms. I
have archived sites up to 250MB in size and over 40000 files with no
difficulty at all. The spider is highly customizable, has extensive support
for JavaScript and can easily gather dynamic or database driven (e.g. asp,
cfm) web sites.
The software and the documentation can be found at http://httrack.free.fr
Christian Coseru
This archive was generated by hypermail 2b29 : Mon Mar 27 2000 - 08:12:07 MET DST