If you end up deciding to do this yourself, you should consider
uptranslating to SGML, then using our LT NSL suite of SGML tools:
http::/www.ltg.ed.ac.uk/software/nsl/
We use it on the British National Corpus, which is the same size as
the corpus you have in mind.
ht
-----------
Henry S. Thompson, Human Communication Research Centre, University of Edinburgh
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
URL: http://www.cogsci.ed.ac.uk/~ht/