Przemyslaw Kaszubski wrote:
>
> Regards to to all the subscribers,
>
> Two questions:
>
> 1. Can anyone explain (or point to a Web source or otherwise easily available source apart from the Church, K.W,, W. Gale, P. Hanks & D. Hindle "Using Statistics in
> Lexical Analysis" in <italic>Lexical Acquisition: Using On-Line
> Resources to Build a Lexicon</italic>. Ed. Uri Zernik. Hillsdale:
> Lawrence Erlbaum, 1991)
> the use of the t-score statistic in collocation retrieval? I mean the
> one used by Cobuild. How does the formula work? I am familiar with
> MI and Z-scores but the t-score seems to be
> in use only in the CobuildDirect service.
>
Try Jeremy Clear's explanation from the Cobuild site of the T-score (and
the MI I think). The address I gave it in my biblio is:
Clear, J 1995, ‘COBUILD Bank of English explanation of stats'. Collins
COBUILD Collocation Concordancer
http://titania.cobuild.collins.co.uk/form.html
(accessed 24th April, 1999).
It's the most clear and accessible that I've found.
Church and Hanks also wrote:
Church, KW, and P Hanks 1990, ‘Word association norms, mutual
information, and lexicography', Computational Linguistics vol 16, no 1
(March 1990), 22-29
You might also try:
Godby, J 1994(?), ‘Two techniques for the identification of phrases in
full text'
http://www.oclc.org/oclc/research/publications/review94/part1/twotech.htm
(Accessed 15th July, 1998).
I don't remember much about it, but think it was related.
> 2. Do you know of corpus analysis
> packages available for researchers that employ this t-score?
Am attaching part of a posting by Oliver Mason from earlier this year --
I think it uses seven(!) different scores for collocations, and was
developed by the Cobuild lot, so I'm sure it would offer the T-score!
Oliver Mason wrote:
. . .I am pleased to announce the release of a corpus browser called
`Qwick', which is now available for download from our website at
http://www.clg.bham.ac.uk/QWICK/index.html.
Qwick allows you to
construct a working corpus from a set of corpora available on the
computer, retrieve concordance lines from this using a simple but
powerful query language, and to compute collocations with a variety of
adjustable parameter settings.
Qwick is implemented in Java and thus is fully platform independent; it
has been extensively tested on Windows and Solaris. . .
>
> I do small corpus research and I am basically after a tool with a statistic that does not favour rare words as much as the MI does. So far TACT's z-scores seem the best option.
>
> Przemek Kaszubski
> ========================================== Przemyslaw Kaszubski, M.A. przemka@amu.edu.pl http://elex.amu.edu.pl/ifa/staff/kaszubski.html
> MY (ENGLISH) (LEARNER) CORPORA PAGE: http://main.amu.edu.pl/~przemka
> School of English Adam Mickiewicz University Al. Niepodleglosci 4 61-874 Poznan, POLAND tel: +48 61 8528820 fax: +48 61 8523103 =========================================
-- Gordon Cain Teacher of ESOL TAFE International Education Centre Liverpool (Sydney) Australia gpcain@rivernet.com.au