Dear all,
size, i.e. number of words, is obviously not the only factor when
compiling a corpus for special investigations. Far more important
seems to be to get at least 400 cases of whatever you are looking for.
It can be shown that even in the worst case of a balanced distribution
when looking at a variable with two values [e.g. ASPECT:
progressive/non-progressive --> 50%/50%] the results will be
significant at the alpha=0.05 level (n = (4*p*(1-p))/alpha^2). I
wonder if anyone has done some work on this and can comment on the
number of necessary cases if the variable has got more than two values
(e.g. SUBJECT: 1PSG, 2PSG, etc.)
Best, Norbert
------------------------
Norbert Schlüter
English Language Pedagogy
Freie Universität Berlin
nosch@zedat.fu-berlin.de
This archive was generated by hypermail 2b29 : Sun Jun 03 2001 - 14:27:07 MET DST