Cheers
Tony
In the last mail Marc Weeber said:
>
> Hello corpora people,
>
> At the moment, I'm trying to isolate certain areas in a corpus to
> extract area-specific keywords. The corpus consists of abstracts of
> medical articles concerning one drug. I'm interested in extracting
> the side effects of this drug. I have located the areas concerning
> side effects, and I want to compare these areas with the rest of the
> corpus. The method I'm using is the keyword program of the WordSmith
> Tools package. This program compares the frequencies of words between the
> subset and the complete corpus. Words that are more frequent in the
> subset compared to the complete set (test with CHI square) are called
> `keywords' of the subset.
>
> Now I have two questions:
>
> 1 what exactly should I use as reference corpus: the complete corpus
> of abstracts or the complete corpus minus the subset. In the former
> case, words that occur in the subset are counted twice (in subset and
> in reference corpus). The results will be more conservative compared
> to the latter case. However, I don't know which method to use, which
> leads to the second question:
> 2 can someone give me more background on the use of keywords as
> means of comparison between two sets (*actually, list of words).
> Commments, references to books, articles, URL's, etc, would be much
> appreciated.
>
> thanks in advance,
>
> Marc Weeber
> marc@farm.rug.nl
>
-- --------------------------------------------------- Tony Berber Sardinha | tony1@liverpool.ac.uk AELSU | Fax 44-51-794-2739 University of Liverpool | PO Box 147 | http://www.liv.ac.uk/ Liverpool L69 3BX | ~tony1/homepage.html UK | --------------------------------------------------- My karma ran over my dogma ...... `' -o-o- Everything should be as simple as possible but no simpler. (A Einstein)