Corpora: Re: co-occurrences

From: Bengt Dahlqvist (bengt.dahlqvist@ling.uu.se)
Date: Fri Apr 28 2000 - 10:01:11 MET DST

Next message: Lionel Clement: "Corpora: TAG+5 - Call for participation"

Previous message: Thorsten Brants: "Corpora: LINC-2000 Extended Deadline"
In reply to: Victoria Powers: "(no subject)"
Next in thread: leonel@lingapli.ciges.inf.cu: "Corpora: Pre-Seminars of the Symposium"
Next in thread: Victoria Powers: "(no subject)"
Reply: leonel@lingapli.ciges.inf.cu: "Corpora: Pre-Seminars of the Symposium"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

At 23:34 2000-04-17 -0700, Victoria Powers wrote:
>I sent out an earlier Email about this issue and I haven't found what I
>needed so I thought if I explained what I was looking for better someone
>might have seen a program that would work. I am looking for something that
>will compute co-occurrences. I will be integrating this
>with Perl code on a unix box so I need something that will just output a
>text file of co-occurrences when I run the program on some corpus.

Briefly, try something like this:
A Korn shell script:
   #!/bin/ksh
   # find.sh
   tr '\n' ' ' < text_in | tr '.,:;?!' '\n' | ./co.pl $1 | sort | uniq -c >
list_out
   return
A Perl script:
   #!/usr/bin/perl
   # co.pl
   $keyword = @ARGV[0];
   while (<STDIN>) {
     chop;
     while (m/\s$keyword\s+([^ ]+)/g) {
        print "$keyword $1\n"; } }
Then just invoke the script stating the desired keyword:
   ./find.sh for
Beware that one might want other clause/sentence delimiters and
maybe also a way to handle words within quotes and parentheses.

Bengt Dahlqvist, Ph.D.
Uppsala University

Next message: Lionel Clement: "Corpora: TAG+5 - Call for participation"
Previous message: Thorsten Brants: "Corpora: LINC-2000 Extended Deadline"
In reply to: Victoria Powers: "(no subject)"
Next in thread: leonel@lingapli.ciges.inf.cu: "Corpora: Pre-Seminars of the Symposium"
Next in thread: Victoria Powers: "(no subject)"
Reply: leonel@lingapli.ciges.inf.cu: "Corpora: Pre-Seminars of the Symposium"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Apr 28 2000 - 09:02:32 MET DST