Corpora: Re: co-occurrences

From: Bengt Dahlqvist (bengt.dahlqvist@ling.uu.se)
Date: Fri Apr 28 2000 - 10:01:11 MET DST

  • Next message: Lionel Clement: "Corpora: TAG+5 - Call for participation"

    At 23:34 2000-04-17 -0700, Victoria Powers wrote:
    >I sent out an earlier Email about this issue and I haven't found what I
    >needed so I thought if I explained what I was looking for better someone
    >might have seen a program that would work. I am looking for something that
    >will compute co-occurrences. I will be integrating this
    >with Perl code on a unix box so I need something that will just output a
    >text file of co-occurrences when I run the program on some corpus.

    Briefly, try something like this:
    A Korn shell script:
       #!/bin/ksh
       # find.sh
       tr '\n' ' ' < text_in | tr '.,:;?!' '\n' | ./co.pl $1 | sort | uniq -c >
    list_out
       return
    A Perl script:
       #!/usr/bin/perl
       # co.pl
       $keyword = @ARGV[0];
       while (<STDIN>) {
         chop;
         while (m/\s$keyword\s+([^ ]+)/g) {
            print "$keyword $1\n"; } }
    Then just invoke the script stating the desired keyword:
       ./find.sh for
    Beware that one might want other clause/sentence delimiters and
    maybe also a way to handle words within quotes and parentheses.

    Bengt Dahlqvist, Ph.D.
    Uppsala University



    This archive was generated by hypermail 2b29 : Fri Apr 28 2000 - 09:02:32 MET DST