Corpora: bigram statistics package (v0.1)

From: ted pedersen (tpederse@d.umn.edu)
Date: Thu Dec 07 2000 - 23:36:15 MET

  • Next message: Julien Nioche: "Corpora: is the MUC official site dead?"

    I'd like to announce the availability of the Bigram Statistics Package.
    This is an easy to use tool for counting and analyzing bigram frequencies
    in text. It is free software (written in Perl) that you can download from:

    http://www.d.umn.edu/~tpederse/code.html

    The following statistical tests are currently supported:

    Fisher's exact test, the likelihood ratio, Pearson's chi squared test,
    the Dice Coefficient, and Mutual Information

    BSP also provides:

    1) A tool for comparing ranked lists of bigrams from two different
    tests. This allows you to measure the difference in the rankings
    obtained from test X and test Y for a given corpus.

    2) The ability to easily implement and incorporate your own tests into the
    package. The package is designed so you can do so with minimal knowledge
    of Perl and our underlying implementation.

    We would be very interested to hear if you find this code useful (or not).
    This is an on-going project so suggestions for improvements, fixes, etc
    would be much appreciated.

    Enjoy!

    Ted

    -- 
    # Ted Pedersen                            http://www.d.umn.edu/~tpederse #
    # Department of Computer Science                      tpederse@d.umn.edu #
    # University of Minnesota Duluth                                         #
    # Duluth, MN 55812                                        (218) 726-8770 #
    



    This archive was generated by hypermail 2b29 : Thu Dec 07 2000 - 23:33:18 MET