Corpora: Summary: German word lists

From: Stefan Thomas Gries (StThGries@t-online.de)
Date: Wed Jul 12 2000 - 14:19:08 MET DST

  • Next message: Silvia Bernardini: "Corpora: CULT2K - Programme"

    Dear colleagues

    Recently I posted a query about where to download German word lists. I would
    like to thank the following people (in alphabetical order) for their kind
    assistance:

    Anna Braasch
    Damon Allen Davison
    Pius ten Hacken
    Agnes Muehlmeyer-Mentzel
    Noemi Preissner
    Markus Schulze

    In what follows I provide a list of the sites that were pointed out to me
    with some additional comments:

    http://www.linguistik.uni-erlangen.de/LAPTDA/laptda.html
    These wordlists were taken from seven corpora of the domains electronic data
    processing, geography, law, medicine, sports, linguistics, economics and a
    representative german corpus (LIMAS-corpus). Each of theses corpora contains
    roughly 1.000.000 wordforms. Downloadable are:
    o Frequency lists of morphemes, allomorphs, wordforms of the single corpora.
    o so-called "n-domain-lists" of morphemes, allomorphs, wordforms:
    n-domain-list: list of items that occured in n of the domain-specific
    corpora mentioned above) eg.: the 2-domain-list of medicine and law contains
    all morphems / allomorphs / wordforms that occured in both corpora
    together with their respective frequency information

    http://www.loria.fr/~bonhomme/sw/
    A useful collection of lists for French, English and German (large word
    lists and smaller stop lists)

    http://services.canoo.com/MorphologyBrowser.html
    http://www.unibas.ch/LIlab/projects/wordmanager/wordmanager.html
    They offer not only a list of word forms, but also a morphological analysis
    module. In addition, word formation rules can be applied to recognise newly
    coined compounds and derivations, which is not a trivial advantage in
    German.

    Finally, Agnes Muehlmeyer was so kind to let me have a 360,000 words word
    list (generated on the basis of the German weekly newspaper Die Zeit (1986).

    Apart from the above-mentioned sites directly concerned with word lists, I
    was also directed to some sites with slightly different though related
    contents:
    http://www.kun.nl/celex/
    http://www.ldc.upenn.edu/Catalog/LDC96L14.html
    http://www.cis.uni-muenchen.de/projects/CISLEX.html

    Once again, thanks to all contributors.

    S t e f a n T h . G r i e s
    ----------------------------------------------------------------------------
    B u e r o / O f f i c e :
    Syddansk Universitet
    Institut for Erhvervssproglig Informatik og Kommunikation
    Grundtvigs Allé 150
    6400 Sonderborg
    Daenemark/Denmark



    This archive was generated by hypermail 2b29 : Thu Jul 13 2000 - 16:59:32 MET DST