Corpora: Automatic Word Categorisation - again

From: Klas (klas.prytz@ling.uu.se)
Date: Mon Nov 27 2000 - 12:09:09 MET

  • Next message: Chris Allen: "Corpora: ICAME 2001 web address"

    Dear list members

    Some time ago I posted a request for references to work in the field of
    automatic word categorisation. I want to thank all who answered me. I have
    included the references in this mail.

    Yours sincerely

    /Klas Prytz

    From Jose Maria Gomez Hidalgo

    * David Lewis (http://www.research.att.com/~lewis/) presented in his
    dissertation (1992) an organization of text classification tasks in two
    tracks: document classification and term (word) classification. In the
    first chapter, he describes some term classification tasks oriented to
    Information Retrieval, like term clustering (thesaurus construction) and
    others. This organization and the referneces could be a good start.

    * Manning and Schuetze wrote a book, Foundations of Statistical Natural
    Language Processing, that include the description of some word
    classification tasks: word sense disambiguation, part of speech tagging.
    The main interest of this book is the good introduction to the technics in
    the field.
    Companion website: http://www-nlp.Stanford.EDU/fsnlp/

    From E Tjong Kim Sang

    Jakub Zavrel and Jorn Veenstra, "Continuous Task-Specific Categories
       for Disambiguation: Putting Lexical Constraints (back) in the Lexicon",
       Conference on Architectures and Mechanisms for Language Processing
       (AMLaP-96). Torino, Italy, 1996.

       Jakub Zavrel, "Lexical Space: Learning and Using Continuous Linguistic
       Representations", Masters Thesis, Cognitive Artificial Intelligence,
       Department of Philosophy, Utrecht University, 1996.

       url: http://pcger39.uia.ac.be/~zavrel/

    From Eric Atwell

    Elliott J and Atwell E. 2000. Is anybody out there?: the detection of
    intelligent and generic language-like features. In Journal of the British
    Interplanetary Society, volume 53 no.1/2 pages 13-22, British
    Interplanetary Society, London. ISSN: 0007-084X.

    Elliott J, Atwell, E and Whyte B. 2000. Language identification in unknown
    signals. in Proceeding of COLING'2000, 18th International Conference on
    Computational Linguistics, pages 1021-1026, Association for Computational
    Linguistics (ACL) and Morgan Kaufmann Publishers, San Francisco. ISBN:
    1-55860-717-X (2 volumes).

    Elliott J, Atwell, E and Whyte B. 2000. Increasing our ignorance of
    language: identifying language structure in an unknown signal. In
    Daelemans W (ed) Proceedings of CoNLL-2000: International Conference on
    Computational Natural Language Learning, Lisbon, Portugal.

    Elliott J and Atwell E. 1999. Language in signals: the detection of
    generic species-independent intelligent language features in symbolic and
    oral communications. In Proceedings of the 50th International
    Astronautical Congress, paper IAA-99-IAA.9.1.08, Amsterdam. International
    Astronautical Federation, Paris.

    From Markus Schulze

    at the following URL you will find the key data of DMM - a system for
    morphological analysis (lemmatisation, categorisation, segmentation)
    of the German language:

    http://www.linguistik.uni-erlangen.de/~orlorenz/DMM/DMM.en.html

    This page also contains a link to an interactive demo of the system
    and an link to the full documentation (german only

    From Alexander Clark

    @INPROCEEDINGS{chater-finch1,
      AUTHOR = {Finch, S. and Chater, N.},
      TITLE = {Bootstrapping syntactic categories},
      YEAR = {1992},
      BOOKTITLE = {Proceedings of the 14th Annual Meeting of the
                      Cognitive Science Society},
      PAGES = {820-825},
    }

    @INPROCEEDINGS{chater-finch2,
      AUTHOR = {Finch, S. and Chater, N.},
      TITLE = {Bootstrapping syntactic categories using statistical
                      methods},
      YEAR = {1992},
      BOOKTITLE = {Background and Experiments in Machine Learning of
                      Natural Language},
      PAGES = {229-235},
      EDITOR = {Daelemans, W. and Powers, D.},
      PUBLISHER = {Tilburg University: Institute for Language
                      Technology and AI}
    }

    @INPROCEEDINGS{chater-finch3,
      AUTHOR = {Finch, S. and Chater, N. and Redington, M.},
      TITLE = {Acquiring syntactic information from distributional
                      statistics},
      YEAR = {1995},
      EDITOR = {Levy, Joseph P. and Bairaktaris, Dimitrios and
                      Bullinaria, John A. and Cairns, Paul},
      BOOKTITLE = {Connectionist Models of Memory and Language},
      PUBLISHER = {UCL Press}
    }

    @ARTICLE{brown-92,
      AUTHOR = {Brown, Peter F. and Della Pietra, Vincent J. and de
                      Souza, Peter V. and Lai, Jenifer C. and Mercer,
                      Robert},
      TITLE = {Class-based n-gram models of natural language},
      YEAR = {1992},
      VOLUME = {18},
      PAGES = {467-479},
      JOURNAL = {Computational Linguistics}
    }

    @Article{ney-essen-kneser,
      author = {Ney, Hermann and Essen, Ute and Kneser, Reinhard},
      title = {On Structuring Probabilistic dependencies in
                      stochastic language modelling},
      journal = {Computer Speech and Language},
      year = {1994},
      volume = {8},
      pages = {1-28}
    }

    @INPROCEEDINGS{pereira-cluster,
      AUTHOR = {Pereira, Fernando and Tishby, Natali and Lee,
                      Lillian},
      TITLE = "Distributional Clustering of {English} words",
      YEAR = {1993},
      BOOKTITLE = "Proceedings of the 31st annual meeting of the
                      {Association for Computational Linguistics}"
    }

    @InProceedings{clark-00,
      author = {Clark, Alexander},
      title = {Inducing Syntactic Categories by Context
                      Distribution Clustering},
      pages = {91-94},
      year = {2000},
      booktitle = {Proceedings of CoNLL-2000 and LLL-2000},
      address = {Lisbon, Portugal}
    }

    Klas Prytz
    Institutionen för lingvistik
    Uppsala universitet
    018-471 1174
    Hemadress:
    Nygården
    747 94, Alunda
    0174/133 01



    This archive was generated by hypermail 2b29 : Mon Nov 27 2000 - 12:01:40 MET