Corpora: Annotated Old English corpus now available

From: Susan Pintzuk (sp20@york.ac.uk)
Date: Mon Aug 28 2000 - 15:16:38 MET DST

  • Next message: Straw, Michelle C: "Corpora: Corpora Request for English Based Creoles and Pidgins"

          The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus
                            of Old English

    The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old
    English (henceforth the Brooklyn Corpus) is a selection of
    texts from the Old English Section of the Helsinki Corpus of
    English Texts, annotated to facilitate searches on lexical
    items and syntactic structure. It is intended for the use of
    students and scholars of the history of the English
    language.
    The Brooklyn Corpus contains 106,210 words of Old English
    text;
    the samples from the longer texts are 5,000 to 10,000 words
    in
    length. The texts represent a range of dates of composition,
    authors, and genres. The texts in the Brooklyn Corpus are
    syntactically and morphologically annotated, and each word
    is
    glossed. The size of the corpus is approximately 12
    megabytes.

    The syntactic annotations enable the users to pose and
    answer
    questions about word order, constituent order, abstract
    structure, and syntactic and morphological characteristics
    of
    the texts in the corpus. The annotations are general-purpose
    and as theory-neutral as possible, while still incorporating
    the insights of modern linguistic theory; they can be used
    by
    scholars with widely varying research interests. The
    syntactic
    annotations mark constituents, both clausal and non-clausal,
    by
    labelled brackets, with some relations marked by empty
    categories. The structure assigned to a sentence by the
    labelled bracketing can be quite complex, but it is not a
    complete syntactic analysis: the function of the bracketing
    is
    not to assign a structure to Old English sentences but
    rather
    to facilitate searches.

    The Brooklyn Corpus is available without fee for educational
    and research purposes, but it is not in the public domain.
    More
    information about the Brooklyn Corpus and how to access it
    is
    available at http://www-users.york.ac.uk/~sp20/corpus.html.
    Downloading the Brooklyn Corpus Manual is unrestricted, but
    the
    corpus texts and search scripts are available only to users
    who
    agree formally to the conditions of use.

    Susan Pintzuk
    Department of Language and Linguistic Science
    University of York
    Heslington, York YO1 5DD
    United Kingdom
    sp20@york.ac.uk
    Telephone: +44 1904 432661



    This archive was generated by hypermail 2b29 : Mon Aug 28 2000 - 15:07:33 MET DST