Re: Corpora: Parallel corpora and French software

From: LDC Office (ldc@unagi.cis.upenn.edu)
Date: Thu Jun 08 2000 - 20:56:57 MET DST

  • Next message: M M Van Zaanen: "Re: Corpora: programming languages for statistical language learning"

    Noelle,

    It may be that the following two corpora are not entirely suitable
    for your research, because they are primarily political and
    legislative in their content. But they are available from the
    LDC, and you can check the LDC catalog web pages for further
    information:

    UN Parallel Text (English/Spanish/French)
    http://morph.ldc.upenn.edu/Catalog/LDC94T4A.html
     
    -- you can request just the English and French data, if you
    prefer; the full corpus is a 3-cdrom set, with one language per
    cdrom, one text document per data file, and alignment at the level
    of document/file only.

    Canadian Hansards (French/English)
    http://morph.ldc.upenn.edu/Catalog/LDC95T20.html

    -- a single cdrom containing
    two distinct sets of parallel text; one set is aligned at the
    sentence level, and the other (smaller) set is aligned at the
    paragraph level (with additional alignment data for individual
    word tokens within paragraphs).

    Please write to ldc@ldc.upenn.edu if you would like further
    information or are interested in purchasing either of these
    collections.

    Best,

    Shannon Sears
    Manager, Intellectual Property Rights and Membership
    ----------------------------------------------------------------------
    Linguistic Data Consortium Phone: (215) 573-1275
    3615 Market Street Fax: (215) 573-2175
    Suite 200 email: ssears@ldc.upenn.edu
    Philadelphia, PA 19104-2608 www: http://www.ldc.upenn.edu

    > From: NOELLE-VERONIQUE SERPOLLET <n.serpollet@lancaster.ac.uk>
    > Subject: Corpora: Parallel corpora and French software
    > To: CORPORA@hd.uib.no
    > Date: Tue, 6 Jun 2000 15:07:16 +0100 (BST)
    > MIME-Version: 1.0
    > Precedence: bulk
    >
    > Apologies if you receive multiple copies of this document
    > ***************************************************
    > Dear list members,
    >
    > I am a French PhD student researching in Corpus Linguistics at
    > Lancaster University. My PhD deals with modality and the
    > subjunctive and my aim is to carry out a contrastive analysis on
    > the French and English languages.
    >
    > I have been working on the Lancaster-Oslo-Bergen corpus (LOB)
    > and on the Freiburg-LOB corpus (FLOB) for the English part
    > of my data.
    > Now I have started working on French corpora.
    > I already have got some corpora (and I am aware of others) that I
    > can use but I was wondering if you could send me a list of data
    > which I could
    > have access to and on which I would be able to carry some
    > analyses. Ideally, I would like to gather a French/English parallel
    > corpus (with the texts being aligned if possible).
    >
    > I will appreciate any contribution and help.
    >
    > Furthermore, are you aware of corpus tools (taggers/lemmatizers)
    > that I could use for my analyses of the French?
    > (I know about Cordial 6 Universites and will probably purchase it,
    > and I am currently working with ParaConc (Barlow, 1995)).
    >
    > I would be grateful if you could tell me where I could obtain
    > a tagger/concordancer which would enable me to retrieve occurrences
    > of the French subjunctive.
    >
    > Thank you in advance for your help, your answers and suggestions.
    > Noelle
    >
    > ----------------------------
    > Noelle SERPOLLET
    > Department of Linguistics and MEL
    > Lancaster University,
    > LANCASTER, LA1 4YT, UK
    > n.serpollet@lancaster.ac.uk
    >
    >



    This archive was generated by hypermail 2b29 : Thu Jun 08 2000 - 20:55:33 MET DST