Re: Corpora: Parallel corpora and French software

From: LDC Office (ldc@unagi.cis.upenn.edu)
Date: Thu Jun 08 2000 - 20:56:57 MET DST

Next message: M M Van Zaanen: "Re: Corpora: programming languages for statistical language learning"

Previous message: John Colby: "Corpora: programming languages for statistical language learning"
In reply to: NOELLE-VERONIQUE SERPOLLET: "Corpora: Parallel corpora and French software"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Noelle,

It may be that the following two corpora are not entirely suitable
for your research, because they are primarily political and
legislative in their content. But they are available from the
LDC, and you can check the LDC catalog web pages for further
information:

UN Parallel Text (English/Spanish/French)
http://morph.ldc.upenn.edu/Catalog/LDC94T4A.html

-- you can request just the English and French data, if you
prefer; the full corpus is a 3-cdrom set, with one language per
cdrom, one text document per data file, and alignment at the level
of document/file only.

Canadian Hansards (French/English)
http://morph.ldc.upenn.edu/Catalog/LDC95T20.html

-- a single cdrom containing
two distinct sets of parallel text; one set is aligned at the
sentence level, and the other (smaller) set is aligned at the
paragraph level (with additional alignment data for individual
word tokens within paragraphs).

Please write to ldc@ldc.upenn.edu if you would like further
information or are interested in purchasing either of these
collections.

Best,

Shannon Sears
Manager, Intellectual Property Rights and Membership
----------------------------------------------------------------------
Linguistic Data Consortium Phone: (215) 573-1275
3615 Market Street Fax: (215) 573-2175
Suite 200 email: ssears@ldc.upenn.edu
Philadelphia, PA 19104-2608 www: http://www.ldc.upenn.edu

> From: NOELLE-VERONIQUE SERPOLLET <n.serpollet@lancaster.ac.uk>
> Subject: Corpora: Parallel corpora and French software
> To: CORPORA@hd.uib.no
> Date: Tue, 6 Jun 2000 15:07:16 +0100 (BST)
> MIME-Version: 1.0
> Precedence: bulk
>
> Apologies if you receive multiple copies of this document
> ***************************************************
> Dear list members,
>
> I am a French PhD student researching in Corpus Linguistics at
> Lancaster University. My PhD deals with modality and the
> subjunctive and my aim is to carry out a contrastive analysis on
> the French and English languages.
>
> I have been working on the Lancaster-Oslo-Bergen corpus (LOB)
> and on the Freiburg-LOB corpus (FLOB) for the English part
> of my data.
> Now I have started working on French corpora.
> I already have got some corpora (and I am aware of others) that I
> can use but I was wondering if you could send me a list of data
> which I could
> have access to and on which I would be able to carry some
> analyses. Ideally, I would like to gather a French/English parallel
> corpus (with the texts being aligned if possible).
>
> I will appreciate any contribution and help.
>
> Furthermore, are you aware of corpus tools (taggers/lemmatizers)
> that I could use for my analyses of the French?
> (I know about Cordial 6 Universites and will probably purchase it,
> and I am currently working with ParaConc (Barlow, 1995)).
>
> I would be grateful if you could tell me where I could obtain
> a tagger/concordancer which would enable me to retrieve occurrences
> of the French subjunctive.
>
> Thank you in advance for your help, your answers and suggestions.
> Noelle
>
> ----------------------------
> Noelle SERPOLLET
> Department of Linguistics and MEL
> Lancaster University,
> LANCASTER, LA1 4YT, UK
> n.serpollet@lancaster.ac.uk
>
>

Next message: M M Van Zaanen: "Re: Corpora: programming languages for statistical language learning"
Previous message: John Colby: "Corpora: programming languages for statistical language learning"
In reply to: NOELLE-VERONIQUE SERPOLLET: "Corpora: Parallel corpora and French software"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Jun 08 2000 - 20:55:33 MET DST