Corpora: French corpora and software - Summary

From: NOELLE-VERONIQUE SERPOLLET (n.serpollet@lancaster.ac.uk)
Date: Thu Jun 15 2000 - 13:27:22 MET DST

Next message: Kraaij, Wessel: "Corpora: Abbreviation lists"

Previous message: Tadeusz Piotrowski: "Odp: Corpora: Collaborative venture"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dear list members,

After having thanked the people who helped me with my query regarding
"Parallel corpora and French software", here is now a sunmmary of the
results I obtained:

* software that I could use to tag/analyse my French data

Michael Barlow is currently developing ParaConc.
<The new version will be based on
<the code from MonoConc Pro and will be similar in functionality (but
with
<more functions) to the one that you are using, [ParaConc, 1995], but
the <underlying code will be different.

http://jupiter.inalf.cnrs.fr/WinBrill/
(Maria José Ribeiro <mj.ribeiro@NETC.PT>)

* tagger/concordancer which would enable me to retrieve
occurrences
of the French subjunctive

Cordial 6 Universités a a tagger/lemmatizer for French which does it:
1 Il il PPER3S
2 faut falloir VINDP3S
3 que que SUB
4 je je PPER1S
5 vienne venir VSUBP1S
6 . . PCTFORTE
(Jean Veronis, http://www.up.univ-mrs.fr/~veronis)
For more information, contact SYNAPSE Développement
www.synapse-fr.com

* gather a French/English parallel corpus (with the texts being
aligned if possible).

<ARCADE corpus of ca. 1.5M words of Fr/En texts aligned at sentence
level:
<http://www.up.univ-mrs.fr/~veronis/arcade

<The corpus is distributed by ELRA:
<http://www.icp.grenet.fr/ELRA/home.html
(Jean Veronis, veronis@up.univ-mrs.fr)

Tim Johns' website: http://web.bham.ac.uk/johnstf/timconc.htm

<He's been working on parallel concordancing within the Lingua
<project on multilingual parallel concordancing. I'm not
<quite sure whether you'll find actual corpora there, but
<there may be something, plus probably useful links.
(Antoine Consigny, anconsig@liverpool.ac.uk, anconsig@yahoo.fr)

Two corpora, primarily political and legislative in their content.
available from the LDC:

<UN Parallel Text (English/Spanish/French)
<http://morph.ldc.upenn.edu/Catalog/LDC94T4A.html

<-- you can request just the English and French data, if you
<prefer; the full corpus is a 3-cdrom set, with one language per
<cdrom, one text document per data file, and alignment at the level
<of document/file only.

<Canadian Hansards (French/English)
<http://morph.ldc.upenn.edu/Catalog/LDC95T20.html

<-- a single cdrom containing
<two distinct sets of parallel text; one set is aligned at the
<sentence level, and the other (smaller) set is aligned at the
<paragraph level (with additional alignment data for individual
<word tokens within paragraphs).

Please write to ldc@ldc.upenn.edu if you would like further
information or are interested in purchasing either of these
collections.
(Shannon Sears, Linguistic Data Consortium, ssears@ldc.upenn.edu
www: http://www.ldc.upenn.edu)

I hope this will be of interest to a lot of members.
Noelle
---------------------
Noëlle SERPOLLET
Department of Linguistics and MEL
Lancaster University,
LANCASTER, LA1 4YT, UK
e-mail: n.serpollet@lancaster.ac.uk

Next message: Kraaij, Wessel: "Corpora: Abbreviation lists"
Previous message: Tadeusz Piotrowski: "Odp: Corpora: Collaborative venture"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Jun 15 2000 - 13:26:02 MET DST