Dear list members,
After having thanked the people who helped me with my query regarding
"Parallel corpora and French software", here is now a sunmmary of the
results I obtained:
* software that I could use to tag/analyse my French data
Michael Barlow is currently developing ParaConc.
<The new version will be based on
<the code from MonoConc Pro and will be similar in functionality (but
with
<more functions) to the one that you are using, [ParaConc, 1995], but
the <underlying code will be different.
http://jupiter.inalf.cnrs.fr/WinBrill/
(Maria José Ribeiro <mj.ribeiro@NETC.PT>)
* tagger/concordancer which would enable me to retrieve
occurrences
of the French subjunctive
Cordial 6 Universités a a tagger/lemmatizer for French which does it:
1 Il il PPER3S
2 faut falloir VINDP3S
3 que que SUB
4 je je PPER1S
5 vienne venir VSUBP1S
6 . . PCTFORTE
(Jean Veronis, http://www.up.univ-mrs.fr/~veronis)
For more information, contact SYNAPSE Développement
www.synapse-fr.com
* gather a French/English parallel corpus (with the texts being
aligned if possible).
<ARCADE corpus of ca. 1.5M words of Fr/En texts aligned at sentence
level:
<http://www.up.univ-mrs.fr/~veronis/arcade
<The corpus is distributed by ELRA:
<http://www.icp.grenet.fr/ELRA/home.html
(Jean Veronis, veronis@up.univ-mrs.fr)
Tim Johns' website: http://web.bham.ac.uk/johnstf/timconc.htm
<He's been working on parallel concordancing within the Lingua
<project on multilingual parallel concordancing. I'm not
<quite sure whether you'll find actual corpora there, but
<there may be something, plus probably useful links.
(Antoine Consigny, anconsig@liverpool.ac.uk, anconsig@yahoo.fr)
Two corpora, primarily political and legislative in their content.
available from the LDC:
<UN Parallel Text (English/Spanish/French)
<http://morph.ldc.upenn.edu/Catalog/LDC94T4A.html
<-- you can request just the English and French data, if you
<prefer; the full corpus is a 3-cdrom set, with one language per
<cdrom, one text document per data file, and alignment at the level
<of document/file only.
<Canadian Hansards (French/English)
<http://morph.ldc.upenn.edu/Catalog/LDC95T20.html
<-- a single cdrom containing
<two distinct sets of parallel text; one set is aligned at the
<sentence level, and the other (smaller) set is aligned at the
<paragraph level (with additional alignment data for individual
<word tokens within paragraphs).
Please write to ldc@ldc.upenn.edu if you would like further
information or are interested in purchasing either of these
collections.
(Shannon Sears, Linguistic Data Consortium, ssears@ldc.upenn.edu
www: http://www.ldc.upenn.edu)
I hope this will be of interest to a lot of members.
Noelle
---------------------
Noëlle SERPOLLET
Department of Linguistics and MEL
Lancaster University,
LANCASTER, LA1 4YT, UK
e-mail: n.serpollet@lancaster.ac.uk
This archive was generated by hypermail 2b29 : Thu Jun 15 2000 - 13:26:02 MET DST