Corpora: ELRA News

From: Valerie Mapelli (mapelli@elda.fr)
Date: Tue Apr 25 2000 - 17:34:19 MET DST

  • Next message: Thorsten Brants: "Corpora: Linguistically Interpreted Corpora - 2nd CfP"

    [ We apologise for the duplicate posting of this announcement ]
    ___________________________________________________________
                                    ELRA
                    European Language Resources Association
                                   ELRA News
    ___________________________________________________________

                         *** ELRA NEW RESOURCES ***

    We are happy to announce new resources available via ELRA:

    ELRA-S0083 ISLE Speech Corpus
    ELRA-W0015 "Le Monde" Text Corpus - Year 1999

    A description of each database is given below.

    _______________________________________
    ELRA-S0083 ISLE Speech Corpus
    _______________________________________

    This corpus contains approximately 20 minutes of speech
    (per speaker) from 23 German and 23 Italian intermediate
    learners of English. Each speaker recorded sentences from
    several blocks of various types (reading simple sentences,
    using minimal pairs, giving answers to multiple choice
    questions). The prompts were of varying perplexities.

    About 2/3 of the data for each speaker was annotated by a
    team of linguists. The files were corrected first at the word
    level, and an automatic recogniser was then used to produce
    phone-level annotations. The annotator then re-annotated
    each sentence to mark phone and stress errors (e.g.,
    substitutions, insertions, or deletions).

    Corpus details:
    · a total of 46 speakers (23 German and 23 Italian)
    · 11484 utterances
    · 1.92 gigabytes of WAV files (4 CDs)
    · 17 hours, 54 minutes, and 44 seconds of speech data
     
    A much more detailed explanation of the ISLE corpus
    will be available in the proceedings of LREC 2000. An
    electronic copy of this paper may be obtained at ELRA
    (Reference: W. Menzel, E. Atwell, P. Bonaventura, D. Herron,
    P. Howarth, R. Morton, and C. Souter (in preparation). "The
    ISLE corpus of non-native spoken English", Proc. Second
    International Conference on Language Resources and Evaluation).

    _______________________________________
    ELRA-W0015 "Le Monde" Text corpus - Year 1999
    _______________________________________

    Electronic archiving of "Le Monde" articles started on 1
    January 1987. Some 200 articles are added every day, making
    it the biggest of its kind for all French daily newspapers.The
    corpus is available in an ASCII text format. Each month consists
    of some 10 MB of data (circa 120 MB per year). Data ranging
    from 1987 until 1999 are available through ELRA (each buyer
    may purchase up to 5 years of data).

    =====================================
    For further information, please contact:

         ELRA/ELDA Tel +33 01 43 13 33 33
         55-57 rue Brillat-Savarin Fax +33 01 43 13 33 30
         F-75013 Paris, France E-mail mapelli@elda.fr

    or visit the online catalogue on our Web site:

         http://www.icp.grenet.fr/ELRA/home.html
         or http://www.elda.fr
    =====================================



    This archive was generated by hypermail 2b29 : Tue Apr 25 2000 - 17:35:09 MET DST