ICAME CORPUS COLLECTION - INFORMATION

 

Lancaster/IBM Spoken English Corpus


A corpus of approximately 52,000 words of contemporary spoken British English. The material is available in orthographic and prosodic transcription and in two versions with grammatical tagging (like those for the LOB Corpus). There is an accompanying manual. See further ICAME Journal 12, pp. 76-77.

Example:
[001 SPOKEN ENGLISH CORPUS TEXT A01] 
[In Perspective] 
[Rosemary Hartill] 
[Broadcast notes: Radio 4, 07.45 a.m., 24th November, 1984]

Orthographic version:

Good morning. More news about the Reverend Sun Myung Moon, founder of the Unification church, who's currently in jail for tax evasion: he was awarded an honorary degree last week by

Prosodic version: (file is in 8 bit code, here #nnn is ASCII nnn, decimal)

#143Good `morning || #143`more news about the Reverend _Sun Myung Moon |
_founder of the Unification Church | who's currently in jail | for
tax evasion || he was auwarded an _honorary deigree last week | by
 

Horizontal version:

A01   2 (_( In_IN Perspective_NP )_)  
A01   3 (_( Rosemary_NP Hartill_NP )_)  
A01   5 ^ good_JJ morning_NN ._. ^ more_AP news_NN about_IN the_ATI
A01   5 Reverend_NPT Sun_NP Myung_NP Moon_NP ,_, founder_NN
A01   6 of_IN the_ATI Unification_NNP church_NN ,_, who_WP 's_BEZ
A01   6 currently_RB in_IN jail_NN for_IN tax_NN evasion_NN :_:

Vertical version:

A01   2 001 (     (                                   @ 
A01   2 010 IN    In 
A01   2 020 NP    Perspective 
A01   2 021 )     )                                   @ 
A01   3 001 (     (                                   @ 
A01   3 010 NP    Rosemary 
A01   3 020 NP    Hartill 
A01   3 021 )     )                                   @ 
A01   5 001 ----- ------------------------------------------- 
A01   5 010 JJ    good 
A01   5 020 NN    morning 
A01   5 021 .     . 
A01   5 022 ----- ------------------------------------------- 
 


Conditions on the use of ICAME corpus material

The primary purposes of the International Computer Archive of Modern English (ICAME) are:

  1. collecting and distributing information on (i) English language material available for computer processing; and (ii) linguistic research completed or in progress on this material;
  2. compiling an archive of corpora to be located at the University of Bergen, from where copies of the material can be obtained at cost.

The following conditions govern the use of corpus material distributed through ICAME:

  1. No copies of corpora, or parts of corpora, are to be distributed under any circumstances without the written permission of ICAME.
  2. Print-outs of corpora, or parts thereof, are to be used for bona fide research of a non-profit nature. Holders of copies of corpora may not reproduce any texts, or parts of texts, for any purpose other than scholarly research without getting the written permission of the individual copyright holders, as listed in the manual or record sheet accompanying the corpus in question. (For material where there is no known copyright holder, the person(s) who originally prepared the material in computerized form will be regarded as the copyright holder(s).)
  3. Commercial publishers and other non-academic organizations wishing to make use of part or all of a corpus or a print-out thereof must obtain permission from all the individual copyright holders involved.
  4. The person(s) who originally prepared the material in computerized form must be acknowledged in every subsequent use of it.

Use of ICAME texts within an institution
Though ICAME texts cannot be used and distributed outside the institution making the order, they can be freely used within the institution (department, faculty, university) for the purposes of research and teaching. To prevent any use of the material for commercial and profit-making purposes, it is advisable to limit access to registered computer users within the institution. The way this is done may vary depending upon the institution making the order.