Re: Corpora: Phonemic Corpora

From: Bill Fisher (william.fisher@nist.gov)
Date: Tue Nov 14 2000 - 15:53:59 MET

  • Next message: Bill Fisher: "Re: Corpora: Freely accessible Corpus of American English"

    VSWarren@aol.com wrote:

    > Can anyone please suggest either a program to convert from orthographic to
    > phonemic or alternatively a large corpora where phonemic transcriptions are
    > given for such a large number of different words.

      You can download software that does a pretty good job
    of converting text to segmental phonemes from the NIST
    website: see http://www.nist.gov/speech/tools/index.htm.
    But you should be aware that the output from this is
    phonemic underlying forms that often are realized differently
    in actual speech; for instance, what usually surfaces as
    syllabic consonants are phonemicized as a sequence of
    (zero-stressed) schwa plus consonant (as in "button").

      A good free lexicon of English is available from CMU;
    see http://www.speech.cs.cmu.edu/cgi-bin/pronounce.
    In addition, the LDC offers a high-accuracy one, but
    it's not free. And Joe Picone & Co. of Mississippi State
    are making available lexicons derived from their
    re-transcription of the Switchboard corpus along with
    phonetic transcriptions from ICSI; see
    http://www.isip.msstate.edu/projects/switchboard/index.html.

     - Bill F.



    This archive was generated by hypermail 2b29 : Tue Nov 14 2000 - 15:51:39 MET