Re: Corpora: Parallel corpus

From: Philip Resnik (resnik@umiacs.umd.edu)
Date: Tue Dec 19 2000 - 21:28:53 MET

  • Next message: Mark Davies: "Re: Corpora: Parallel corpus"

    Yuliya Katsnelson asked:
    > >I am looking for a parallel corpus (news, etc.) in English and
    > >optimally, Eastern European languages.

    Mike Maxwell Mike_Maxwell@sil.org replied:
    > For nearly every written language, there is at least one parallel
    > corpus: the Bible (or at least the New Testament). There are
    > obvious shortcomings with such a source (the alignment is at the
    > verse level, which may be too broad for some purposes; much of the
    > vocabulary is likely to be in semantic domains not of wider
    > interest; there are issues of translation style; the corpus may be
    > too small; etc.). But it's there, and in many cases should be
    > available in electronic form, perhaps even on the web.

    At the University of Maryland we've done some work on systematizing
    the Bible as a parallel corpus using the Corpus Encoding Standard
    (CES), as well as investigating the properties of the text from a
    computational linguistics perspective. See the Web page at
    http://umiacs.umd.edu/~resnik/parallel/ for information and
    references.

      Philip

      ----------------------------------------------------------------
      Philip Resnik, Assistant Professor
      Department of Linguistics and Institute for Advanced Computer Studies

      1401 Marie Mount Hall UMIACS phone: (301) 405-6760
      University of Maryland Linguistics phone: (301) 405-8903
      College Park, MD 20742 USA Fax : (301) 405-7104
      http://umiacs.umd.edu/~resnik E-mail: resnik@umiacs.umd.edu



    This archive was generated by hypermail 2b29 : Tue Dec 19 2000 - 21:26:24 MET