AW: Corpora: non-alphabetic language databases

From: Thomas Schmidt (thomas.schmidt@uni-hamburg.de)
Date: Thu Nov 30 2000 - 12:59:38 MET

  • Next message: Mcenery, Tony: "RE: Corpora: non-alphabetic language databases"

    The unicode standard is indeed a promising solution for representing
    non-alphabetic characters of any kind. Concerning the original question: I
    don't know much about sign languages, but I wouldn't be surprised if the
    unicode consortium has taken or will take these into account.If they don't,
    the design of the unicode standard leaves room for user-defined symbols, so
    it should be possible, for instance, to code alphabetic and sign language
    symbols within one document.
    The unicode homepage is on

            http://www.unicode.org/

    -----Ursprüngliche Nachricht-----
    Von: Simon G. J. Smith [SMTP:smithsgj@eee.bham.ac.uk]
    Gesendet am: Donnerstag, 30. November 2000 12:34
    An: corpora@hd.uib.no
    Betreff: Re: Corpora: non-alphabetic language databases

    Paula

    Have a look at www.chinesecomputing.com

    Are you a student of one of these languages? Take a look at a website from
    one of the countries, without character-reading software running, and you
    will see that each character is represented by two ASCII characters -
    usually obscure things like ^ or ` and others that are not on the qwerty
    keyboard at all.

    My understanding is this: order of database entry is not based on any
    phonetic system, nor on any arrangement of radicals or character
    components, but on a standard (for Chinese, usually one of Big-5 or GB
    (Guo-Biao)) which maps each character on to an arbitrary pair of ASCII
    characters. With the advent of the Unicode standard, a one-to-one mapping
    is also now possible, but implementations are rare.

    I'm not an expert: perhaps there's one around who would care to add their
    comments?



    This archive was generated by hypermail 2b29 : Thu Nov 30 2000 - 12:52:50 MET