[Corpora-List] Second CfP: Pre-Conference Workshop on Multilingual Corpora

From: Silvia Hansen (hansen@CoLi.Uni-SB.DE)
Date: Mon Jan 06 2003 - 12:39:52 MET

  • Next message: info@folli.org: "[Corpora-List] CfP: ESSLLI'03 Student Session"

          Apologies to those of you who receive this more than once

                            ** CALL FOR PAPERS **

                            Multilingual Corpora:
              Linguistic Requirements and Technical Perspectives

       A pre-conference workshop to be held at
                          Corpus Linguistics 2003

           Lancaster, 27 March 2003

               http://www.comp.lancs.ac.uk/ucrel/cl2003
        http://www.coli.uni-sb.de/mocu03

    ORGANIZED BY:

    Stella Neumann (Department of Applied Linguistics, Translation and
    Interpreting)
    Silvia Hansen (Department of Computational Linguistics)

    Saarland University, Saarbrücken, Germany

    TOPIC AND MOTIVATION:

    How do researchers go about building multilingual corpora? For the
    development of a linguistically interpreted corpus on the basis of more
    than one language there seem to be two methods: First, the multilingual
    corpus is split up into monolingual sub-corpora which are then annotated
    independently. For the second method, one language serves as the basis for
    building up and interpreting a multilingual corpus, whereas the other has
    to be adapted. Both methods, however, are rather problematic. They do not
    take sufficiently into account the differences and commonalities between
    the languages in question at each stage of corpus-based research, involving
    the comparability of the corpus design, the different kinds of
    segmentation, the diverging annotation schemes, the corpus representations
    and finally the again converging querying across different languages.
    Mistakes or inconsistencies which happen at one stage of the multilingual
    corpus development have negative influences on the following steps and
    result in worse mistakes or inconsistencies. Not only do these problems
    arise at each methodological step. They also multiply with the growing
    complexity of the research design. If the research aims at interpreting
    linguistic data on several levels, cross-linguistic comparability has to be
    taken into account on each level.

    The goal of the workshop is to bring together researchers who formulate
    specific requirements of how to work with corpora under a linguistic
    perspective and engineers who can offer technical solutions but need the
    input of users to adapt their tools to the needs of the linguists. Within
    this context, questions like the following are to be discussed:
    - What happens, if the units under investigation diverge on the different
    levels?
    - At present, the preferred solution is to use XML at all stages and on all
    layers. But is this really practicable?
    - Do linguists get along with stand-off mark-up?
    - Is this maybe a technical compromise?

    The workshop should result in a requirement catalogue in combination with
    technical solutions. It could thus serve as a starting point for the
    development of an annotation typology which takes into account different
    languages as well as different annotation layers. On the basis of this
    typology, the comparability of a multilingual multi-layer annotated corpus
    can be guaranteed. With this in mind, a multilingual corpus builder should
    be able to cope with possible problems in each of the above explained steps
    in corpus development.

    Papers are expected on the following questions:
    - linguistic requirements in the different methodological steps
    - state-of-the-art technical solutions
    - international standards which facilitate the development and exchange of
    multilingual corpora

    WORKSHOP PROFILE:
    The workshop will take a full day comprising about 8-10 papers. Short
    presentations are expected leaving enough time for discussion and
    assessment of the used methodologies as well as the development of possible
    solutions. This already points to the workshop agenda: The first third will
    deal with linguistic fundamentals, the second part will discuss the
    technical aspects and the last third will provide a platform for
    integrating both perspectives. Workshop proceedings will be produced.

    PROGRAMME COMMITTEE:

    Silvia Bernardini, Bologna
    Sabine Brants, Palo Alto
    Andreas Eisele, Saarbrücken
    Stefan Evert, Stuttgart
    Silvia Hansen, Saarbrücken
    Tony Hartley, Leeds
    Natalie Kübler, Paris
    Stella Neumann, Saarbrücken
    Mick O'Donnell, Madrid
    Maeve Olohan, Manchester
    Elke Teich, Saarbrücken
    Spela Vintar, Ljubljana
    Federico Zanettin, Bologna

    SCHEDULE:

    20 January 2003: Deadline for submitted papers
    21 February 2003: Notification of acceptance
    7 March 2003: Camera ready copy
    27 March 2003: Workshop

    REGISTRATION:

    Please refer to the main conference web page
    (http://www.comp.lancs.ac.uk/ucrel/cl2003) for registration details.

    SUBMISSIONS:

    Please send submissions in English as RTF or plain text files (preferably
    by email) to the address below. Paper length should be 8-10 pages,
    formatted
    in the same way as for the main conference
    (see http://www.comp.lancs.ac.uk/ucrel/cl2003/style.html
    for paper format guidelines).

    Stella Neumann (st.neumann@mx.uni-saarland.de)
    Department of Applied Linguistics, Translation and Interpreting (FR 4.6)
    Saarland University
    Postfach 15 11 50
    66041 Saarbrücken
    Germany



    This archive was generated by hypermail 2b29 : Tue Jan 07 2003 - 08:57:47 MET