Hi there again.
about two weeks ago I posted a query about corpus metadata. I also
promised to post a summary. Thank you very much for the answers (total
8), and here is the summary.
- Mikko
Here is the original query:
>From mlounela@kotus.fi Mon Jun 24 09:41:28 2002
>Date: Wed, 5 Jun 2002 13:36:14 +0300 (EET DST)
>From: Mikko Lounela <mlounela@kotus.fi>
>To: CORPORA@HD.UIB.NO
>Subject: Corpus metadata
>
>
>Hello everybody.
>
>I am currently trying to figure out what information to include in text
>corpora metadata. At this point, I'm trying to collect references. So, if
>you have any to share, I would be most grateful. Summary will follow.
>
> - Mikko Lounela
Here is a brief summary:
Paul Clough recommended two books:
Corpus Linguistics (1996), Tony McEnery and Andrew Wilson, Edinburgh
textbooks in empirical linguistics. and
Corpus Annotation (1997), Roger Garside, Geoffrey Leech and Tony McEnery,
Longman.
Mickel Grönroos told that the Language Bank of Finland uses a metadata
set that resembles Dublin Core
(<http://www.dublincore.org/documents/1999/07/02/dces/>).
Lou Burnard guided to the TEI guidelines
(<http://www.tei-c.org/Guidelines>, in particular chapters 5 and 23).
Manne Miettinen told to have a look at IMDI and OLAC
(<http://www.mpi.nl/ISLE/index.html>,
<http://www.language-archives.org/>)
Rita Simpson recommended articles by Simpson & Powell in the book
edited by Rita Simpson & John Swales, Corpus Linguistics in North
America: Selections from the 1999 Symposium, 2001, Univ. of Michigan
Press and another article by Simpson, Lucka & Ovens in the proceedings
volume of TALC 1998, edited by Burnard & McEnery.
Sven Hartrumpf suggested the Corpus Encoding Standard
(<http://www.cs.vassar.edu/CES/>
esp. <http://www.cs.vassar.edu/CES/CES1-3.html>).
Martin Wynne gave a few pointers, which were the TEI guidelines, BNC
User Reference Guide section 8
(<http://www.hcu.ox.ac.uk/BNC/World/HTML/cdifhd.html>), OLAC, and also
mentioned a seminar to be held at the Oxfrod Text Archive
(<http://www.oucs.ox.ac.uk/ltg/courses/summer/documents/corpora.htm>)
Truus Kruyt recommended Kruyt & Dutilh 1997 at <www.inl.nl> sub
Publications.
Here are all the answers (some in Finnish):
**************************************
From p.clough@dcs.shef.ac.uk
Mon Jun 24 09:42:46 2002 Date: Wed, 5 Jun 2002 12:02:05 +0100 From:
Paul Clough <p.clough@dcs.shef.ac.uk> To: Mikko Lounela
<mlounela@kotus.fi> Subject: Re: Corpora: Corpus metadata
Mikko,
Two references for you:
Corpus Linguistics (1996), Tony McEnery and Andrew Wilson, Edinburgh
textbooks in empirical linguistics.
Corpus Annotation (1997), Roger Garside, Geoffrey Leech and Tony McEnery,
Longman.
These both mention meta-linguistic information.
Best,
Paul.
----------------------------------------------------------------------------
---------------------
Paul Clough
Natural Language Processing Group,
Department of Computer Science,
University of Sheffield,
G35 Regent Court,
211 Portobello Street,
SHEFFIELD,
S1 4DP.
**************************************
This archive was generated by hypermail 2b29 : Mon Jun 24 2002 - 09:46:49 MET DST