English-Norwegian Parallel Corpus

A Research Project



The comparison of languages is of great interest in a theoretical as well as in an applied perspective. It reveals what is general and what is language specific and is therefore important both for the understanding of language in general and for the study of the individual languages compared. The analysis has applications within lexicography, language teaching, and translation studies.

Recently there has been a revival of interest in contrastive studies, partially due to the increasing internationalization of society and the growing need for advanced bilingual and multilingual competence. At the same time, linguistics has become increasingly concerned with the study of language in context, with the emergence of fields like text linguistics, discourse analysis, and pragmatics. The time is ripe for text-based contrastive studies.

Text-based contrastive studies can benefit from the progress in computer processing of texts, which has been a major area of research at the Department of British and American Studies, University of Oslo, and the Norwegian Computing Centre for the Humanities, University of Bergen. The present project extends this work to computer processing of parallel texts.


The aim of the project is (1) to compile a parallel corpus of English and Norwegian texts for computer processing; (2) to develop tools for analysing parallel texts; and (3) to carry out studies of the structure and communicative use of the two languages on the basis of the corpus. Areas to be studied include:

Examples of more general questions to be addressed are: To what extent are there parallel differences in text genres across languages? In what respects do translated texts differ from comparable original texts in the same language? Are there any features in common among translated texts in different languages (and, if so, what are these features)?

The aim of studying translated texts is not to reveal translation mistakes, but rather to use the work of translators as a resource for contrastive analysis and the study of translation problems.

The corpus

The parallel corpus is planned as an open text bank and will be expanded as allowed by the resources available. It is intended as a general research tool, available beyond the present project for applied and theoretical linguistic research. There will be two main parts:

A core corpus consisting of original texts and their translations (English to Norwegian and Norwegian to English). Initially, the focus has been on novels and fairly general non-fictional books. In order to include material by a range of translators, the texts of the core corpus are limited to text extracts (chunks of 10,000 words or more). Provided that there is sufficient funding, the amount and variety of text will be increased to include more specialized material, including legal texts. The current size of the corpus (November 1997) is approximately 2,6 million words.

A supplementary corpus containing texts that are not translations yet comparable in terms of genre and text type. The supplementary corpus will have the functions of controlling for "translationese" (that is, features typical of translated texts) and, in general, of increasing the amount and variety of the material.

Project staff

Stig Johansson, Oslo, project leader (language)
Knut Hofland, Bergen, project leader (programming)
Jarle Ebeling, Oslo, research fellow
Signe Oksefjell, Oslo, research assistant

Associate researchers

Hilde Hasselgård, Oslo, Kay Wikberg, Oslo.


The project is carried out in cooperation with a research group at the University of Lund (headed by Bengt Altenberg and Karin Aijmer) and with similar research teams in Belgium, Denmark, Finland, and Germany. The Nordic network "Språk i kontrast"/ "Languages in Contrast" is supported by Nordisk Forskerutdanningsakademi. Through the cooperation with other contrastive teams, the study can be extended to multilingual comparison. There are also important gains in corpus compilation.

Publication of results

The material will be used for theses at the M.A. and doctoral levels and for post-doctoral research. One doctoral thesis and several M.A. theses are in progress. Results from the project will be published in the form of articles and eventually in book form.


Hasselgård, Hilde. Forthcoming. 'Some methodological issues in a
     contrastive study of word order in English and Norwegian'. To
     appear in B. Altenberg and K. Aijmer (eds), Languages in
     Contrast. Papers from a Symposium on Text-based Cross-linguistic
     Studies in Lund, 4-5 March 1994.

Hofland, Knut. 1996 'A program for aligning English and
     Norwegian sentences'. In S. Hockey, N. Ide, and G.
     Perissinotto (eds.), Research in Humanities Computing 5. Oxford: Oxford
     University Press. 165-178
     Postscript 146 KB

Johansson, Stig and Knut Hofland. 1993. 'Towards an English-Norwegian
     parallel corpus'. In U. Fries, G. Tottie, and P. Schneider (eds),
     Creating and Using English Language Corpora. Amsterdam: Rodopi: 25-37.

Johansson, Stig, Knut Hofland, and Jarle Ebeling. Forthcoming.
     'Coding and aligning the English-Norwegian parallel corpus'.
     To appear in B. Altenberg and K. Aijmer (eds), Languages in
     Contrast. Papers from a Symposium on Text-based Cross-linguistic
     Studies in Lund, 4-5 March 1994.
     Postscript 90 KB

Johansson, Stig and Jarle Ebeling. Forthcoming. 'Exploring the
     English-Norwegian parallel corpus'. To appear in the
     Proceedings of the Sixteenth ICAME Conference, Toronto, May

Wikberg, Kay. Forthcoming. 'Using the English-Norwegian parallel
     corpus: Questions in English and Norwegian'. To appear in the 
     Proceedings of the Sixteenth ICAME Conference, Toronto, May


Department of British and American Studies
University of Oslo
P.O. Box 1003, Blindern
N-0315   OSLO
Norwegian Computing Centre for the Humanities
University of Bergen
Harald Hårfagres gt. 31
N-5007  BERGEN
World Wide Web: http://gandalf.aksis.uib.no/enpc/
E-mail:   Stig.Johansson@iba.uio.no