FreeText   Frida   ICLE   Lindsei   CECL's home

Centre for English Corpus Linguistics - CECL

International Corpus of Learner English - ICLE

S. Granger

Table of contents

  1. Some Background
     1.1. For further information contact...
     1.2. Project director
     1.3. Researchers
     1.4. ICLE partners/contributors

  2. Why join the project ?

  3. What ICLE can do for you !

  4. Guidelines for collecting sub-corpus
     4.1. Request students to fill in a learner profile
     4.2. Collect the right type of material
     4.3. Send material to Louvain
     4.4. Corpus format

  5. Suggested essay titles

(( 1. Some background ))


The Louvain Centre for English Corpus Linguistics has played a pioneering role in promoting computer learner corpora (CLC) and was among the first, if not the first, to begin compiling such a corpus. The Centre's computerised  databank is known as the International Corpus of Learner English (ICLE) and is the result of over ten  years of collaborative activity between a number of universities internationally and currently contains over 3 million words of writing by learners of English from 21 different mother tongue backgrounds. The writing in the corpus has been contributed by advanced learners of English as a foreign language rather than as a second language and is made up of 21 distinct sub-corpora, each containing one language variety (E2French, E2German, E2Swedish etc). The type of writing being collected is essay writing (see below for fuller details). Advanced students can, for the purpose of the project, be broadly defined as university students of English in their 3rd or 4th year of study. In cases where the comparability of the level is in doubt, sample pieces of writing should be submitted beforehand.          

  1. For further information contact:

  2. Centre for English Corpus Linguistics       
    Université catholique de Louvain

    Place Blaise Pascal, 1
    B-1348 Louvain-la-Neuve

    Tel : +32 10 474034
    Fax : +32 10 474942

  3. Project director

  4. Sylviane Granger
    Tel : +32 10 474947
    E-mail :

  5. Researchers

  6. Fanny Meunier
    Tel : +32 10 474974       
    E-mail :

    Estelle Dagneaux
    Tel : +32 10 474034
    E-mail :        

    Magali Paquot
    Tel : +32 10 474034
    E-mail :  

    Sylvie De Cock
    Tel : +32 10 474034
    E-mail :

  7. ICLE partners/contributors

National team
E-mail address
ICLE website
state of the
Arabic Sultan Moulay Slimane University, Morocco Jamal Koubali    
Sofia University
Roumiana Blagoeva

Catholic University in Sao Paulo (PUC-SP)
University of Sao Paulo (USP)
Tony Berber Sardinah
Stella Tagnin


Lingnan University of Hong Kong
Hong Kong Polytechnic University
The Chinese University of Hong Kong

Lorrita Yeung
Gene Adam
Linda H.F. Lin
Cameron Smart
Joseph Hung

University of Cambridge Szilvia Papp   complete
Jan Evangelista Purkyne University
Vladimira Minovska
Katholieke Universiteit Nijmegen
Pieter de Haan
Inge de Mönnink
Jan Aarts

Åbo Akademi University
Växjö University
Håkan Ringbom

Tuija Virtanen
Université catholique de Louvain
Sylviane Granger
Estelle Dagneaux
Sylvie De Cock
Fanny Meunier
Stephanie Petch-Tyson

Universität Augsburg
Gunter Lorenz
Greek Aristotle University of Thessaloniki Anna-Maria Hatzitheodorou
Maria Mattheoudakis
Hungarian Hungarian Academy of Sciences, Budapest Tamás Váradi   complete
Universita di Torino
Maria Teresa Prat-Zagrebelsky

Showa Women's University
Tomoko Kaneko

Vilniaus Universitetas
Jone Grigaliuniene

Vytautas Magnus University Violeta Kaledaite    
University of Oslo
Stig Johansson
Lynell Chvala

Pakistani G.C. University Faisalabad, Pakistan Muhammad Asim Mahmood

Rashid Mahmood

Adam Mickiewicza University (Poznan)
Przemek Kaszubski
Escola Superiro de Technologia de Viseu - Northumbria University
University Nova de Lisboa
University of Aveiro
John McKenny

David Hardisty
Georgina Hodges

Romanian Albert-Ludwig University of Freiburg (in collaboration with the following Universities in Romania: University Transilvania of Brasov, University Alexandru Ioan Cuza of Iasi, University of Craiova, Universitatea Babes-Bolyai of Cluj, University of Oradea ) Madalina Chitez    
Lomonosov Moscow State University
Natalya Gvishiani

Slovene University of Primorska, Faculty of Humanities Koper Neva Cebron    
Universidad Complutense de Mardid
JoAnne Neff

South African (Setswana)
Potchefstroom University
Johann Van Der Walt


Lund University

Göteborg University
Växjö University
Lund University

Marie Källkvist
Bengt Altenberg
Karin Aijmer
Tuija Virtanen
Marie Tapper
Cukurova University
Abdurrahman Kilimci
Cem Can



(( 2. Why join the project ? ))

Collecting the corpus is just the first step. Once the corpus is gathered, there are many opportunities for national and international research activities.  At the moment several international teams are running research projects in the areas of interlanguage syntax, lexis and discourse features.  A control English native corpus ( LOCNESS- the Louvain Corpus of Native English Essays) has also been gathered to enable a comparison of the ICLE material and a native corpus of the same type.

(( 3. What ICLE can do for you ! ))

If you contribute to the ICLE project, you will have access to other learner corpora for comparative studies.  It should be stressed however that corpora are made available for academic research purposes only.  You will have the opportunity to participate in collaborative research projects with other national teams as well as attend our symposia.  Three have taken place already (January 1995 in Louvain, August 1996 in Abo - Finland and December 1998 in Hong Kong).  Many ICLE participants have contributed to a book on learner corpora which was published in 1998. (S. GRANGER ed. Learner English on Computer, Addison Wesley Longman).  You will also have access to the ICLE tagger (part-of-speech tagger) developed for the project by the TOSCA team.  The TOSCA Research Group is a team of corpus linguists at the University of Nijmegen under the direction of Prof. Jan Aarts.  Since the beginning of the 1980s the group has played a leading role in the field of corpus linguistics and has focused its research on the development of Tools for Syntactic Corpus Analysis (hence the acronym TOSCA). 

It is also worth mentioning that apart from the ICLE corpus we are collecting and working with other types of corpora in Louvain: a spoken learner corpus (LINDSEI - Louvain International Database of Spoken English Interlanguage), a corpus of French as a Foreign Language (FRIDA: French Interlanguage Database) and an English-French bilingual corpus of journalistic texts in collaboration with the university of Poitiers (PLECI, Poitiers-Louvain Echange de Corpus Informatisés).

(( 4. Guidelines for collecting sub-corpus ))

The target size for each sub-corpus is 200,000 words. It may be possible to gather the full 200,000 words in one university or alternatively, if there is insufficient time or an insufficient number of students, collaboration with other universities (where the same language is spoken) may be sought. In order to reach the 200,000 word target, contributions from minimum 200 students are needed, as each student may only contribute up to 1,000 words.

  1. Request students to fill in a learner profile

The ICLE learner profile has been created in order to provide researchers with information about contributors which will enable meaningful conclusions to be drawn from the results obtained when the corpus is analysed.  Using the profile, it will be possible both to draw general conclusions about advanced learner writing, and also to examine subsections e.g. Spanish mother tongue learners, learners who speak some English at home, learners for whom German is the second language and English is the third language. It will also be possible to examine more sociolinguistic aspects such as for instance male/female comparisons. If the corpus is used as a basis for developing specifically adapted teaching tools, the potential advantages of this facility are clear.                 

  1. Collect the right type of material

  2.         The corpus will consist entirely of essay writing. Two types of essay writing are useful:       

    • Argumentative essay writing
    •  Using titles such as the ones below :

         - "Crime does not pay"
         - "Feminism has done more harm to the cause of women than good"
         - "Pollution : a silent conspiracy"
         - ...

These essays may be done by students in their own time (untimed), using language reference tools (dictionaries, grammars, etc.) but should be entirely the students' own work, i.e. they should not draw on other articles, books for the essay and should not ask  a native speaker of English for help. Alternatively,they may also be done under examination conditions.

Descriptive, narrative or technical subjects are not useful for the corpus. For this reason, the following types of titles should beavoided if possible:

      - "The joys of the English countryside"
      - "The British Electoral System" (prefer a topic such as "The British Electoral System is no guarantee of democracy")
      - "My year in Amercia"
      - "The position of the adverb in journalistic English"           


    • Literature examination paper

These are in some ways easier to collect, but it should be remembered that they must be accompanied by relevant learner profiles. Literature examination papers should not amount to more than 25% of each national corpus.

Essays can be completed at home (untimed), should be at least 500 words long (up to  1,000), and handed in on disk - plain text). This reduces the time spent typing up student essays and minimises the risk of introducing errors into the text. Work should be entirely the students' own, no help should be sought from third parties, but they may use reference tools suchas dictionaries and grammar books (use of reference tools should be indicated on the learner profile questionnaire).      

 Important note : the essays should be at least 500 words long (up to 1,000).

  1. Send material to Louvain

  2.          The essays should be handed in on disk (ideally text format, plus an original paper copy if the original is a manuscript). Learner profiles should also be sent together with the disk.       

  3. Corpus format

    • Corpus editing

    • - Leave all the spelling mistakes made by students. If you do not receive the essays in electronic form from the students, pay attention not to add spelling mistakes when keying in the data.        
      - References in the text to authors, e.g. -(Granger, 1995)- have to be removed and replaced by the <R> symbol (R for reference).
      - Quotes have to be removed and replaced by the <*> symbol except if they are very short and/or integrated in the sentence (as subject e.g.).
      - Illegible words have to be preceded and followed by an angle bracket and a question mark : <?sorglub?> (or simply <?>).

      N.B. We do not remove titles of books, journals, pop songs, etc because they generally make part of the sentence and if we remove them we'll have problems for the tagging.

    • Corpus files

    • Please create a subdirectory for each subcorpus and within this subdirectory put only 1 essay per file. Use the following system :

      a:\ frucl1\ 0001.cor
      fr-  French mother tongue
      ucl-  from the UCL
      1  1st batch
      0001.cor  1st essay
      0002.cor  2nd essay
      0003.cor  3rd essay

(( 5. Suggested essay titles ))

  1. Crime does not pay.
  2. The prison system is outdated. No civilised society should punish its criminals : it should rehabilitate them.
  3. Most university degrees are theoretical and do not prepare students for the real world. They are therefore of very little value.
  4. A man/woman's financial reward should be commensurate with their contribution to the society they live in.
  5. The role of censorship in Western society.
  6. Marx once said that religion was the opium of the masses. If he was alive at the end of the 20th century, he would replace religion with television.
  7. All armies should consist entirely of professional soldiers : there is no value in a system of military service.
  8. The Gulf War has shown us that it is still a great thing to fight for one's country.
  9. Feminists have done more harm to the cause of women than good.
  10. In his novel Animal Farm, George Orwell wrote "All men are equal : but some are more equal than others". How true is this today?
  11. In the words of the old song "Money is the root of all evil".
  12. Europe.
  13. In the 19th century, Victor Hugo said : "How sad it is to think that nature is calling out but humanity refuses to pay heed. "Do you think it is still true nowadays ?
  14. Some people say that in our modern world, dominated byscience technology and industrialisation, there is no longer a place for dreaming and imagination. What is your opinion ?