FreeText  -  Frida  ICLE  Lindsei  

CECL's home


Centre for English Corpus Linguistics - CECL


The Louvain International Database of Spoken English Interlanguage - LINDSEI

(Last updated: December 2009)




(( "="

1. Some background




In 1990, a project was launched at the University of Louvain in Belgium to compile a corpus of written learner language. The International Corpus of Learner English (ICLE) now contains over 3 million words of argumentative essay writing from university students of English from 21 different language backgrounds and is being used as a research tool for analysing features of written interlanguage grammar, lexis and discourse. For a bibliography of publications relating to ICLE and associated research click here.      

In 1995, a complementary project was launched in Louvain to compile a corpus of spoken learner language, the Louvain International Database of Spoken English Interlanguage (LINDSEI). The first component of LINDSEI contains transcripts of 50 interviews (30 female subjects, 20 male subjects) with French mother tongue learners of English (c. 100,000 words of learner language) and research has already begun into the phraseology of this type of interlanguage (see list of publications on learner corpora: De Cock 1996, De Cock et al. 1998, De Cock 1998, De Cock 2000). A number of other components are currently being compiled for different mother  tongue backgrounds. Alongside these non-native varieties of English, a comparable corpus of interviews with native speakers of English has been compiled, so that interlanguage and native language can be compared and the universal and L1-specific features of oral interlanguage identified. The corpus needs to be extended still further and we are hoping to attract yet more researchers working with students from different mother tongue backgrounds to join the LINDSEI project.          


Project director:
Prof. Sylviane Granger
Current project coordinator:
Dr. Gaëtanelle Gilquin (2006 - )
Past project coordinators:
Dr. Sylvie De Cock (1995- 2006)
Stephanie Petch-Tyson (1995 - 2003)
The Louvain LINDSEI team:
Dr. Sylvie De Cock
Claire Hugon

LINDSEI partners/contributors:  



National team

E-mail address

State of the

Basque Universidad del País Vasco (UPV / EHU) - University of Sheffield

Regina Weinert
María Basterrechea Lonzano
María del Pilar García Mayo  


Sofia University

Roumiana Blagoeva



South China Normal University, Guangzhou

He Anping



Universiteit Gent

Anne-Marie Vandenbergen
David Chan



Université catholique
de Louvain

Sylviane Granger
Sylvie De Cock
Stephanie Petch-Tyson



Justus-Liebig-Universitaet, Giessen

Joybrato Mukherjee
Christiane Brand
Sandra Goetz
Susanne Kaemmerer



Hellenic Air Force Academy

Ourania Hatzidaki



Universita' degli Studi di Torino

Virginia Pulcini



Showa Women’s University

Tomoko Kaneko


Norwegian Hedmark University College Susan Nacey  


Adam Mickiewicz University, Poznan

Joanna Jendryczka



Universidad Autonoma de Madrid

Universidad de Murcia

Jesus Romero Trillo
Maria Fernandez
Pascual Perez-Parades



Göteborg University

Karin Aijmer
Viktoria Börjesson



Turkish Çukurova University Abdurrahman Kilimci  




(( "="

2. What does compiling a spoken learner corpus involve ?




  1. The data

 A key objective of the ICLE and LINDSEI projects has been to collect comparable data. It is only by examining data of a similar text type collected from similar subjects in similar environments that true comparisons can be made.

 Data for LINDSEI has been collected according to a specific format, details of which are below.

    • The subjects are all non-native third/fourth year university students of English.
    • Subjects are "recruited" by the researcher i.e. they know they are taking part in a project to collect interlanguage data. They also know their conversation is being recorded.
    • Interviews last about 15 minutes.
    • At the beginning of the "interview", subjects are requested to choose a topic and talk about it for a few minutes (they have a few minutes before the conversation to plan what they are going to say). They are requested not to make any written notes.
    • Once they have spoken for a short while, the researcher starts to become involved, first by asking questions related to what the subject has spoken about, then about more general topics (such as life at university, hobbies, what the subject hopes to do after university, etc.).         
    • Just before the interview ends, the subject is asked to look at four pictures which make up a short story. They are asked to look at the pictures and then to retell the story.
    • All subjects are asked to fill in a questionnaire which gives details about the learner (age, number of years of English, other languages, etc.) and about the interviewer.

    Researchers who are interested in becoming involved in the LINDSEI project will be expected to collect data which follows this format. It will also be necessary to have access to good quality recording equipment (although lab-type equipment with pause-measuring facilities has not so far been used).      

  1. Encoding the data

 Once the data has been recorded, it needs to be transcribed. LINDSEI will not contain phonological transcription, but even orthographic transcription of spoken data is a very time-consuming process - one 15-minute interview can take up to about five hours in total to transcribe (initial transcription, going over difficult passages and re-checking). 50 interviews may take up to 250 hours to transcribe.

Guidelines for transcription have been drawn up for the project and it is important that these are strictly adhered to.




(( "="

3. Research based on LINDSEI




       In view of the difficulties associated with compiling a spoken learner corpus, one might be tempted to ask "Why bother?". The attraction of this type of research is that it is relatively recent, so that there is scope for conducting very exciting new research into a wide range of features of oral interlanguage. The advantages of a collaborative project are obvious. Instead of having lots of people doing different things on a small-scale, it is possible to carry out research on comparable data on a large scale across a variety of language backgrounds and thus produce meaningful results.  

For references of studies based on LINDSEI, click here




(( "="

4. What do I get in return for contributing to LINDSEI ?



  If you are interested in being involved in LINDSEI, you should be able, in the first instance, to compile a sub-corpus of 50 interviews. If you do not have access to this number of students, you could ask researchers from other universities with students from the same language background to contribute. We hope to enlarge these sub-corpora afterwards, but given the time taken to compile this type of corpus, 50 interviews seems to be a realistic first step.           

Once you have compiled your corpus, you will be able to have access to other interlanguage sub-corpora.      




(( "="

5. Interested ?




 If you require additional information, contact Gaëtanelle Gilquin or Sylviane Granger . Or write to us at :

    The Centre for English Corpus Linguistics
    Université catholique de Louvain
    Collčge Erasme      
    Place Blaise Pascal, 1
    B-1348 Louvain-la-Neuve