Graduate School of Language Technology in Finland

Kieliteknologian valtakunnallinen tutkijakoulu - Språkteknologiska forskarskolan i Finland

Course in Soft Computing: An introduction to R as a statistical programming environment for the analysis of quantitative linguistic data (1 - 2 cr)


Harald Baayen (Interfaculty research unit for language and speech, University of Nijmegen & Max Planck Institute for Psycholinguistics, Nijmegen)


Department of General Linguistics
Siltavuorenpenger 20 A


13 - 17 December 2004

Course description

R is an open source implementation of the S language and environment for data analysis originally developed at Bell Laboratories. R is the focus of this course because it is an elegant object-oriented system with excellent graphical facilities, because it has a consistent uniform syntax for specifying statistical models, no matter which type of model is being fitted, and because it is a programming language in which new ideas, or applications for specific data sets, are easy to implement.

The morning sessions of this course will consist of plenary presentations in which I will introduce R and a selection of techniques that are especially useful for the analysis of quantitative linguistic data. The afternoons will be practical sessions in which participants in this course will obtain hands-on experience with these techniques applied to full-scale, actual linguistic data sets. In this course, the emphasis is, on the one hand, on learning to use graphical tools to explore the quantitative structure of linguistic data sets, and on the other hand, on learning what statistical tools might be useful for a given data set, and how to apply them in R.

This 5-day course is structured as follows:

Day 1: Introduction

Day 2: Multivariate analysis

Day 3: Classification methods

Day 4: Regression

Day 5: Advanced regression

The standard reference to the S language is Becker et al. (1988), highly recommended books on statistical modeling in R (S) are Chambers and Hastie (1992) and Venables and Ripley (1994). An introduction to statistics in R is Dalgaard (2002).

Preparatory work

Browse through the R page at If you have a particular data set that you would like to have advice on, bring it along to the course (on CDROM or floppy disk). The data should preferably be tab-delimited ASCII text.

Back to the Courses Page


Last updated: