Using small and large corpora to investigate pragmatic markers in Irish English

Goodith White

University of Leeds, United Kingdom


The paper will explore the differences in occurrences of the pragmatic markers ‘like’, ‘so’ and ‘now’ in a small (31,000 word) corpus of spoken English collected in Cork and Kerry and a much larger (1 million word) corpus, ICE-Ireland, collected at roughly the same time.The larger corpus included examples of spoken language from both the north and south of Ireland, and a wider range of spoken genres. In the Cork/Kerry corpus, ‘like’, and ‘so’ in clause final position and ‘now’ and ‘so’ in clause initial position appear to serve functions of topic management and hedging/politeness which do not occur in the ICE-GB corpus and may therefore be a distinctive feature of Irish English. A preliminary analysis of the ICE-Ireland corpus suggests that the markers are not confined to spontaneous conversation but also found in more formal genres such as lectures, and that they are found over a wide geographical area but are possibly more frequent in the speech of informants in the south-west of Ireland. In comparing results from different corpora, the problems of corpus size, design and reliability will be discussed. There may be advantages as well as disadvantages connected to smaller corpora such as the Cork/Kerry one in that the corpus has often been collected by one person who may have insights and contextual knowledge about the circumstances in which the language was produced. Larger corpora tend to provide the researcher with more limited information about context. Distinctive features may be easier to spot in a smaller corpus, which can then be tested in the larger corpus. The two corpora in this study have been collected using the same methodology and for the same aim, i.e. to establish what is standard usage in Irish English, and a very small amount of the Cork/ Kerry corpus has been incorporated into the ICE-Ireland. corpus Using the two corpora in tandem enables the researcher to combine intuition with a more mass observation of linguistic features, with the smaller corpus ( the ‘micro’) providing clues and hints for investigation of the larger corpus (the ‘macro’).

