The C-value method for the Comparison of Male and Female Politicians’ Use of Collocations

Frantzi, Katerina T.

University of the Aegean, Dept. of Mediterranean Studies, Greece


The paper presents a study on the extraction and comparison of the collocations used by male and female politicians in the Hellenic Parliament. Political Discourse analysis is becoming a quite interesting subject (Wilson, 2001; Tzampouras, 2005; Christidis, 1999) as Language and Gender is (Tsokalidoy et al. 2007). We set the following type of research questions: Do men and women politicians use collocations to the same degree? Do women politicians use the same type of collocations as men politicians do? Do women politicians prefer to use specific words where men prefer others? We attempt to answer to the above using Corpus Linguistics, that applying automatic processing on real texts, offers precise, complete and quick linguistic information extraction, in such a way that cannot be achieved using the traditional (manual) means (Sinclair, 2004).

We have constructed a corpus of Hellenic Parliamentary Discourse obtaining the text material from the Hellenic Parliament web page ( We organized the corpus in such a way that according to our research questions we can every time use the appropriate language pieces we need. The corpus is accompanied by a database that keeps information of the texts and the texts’ authors (text’s size, date of creation, text’s author, author’s gender, author’s affiliation, author’s age, author’s political profile, etc.). The corpus is dynamic and is continuously updated with new data. We apply C-value to the corpus for the extraction of the collocations used by the male and female Greek politicians. C-value is a language and domain independent measure, originally proposed for the automatic identification of multi-word terms from special language corpora and collocations from both general and special language corpora. As a collocation extraction tool, it is merely based on statistical information, while when used as a term extraction tool, it is a hybrid measure, combining linguistic and statistical information. What makes C-value differ from the frequency of occurrence and other such measures is that it is able to deal with “nested” collocations, i.e. independent collocations found within other longer collocations, neither overestimating nor underestimating them (Frantzi et al., 2000). We present lists of the collocations used by male and female Greek politicians giving the first answers to the research questions we set above.

