Frequencies, quality and quantity – how best to analyse null subjects in English

Wagner, Susanne

University of Freiburg, Germany


While it is well-known that (pronominal) subjects can be omitted not only in pro-drop languages such as Russian or Italian, but also in certain registers of English (e.g. diaries), non-overt subjects in casual spoken English have not (yet) received any substantial attention in the literature. One of the reasons for this is certainly the overall low frequency of null subjects in colloquial English, averaging a mere 5%.

Those studies that include data from English have generally reduced the total of overt subjects to a more manageable total by using an extraction procedure for overt subjects that created an artificial distribution of 1 to 2 of null to overt subjects (e.g. Harvie 1998; Leroux & Jarmasz 2006). While extraction and coding is greatly simplified by this, side effects of such a procedure are immediate and should be carefully assessed.

This paper presents results from a study of all first person null and overt subjects in a corpus of 280,000 words of conversational Canadian English collected in 2006, some 10,000 tokens overall. A number of current statistical tools (primarily GoldVarb X and SPSS 13) were used to analyse the influence of 15 different (socio)linguistic factor groups on the dependent variable.

Clear results can be obtained in several areas. For example, the presence/absence of pronouns is not solely dependent on factors previously discussed in literature on pro-drop languages. Rather, features such as VP length which are known to have an impact on subject realisation in first language acquisition (cf. e.g. Bloom 1990) should also be taken into consideration. It be shown that complexity in one part of the sentence, such as a complex multi-word/multi-morpheme verb phrase, favours simplicity such as the use of a null subject in another part. Moreover, the data show possible persistence effects to be at work, with one null subject triggering another in the next subject slot. It can also be shown that certain (groups of) lexical items favour deletion.

Statistical regression analyses also show, however, that GoldVarb in particular might not be the ideal tool when handling data with a 5% – 95% distribution. Results indicate interaction effects between factor groups which cannot be accounted for outside of GoldVarb, and low cell counts are a general problem. In addition, the proposed factors only account for about 20% of the overall variation encountered in the data, suggesting that the current explanatory value of the hypothesis is not very high., Given the ratio of variation overall, however, this result should not be underestimated.


Bloom, Paul. 1990. Subjectless sentences in child language. Linguistic Inquiry 21: 491-504.

Harvie, Dawn. 1998. Null subject in English: Wonder if it exists? Cahiers Linguistique d'Ottawa 26: 15–25.

Leroux, Martine and Lidia-Gabriela Jarmasz. 2006. A study about nothing: null subjects as a diagnostic of convergence between English and French. University of Pennsylvania Working Papers in Linguistics 12 (2): 1-14.

Session: Paper session
Various Topics
Saturday, April 5, 2008, 11:00-12:30
room: 12