Michael Hoey, chief adviser on the Macmillan Dictionary of English for Advanced Learners, considers the consequences of changes in lexicography.

ETp logo

The following article is reproduced with the kind permission of English Teaching professional magazine.

This article reviews some recent advances in our understanding of the way words work and then looks closely at entries drawn from leading pre-corpus dictionaries and from the latest corpus-based dictionary. This comparison will reveal the scale of the improvements that have been made by modern lexicographical methods.

A revolution in lexicography

In the late 1970s, when the last major pre-corpus dictionaries were developed, dictionary publishers had just started using computers to help compile the works (although no electronic corpus was used). Thirty-five years on, the practice of lexicography has undergone a revolution. Dictionaries published by Cambridge, Collins COBUILD, Longman, Macmillan and Oxford have established the importance of computer corpora as the bases of all lexicographical decisions, whether such decisions concern the way a word might be defined or illustrated, the idioms in which it participates, or its right to be included in the dictionary at all. As computers have been used in ever more sophisticated ways, it has been possible to ensure that entries properly crossrefer, that examples are truly representative and that thesaural information is supplied. Definitional style has become more relaxed and usage notes have begun to appear alongside the definitions. From the point of view of the user, dictionary pages have been transformed; the average page is differently constructed and makes use of an increasingly wide range of typographical conventions.

This new generation of dictionaries is not perfect, however, entries can be wordy, and users may sometimes feel they are being told something solely because the lexicographer had a moment of discovery.

Anchor Point:2Innovations

The Macmillan Dictionary of English for Advanced Learners represents the (temporary) culmination of this process. It brings together the best practice of both those lexicographical traditions and adds to these a number of significant innovations of its own, particularly with regard to the provision of vital language learning opportunities (thesaural, collocational and grammatical) at the point of the user’s consultation of the dictionary. Just as 20th-century lexicography was in some senses defined and shaped by the appearance at the beginning of the last century of the monumental Oxford English Dictionary, so, more modestly, the 21st-century dictionary is forecast in and will be measured against the MDEAL. This is far from saying that the MDEAL is perfect, still less that it will not be bettered; one would suppose that the second edition would introduce further improvements, even if its competitors in the field were not to. But it does attempt to represent current best practice and to reflect the current state of knowledge about vocabulary and language.

Anchor Point:3What’s in a word?

There are five questions that linguists (and learners) need to ask about any word. These are:
1 What does the word mean?
2 What words does it associate with?
3 What meanings does it associate with?
4 What grammatical functions does it associate with?
5 What positions in the text does the word favour?

The first two of these questions are perhaps the most familiar. All mainstream dictionaries have concerned themselves with the meanings of words since dictionary-making began and, with regard to question 2, it has been recognised for some years that a proper description of any word will include an account of the lexical company the word keeps, its collocations. So, for example, the word consequence likes to co-occur with serious and the word consequences likes to co-occur with economic. (Notice straight away one complication that collocation brings to the dictionary entry: the collocations of the singular form of a word may be different from those of its plural form.)

Anchor Point:4Collocation and idiom

I am not going to say much about collocation in this article, except to note that there are probably no words in the English language (other than the most common grammatical items, and even this exception is arguable) that do not have their special preferences with regard to the words they like to occur with. Collocations merge into idioms, and there is probably no principled way of drawing the borderline between words that collocate and words that form an idiom (though it is easy enough to find examples which are clearly one or the other). Idioms are collocations where the meaning of the combination is not predictable from the separate meanings of the parts. Collocations and idioms are of the greatest importance to the language learner; one of the things that distinguishes an advanced learner’s language from that of a native speaker is that advanced learners often manifest grammatical correctness but collocational inappropriateness.

Anchor Point:5Semantic association

Of similar importance to collocation, though less well described, is what we can term semantic association, the phenomenon associated with question 3 above. (This is variously referred to in the linguistic literature as semantic prosody, semantic preference and semantic association. Since semantic prosody is often used to mean something rather different and I want to talk about a word having a preference for an association – which semantic preference will not allow me to do comfortably – I will use the term semantic association here.) We can define semantic association as the tendency of a word to keep company with a semantic set or class; some members of this set or class will usually be collocates.

As a detailed example of the way semantic association works, let us look at the word consequence. Consequence has semantic associations with concepts of logic (perhaps unsurprisingly since its core meaning is that of a logical relation), with (un)expectedness, with negative evaluation and with markers of (in)significance:

logic unavoidable, inevitable, inexorable ,inescapable, ineluctable, direct, ultimate, long-term, immediate
(un)expectedness likely, possible, probable, natural, unintended, odd, strange, planned-for
negative evaluation awful, dire, appalling, sad
significance serious, important, dramatic, enduring, prominent

It will be seen that many of the adjectives in these semantic categories are also collocates of consequence. Importantly, though, they are not all so. I have no evidence in my own 100-million-word corpus of Guardian newspaper text and British National Corpus extracts to suggest that ineluctable, planned-for or prominent are collocates. So the notion of semantic association is needed in addition to collocation; it cannot be subsumed within it.

In my corpus, only nine per cent of adjectives occurring with consequence do not fit into one of the above semantic categories. This means that any combination of adjective and consequence falling within the nine per cent will stand out a little. That may well be a very good thing if the speaker or writer is trying to be creative with the language, but a learner needs to know when something sounds natural and when it sounds unusual.

As with collocation, there are differences in the behaviour of the word depending on whether it is singular or plural. The word consequences, i.e. the plural form, shares a semantic association with adjectives of logic and negative evaluation (though it is more commonly associated with the latter than the former) but has in addition a semantic association with domain:

domain biological, constitutional, economic, political, cultural

Anchor Point:6Colligation

Collocation and semantic association both have immediate implications for meaning. Other patterns that a word participates in, however, may have less obvious semantic implications but still be important in the use of the word. Question 4 in the list above asks what grammatical associations, or colligations, a word may have. The question needs asking because sometimes a word’s relationship is with a particular grammatical feature. As an example of a word’s colligations, let us look again at the word consequence; we find that it has a very low likelihood of appearing as the object of a clause (i.e. following an action or possession verb) unlike other abstract nouns such as preference or use. We do not (perhaps surprisingly) encounter many examples of sentences like the following: Unfortunately it also had this tragic consequence that the baby became grossly bloated. whereas sentences like The homeless are asked if they have a preference. and The minister called on schools to make more use of the colleges’ vocational experience … are very common. 

Consequence occurs as (part of) the object of a clause only four per cent of the time, whereas preference and use both occur in this grammatical position over a third of the time. On the other hand, consequence occurs as (part of) the complement (i.e. following the verb be or a closely related verb) much more often than is normal for abstract nouns. In fact it occurs in this grammatical position almost a quarter of the time, whereas preference and use occur with this function in less than one in 14 clauses. So a sentence on the pattern of It is the natural consequence of a deep recession … is extremely common, but sentences such as The main one was his preference for force. or This is an improper use of executive power. are very much the exception rather than the rule. The aversion of consequence for occurring as an object is an example of negative colligation; its liking for complement position is an example of positive colligation. Colligations are particularly important to learners of the language because they explain why it is that a learner may feel he or she knows a word and yet produce a sentence that is grammatical but ‘not English’.

Anchor Point:7Textual colligation

We have not exhausted the colligations of consequence. One of the most obvious observations one can make about it is that it has a great liking for appearing in prepositional phrases that serve as Adjuncts in their clauses; the vast majority take the forms as a consequence or in consequence (or variants of these). These phrases in turn like to appear at the beginning of a sentence, much more so than prepositional phrases normally do. This preference for appearing at the beginning of sentences is an example of an answer to question 5 – what positions in a text does a word favour? The positional preferences of a word can be referred to as its textual colligations. The word consequence likes to occur in sentence-initial position, whether as part of Adjunct or as part of Subject. It occurs in such a position almost exactly 50 per cent of the time. Compare this with the word consequencia in Portuguese. Although in consequence translates exactly as em consequencia and as a consequence translates exactly as como consequencia, neither Portuguese phrase has any tendency to appear in sentenceinitial position. We have here, therefore, a new class of ‘false friends’ – words or phrases that translate exactly but have different positional preferences. The same goes of course for words that mean the same but have differing collocations, semantic associations or colligations. Just as we saw a difference between consequence and consequences in respect of their collocations, so also there is a difference between the singular and the plural in respect of textual colligation. Consequences shows no preference for sentence-initial position but does like to occur in the first sentence of paragraphs!

Anchor Point:8What else is in a word?

All of this suggests that to know a word is to know a great deal more than its meaning. I would suggest that to know a word such as consequence is, at the very least, to know the answers to the following nine questions about the word:
1 What does consequence mean? (meaning)
2 How is consequence used grammatically? (grammar)
3 How is consequence pronounced? (pronunciation)
4 Are there restrictions on the use of consequence? (context and genre)
5 What are the collocations of consequence? (collocation)
6 What idioms and semi-fixed expressions does consequence appear in? (idiom)
7 What meanings does consequence associate with? (semantic association)
8 What grammatical constructions does consequence like to appear in or with? (colligation)
9 What positions in a text does consequence like to appear in? (textual colligation)

With these questions in mind, let us compare entries from pre-corpus advanced learners’ dictionaries and the most recent major corpus-based advanced learners’ dictionary, to see the difference that a corpus, and corresponding advances in lexicology and lexicography can make.

Anchor Point:9Pre-corpus dictionaries

The typical pre-corpus dictionary would answer questions 1–4, giving the meaning, providing brief information on the count/uncount status of the noun, saying how the word is pronounced, and perhaps noting that in one of its uses, to convey importance, it is used in formal contexts. It might also provide information on the use of the word in phrases such as in consequence and as a consequence. Without corpus input, however, the information could be misleading. For example, one leading pre-corpus dictionary notes the expression in consequence (of). In fact, in consequence is rarely followed by of, particularly in first position in the clause; if you need to add of, you use as a consequence in the great majority of cases. The example it gives is, therefore, not representative of normal usage in English. Other than the phrases mentioned, the typical pre-corpus dictionary entry provides no information on the most common collocations of consequence, nor does it identify any of the idioms or semifixed expressions in which it participates. There is, moreover, rarely any clue given about the different behaviour of the singular and plural forms. There is the same absence of information concerning the word’s preferences for semantic association. No indication is generally given of any of the semantic associations of consequence. So we are not told, for example, that consequence in its meaning of ‘result’ tends to be logical or negative, nor, perhaps more importantly, that consequence in its meaning of ‘importance’ is almost always associated with denial. Typically, therefore, we say that something is of little consequence or of no consequence, but many pre-corpus dictionaries would appear to license a learner to talk of something having great consequence, a usage which is not supported by corpus data and might well be understood as reference to a result. Not unexpectedly there is in the typical pre-corpus dictionary no clue as to the colligations of consequence.

Anchor Point:10A corpus-based dictionary

Let us now look at the equivalent entry from a contemporary corpus-based dictionary, MDEAL, to see how it reflects the features discussed above. To what extent is the dictionary out of date as soon as it is published? The answer to both questions is encouraging, as we shall see.

The first and most obvious point about this entry is that it is much longer than a pre-corpus dictionary entry would be. As with pre-corpus dictionaries, this entry provides information on meaning, pronunciation, basic grammar and context; like its predecessors, it notes formality for several uses. However, the entry is much fuller on the other aspects of the word. On collocation it is extremely strong. There is a separate section devoted to collocations (rather coyly referred to as ‘Words frequently used with consequence’). In addition, in the entry itself a number of key collocates are mentioned next to bullet marks. In the case of these the differences between singular and plural are clearly marked – each of these lists is attached to the plural noun. (The entry is not entirely consistent, though, in that all the verbs listed in the separate section also belong with the plural form, which is not stated explicitly; however, accept and face are also listed in the main entry and here it is indicated that they go with the plural form only.) Idioms are identified – face the consequences, for example – that were generally overlooked in pre-corpus dictionaries and definitions are given. The lists, both in the separate section and in the main entry, also serve the function of indicating the semantic associations of the word. Thus it is possible to infer that consequence and consequences have a semantic association with negative evaluation and that the latter word also has associations with domain adjectives (economic, etc) and with reaction verbs (accept, face, etc). Likewise with the ‘importance’ meaning of consequence, there can be no doubting its association with denial, though the entry may go a mite too far here in implying that of no consequence and of little consequence are idioms and therefore the only acceptable uses available for this meaning; my corpus suggests that this is an oversimplification. Nevertheless this way of representing the ‘importance’ meaning is an improvement on its predecessors’ in that the learner will not be tempted to construct unintelligible utterances – better a little too much restriction than serious overextension of the word’s use. Colligations are harder to represent in a standard dictionary entry without making the entry unwieldy and/or unreadable. However, the liking of consequence for Subject and Adjunct functions is well illustrated, as is the liking of consequences for Object function. The absence of examples of consequence in the Object function is compatible with its aversion for such a function. Only with regard to the preference of consequence for Complement function does the entry fall short. Textual colligations are even harder to illustrate in an entry, and there is for very obvious reasons no allusion to the tendency of consequences to appear at the beginning of paragraphs. But the tendency of consequence to come in initial position in sentence and clause is indicated by its being in such a position in both the examples of singular consequence.

So the newer corpus-based dictionary does reflect new work on vocabulary. It is strong on collocation, idioms and semantic association and good on colligation and textual colligation, though the dictionary reader has to read the entry carefully to get the full value of its reflection of the latter two categories. No doubt future dictionaries may be more explicit about some of this information, the way that the MDEAL entry is explicit about collocations and idioms, though it is doubtful whether any paper dictionary will ever have the space to illustrate textual colligation properly. Dictionaries and lexicologists are, I conclude, keeping pace with each other; indeed, in some respects, dictionaries are probably ahead of the descriptive linguist. It does not much matter whether dictionaries are ahead of lexicologists or the other way round; only descriptive linguists will ever care. It does, however, matter very much that a dictionary should be ahead of the learners at whom it is targeted. Teachers now have an urgent task to train learners to get the best out of the cornucopia of information contained in a corpus-based dictionary. The learner has to realise that while bilingual dictionaries give the meanings and sometimes list a few idioms, the new monolingual advanced learners’ dictionaries provide information essential to skilled use of the language. In short, we have a new generation of dictionaries; what we now need is a new generation of learners to use them.