How We Talk by N. J. Enfield

This book offers a thought-provoking introduction to the mechanics of conversation. It creates a forceful case for the importance of timing, repair, and the procedural utterances we use to construct successful conversations. Enfield methodically unveils the hidden and unconscious scaffolding we use as we talk to each other. Who knew "Huh?" was so important an element in spoken language? It is well-laid-out and easy to read.

A small gripe is that the academic rigor of laying bare all the supporting points sometimes gets in the way of the fascinating insights from the research. If you are at all interested in the field of conversation design it is well worth the read.

Published: Nov 2017
ISBN‏ :  978-0465059942

Reading notes:

A thread through the book is that standard linguistic theory is focused only on written forms of language. Ruling out a lot of the elements of real conversation with all its errors, corrections and collaborative flow controls. Enfield tasks himself with correcting this.

“Linguistic theory is concern primarily with an ideal speaker-listener…”

 Conversation Has Rules

The second chapter goes well beyond the concept of 'politeness' often used to describe the rules of conversation.

“By definition, joint action introduces rights and duties. As Gilbert says … each person (in a conversation) has a right to the other’s attention and corrective action. Each person has a moral duty to ensure they are doing their part.”  

Entering into a conversation is entering a contract for collaborative joint action, with well-defined rules and roles. The contract lays out many corrective measures to repair and realign a conversation if a party believes the other has strayed. I thought the examples of how people correct each other were fascinating. They border into areas of control and power in relationships of those talking.

This contract with others to allow them to correct us could cause a profound point of conflict for conversational interfaces between humans and AI, once AI can achieve the ability to join a turn-taking conversation.

Split-Second Timing

The third chapter looks at the importance of timing. It slowly builds a picture that timing is not just an end product of the thought process but is used to convey meaning and help us structure the conversational flow.

A study of Dutch found that 40% of turn-taking transitions in conversation occurred within a 200ms window on either side of zero, with 85% occurring within 750ms of zero. Similar results were found in studies of English and German.

The time to form a conservational response “from intention to articulation”
175ms.            Retrieve concepts
75ms               Concepts to words
80ms               Words to sounds – phonological codes
125ms             Forming the sounds into syllables
145ms             Executing the motor program to pronounce the words
600ms             Total

If the typical turn-taking gap in English is 200ms, then people are starting to form a response well in advance of the last speaker finishing. There is a measurable percentage of people overlapping the end of the last turn, although the total time two people are speaking is relatively small at 3.8%, meaning even overlaps are well-timed.

“In their 1974 paper on the rules of turn-taking, Sacks and colleagues identified this ability to tell in advance when a current speaker would finish, referring to the skill as projection.”

Among other signals, we use both pitch and the length of the last syllable to signal to others that the end of a turn is coming, i.e., prosody.

“…study shows that the signals for turn ending combine serval features of sound of utterances, as well as the grammatical structure of the utterance.   … the fact that grammar alone cannot be sufficient”

 The One-Second Window

 The preferred and dispreferred responses to the last turn have different time gaps. A preferred response to the question “Can you come out tonight?” would be "yes" or "no" and would be answered promptly. A dispreferred response could be “I don’t know, I will have to check my calendar.” These dispreferred responses are typically in the late zone, at about 750ms.

“In nearly half the dispreferred response, the first sound one hears is not a word at all, but an inbreath (or click, that is a “tut” or “tsk” sound)."  

“…people are now able to manipulate timing to send social signals about how a response is being packaged.”  

“…also see “well” ad “um” playing a role in packaging and postponing certain kinds of response in conversation.”

The Traffic Signals

The chapter covers the use of “um” and “uh” in great detail. It concludes that these little words are part of the conversational machine, allowing us to signal a brief delay and forestall the handing over of a turn due to a longer gap than usual. These signals are very frequent, with men making use of them every 50 words and women every 80 words.

“In written language, the reader does not directly witness the act of production.”

Transcripts are often cleaned up, removing inevitable problems with the choice of words, pronunciation, and content. With conversation, this process of production is visible.

“The use of these little traffic signals such as “uh/um”, “uh-huh” and “okay” all illustrate ways in which bits of language are used for regulating language use itself.”

They are the procedural directional instructions of conversation. They form part of the joint commitment to a conversation.


On average, we need to repair an informal conversation every 84 seconds. These repairs are a normal part of our conversations, and we have ways of signaling a correction, much like we do for a small delay. Our ability to do this is what keeps a conversation on track and moving at speed.

“Hardly a minute goes by without some kind of hitch: a mishearing, wrong word, poor phrasing, a name not recognized.”

  • book