This summer, a computer almost passed the Turing intelligence test. Dan Falk examines the narrowing gap between humans and machines.
Will this summer be remembered as a turning point in the story of man versus machine? On June 23, with little fanfare, a computer program came within a hair’s breadth of passing the Turing test, a kind of parlour game for evaluating machine intelligence devised by mathematician Alan Turing more than 60 years ago.
This wasn’t as dramatic as Skynet becoming self-aware in the Terminator films, or HAL killing off his human crew mates in 2001, A Space Odyssey. But it was still a sign that machines are getting better at the art of talking – something that comes naturally to humans, but has always been a formidable challenge for computers.
Turing proposed the test – he called it “the imitation game” – in a 1950 paper titled “Computing machinery and intelligence”.
Back then, computers were very simple machines, and the field known as Artificial Intelligence (AI) was in its infancy. But already scientists and philosophers were wondering where the new technology would lead. In particular, could a machine “think”?
Turing considered that question to be meaningless, so proposed the imitation game as a way of sidestepping the question.
Better, he argued, to focus on what the computer can actually do: can it talk? Can it hold a conversation well enough to pass for human? If so, Turing argued, we may as well grant that the machine is, at some level, intelligent.
In a Turing test, judges converse by text with unseen entities, which may be either human or artificial. (Turing imagined using teletype; today it’s done with chat software.) A human judge must determine, based on a five-minute conversation, whether his correspondent is a person or a machine.
Turing speculated that by 2000, “an average interrogator will not have more than a 70 per cent chance of making the right identification” – that is, computers would trick the judges 30 per cent of the time. For years, his prediction failed to come true, as software systems couldn’t match wits with their human interrogators. But in June, they came awfully close.
The event in question, billed as a “Turing test marathon”, was organised by the University of Reading as part of the centenary celebrations of the mathematician’s birth – and held, appropriately enough, at Bletchley Park in Buckinghamshire, where he played a key role in cracking the Enigma code as part of the Allied code-breaking effort. I joined 29 other judges in chatting electronically with 25 “hidden humans” (ensconced in an adjacent room) and five sophisticated “chatbots” – computer programs designed to imitate human intelligence and ability to converse.
Altogether, some 150 separate conversations were held. The winning program, developed by a Russian team, was called “Eugene”.
Attempting to emulate the personality of a 13-year-old boy, Eugene fooled the judges 29.2 per cent of the time, just a smidgen below Turing’s 30 per cent threshold.
As a judge, I got a first-hand look at the strengths and weaknesses of the test. First of all, there’s the five-minute time limit – an arbitrary figure mentioned by Turing in his paper. The shorter the conversation, the greater the computer’s advantage; the longer the interrogation, the higher the probability that the computer will give itself away – typically by changing the subject for no reason, or by not being able to answer a question. The 30 per cent mark, too, is arbitrary.
But what about the nature of the test itself? Traditionally, language has been as the ultimate hallmark of intelligence, which is why Turing chose to focus on it. Yet while it may be our most impressive cognitive tool, it is certainly not the only one. In fact, what gives our species its edge may be the sheer variety of skills we have at our disposal, rather than its proficiency at any one task. “Human intelligence,” says Manuela Veloso, a computer scientist at Carnegie Mellon University, “has to do with the breadth of things that we can do.”
Not that we would necessarily want a machine that could “do it all”. Aside from being a staggeringly ambitious task, the idea of building an all-purpose robot – an “artificial human” – has never been a useful approach to AI, not least because it would simply replicate our own impressive capabilities.
Instead, the greatest progress has come when AI is applied to very specific tasks, such as the satellite navigation system in your car, the apps on your iPhone, or the search engines that pull needles out of the Internet’s haystack. Indeed, its most widely publicised achievements – the chess-playing skills of the computer Deep Blue, or the quiz knowledge of an IBM supercomputer called Watson, which last year triumphed in the American TV show Jeopardy! – are very narrow indeed. (Watson can answer difficult trivia questions with impressive skill, but it can’t do your taxes, fold your laundry, or make you a cup of tea.)