For artificial intelligence to thrive, it must explain itself

Article tiré du magazine The Economist.

For artificial intelligence to thrive, it must explain itself

Science fiction is littered with examples of intelligent computers, from HAL 9000 in “2001: A Space Odyssey” to Eddie in “The Hitchhiker’s Guide to the Galaxy”. One thing such fictional machines have in common is a tendency to go wrong, to the detriment of the characters in the story. HAL murders most of the crew of a mission to Jupiter. Eddie obsesses about trivia, and thus puts the spacecraft he is in charge of in danger of destruction. In both cases, an attempt to build something useful and helpful has created a monster.

Successful science fiction necessarily plays on real hopes and fears. In the 1960s and 1970s, when HAL and Eddie were dreamed up, attempts to create artificial intelligence (AI) were floundering, so both hope and fear were hypothetical. But that has changed. The invention of deep learning, a technique which uses special computer programs called neural networks to churn through large volumes of data looking for and remembering patterns, means that technology which gives a good impression of being intelligent is spreading rapidly. Applications range from speech-to-text transcription to detecting early signs of blindness. AI now runs quality control in factories and cooling systems in data centres. Governments hope to employ it to recognise terrorist propaganda sites and remove them from the web. And it is central to attempts to develop self-driving vehicles. Of the ten most valuable quoted companies in the world, seven say they have plans to put deep-learning-based AI at the heart of their operations.

Real AI is nowhere near as advanced as its usual portrayal in fiction. It certainly lacks the apparently conscious motivation of the sci-fi stuff. But it does turn both hope and fear into matters for the present day, rather than an indeterminate future. And many worry that even today’s “AI-lite” has the capacity to morph into a monster. The fear is not so much of devices that stop obeying instructions and instead follow their own agenda, but rather of something that does what it is told (or, at least, attempts to do so), but does it in a way that is incomprehensible.

The reason for this fear is that deep-learning programs do their learning by rearranging their digital innards in response to patterns they spot in the data they are digesting. Specifically, they emulate the way neuroscientists think that real brains learn things, by changing within themselves the strengths of the connections between bits of computer code that are designed to behave like neurons. This means that even the designer of a neural network cannot know, once that network has been trained, exactly how it is doing what it does. Permitting such agents to run critical infrastructure or to make medical decisions therefore means trusting people’s lives to pieces of equipment whose operation no one truly understands.

If, however, AI agents could somehow explain why they did what they did, trust would increase and those agents would become more useful. And if things were to go wrong, an agent’s own explanation of its actions would make the subsequent inquiry far easier. Even as they acted up, both HAL and Eddie were able to explain their actions. Indeed, this was a crucial part of the plots of the stories they featured in. At a simpler level, such powers of self-explanation are something software engineers would like to emulate in real AI.

Open the box

One of the first formal research programs to attempt to crack open the AI “black box” is the Explainable AI (XAI) project, which is being run by the Defence Advanced Research Projects Agency (DARPA), an organisation that does much of America’s military research. In particular, America’s armed forces would like to use AI to help with reconnaissance. Dave Gunning, XAI’s head, observes that monitoring places like North Korea from on high, by spy plane or satellite, creates a huge amount of data. Analysts looking at these data would certainly value something that alerted them automatically to suspicious activity. It would, though, also be valuable if such an agent could explain its decisions, so that the person being alerted was able to spot and ignore the inevitable false positives. Mr Gunning says that analysts from one of America’s spy agencies, the NSA, are already overwhelmed by the recommendations of old-fashioned pattern-recognition software pressing them to examine certain pieces of information. As AI adds to that deluge, it is more important than ever that computer programs should be able to explain why they are calling something to a human operator’s attention.

How the NSA is responding to this is, understandably, a secret. But civilian programmes are also trying to give neural networks the power to explain themselves by communicating their internal states in ways that human beings can comprehend. Trevor Darrell’s AI research group at the University of California, Berkeley, for example, has been working with software trained to recognise different species of birds in photographs. Instead of merely identifying, say, a Western Grebe, the software also explains that it thinks the image in question shows a Western Grebe because the bird in it has a long white neck, a pointy yellow beak and red eyes. The program does this by drawing on the assistance of a second neural network which has been trained to match the internal features of the agent doing the recognising (ie, the pattern of connections between its “neurons”) with sentences that people have written, describing what they see in a picture being examined. So, as one AI system learns to classify birds, the other learns simultaneously to classify the behaviour of the first system, in order to explain how that system has reached its decisions.

A team led by Mark Riedl at the Georgia Institute of Technology has employed a similar technique to encourage a game-playing AI to explain its moves. The team asked people to narrate their own experiences of playing an arcade game called Frogger. They then trained an AI agent to match these narratives to the internal features of a second agent that had already learned to play Frogger. The result is a system which provides snippets of human language that describe the way the second agent is playing the game.

Such ways of opening the black box of AI work up to a point. But they can go only as far as a human being can, since they are, in essence, aping human explanations. Because people can understand the intricacies of pictures of birds and arcade video games, and put them into words, so can machines that copy human methods. But the energy supply of a large data centre or the state of someone’s health are far harder for a human being to analyse and describe. AI already outperforms people at such tasks, so human explanations are not available to act as models.

Fortunately, other ways exist to examine and understand an AI’s output. Anupam Datta, a computer scientist at Carnegie Mellon University, in Pittsburgh, for instance, is not attempting to peer inside the black box directly, in the ways that Dr Darrell and Dr Riedl are. Rather, he is trying to do so obliquely, by “stress-testing” the outputs of trained systems—for example, those assessing job candidates.

Dr Datta feeds the system under test a range of input data and examines its output for dodgy, potentially harmful or discriminatory results. He gives the example of a removals firm that uses an automated system to hire new employees. The system might take a candidate’s age, sex, weightlifting ability, marital status and education, as described in the application, as its inputs, and churn out a score which indicates how likely that candidate is to be a good employee.

Clearly one of these pieces of information, the ability to lift heavy things, is both pertinent and likely to favour male candidates. So in this case, to test the system for bias against females, Dr Datta’s program alters randomly selected applications from women to make them appear to be from men and, separately, swaps the weightlifting abilities of female applicants—again, at random—with that of applicants from both sexes. If the randomisation of sex produces no change in the number of women offered jobs by the AI, but randomising weightlifting ability increases it (because some women now appear to have “male”abilities to lift weights), then it is clear that weightlifting ability itself, not an applicant’s sex, is affecting the hiring process.

Dr Datta’s approach does not get to the heart of how and why agents are making decisions, but, like stress testing an aircraft, it helps stop undesirable outcomes. It lets those who make and operate AI ensure they are basing decisions on the right inputs, and not harmful spurious correlations. And there are other ways still of trying to peer into machines’ minds. Some engineers, for example, are turning to techniques, such as cognitive psychology, that human beings use to understand their own minds. They argue that, since artificial neural networks are supposed to work like brains, it makes sense to employ the tools of human psychology to investigate them.

One example of such an approach is research by DeepMind, an AI firm in London that is owned by Google’s parent company, Alphabet. This has yielded an intriguing insight into the behaviour of a piece of image-matching software the company has designed. A group of DeepMind’s engineers, led by David Barrett, showed the software sets of three images. The first of each set was a “probe” image of a certain shape and colour. Of the other two, one matched the probe in shape and the second matched it in colour. By measuring how often the system chose the shape match as opposed to the colour match, Dr Barrett and his team were able to deduce that DeepMind’s image matcher equates images in the way that people do—that is, according to shape rather than colour. Elucidating in this way the broader principles of how a particular AI makes decisions might be useful when preparing it for deployment in the world. It might also help accident investigators, by directing them towards the most likely sorts of explanation for a failure.

Those inclined to try to crack open the “minds” behind AI thus have many ways of doing so. Some people, however, think this whole approach wrongheaded. They observe that those decisions made by AI which are hardest to scrutinise are necessarily the most complex and thus likely to be the most useful. Easy-to-parse tasks, like playing video games and naming birds, are of limited value. Decisions made while balancing an electrical grid or managing a city’s traffic flow are harder to explain, especially as many of them are taken at levels beyond human processing capabilities. Yoshua Bengio, a computer scientist at the University of Montreal, calls this kind of processing artificial intuition.

Dr Bengio says such artificial intuition was on display during the most public demonstration of deep-learning that has ever taken place. This was a Go match held in 2016 between an AI agent and Lee Sedol, the world’s greatest human player. The agent in question, AlphaGo, was trained by DeepMind. It sometimes made unexpected moves that human experts could not explain. At first those moves appeared to be errors. But AlphaGo then used the surprising position thus generated to dominate the rest of the match.

Intriguingly, moves like these are also sometimes made by human Go masters. They are known in Japanese as kami no itte (“the hand of God”, or “divine moves”). As the name suggests, a player who feels a move is divinely directed in this way usually cannot say how or why he placed a certain stone where he did. Indeed, the fact that players cannot explain the reasoning behind their best moves offers a hint as to why old-style Go-playing computers, based on formal logic, were never any good. Neural learning systems, both those that have evolved in brains and those now being put into computers, can handle the task of playing Go. But human language cannot describe it.

Pandora’s box?

There is, though, a crucial difference between the explanations that humans offer up for their own behaviour, and those available from machines. As Dan Sperber, a cognitive scientist at the Jean Nicod Institute, in Paris, observes, people tend to construct reasons for their behaviour which align with information mutually available to speaker and listener, and with their own interests, rather than describing accurately how their thoughts led to a decision. As he puts it, “the reason to give reasons is so that others will evaluate your actions and beliefs”. Today’s autonomous machines do not have their own interests to serve. Instead, their explanations are forged by and for human beings.

Some speculate that this may change in the future, if AI is developed which, like the fictional variety, seems to have motives of its own, rather than merely acting at human whim. Jacob Turner, a specialist in international law, suggests ascribing legal personhood to AI will then be necessary if those harmed by such advanced agents are to seek compensation and justice.

That is probably a long way off. But even today’s AI may raise ticklish legal questions. In particular, machine minds that cannot explain themselves, or whose detailed operation is beyond the realm of human language, pose a problem for criminal law. As Rebecca Williams, a legal scholar at Oxford University, observes, if machines lack the ability to explain their actions, current law might struggle to identify criminal intent in acts that arise because of decisions they have made. “In criminal law,” she says, “the thing that’s interesting is having the third party breaking the chain of causation that is not a human being. That is really new.”

This is not a matter of AI agents themselves acting in a criminal manner in the way Mr Turner speculates might one day happen. But if the process by which a machine made a decision cannot be subject to cross-examination, because neither the machine nor its creator is able to explain what went on, then deciding the guilt or innocence of a human being associated with that decision may be impossible.

For example, if a neural network that authorises loans cannot explain why it gives certain people certain scores that seem biased against one social group or another, it may be impossible to determine whether its operators had arranged this intentionally (which would be an offence in most jurisdictions), or whether lazy coding by its designers had led to accidental bias (which would probably be a matter for the civil courts rather than the criminal ones). Similarly, if the AI that ran the visual systems of a driverless taxi were a black box that could not be interrogated about its choices, it might be hard to know whether a death caused by that car was the fault of the manufacturer or of the firm responsible for maintaining the vehicle.

The world is still a few years from the moment a case involving a driverless car might come before the courts. A case of social bias, however, is eminently conceivable even now. It does not require the imaginations of Arthur C. Clarke or Douglas Adams, the inventors, respectively, of HAL and Eddie, to envisage the advantages of software that can not only act, but also explain the reasons behind its actions.


From the print edition