IBM Watson, or how not to manage stakeholder expectations.

sky-high stakeholder expectations
Photo by Jeremy Perkins on Unsplash

After IBM’s question-answering system Watson won the Jeopardy! Quiz on US Television in February 2011, expectations about its commercial applicability rose sharply. In 2016 several hospitals in Asia adopted Watson to provide advice about medical treatments of oncology. Watson would use millions of pages of medical textbooks plus data about a cancer patient to help specialists with medical diagnoses and treatment advice. However, two years later, there was news about laying off 50% to 70% of Watson Health personnel and more recently, there is even news about Watson giving wrong and sometimes dangerous medical advice. Watson is not meeting the high expectations that were raised by winning the Jeopardy! game. Stakeholder expectations about new technology were so high that disappointment was bound to follow. 

Stakeholders started pointing fingers to each other. The project leader blamed the doctors, who are said to have provided too few and only hypothetical training cases. Doctors blamed the technologists, who delivered a system that was dangerous to use. However, there are lessons to be learned from Watson’s failure without pointing fingers at each other. We list three lessons here.

Stakeholder expectations about new technology need to be managed

First, there is the old, if not ancient lesson that expectations need to be managed in order to avoid deep disappointment. IBM Watson had been under development since 2005, specifically to play Jeopardy!. In 2006 it answered 35% of the question correctly. By 2010, it won 65% of a series of test matches with former Jeopardy! players. A percentage high enough to have a reasonable chance of winning a game in a TV match.

Jeopardy! is a match in which players are given general knowledge cues and must answer in the form of a question to which this knowledge cue could have been an answer. In the 2011 TV match, had access to the most recent edition of Wikipedia but no internet connection.

The computer-supported physician
Photo by rawpixel on Unsplash

Knowing this, what could we expect if Watson’s technology is applied in a medical domain? A reasonable expectation would be that in a medical domain too, it will take six years to develop a system that can answer questions correctly sufficiently often. Moreover, the knowledge used in Jeopardy! is encyclopedic but otherwise context-free. In other words, all knowledge needed to answer Jeopardy! questions can be summarized in a (large) set of encyclopedia lemmas. By contrast, the knowledge used in medical diagnosis and treatment decisions is context-sensitive.  This means that in addition to sometimes encyclopedic knowledge about the medical domain, application of the knowledge to an individual case includes the ability to identify relevant variables. What is relevant may differ from case to case. Computers are notoriously bad at distinguishing relevant from irrelevant variables. This should lower expectations that Watson can be redeveloped for a medical domain in anything less than 6 years.

To manage expectations about technology, estimate the cost of introducing it

quantified cost estimations
Photo by rawpixel on Unsplash

Perhaps there is an even simpler lesson behind this: Estimate the cost of introducing new technology. This is the second lesson to be drawn from the Watson example. And the cost not only includes the duration of the project, but also the people and procedures needed to set up and manage the databases needed by it. Let me explain this in more detail.

Watson is an AI system. But what is today called AI could more accurately be called computation-intensive statistics with big data. Watson needed the Wikipedia, and the Wikipedia is available online and is developed by volunteers. They are not paid by the Watson project. But making medical literature and patient data available to an application of Watson in health care is a huge cost that cannot be ignored. Where do you get the training set for medical knowledge? How large is it? Can you ensure high quality of the data set? How much work is it to acquire it? Can you continuously refresh it? What data management infrastructure should be in place to maintain it at this quality? Can the privacy of patients be ensured? How much time do medical specialists have to judge the quality of the training set? Is that time on the budget of the project? Do they have the time after the implementation project is finished?  An honest cost estimate is required to set realistic expectations.

Consider sub-ideal scenarios too

broken plate
Photo by chuttersnap on Unsplash

Third, we should consider sub-ideal cases too. If Watson answers a question incorrectly in Jeopardy!, the worst that could happen is that it does not win the game. But in a medical case, a patient could die. Before investing in a medical application of Watson we should consider sub-ideal cases, and ask who is responsible in these cases. More generally, what happens if a human or technical actor does something wrong, stupid, malicious or something else that we did not expect in the sunny-day scenario? In many cases, when something fails, people start pointing fingers —as in the Watson failure. Technologists point at users, users point at the technology. However, in a medical diagnosis, doctors know they are responsible in case of failure. Someone may get sued. And they may feel guilty of negligence, even if they don’t say this in public. In general, when introducing new technology we should consider who is hurt and who is accountable when an actor, human or machine, does not act according to idealistic expectations.

In a short, how to manage stakeholder expectations about new technology? First of all, make a map of stakeholders and communicate realistic expectations to them. Second, quantify benefits and costs where you can. Third,  consider what happens in sub-ideal cases when technology or people do not behave according to the happy-day expectation that they will behave according to the rules.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.