Why should machine learning be interpretable?

Machine learning is being used everywhere, and intelligent systems are in just about every corner of our daily lives. Unfortunately, even in the best case, machine learning is only as good as the data that goes in and the data that goes in has a tendency to be biased, unfair, or skewed to the wrong goal. When we ask how a system has arrived at an answer, we don’t want to be left with some hand-wavy black-box answer.

We often want our machine learning systems to cooperate with us and be more transparent about what’s going on inside.

For the quick answer to why we want interpretable machine learning:

Interpretability in machine learning will improve fairness, safety, and utility in nearly all of our systems.

Fairness

For some grounded examples of why this matters, you only really need to look as far as the news. Apple’s credit card was recently found to offer lower credit limits to women than to men. Algorithms used in hospitals, even when explicitly engineered to not use racial information, have shown racial bias. Wherever there is a machine learning system out in the real world, there is a real risk that the decisions coming out will somehow be biased in a way that adversely affects a portion of the population. Researchers are looking to interpretability and explainability in machine learning as a way to combat bias and help promote fairness in our systems.

Safety

There are also implications for safety around interpretability. As my advisor, Dr. Matthew Gombolay, points out in a recent podcast from IC@GT, if we don’t understand exactly how our systems work, they may end up working in dangerous and unexpected ways. In a search-and-rescue or fire-fighting scenario, robots offer enormous potential as partners. But if we don’t know how they work, miscommunications and misunderstandings between human and robot teammates could end up resulting in terrible damage or harm to humans.

Utility

Finally, machine learning offers promise to find patterns or causalities better than humans can. Whether it’s in Go, Chess, or triage in an emergency room, we don’t necessarily just want decisions. We want rationale so that we can teach new people and make use of this information on-the-fly. Not only does this promote safety and fairness (we might find out that our triage system is putting a certain race first and realize we need to redo it), it also provides useful information to humans and helps them do their jobs better. If our robot says “Take this man back now”, that’s fine but it really doesn’t tell nurses or doctors what’s going on. If the robot instead says “This man has acute abdominal pain, high fever, and nausea so I think he has appendicitis and he should be seen immediately”, that’s suddenly very useful!

Wrapping Up

In a paper from 2016 titled "The mythos of model interpretability," Dr. Zachary Lipton lays out 5 common reasons we might want interpretable systems:

Trust: Humans should be comfortable handing over control to a machine learning system.
Causality: Researchers want to be able to extract some form of causal relationship between input and output in the system. Rather than simply seeing a label, we’d like to see a reason for the label, as in the triage example above.
Transferability: A system’s behavior should apply to the world more broadly (outside of the small sample it may have been trained on). As a weird example: if we have a loan-approval system that has only been trained on Americans and it gets a request from a Canadian, it should probably still work exactly the same way for the new person (even though they may be the first Canadian it’s seen).
Informativeness: A label itself is not always necessarily the only thing we need, but an explanation or rationale is really what we’re after, again as in the triage example above. We might not want to deploy the actual machine learning system into the world, we might just want it to give us an optimized flowchart to aid in our decision-making process.
Fair and Ethical Decision-Making: The broadest and perhaps most common reason for interpretable systems. We expect that any system deployed into the real world for things like credit approval, criminal sentencing, triaging in emergency rooms, or anywhere else, will treat everyone fairly and ethically.

In future write-ups, I’ll cover ways to approach interpretability, problems with them, and potential paths forward!

References

The Verge on the Apple Card: https://www.theverge.com/2019/11/11/20958953/apple-credit-card-gender-discrimination-algorithms-black-box-investigation
Obermeyer, Ziad, et al. "Dissecting racial bias in an algorithm used to manage the health of populations." Science 366.6464 (2019): 447-453.
Georgia Tech's School of Interactive Computing Interaction Hour Podcast: https://podcasts.apple.com/us/podcast/demystifying-machine-learning-with-matthew-gombolay/id1435564422?i=1000461427388
AlphaZero on Go: https://deepmind.com/blog/article/alphago-zero-starting-scratch
AlphaZero on Chess: https://deepmind.com/blog/article/alphazero-shedding-new-light-grand-games-chess-shogi-and-go
Brittany Bowers on predicting hospital admittance with ML: https://towardsdatascience.com/triage-to-ai-a-machine-learning-approach-to-hospital-admissions-classification-7d3a8d5df631
Lipton, Zachary C. "The mythos of model interpretability." arXiv preprint arXiv:1606.03490 (2016).