Three ways to avoid bias in machine learning

Edd H. English

At this moment in history, when deep learning is being used to improve products and services in many industries, it is important to remember that AI is a tool. It's up to us as humans to decide how that tool should be used. We need to consider the impact of bias in the data, in the algorithms, and in the decisions made by machines.

A recent example of bias in machine learning comes from Microsoft, which was forced to apologize after its team building an AI chat bot named Tay was "taught" some pretty racist ideas by Twitter users over a short period of time. Tay was designed to learn about conversation from human interlocutors on Twitter. Unfortunately, it learned from some pretty unsavory people on Twitter instead!

The researchers at Microsoft had taught Tay with a data set of anonymized Tweets, called a "corpus," and they hadn't filtered it for offensive content. They didn't have a specific goal in mind for their experiments with Tay, but they wanted it to learn conversational skills so they gave it access to Twitter so that it could learn from conversations among human users. By contrast, Google's DeepMind created AlphaGo by studying thousands of games played by human experts first and then adjusting its algorithms accordingly rather than starting with generic knowledge about the game and then seeking out specific examples of play (see the paper).

It's important for companies building artificial intelligence systems not just to build them but also to monitor how they function over time. Here are three ways to help avoid bias in machine learning:

Investigate quickly

Anomalous behavior should be investigated as soon as possible because even if a system isn't performing well enough yet for commercial use, researchers can use these systems as test cases for investigating bias-related issues while there is still time to correct those problems.

Maintain data transparency

The second way that we can prevent machine learning systems from becoming biased is through transparency about what has been learned. This means being able to distinguish between what has been learned through supervised versus unsupervised learning (i.e., training). A few years ago Google began publishing research about this distinction in its DeepMind blog when it discovered that its systems were developing some troubling biases simply through exposure to too much unsupervised content on YouTube!

Improve research responsibility

The third way we can avoid bias involves social responsibility on the part of artificial intelligence researchers and engineers who are building these systems: It is incumbent upon them not only not just not build biased AI but also not just make sure their systems are being used appropriately but also make sure their systems aren't being misused or abused once they are deployed commercially or otherwise adopted into existing environments.