Data Poisoning and How it Works

3 min readApr 9, 2021

Data Poisoning

As machine learning and deep learning take further grip on our society, it is paramount that we are aware of the data these systems can collect on us. What is really surprising, is that these algorithms collect TONNES of data on us daily, without us ever realising! The global cyber security market was estimated by IDC to be worth $107 billion in 2019, growing to $151 billion by 2023, so there is no wonder as to why data is so valuable nowadays.

Data poisoning is a sort of outside the box way of combatting this fact. Instead of trying to prevent these algorithms from taking your information, the information delivered to the algorithm is instead ‘poisoned’, or in other words, contains a surplus of fictional data from the user which the algorithm then uses to produce targeted content.

Types of Data Poisoning

Input attacks

Input attacks involve corrupting a clean dataset by, for example, mislabelling images or files so that the AI algorithm produces incorrect answers. An example of an input attack could be designing a program that tells the algorithm that the user has clicked on, or taken an interest in, every possible advertisement the webpage has. This would in turn, mean that the algorithm would not be able to efficiently apply targeted advertisement to this user. Figure 1 illustrates this “garbage in — garbage out” concept.

Poisoning Attacks

The second method of data poisoning involves corrupting the data before it is introduced into the AI training process. This method can be easier for “poisoners” to carry out as it does not require a hacker to breach the security systems of a third party. Infecting the wells of data used to train AI systems is, perhaps, the area of greatest concern as it may be very difficult or even impossible to detect. By the time such a poisoning attack is discovered it may be too late as the AI systems could already be deployed.

Why Data Poisoning is Useful in eCommerce

Although it can be sometimes useful to have an algorithm save your interests to provide you with more relevant information for shopping, there is also a number of things to be concerned about.

In the context of online shopping, political security is of the greatest concern. AI can automate tasks associated with surveillance, for example consumer persuasion through the use of targeted propaganda or even by deception through, for example, manipulating videos. The results of this could be that a user is duped into buying something they never would have to begin with, and it can also give birth to a race between companies whereby each company competes financially or otherwise to have their ads displayed more often than others.

So, in essence, data poisoning in eCommerce can help to balance this market in favour of the consumer, in that there is much less “behind the scenes” action occurring that the consumer is unaware of. Data poisoning can help to deplete the controlling power of online companies and restore some power back to the user.

By Scott Talbot, 17326508

Comiter, M. (2019). Attacking Artificial Intelligence: AI’s Security Vulnerability and What Policymakers Can Do About It (Belfer Center Paper). Belfer Center for Science and International Affairs, Harvard Kennedy School.

Brundage, M., Avin, S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B., Dafoe, A., Scharre, P., Zeitzoff, T., & Filar, B. (2018). The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. ArXiv Preprint ArXiv:1802.07228.