Depth-2 neural networks under a data-poisoning attack

Abstract

In this work, we study the possibility of defending against data-poisoning attacks while training a shallow neural network in a regression setup. We focus on doing supervised learning with realizable labels for a class of depth-2 finite-width neural networks, which includes single-filter convolutional networks. In this class of networks, we attempt to learn the true network weights generating the labels in the presence of a malicious oracle doing stochastic, bounded and additive adversarial distortions on the true labels, during training. For the gradient-free stochastic algorithm that we construct, we prove worst-case near-optimal trade-offs among the magnitude of the adversarial attack, the weight approximation accuracy, and the confidence achieved by the proposed algorithm. As our algorithm uses mini-batching, we analyze how the mini-batch size affects convergence. We also show how to utilize the scaling of the outer layer weights to counter data-poisoning attacks on true labels depending on the probability of attack. Lastly, we give experimental evidence demonstrating how our algorithm outperforms stochastic gradient descent under different input data distributions, including instances of heavy-tailed distributions.

Publication
In Neurocomputing
Theodore Papamarkou
Theodore Papamarkou
Professor in maths of data science

Knowing is not enough, one must compute.