PROVERO

Scalable Quantitative Verification for Deep Neural Nets

What and why? (abstract)

Despite the functional success of deep neural networks (DNNs), their trustworthiness remains a crucial open challenge. To address this challenge, both testing and verification techniques have been proposed. But these existing techniques provide either scalability to large networks or formal guarantees, not both. In this paper, we propose a scalable quantitative verification framework for deep neural networks, i.e., a test-driven approach that comes with formal guarantees that a desired probabilistic property is satisfied. Our technique performs enough tests until soundness of a formal probabilistic property can be proven. It can be used to certify properties of both deterministic and randomized DNNs. We implement our approach in a tool called PROVERO and apply it in the context of certifying adversarial robustness of DNNs. In this context, we first show a new attack-agnostic measure of robustness which offers an alternative to purely attack-based methodology of evaluating robustness being reported today. Second, PROVERO provides certificates of robustness for large DNNs, where existing state-of-the-art verification tools fail to produce conclusive results. Our work paves the way forward for verifying properties of distributions captured by real-world deep neural network, with provable guarantees, even where testers only have black-box access to the neural network.

Code and Benchmarks

Our prototype’s code and usage instructions are available on provero’s Github page.

Adversarial Hardness

The adversarial density is a measure of how many adversarial examples exist within a pertubation size in some norm with respect to a given input. Picking the base classifier with the better adversarial density, rather than minimum perturbation, will lead to better accuracy in a smoothening defense.

We define adversarial hardness as the highest perturbation size for which the adversarial density is below a given threshold. Adversarial hardness is a measure of the difficulty of finding an adversarial example by uniform sampling. Surprisingly, we find that this measure strongly correlates with perturbation bounds produced by prominent white-box attacks such as C&W and PGD attacks.

Attack Evaluation

We use Cleverhans to run the C&W and PGD attacks.