advertorch

by BorealisAI

1.3Kstars

200forks

26watchers

Updated 8 months ago

About

AdverTorch is a PyTorch-based toolbox designed to generate adversarial perturbations and defend machine learning models against adversarial attacks for robustness research.

A Toolbox for Adversarial Robustness Research

Primary Use Case

This tool is primarily used by AI security researchers and practitioners to evaluate and improve the adversarial robustness of machine learning models. It enables users to create adversarial examples, implement defenses, and perform adversarial training to enhance model security against exploitation.

Key Features

Modules for generating adversarial perturbations
Defenses against adversarial examples
Scripts for adversarial training of robust models
Implemented primarily in PyTorch
Compatibility testing with Foolbox and CleverHans frameworks
Support for both targeted and untargeted attacks
Active development with plans for multi-framework support
Comprehensive examples and tutorials included

Installation

pip install advertorch
git clone the repository and run python setup.py install
pip install -e . to install in editable mode
Install TensorFlow GPU 1.11.0 via conda for testing environments
pip install CleverHans from specific git commit
pip install Keras version 2.2.2
pip install Foolbox version 1.3.2

Usage

>_ pip install advertorch

Installs the AdverTorch package via pip.

>_ python setup.py install

Installs AdverTorch from the cloned repository.

>_ pip install -e .

Installs AdverTorch in editable mode for development.

>_ from advertorch.attacks import LinfPGDAttack

adversary = LinfPGDAttack(model, loss_fn=nn.CrossEntropyLoss(reduction="sum"), eps=0.3, nb_iter=40, eps_iter=0.01, rand_init=True, clip_min=0.0, clip_max=1.0, targeted=False)

adv_untargeted = adversary.perturb(cln_data, true_label)

Creates an untargeted PGD adversarial attack on a PyTorch model.

>_ adversary.targeted = True
target = torch.ones_like(true_label) * 3
adv_targeted = adversary.perturb(cln_data, target)

Switches the attack to targeted mode and generates targeted adversarial examples.

>_ See advertorch_examples/tutorial_attack_defense_bpda_mnist.ipynb

Example notebook demonstrating how to perform attacks and defenses using AdverTorch.

>_ See advertorch_examples/tutorial_train_mnist.py

Example script showing how to adversarially train a robust model on MNIST.

Security Frameworks

Reconnaissance

Initial Access

Defense Evasion

Credential Access

Impact

Usage Insights

Integrate AdverTorch into ML model development pipelines for continuous adversarial robustness testing.
Use adversarial example generation to simulate attacker behavior during red team exercises.
Leverage adversarial training scripts to harden models proactively as part of blue team defense.
Combine with threat intelligence to tailor adversarial attacks reflecting real-world tactics.
Expand tool support to TensorFlow and other frameworks to cover diverse AI environments.