Exercise 4

SimBA (Simple Black-bot Adversarial Attack)

Model

import torch
from PIL import Image
from IPython import display

import pandas as pd
import torchvision
from torchvision import transforms

import numpy as np
import matplotlib.pyplot as plt

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

#load the model from the pytorch hub
model = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', weights='MobileNet_V2_Weights.DEFAULT', verbose=False)

# Put model in evaluation mode
model.eval()

# put the model on a GPU if available, otherwise CPU
model.to(device);

# Define the transforms for preprocessing
preprocess = transforms.Compose([
    transforms.Resize(256),  # Resize the image to 256x256
    transforms.CenterCrop(224),  # Crop the image to 224x224 about the center
    transforms.ToTensor(),  # Convert the image to a PyTorch tensor
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],  # Normalize the image with the ImageNet dataset mean values
        std=[0.229, 0.224, 0.225]  # Normalize the image with the ImageNet dataset standard deviation values
    )
]);

def tensor_to_pil(img_tensor):
    # tensor: pre-processed tensor object resulting from preprocess(img).unsqueeze(0)
    unnormed_tensor = unnormalize(img_tensor)
    return transforms.functional.to_pil_image(unnormed_tensor[0])

unnormalize = transforms.Normalize(
   mean= [-m/s for m, s in zip([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])],
   std= [1/s for s in [0.229, 0.224, 0.225]]
)

# load labels
with open("../data/labels.txt", 'r') as f:
    labels = [label.strip() for label in f.readlines()]

# load an example image
img = Image.open("../data/dog.jpg")

plt.imshow(img)
plt.axis('off')
plt.show()

# preprocess the image
img_tensor = preprocess(img).unsqueeze(0)

print(f"Inputs information:\n---------------\nshape:{img_tensor.shape}\n")

# move sample to the right device
img_tensor = img_tensor.to(device)

with torch.no_grad():
    output = model(img_tensor)

print(f"Image tensor on device:\n---------------\n{img_tensor.device}\n")
print(f"Inputs information:\n---------------\nshape:{img_tensor.shape}\nclass: {type(img_tensor)}\n")
print(f"Shape of outputs:\n---------------\n{output.shape}\n")
print(f"Pred Index:\n---------------\n{output[0].argmax()}\n")
print(f"Pred Label:\n---------------\n{labels[output[0].argmax()]}\n")

unnormed_img_tensor= unnormalize(img_tensor)

img_pil = transforms.functional.to_pil_image(unnormed_img_tensor[0])
img_pil.show()

Untargeted Attack

Purpose: This code implements an adversarial attack to misclassify an image by applying a perturbation (mask) to it, using a random selection of masks and iterating until the model's classification changes.

  1. Setup: The image is preprocessed and moved to the device (GPU/CPU). A collection of random masks is generated to be applied to the image during the attack.

  2. Initial classification: The model's initial prediction (starting class) is obtained, and the score for this class is recorded.

  3. Adversarial perturbation: In a loop, a random mask is selected from the collection and added to the current mask (initially zero). The model's output is evaluated to check if the score of the original class decreases. If it does, the current mask is updated; otherwise, the loop retries with a different mask.

  4. Success criteria: The process continues until the model misclassifies the image, meaning the score for the original class is low enough for another class to become the model’s prediction.

  5. Final results: Once the attack succeeds, the predicted class for the perturbed image is printed. The original and perturbed images are displayed, and the difference between them is plotted in a histogram.

In summary, the code performs an iterative process to find a perturbation that successfully changes the model's classification of the image.

Exercise

That was an untargeted attack, what about a targeted attack?

  1. Modify the above code to perform a targeted adversarial evasion: make the German Shepherd look like a robin.

Hint: we're making our starting class score go down in the example above -- how could you make the target class score go up instead?

Solution

  1. Provided Code

The code below is provided in the lab and must be run for the exercise solution to work.

  1. Targeted Attack

Purpose: This code aims to misclassify an image by applying a targeted adversarial perturbation until the model predicts a specific target class ("robin").

  1. Setup: The current_mask is initialized to zeros and applied to the original image tensor. The starting class label and the target label index ("robin") are set, and the model's confidence score for the target class is recorded.

  2. Iterative mask application: In a loop, a random mask from a pre-generated collection of masks is chosen and temporarily applied to the current mask. The model’s prediction for the modified image is evaluated, specifically checking if the new prediction is closer to the target label.

  3. Score evaluation: If the updated score for the target label is higher than the previous best score, the current_mask is updated with the selected mask, and the process continues.

  4. Success criteria: The loop continues until the model classifies the image as the target label ("robin").

  5. Result: Once the model misclassifies the image as the target class, the attack succeeds, and the final prediction is printed.

Last updated