TML_Assignment_1

pdf

School

University at Buffalo *

*We aren’t endorsed by this school

Course

6261

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by ElderOstrich640

CIS 6261: Trustworthy Machine Learning (Spring 2023) Homework 1 — Adversarial Examples Name: Ankireddypalli Sravani March 2, 2023 This is an individual assignment. Academic integrity violations (i.e., cheating, plagiarism) will be reported to SCCR! The official CISE policy recommended for such offenses is a course grade of E. Additional sanctions may be imposed by SCCR such as marks on your permanent educational transcripts, dismissal or expulsion. Reminder of the Honor Pledge: On all work submitted for credit by Students at the University of Florida, the following pledge is either required or implied: “On my honor, I have neither given nor received unauthorized aid in doing this assignment.” Instructions Please read the instructions and questions carefully. Write your answers directly in the space provided. Compile the tex document and hand in the resulting PDF. In this assignment you will explore adversarial examples in Python. Use the code skeleton provided and submit the completed source file(s) alongside with the PDF. 1 Note: bonus points you get on this homework *do* carry across assignments/homework. Assignment Files The assignment archive contains the following Python source files: • hw.py . This file is the main assignment source file. • nets.py . This file defines the neural network architectures and some useful related functions. • attacks.py . This file contains attack code used in the assignment. Note: You are encouraged to carefully study the provided files. This may help you successfully complete the assignment. 1 You should use Python3 and Tensorflow 2. You may use HiPerGator or your own system. This assignment can be completed with or without GPUs. 1

Problem 0: Training a Neural Net for MNIST Classification (10 pts) In this problem, you will train a neural network to do MNIST classification. The code for this problem uses the following command format. python3 hw.py problem0 <nn_desc> <num_epoch> Here <nn desc> is a neural network description string (no whitespaces). It can take two forms: simple,<num hidden>, <l2 reg const> or deep . The latter specifies the deep neural network architecture (see get deeper classifier() in nets.py for details), whereas the former specifies a simple neural network architecture (see get simple classifier() in nets.py for details) with one hidden layer with <num hidden> neurons and an L 2 regularization constant of <l2 reg const> . Also, <num epoch> is the number of training epochs. For example, suppose you run the following command. python3 hw.py problem0 simple,64,0.001 100 This will train the target model on MNIST images for 100 epochs. 2 The target model architecture is a neural network with a single hidden layer of 64 neurons which uses L 2 regularization with a constant of 0 . 001. 3 (The loss function is the categorical cross-entropy loss.) 1. (5 pts) Run the following command: python3 hw.py problem0 simple,128,0.01 20 This will train the model and save it on the filesystem. Note that ’ problem0 ’ is used to denote training. The command line for subsequent problems ( problem1 , problem2 , etc.) will load the model trained. Before you can run the code you need to put in your UFID at the beginning of the main() function in hw.py . 2. (5 pts) What is the training accuracy? What is the test accuracy? Is the model overfitted? Training Accuracy – 95.3% Test Accuracy – 94.8% We can’t determine if model is overfitted. as training and test accuracy is close and accuracy may be high or low depend on different inputs 2 Each MNIST image is represented as an array of 28 · 28 = 784 pixels, each taking a value in { 0 , 1 , . . . , 255 } . 3 By default, the code will provide detailed output about the training process and the accuracy of the target model. 2

Problem 1: Mysterious Attack (50 pts) For this problem, you will study an unknown attack that produces adversarial examples. This attack is called the gradient noise attack. You will look at its code and run it to try to understand how it works. (The code is in gradient noise attack() which is located in attacks.py . ) This attack is already implemented, so you will only need to run it and answer questions about the output. However, before you can run the attack you will need to implement gradient of loss wrt input() found in nets.py . This function computes the gradient of the loss function with respect to the input. We will use it for the subsequent problems, so make sure you implement it correctly! To run the code for this problem, use the following command. python3 hw.py problem1 <nn_desc> <input_idx> <alpha> Here <input idx> is the input (benign) image that the attack will create an adversarial example from and <alpha> is a non-negative integer parameter used by the attack. The code will automatically load the model from file, so you need to have completed problem0 first! 1. (5 pts) Before we can reason about adversarial examples, we need a way to quantify the distortion of an adversarial perturbation with respect to the original image. Propose a metric to quantify this distortion as a single real number. 4 Explain your choice. My metric choice to quantify distortion of an adversarial perturbation is L1 Norm.Because, L1 norm is more robust as it takes absolute value so it ignores extreme values of data and decreases the cost of outliners and promotes sparsity. Locate the incomplete definition of the distortion () function in hw.py , and implement your proposed metric. What is the range of your distortion metric? Range of distortion metric : (8-77) 2. (10 pts) Before we can run the attack, you need to implement gradient of loss wrt input() located in nets.py . For this, you can use Tensorflow’s GradientTape . Follow the instructions in the comments and fill in the implementation (about 5 lines of code). Make sure this is implemented correctly and copy-paste your code below. 3. (15 pts) Now, let’s run the attack using the following command with various input images and alphas. python3 hw.py problem1 simple,128,0.01 <input_idx> <alpha> Note: it is important than the architecture match what you ran for Problem 0. (The code uses these arguments to locate the model to load.) For example, try: 4 The specific metric you implement is your choice and there many possible options, but you probably want to ensure that two identical images have a distortion of 0 and that any two different images have a distortion larger than 0, with the larger the difference between the images the larger the distortion value. 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

python3 hw.py problem1 simple,128,0.01 0 2 python3 hw.py problem1 simple,128,0.01 0 15 python3 hw.py problem1 simple,128,0.01 1 4 python3 hw.py problem1 simple,128,0.01 1 40 python3 hw.py problem1 simple,128,0.01 2 8 python3 hw.py problem1 simple,128,0.01 3 1 python3 hw.py problem1 simple,128,0.01 4 1 python3 hw.py problem1 simple,128,0.01 5 4 python3 hw.py problem1 simple,128,0.01 6 9 If you have implemented the previous function correctly, the code will plot the adversarial examples (see gradient noise.png ) and print the distortion according to your proposed metric. Produce adversarial examples for at least four different input examples and two values for the alpha parameter. Paste the plots here. (Use minipage and subfigure to save space.) Do you observe any failures? Do successful adversarial examples look like the original image? What do think is the purpose of the parameter alpha? Yes I observed one failure. For the samples for which attack is success Adversarial examples look almost similar to original image but it has some noise so the images are misclassified and confidence is low. The purpose of alpha is to scale the distortion. 4. (15 pts) Now, let’s look into the code of the attack ( gradient noise attack() located in attacks.py ) and try to understand how it works. First focus on lines 39 to 44 where the perturbation is created and added to the adversarial example. How is the perturbation made and how does it use the gradient of the loss with respect to the input? We are normalizing the gradient of loss .Calculating perturbation using sign of gradient and increasing the loss in the direction of gradient loss vector. Scaling the perturbation using alpha and adding perturbation to the adversarial example The code uses tf.clip by value() . What is the purpose of this function and why is it used by the attack? To clip the adversarial image to have pixel range within 0-255 Now let’s look at lines 50 to 57. Is the attack targeted or untargeted? What is the purpose of target class number ? How does the attack terminate and why? Attack is targeted. The purpose of target class number is to provide a standardized numerical representation of class labels. For the training data, the ground truth label is the target class number. The model adjusts its internal parameters through a procedure known as optimization during training to learn to map input attributes to their corresponding target class numbers. After the model has been trained, it can be used to forecast the desired class number for brand-new, unforeseen input data. To provide an output that can be understood by people, the anticipated target class number can then be transferred back to the matching class label. Attack terminates when maximum number of iterations is reached or if target class is reached. If within maximum number of iterations model couldn’t predict the label as target label and with confidence ¿=0.8 then attack fails 5. (5 pts) Finally let’s look at the lines 35 to 37 (the if branch). What functionality is implemented by this short snippet of code? Give a reason why doing this is a good idea. If sum of absolute of elements of gradient vector is less than 0.000000000001 (i.e, if gradient loss is small we are normalizing it in the direction of gradient loss by increasing it) so that it generates a perturbation that is enough to misclassify the image. 4

Problem 2: Strange Predictions (10 pts) In this problem, we will look at strange behavior of neural nets using our MNIST classification model. Specifically, we will study the behavior of the model when given random images as input. 1. (5 pts) Locate the random images () function in main of hw.py . The purpose of this function is to generate random images in the input domain of MNIST. Each image is represented as a 1 × 784 array of pixels (integers) with each pixel taking a value in { 0 , 1 , . . . , 255 } . Fill in the code to draw random images with independent pixel values selected uniformly in { 0 , 1 , . . . , 255 } . Make sure you return image data with shape that match the size parameter. Once you have implemented this, run the following command. python3 hw.py problem2 simple,128,0.01 The code will plot the distribution of predictions for random images (estimated over a large number of samples). Paste the plot here. What does the distribution look like? Is this expected or unexpected? 2. (5 pts) Is there a relationship between the previous observation and the failure(s) you observed in problem1? For input is handwritten digit 8 even after attack, attack is failed and random images created has most of the samples with class label 8. 5

Problem 3: Iterative FGSM Attacks (15 pts) In this problem, we will study iterative FGSM attacks. 1. (10 pts) Locate the do untargeted fgsm () function in attacks.py . Implement the body of the function according to the instructions in the comments. You can also refer to the course slides. Note that the version you have to implement is untargeted. Make sure it is implemented correctly and copy-paste your code for this function below. You can familiarize yourself with the code of run iterative fgsm () which calls iterative fgsm () function in attacks.py . Both are already implemented for you. (You can also use the provided plot adversarial example () to help debug your code.) 2. (5 pts) Now, let’s run the attack using the following command with a specific number of adversarial examples (e.g., 100 or 200) and perturbation magnitude ϵ (e.g., ϵ = 20). (Depending on the parameters you choose, it could take a few minutes.) python3 hw.py problem3 simple,128,0.01 <num_adv_samples> <eps> The code will save the adversarial examples created to a file. It will also evaluate the success rate of the attack using the evaluate attack () function located in attacks.py . What is the success rate of the untargeted attack? Explain what the benign accuracy and adversarial accuracy measure. Benign Accuracy – 96.0% Adversarial Accuracy – 95.0% 6

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Problem 4: Randomized Smoothing (15 pts) In this problem, we will implement a defense based on randomized smoothing. The code for this problem should be invoked as follows: python3 hw.py problem4 simple,128,0.01 <eps> <sigma_str> where <eps> denotes ϵ the magnitude of the perturbation (this is necessary to load files of adversarial examples saved in problem3) and <sigma str> is a comma-delimited list of values of σ for randomized smoothing. For example: “ 1,5,10,20 ” means to perform randomized smoothing for σ = 1, then again for σ = 5, then σ = 10, and finally σ = 20. 1. (10 pts) Locate randomized smoothing predict fn () in hw.py . You will need to implement Gaussian noise addition. (The rest of code that is provided already does averaging of predictions.) Follow the instruction in comments and refer to the course slides. Run the code for ϵ = 20 and with a reasonable list of sigma values. Paste your output below and explain how you interpret the results printed out when you run the code. How effective is the defense? How many adversarial examples did you evaluate the defense on? Defense is effective as adversarial accuracy is closer to benign accuracy. Defense is evaluated on 20 adversarial examples Note: you may need to re-run problem3 to generate sufficiently many adversarial examples so you can make sound conclusions when you answer this question. 2. (5 pts) Let’s explore the relationship induced by σ between the two kinds of accuracies. For this you should run the code again to obtain data for sufficiently many different sigma values (e.g., 0 to 100). Plot a figure or create a table to show this relationship. You can add your code at the end of the problem4 if branch in hw.py . The two accuracies are saved in benign accs and adv accs . Paste the plot/table below and briefly comment on the relationship. As sigma(standard deviation increases) benign and adversarial accuracy decreases. 3. [Bonus] (5 pts) Implement randomized smoothing with Laplace noise. You should add your code inside randomized smoothing predict fn () and switch between the two noises using noise type (which is passed as an optional command line argument). For passing the Laplace lambda parameter reuse sigma. Make sure that the Gaussian noise version still works as intended. Paste the plot/table of randomized smoothing with Laplace noise below. Which type of noise is more effective against the iterative FGSM attack? (Justify your reasoning.) 7

[Bonus] Problem 5: Transferability (5 pts) For this (bonus) problem you will train a new model (different from the one your trained in problem 0 but trained on the same data) and evaluate the transferability of adversarial examples produced by iterative FGSM (problem3) on this new model. 1. (5 pts) Run the attack on the new model. Hint: there is a way to do this without changing/adding a single line of code (can you think of it?). If you cannot, feel free to use the problem5 if-branch in hw.py to put your additional code. First explain briefly your methodology for this question. What is the architecture of the other model that you chose? How did the attack perform on the original model? (Include details below.) I have chosen Deep neural network whereas before one is simple neural network. Accuracy is increased. Benign Accuracy – 98.9% Adversarial Accuracy – 97.1% Now include details about success rate of the attack on the new model. What do you conclude about transferability? (Justify your answer.) Your answer here. 9

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

TML_Assignment_1

Related Documents