Practice Questions

pdf

School

Stevens Institute Of Technology *

*We aren’t endorsed by this school

Course

583

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

4

Uploaded by SargentSnowHornet27

Report
CS 583 - Deep Learning Practice Questions Question 1 s start s ans s quit s end Answer Quit V Ans ( s ans ) =? V Quit ( s quit ) =? For the MDP above (same as the one we had in class), we randomly selected a policy and generated four (4) episode. What will be the values after each episode if we use the Model-free Monte Carlo method. You should write down the two (2) utility values for each question. ( V Ans ( s ans ) =? and V Quit ( s quit ) =?). i) Policy = Ans , Data = s start ; Ans, 4 , s start ; Ans, 4 , s end ii) Policy = Quit, Data = s start ; Quit, 10 , s end iii) Policy = Ans , Data = s start ; Ans, 4 , s end iv) Policy = Ans , Data = s start ; Ans, 4 , s start ; Ans, 4 , s start ; Ans, 4 , s end v) Policy = Quit, Data = s start ; Quit, 10 , s end 1
Question 2 s start s end Ans: 3 4 : $ 5 Ans: 1 4 : $ 5 Quit: 1: $ 20 The MDP for the gameshow (given in Question 1) can also be drawn as above without the chance nodes. Labels for each edge denote three things; the policy, the transition probability, and the reward. Suppose you start at the s start state and stop when you reach the s end state, and the discount factor γ = 0 . 5. Answer the following questions. i) V Ans ( s end ) =? ii) V Ans ( s start ) =? Note: You may assume that you take a total of n steps to reach s end . iii) V Quit ( s end ) =? iv) V Quit ( s start ) =? v) V opt ( s start ) =? vi) What is the optimal policy of the above MDP? vii) Define the value function V π ( s ) in an MDP? Question 3 i) What is the difference between the optimization functions for Supervised Multi-task Learning and Multi-task Reinforcement Learning? ii) Give an example of domain shift when using Transfer Learning for Reinforce- ment Learning. What is a possible solution? iii) There are two generators in cycle GANs. What is the loss function each of the generator is trying to minimize? iv) Suppose you have trained a model which identifies the genre of a given movie plot. Now you plan to expand your model and include movies which can have multiple genres; e.g. ‘Back to the Future’ can be classified both as a Western 2
and a Sci-Fi. Propose a way to label the new movie data, where each input can simultaneously belong to multiple classes. v) The following function updates the model parameter using gradient descent. Assuming the loss function = 1 2 ( wx y ) 2 , complete the missing code. def gradientDescent (X,Y, w old , eta ) : t o t a l g r a d i e n t l o s s = 0 n = len (X) f o r i in range (n ) : l o s s = ??? g r a d i e n t l o s s = ??? t o t a l g r a d i e n t l o s s += g r a d i e n t l o s s a v e r a g e g r a d i e n t l o s s = g r a d i e n t l o s s /n w new = ??? return w new Question 4 w 11 w 12 w 13 w 14 w 21 w 22 w 23 w 24 Consider the above Autoencoder with single hidden layer and same dimen- sions for each layer. The weight matrices are the following: W 1 = w 11 w 12 w 13 w 14 W 2 = w 21 w 22 w 23 w 24 i) Given an input vector x , what is the expected output? ii) Give example weight values W 1 and W 2 for the output ˜ x to be exactly the same as the input x . iii) Give step-by-step instructions on how to train the autoencoder shown above as a denoising autoencoder. Include all the changes to the model architecture, the pre-processing steps, loss function, and the evaluation criteria. iv) Give two real-world applications where denoising autoencoders are used. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
v) Which of the following is the cost for the generator for GANs ( G is the gen- erator and D is the discriminator). (A) J ( G ) = 1 m m i =1 log(1 D ( G ( z ( i ) ))) (B) J ( G ) = 1 m m i =1 log( D ( G ( z ( i ) ))) (C) J ( G ) = 1 m m i =1 log(1 G ( D ( z ( i ) ))) (D) J ( G ) = 1 m m i =1 log( G ( D ( z ( i ) ))) vi) What does K-L divergence compute? Explain in regards to the second loss term in the Variational Auto-Encoder’s (VAE) loss function: D KL ( q ϕ ( z | x ) || p ( z ))? Question 5 i) What is the difference between binary cross entropy loss function and the categorical cross-entropy loss function? ii) Draw the computation graph for a one-hidden layer neural network with ReLU activation for the hidden layer and sigmoid activation for the output layer. Write all the derivatives. You are given the following: the derivative of Relu is f ( x ) = ( 0 if x = 0 1 otherwise , and the derivative for sigmoid is σ ( x ) = σ ( x )(1 σ ( x )), iii) After training your neural network you observe a gap between the training accuracy (100%) and test accuracy (41%). Which of the following method is used to reduce this gap? (A) GANs (B) Batch size tuning (C) Adam optimizer (D) Dropout 4