Implement and test: • Logistic regression (LR) with L1 regularization • LR is differentiable • But L1 norm is not • Use proximal gradient descent • For L1 norm, that’s soft-thresholding • Use tensorflow library • Dataset – the same as in HW2: • Classify two digits from MNIST dataset • See: tensorflow_minimizeF.py • Performs projected gradient descent on a simple function • The function has global minimum at • w1=-0.25, w2=2 • But the feasible set Q is: w1>=0, w2>=0 • For this function, the best solution is w1=0, w2=2 • The code does the following, in a loop: • Gradient step on the function, followed up by proximal step • Here, the proximal step is just “make w nonnegative” by replacing negative values with 0, the closest non-negative value • Feasible set Q is set of all vectors with nonnegative coordinates, i.e., for 2D, w1>=0, w2>=0 • In your actual code, you should use soft-thresholding instead • See: tensorflow_leastSquares.py • Performs gradient descent on a function based on data • We have some fake data x,y, where y=w*x+b+small_gaussian_noise • The code tries to find best wbest, bbest that predict y • It uses the loss: (y-ypredicted)2 • ypredicted = wbest*x + bbest • In your code: • x,y will be taken from the MNIST dataset • the loss should be logistic loss • you need to add the proximal step / soft-thresholding • Constant L is unknown, you should try several gradient step sizes • Constant in front of L1 penalty is unknown, you should try several values A report in PDF n Results of tests of the method on MNIST dataset, for decreasing training set sizes (include you #, and what are your two digits defining the two-class problem). n Code in python for solving the MNIST classification problem (for full size of the training set)
• Implement and test:
• Logistic regression (LR) with L1 regularization
• LR is differentiable
• But L1 norm is not
• Use proximal gradient descent
• For L1 norm, that’s soft-thresholding
• Use tensorflow library
• Dataset – the same as in HW2:
• Classify two digits from MNIST dataset
• See: tensorflow_minimizeF.py
• Performs projected gradient descent on a simple
function
• The function has global minimum at
• w1=-0.25, w2=2
• But the feasible set Q is: w1>=0, w2>=0
• For this function, the best solution is w1=0, w2=2
• The code does the following, in a loop:
• Gradient step on the function, followed up by proximal step
• Here, the proximal step is just “make w nonnegative” by
replacing negative values with 0, the closest non-negative value
• Feasible set Q is set of all
coordinates, i.e., for 2D, w1>=0, w2>=0
• In your actual code, you should use soft-thresholding
instead
• See: tensorflow_leastSquares.py
• Performs gradient descent on a function based on data
• We have some fake data x,y, where
y=w*x+b+small_gaussian_noise
• The code tries to find best wbest, bbest that predict y
• It uses the loss: (y-ypredicted)2
• ypredicted = wbest*x + bbest
• In your code:
• x,y will be taken from the MNIST dataset
• the loss should be logistic loss
• you need to add the proximal step / soft-thresholding
• Constant L is unknown, you should try several gradient step sizes
• Constant in front of L1 penalty is unknown, you should try several values
A report in PDF
n Results of tests of the method on MNIST dataset, for decreasing training set
sizes (include you #, and what are your two digits defining the two-class
problem).
n Code in python for solving the MNIST classification problem (for
full size of the training set)
Trending now
This is a popular solution!
Step by step
Solved in 2 steps