1. (Proximal Gradient Descent) In this problem, we will show the sublinear convergence for proximal gradient descent. To be precise, we assume that the objective f(x) can be written as f(x) = g(x) + h(x), where (a) g is convex, differentiable, and dom(9) € Rd. (b) Vg is Lipschitz, with constant L > 0. (c) h is convex, not necessarily differentiable, and we take dom(h) = Rd for simplicity. By defining the generalized gradient to be: G(x) = XkX+1 L where xk+1 is next iterate obtained from applying PGD to xk. Show that f(xk+1) - f(x*) ≤ ( | x - x*||² — ||×xk+1 − ×*||²), - where x* is the minimizer of f, and use it to conclude L f(xk) - f(x*)≤ - 2k That is, the proximal descent method achieves O(1/k) accuracy at the k-th iteration. Hint: You can freely use the following lemma, which shows that the PGD is also a "descent method": Lemma 1 (Proximal Descent Lemma). f(xk+1) - f(z) ≤ G(xk) (xk — z) · - - 1 ||G(X)||2, VzЄR". 2L Given a (full rank) matrix of n data points XRxd and labels y ЄR". Consider the minimization problem of f: Rd → R defined as min WERd wala (F(w) 1. Calculate the Hessian V2f(w) of f(w) w.r.t. w. 2. Is f(w) a convex function on Rd? == Xw- 3. Prove or disprove the following statement: f(w) has L-Lipschitz-continuous gradients. 4. Assuming the Hessian of f(w) is invertible and that the iterates are initialized at some wo € Rd, derive the update rule for Undamped Newton's method in terms of X and y for minimizing f(w). 5. Write the exact form of the minimizer that Newton's method leads to. How many iterations does it take to reach such a solution? 6. Now assume we change the initialization to 2wo. How does it affect your answer in part (5)?

Algebra & Trigonometry with Analytic Geometry
13th Edition
ISBN:9781133382119
Author:Swokowski
Publisher:Swokowski
Chapter7: Analytic Trigonometry
Section7.6: The Inverse Trigonometric Functions
Problem 91E
icon
Related questions
Question
1. (Proximal Gradient Descent) In this problem, we will show the sublinear convergence for proximal gradient descent.
To be precise, we assume that the objective f(x) can be written as f(x) = g(x) + h(x), where
(a) g is convex, differentiable, and dom(9) € Rd.
(b) Vg is Lipschitz, with constant L > 0.
(c) h is convex, not necessarily differentiable, and we take dom(h) = Rd for simplicity.
By defining the generalized gradient to be:
G(x) =
XkX+1
L
where xk+1 is next iterate obtained from applying PGD to xk. Show that
f(xk+1) - f(x*) ≤ ( | x - x*||² — ||×xk+1 − ×*||²),
-
where x* is the minimizer of f, and use it to conclude
L
f(xk) - f(x*)≤
-
2k
That is, the proximal descent method achieves O(1/k) accuracy at the k-th iteration.
Hint: You can freely use the following lemma, which shows that the PGD is also a "descent method":
Lemma 1 (Proximal Descent Lemma).
f(xk+1) - f(z) ≤ G(xk) (xk — z) ·
-
-
1
||G(X)||2, VzЄR".
2L
Transcribed Image Text:1. (Proximal Gradient Descent) In this problem, we will show the sublinear convergence for proximal gradient descent. To be precise, we assume that the objective f(x) can be written as f(x) = g(x) + h(x), where (a) g is convex, differentiable, and dom(9) € Rd. (b) Vg is Lipschitz, with constant L > 0. (c) h is convex, not necessarily differentiable, and we take dom(h) = Rd for simplicity. By defining the generalized gradient to be: G(x) = XkX+1 L where xk+1 is next iterate obtained from applying PGD to xk. Show that f(xk+1) - f(x*) ≤ ( | x - x*||² — ||×xk+1 − ×*||²), - where x* is the minimizer of f, and use it to conclude L f(xk) - f(x*)≤ - 2k That is, the proximal descent method achieves O(1/k) accuracy at the k-th iteration. Hint: You can freely use the following lemma, which shows that the PGD is also a "descent method": Lemma 1 (Proximal Descent Lemma). f(xk+1) - f(z) ≤ G(xk) (xk — z) · - - 1 ||G(X)||2, VzЄR". 2L
Given a (full rank) matrix of n data points XRxd and labels y ЄR". Consider the minimization problem of
f: Rd → R defined as
min
WERd
wala (F(w)
1. Calculate the Hessian V2f(w) of f(w) w.r.t. w.
2. Is f(w) a convex function on Rd?
==
Xw-
3. Prove or disprove the following statement: f(w) has L-Lipschitz-continuous gradients.
4. Assuming the Hessian of f(w) is invertible and that the iterates are initialized at some wo € Rd, derive the update
rule for Undamped Newton's method in terms of X and y for minimizing f(w).
5. Write the exact form of the minimizer that Newton's method leads to. How many iterations does it take to reach such
a solution?
6. Now assume we change the initialization to 2wo. How does it affect your answer in part (5)?
Transcribed Image Text:Given a (full rank) matrix of n data points XRxd and labels y ЄR". Consider the minimization problem of f: Rd → R defined as min WERd wala (F(w) 1. Calculate the Hessian V2f(w) of f(w) w.r.t. w. 2. Is f(w) a convex function on Rd? == Xw- 3. Prove or disprove the following statement: f(w) has L-Lipschitz-continuous gradients. 4. Assuming the Hessian of f(w) is invertible and that the iterates are initialized at some wo € Rd, derive the update rule for Undamped Newton's method in terms of X and y for minimizing f(w). 5. Write the exact form of the minimizer that Newton's method leads to. How many iterations does it take to reach such a solution? 6. Now assume we change the initialization to 2wo. How does it affect your answer in part (5)?
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:
9781133382119
Author:
Swokowski
Publisher:
Cengage
Algebra and Trigonometry (MindTap Course List)
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:
9781305071742
Author:
James Stewart, Lothar Redlin, Saleem Watson
Publisher:
Cengage Learning