1. (Proximal Gradient Descent) In this problem, we will show the sublinear convergence for proximal gradient descent. To be precise, we assume that the objective f(x) can be written as f(x) = g(x) + h(x), where (a) g is convex, differentiable, and dom(9) € Rd. (b) Vg is Lipschitz, with constant L > 0. (c) h is convex, not necessarily differentiable, and we take dom(h) = Rd for simplicity. By defining the generalized gradient to be: G(x) = XkX+1 L where xk+1 is next iterate obtained from applying PGD to xk. Show that f(xk+1) - f(x*) ≤ ( | x - x*||² — ||×xk+1 − ×*||²), - where x* is the minimizer of f, and use it to conclude L f(xk) - f(x*)≤ - 2k That is, the proximal descent method achieves O(1/k) accuracy at the k-th iteration. Hint: You can freely use the following lemma, which shows that the PGD is also a "descent method": Lemma 1 (Proximal Descent Lemma). f(xk+1) - f(z) ≤ G(xk) (xk — z) · - - 1 ||G(X)||2, VzЄR". 2L Given a (full rank) matrix of n data points XRxd and labels y ЄR". Consider the minimization problem of f: Rd → R defined as min WERd wala (F(w) 1. Calculate the Hessian V2f(w) of f(w) w.r.t. w. 2. Is f(w) a convex function on Rd? == Xw- 3. Prove or disprove the following statement: f(w) has L-Lipschitz-continuous gradients. 4. Assuming the Hessian of f(w) is invertible and that the iterates are initialized at some wo € Rd, derive the update rule for Undamped Newton's method in terms of X and y for minimizing f(w). 5. Write the exact form of the minimizer that Newton's method leads to. How many iterations does it take to reach such a solution? 6. Now assume we change the initialization to 2wo. How does it affect your answer in part (5)?
1. (Proximal Gradient Descent) In this problem, we will show the sublinear convergence for proximal gradient descent. To be precise, we assume that the objective f(x) can be written as f(x) = g(x) + h(x), where (a) g is convex, differentiable, and dom(9) € Rd. (b) Vg is Lipschitz, with constant L > 0. (c) h is convex, not necessarily differentiable, and we take dom(h) = Rd for simplicity. By defining the generalized gradient to be: G(x) = XkX+1 L where xk+1 is next iterate obtained from applying PGD to xk. Show that f(xk+1) - f(x*) ≤ ( | x - x*||² — ||×xk+1 − ×*||²), - where x* is the minimizer of f, and use it to conclude L f(xk) - f(x*)≤ - 2k That is, the proximal descent method achieves O(1/k) accuracy at the k-th iteration. Hint: You can freely use the following lemma, which shows that the PGD is also a "descent method": Lemma 1 (Proximal Descent Lemma). f(xk+1) - f(z) ≤ G(xk) (xk — z) · - - 1 ||G(X)||2, VzЄR". 2L Given a (full rank) matrix of n data points XRxd and labels y ЄR". Consider the minimization problem of f: Rd → R defined as min WERd wala (F(w) 1. Calculate the Hessian V2f(w) of f(w) w.r.t. w. 2. Is f(w) a convex function on Rd? == Xw- 3. Prove or disprove the following statement: f(w) has L-Lipschitz-continuous gradients. 4. Assuming the Hessian of f(w) is invertible and that the iterates are initialized at some wo € Rd, derive the update rule for Undamped Newton's method in terms of X and y for minimizing f(w). 5. Write the exact form of the minimizer that Newton's method leads to. How many iterations does it take to reach such a solution? 6. Now assume we change the initialization to 2wo. How does it affect your answer in part (5)?
Advanced Engineering Mathematics
10th Edition
ISBN:9780470458365
Author:Erwin Kreyszig
Publisher:Erwin Kreyszig
Chapter2: Second-order Linear Odes
Section: Chapter Questions
Problem 1RQ
Related questions
Question
data:image/s3,"s3://crabby-images/88fb8/88fb8251a558c8e5b1c922ee896e816b8978ddbf" alt="1. (Proximal Gradient Descent) In this problem, we will show the sublinear convergence for proximal gradient descent.
To be precise, we assume that the objective f(x) can be written as f(x) = g(x) + h(x), where
(a) g is convex, differentiable, and dom(9) € Rd.
(b) Vg is Lipschitz, with constant L > 0.
(c) h is convex, not necessarily differentiable, and we take dom(h) = Rd for simplicity.
By defining the generalized gradient to be:
G(x) =
XkX+1
L
where xk+1 is next iterate obtained from applying PGD to xk. Show that
f(xk+1) - f(x*) ≤ ( | x - x*||² — ||×xk+1 − ×*||²),
-
where x* is the minimizer of f, and use it to conclude
L
f(xk) - f(x*)≤
-
2k
That is, the proximal descent method achieves O(1/k) accuracy at the k-th iteration.
Hint: You can freely use the following lemma, which shows that the PGD is also a "descent method":
Lemma 1 (Proximal Descent Lemma).
f(xk+1) - f(z) ≤ G(xk) (xk — z) ·
-
-
1
||G(X)||2, VzЄR".
2L"
Transcribed Image Text:1. (Proximal Gradient Descent) In this problem, we will show the sublinear convergence for proximal gradient descent.
To be precise, we assume that the objective f(x) can be written as f(x) = g(x) + h(x), where
(a) g is convex, differentiable, and dom(9) € Rd.
(b) Vg is Lipschitz, with constant L > 0.
(c) h is convex, not necessarily differentiable, and we take dom(h) = Rd for simplicity.
By defining the generalized gradient to be:
G(x) =
XkX+1
L
where xk+1 is next iterate obtained from applying PGD to xk. Show that
f(xk+1) - f(x*) ≤ ( | x - x*||² — ||×xk+1 − ×*||²),
-
where x* is the minimizer of f, and use it to conclude
L
f(xk) - f(x*)≤
-
2k
That is, the proximal descent method achieves O(1/k) accuracy at the k-th iteration.
Hint: You can freely use the following lemma, which shows that the PGD is also a "descent method":
Lemma 1 (Proximal Descent Lemma).
f(xk+1) - f(z) ≤ G(xk) (xk — z) ·
-
-
1
||G(X)||2, VzЄR".
2L
data:image/s3,"s3://crabby-images/94f33/94f3377500651eb1a37b1cf62c141b3d6d65b8a0" alt="Given a (full rank) matrix of n data points XRxd and labels y ЄR". Consider the minimization problem of
f: Rd → R defined as
min
WERd
wala (F(w)
1. Calculate the Hessian V2f(w) of f(w) w.r.t. w.
2. Is f(w) a convex function on Rd?
==
Xw-
3. Prove or disprove the following statement: f(w) has L-Lipschitz-continuous gradients.
4. Assuming the Hessian of f(w) is invertible and that the iterates are initialized at some wo € Rd, derive the update
rule for Undamped Newton's method in terms of X and y for minimizing f(w).
5. Write the exact form of the minimizer that Newton's method leads to. How many iterations does it take to reach such
a solution?
6. Now assume we change the initialization to 2wo. How does it affect your answer in part (5)?"
Transcribed Image Text:Given a (full rank) matrix of n data points XRxd and labels y ЄR". Consider the minimization problem of
f: Rd → R defined as
min
WERd
wala (F(w)
1. Calculate the Hessian V2f(w) of f(w) w.r.t. w.
2. Is f(w) a convex function on Rd?
==
Xw-
3. Prove or disprove the following statement: f(w) has L-Lipschitz-continuous gradients.
4. Assuming the Hessian of f(w) is invertible and that the iterates are initialized at some wo € Rd, derive the update
rule for Undamped Newton's method in terms of X and y for minimizing f(w).
5. Write the exact form of the minimizer that Newton's method leads to. How many iterations does it take to reach such
a solution?
6. Now assume we change the initialization to 2wo. How does it affect your answer in part (5)?
Expert Solution
data:image/s3,"s3://crabby-images/00039/00039eaf710a9765f6db01fc5b9812260bf5cade" alt=""
This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
Step by step
Solved in 2 steps
data:image/s3,"s3://crabby-images/e0cbe/e0cbe7c1cfa79a285a06530332b315bcf077d9a4" alt="Blurred answer"
Recommended textbooks for you
data:image/s3,"s3://crabby-images/459cf/459cf6241d135de10054da228a1eeba40b2fb92a" alt="Advanced Engineering Mathematics"
Advanced Engineering Mathematics
Advanced Math
ISBN:
9780470458365
Author:
Erwin Kreyszig
Publisher:
Wiley, John & Sons, Incorporated
data:image/s3,"s3://crabby-images/1fad9/1fad99a5e283e74e984c6bf7510d1f9836377e96" alt="Numerical Methods for Engineers"
Numerical Methods for Engineers
Advanced Math
ISBN:
9780073397924
Author:
Steven C. Chapra Dr., Raymond P. Canale
Publisher:
McGraw-Hill Education
data:image/s3,"s3://crabby-images/5a87c/5a87cace12f9cc506b7a6251c6c030791d2a058d" alt="Introductory Mathematics for Engineering Applicat…"
Introductory Mathematics for Engineering Applicat…
Advanced Math
ISBN:
9781118141809
Author:
Nathan Klingbeil
Publisher:
WILEY
data:image/s3,"s3://crabby-images/459cf/459cf6241d135de10054da228a1eeba40b2fb92a" alt="Advanced Engineering Mathematics"
Advanced Engineering Mathematics
Advanced Math
ISBN:
9780470458365
Author:
Erwin Kreyszig
Publisher:
Wiley, John & Sons, Incorporated
data:image/s3,"s3://crabby-images/1fad9/1fad99a5e283e74e984c6bf7510d1f9836377e96" alt="Numerical Methods for Engineers"
Numerical Methods for Engineers
Advanced Math
ISBN:
9780073397924
Author:
Steven C. Chapra Dr., Raymond P. Canale
Publisher:
McGraw-Hill Education
data:image/s3,"s3://crabby-images/5a87c/5a87cace12f9cc506b7a6251c6c030791d2a058d" alt="Introductory Mathematics for Engineering Applicat…"
Introductory Mathematics for Engineering Applicat…
Advanced Math
ISBN:
9781118141809
Author:
Nathan Klingbeil
Publisher:
WILEY
data:image/s3,"s3://crabby-images/21a4f/21a4f62f7828afb60a7e1c20d51feee166b1a145" alt="Mathematics For Machine Technology"
Mathematics For Machine Technology
Advanced Math
ISBN:
9781337798310
Author:
Peterson, John.
Publisher:
Cengage Learning,
data:image/s3,"s3://crabby-images/e1ae4/e1ae4278513a956743faa46779d19ccf451bd689" alt="Basic Technical Mathematics"
data:image/s3,"s3://crabby-images/3ba18/3ba18d7401cedc0b368d26ff888192ad5881f9c0" alt="Topology"