11. Let S : RK → RK be the softmax function. Let Sk be the kth component function of S: euk Sk(u) = Let p e RK be a "probability vector", that is, a vector whose components are non- negative and sum to 1. If u E RK, then S(u) is also a probability vector, and we can compare p with S(u) by computing the cross-entropy K h(u) Σ-P: log (S. (u). k=1 Compute the gradient Vh(u). (This calculation is a key step when training a multiclass logistic regression model using gradient descent.)

A First Course in Probability (10th Edition)
10th Edition
ISBN:9780134753119
Author:Sheldon Ross
Publisher:Sheldon Ross
Chapter1: Combinatorial Analysis
Section: Chapter Questions
Problem 1.1P: a. How many different 7-place license plates are possible if the first 2 places are for letters and...
icon
Related questions
Question
Solve please
11. Let S : RK → RK be the softmax function. Let Sk be the kth component function
of S:
euk
Sk(u)
2j=1 euj
Let p e RK be a "probability vector", that is, a vector whose components are non-
negative and sum to 1. If u E RK, then S(u) is also a probability vector, and we can
compare p with S(u) by computing the cross-entropy
K
h(u)-Σ-P: log (S, (u) ).
k=1
Compute the gradient Vh(u). (This calculation is a key step when training a multiclass
logistic regression model using gradient descent.)
Transcribed Image Text:11. Let S : RK → RK be the softmax function. Let Sk be the kth component function of S: euk Sk(u) 2j=1 euj Let p e RK be a "probability vector", that is, a vector whose components are non- negative and sum to 1. If u E RK, then S(u) is also a probability vector, and we can compare p with S(u) by computing the cross-entropy K h(u)-Σ-P: log (S, (u) ). k=1 Compute the gradient Vh(u). (This calculation is a key step when training a multiclass logistic regression model using gradient descent.)
Expert Solution
steps

Step by step

Solved in 3 steps with 1 images

Blurred answer