master

pdf

School

University of Cincinnati, Main Campus *

*We aren’t endorsed by this school

Course

OPTIMIZATI

Subject

Computer Science

Date

Oct 30, 2023

Type

pdf

Pages

135

Uploaded by BrigadierGorillaMaster2190

CS229 Lecture Notes Andrew Ng Updated by Tengyu Ma

Contents I Supervised learning 4 1 Linear regression 7 1.1 LMS algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2 The normal equations . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.1 Matrix derivatives . . . . . . . . . . . . . . . . . . . . . 12 1.2.2 Least squares revisited . . . . . . . . . . . . . . . . . . 13 1.3 Probabilistic interpretation . . . . . . . . . . . . . . . . . . . . 14 1.4 Locally weighted linear regression (optional reading) . . . . . . 16 2 Classification and logistic regression 19 2.1 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 Digression: the perceptron learning algorithn . . . . . . . . . . 22 2.3 Another algorithm for maximizing ‘ ( θ ) . . . . . . . . . . . . . 23 3 Generalized linear models 25 3.1 The exponential family . . . . . . . . . . . . . . . . . . . . . . 25 3.2 Constructing GLMs . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.1 Ordinary least squares . . . . . . . . . . . . . . . . . . 28 3.2.2 Logistic regression . . . . . . . . . . . . . . . . . . . . 29 3.2.3 Softmax regression . . . . . . . . . . . . . . . . . . . . 29 4 Generative learning algorithms 34 4.1 Gaussian discriminant analysis . . . . . . . . . . . . . . . . . . 35 4.1.1 The multivariate normal distribution . . . . . . . . . . 35 4.1.2 The Gaussian discriminant analysis model . . . . . . . 38 4.1.3 Discussion: GDA and logistic regression . . . . . . . . 40 4.2 Naive bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2.1 Laplace smoothing . . . . . . . . . . . . . . . . . . . . 44 4.2.2 Event models for text classification (optional reading) . 46 1

CS229 Spring 2022 2 5 Kernel methods 48 5.1 Feature maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 LMS (least mean squares) with features . . . . . . . . . . . . . 49 5.3 LMS with the kernel trick . . . . . . . . . . . . . . . . . . . . 49 5.4 Properties of kernels . . . . . . . . . . . . . . . . . . . . . . . 53 6 Support vector machines 59 6.1 Margins: intuition . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.2 Notation (option reading) . . . . . . . . . . . . . . . . . . . . 61 6.3 Functional and geometric margins (option reading) . . . . . . 61 6.4 The optimal margin classifier (option reading) . . . . . . . . . 63 6.5 Lagrange duality (optional reading) . . . . . . . . . . . . . . . 65 6.6 Optimal margin classifiers: the dual form (option reading) . . 68 6.7 Regularization and the non-separable case (optional reading) . 72 6.8 The SMO algorithm (optional reading) . . . . . . . . . . . . . 73 6.8.1 Coordinate ascent . . . . . . . . . . . . . . . . . . . . . 74 6.8.2 SMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 II Deep learning 79 7 Deep learning 80 7.1 Supervised learning with non-linear models . . . . . . . . . . . 80 7.2 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.3 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.3.1 Preliminary: chain rule . . . . . . . . . . . . . . . . . . 92 7.3.2 One-neuron neural networks . . . . . . . . . . . . . . . 92 7.3.3 Two-layer neural networks: a low-level unpacked com- putation . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.3.4 Two-layer neural network with vector notation . . . . . 96 7.3.5 Multi-layer neural networks . . . . . . . . . . . . . . . 98 7.4 Vectorization over training examples . . . . . . . . . . . . . . 98 III Generalization and regularization 101 8 Generalization 102 8.1 Bias-variance tradeoff . . . . . . . . . . . . . . . . . . . . . . . 104 8.1.1 A mathematical decomposition (for regression) . . . . . 109 8.2 The double descent phenomenon . . . . . . . . . . . . . . . . . 110

Your preview ends here