The squared distance from any sample point to the origin has a x² distribution with mean d. Consider a prediction point x₁ drawn from this distribution, and let a = Xo/|xo| be an associated unit vector. Let zi aTx; be the projection of each of the training points on this - direction. (a). Show that the z; are distributed N(0, 1) with expected squared distance from the origin 1, while the target point has expected squared distance d from the origin. (b). For d = 10 show that the expected distance of a test point from the centre of the training data is 3.1 standard deviations, while all the training points have expected distance 0.80 along direction a. So most prediction points see themselves as lying on the edge of the training set. Note: for this question you need to use a result for the expected value of a squared root of a chi-squared distribution. Either find such a result, or obtain your answer by simulation.

MATLAB: An Introduction with Applications
6th Edition
ISBN:9781119256830
Author:Amos Gilat
Publisher:Amos Gilat
Chapter1: Starting With Matlab
Section: Chapter Questions
Problem 1P
icon
Related questions
Question

The problem with KNN is that in high dimensions, most points tend to lie on the boundary of the data space. Consider explanatory variables drawn from a spherical multinormal distribution x ~ N(0, I), where x is a random d-vector, and I is a d x d identity matrix.

The squared distance from any sample point to the origin has a x² distribution with mean
d. Consider a prediction point xo drawn from this distribution, and let a = Xo/||xo|| be an
associated unit vector. Let z; = aTx; be the projection of each of the training points on this
direction.
(a). Show that the z; are distributed N(0, 1) with expected squared distance from the origin
1, while the target point has expected squared distance d from the origin.
(b). For d = 10 show that the expected distance of a test point from the centre of the
training data is 3.1 standard deviations, while all the training points have expected
distance 0.80 along direction a. So most prediction points see themselves as lying on
the edge of the training set. Note: for this question you need to use a result for the
expected value of a squared root of a chi-squared distribution. Either find such a result,
or obtain your answer by simulation.
Transcribed Image Text:The squared distance from any sample point to the origin has a x² distribution with mean d. Consider a prediction point xo drawn from this distribution, and let a = Xo/||xo|| be an associated unit vector. Let z; = aTx; be the projection of each of the training points on this direction. (a). Show that the z; are distributed N(0, 1) with expected squared distance from the origin 1, while the target point has expected squared distance d from the origin. (b). For d = 10 show that the expected distance of a test point from the centre of the training data is 3.1 standard deviations, while all the training points have expected distance 0.80 along direction a. So most prediction points see themselves as lying on the edge of the training set. Note: for this question you need to use a result for the expected value of a squared root of a chi-squared distribution. Either find such a result, or obtain your answer by simulation.
Expert Solution
steps

Step by step

Solved in 4 steps with 54 images

Blurred answer
Similar questions
Recommended textbooks for you
MATLAB: An Introduction with Applications
MATLAB: An Introduction with Applications
Statistics
ISBN:
9781119256830
Author:
Amos Gilat
Publisher:
John Wiley & Sons Inc
Probability and Statistics for Engineering and th…
Probability and Statistics for Engineering and th…
Statistics
ISBN:
9781305251809
Author:
Jay L. Devore
Publisher:
Cengage Learning
Statistics for The Behavioral Sciences (MindTap C…
Statistics for The Behavioral Sciences (MindTap C…
Statistics
ISBN:
9781305504912
Author:
Frederick J Gravetter, Larry B. Wallnau
Publisher:
Cengage Learning
Elementary Statistics: Picturing the World (7th E…
Elementary Statistics: Picturing the World (7th E…
Statistics
ISBN:
9780134683416
Author:
Ron Larson, Betsy Farber
Publisher:
PEARSON
The Basic Practice of Statistics
The Basic Practice of Statistics
Statistics
ISBN:
9781319042578
Author:
David S. Moore, William I. Notz, Michael A. Fligner
Publisher:
W. H. Freeman
Introduction to the Practice of Statistics
Introduction to the Practice of Statistics
Statistics
ISBN:
9781319013387
Author:
David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:
W. H. Freeman