HW5-solutions
html
keyboard_arrow_up
School
University of Texas *
*We aren’t endorsed by this school
Course
230
Subject
Statistics
Date
Apr 3, 2024
Type
html
Pages
9
Uploaded by sjobs3121
Homework 5 for ISEN 355 (System Simulation)
¶
(C) 2023 David Eckman
Due Date: Upload to Canvas by 9:00pm (CST) on Friday, March 3.
In [ ]:
# Import some useful Python packages.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
import pandas as pd
import random
Problem 1. (20 points)
¶
In this problem, you will be revisiting the topic of applying KDE to the quiz3_times.csv
dataset from Homework 4. This dataset contains the times (measured in minutes) it took each student to complete Quiz 3.
In [ ]:
# Import the dataset.
mydata = pd.read_csv('quiz3_times.csv', header=None)
# Convert the data to a list.
times = mydata[mydata.columns[0]].values.tolist()
(a) (2 points)
Recall from the lecture slides that the KDE $\hat{f}$ can be viewed as the average of $n$ normal pdfs, each one centered at an observation $x_i$ for $i = 1, 2, \ldots, n$. Explain why $\hat{f}$ is itself a probability density function. In particular,
why is $\hat{f}(x) \geq 0$ for all $x$ and why does $\int_{-\infty}^{\infty} \hat{f}(x) dx = 1$?
Nonnegativity:
Each normal pdf is function taking only nonnegative values, thus their
average is also a function taking only nonnegative values.
Integrates to 1:
Each normal pdf integrates to 1 when integrated from $-\infty$ to $\
infty$. The integral (from $-\infty$ to $\infty$) of the average of the normal pdfs is equal to the average of the integrals (over the same range). The average of a bunch of
1s is 1.
To put some more math on this last point, define $\hat{f}(x) = \frac{1}{n} \
sum_{i=1}^n g_i(x)$ where $g_i$ is normal pdf with mean $x_i$ and standard deviation $h$. Then $$ \int_{-\infty}^{\infty} \hat{f}(x) dx = \int_{-\infty}^{\infty} \
frac{1}{n} \sum_{i=1}^n g_i(x) dx = \frac{1}{n} \sum_{i=1}^n \int_{-\infty}^{\
infty} g_i(x) dx = \frac{1}{n} \sum_{i=1}^n 1 = 1. $$
Collectively, we have shown that $\hat{f}$ is a probability density function.
(b) (3 points)
When using the normal kernel, it can be shown that $\hat{f}(x) > 0$ for all $x$. In other words, the support of a random variable having pdf $\hat{f}$ is unbounded. In the case of the quiz time example, this means that the KDE places positive probability on values less than 0 minutes and greater than 15 minutes, both of
which are impossible. Describe how you could modify $\hat{f}$ so that it obeys the real-world bounds (greater than 0 and less than 15), while still being a probability density function?
The best approach is to set $\hat{f}(x)$ to 0 for values of $x$ that are $< 0$ or $> 15$. Because we have cut off a strictly positive amount of probability density in the two tails, we need to renormalize
the truncated form of $\hat{f}$ so that it integrates to 1. This is accomplished by rescaling it, namely, dividing by the integral of
$\hat{f}(x)$ from 0 to 15.
Mathematically, define $$ \tilde{f}(x) = \begin{cases} 0 & \text{if } x < 0 \\ \frac{\
hat{f}(x)}{\int_{0}^{15} \hat{f}(u) du} & \text{if } 0 \leq x \leq 15 \\ 0 & \text{if } x > 15. \end{cases} $$
The function $\tilde{f}$ is a probability density with support on the interval $[0, 15]$.
(c) (3 points)
Write a function that evaluates the KDE $\hat{f}$ at a given value of $x$. Your function should take as inputs a vector of observations x_data
and a bandwidth parameter h
and outputs a value fhatx
. As stated in part (a), $\hat{f}$ is the average of $n$ normal pdfs, each of which has mean $x_i$ for $i = 1, 2, \ldots, n$ and standard deviation $h$.
A skeleton function is provided for you to complete. Read the documentation
for scipy.stats.norm.pdf
to learn what the arguments x
, loc
, and scale
refer to. A print
statement is provided to check that the output of your function matches a known case.
You ONLY need to fill in the question marks. No other coding is needed.
(Do NOT use the scipy.stats.gaussian_kde()
function as we did in Homework 4. It turns out that that function does not use the bandwidth argument in the same way we saw in class; hence why the built-in Silverman's option was different from the equation
we saw in class.)
In [ ]:
def evaluate_fhat(x, x_data, h):
fhatx = np.mean(scipy.stats.norm.pdf(x=x_data, loc=x, scale=h))
return fhatx
# Check your work.
print(f"fhat(x=3) = {round(evaluate_fhat(x=3, x_data=times, h=1.0), 3)} and should be 0.147.")
fhat(x=3) = 0.147 and should be 0.147.
(d) (3 points)
Fix the bandwidth parameter to be $h = 1$. Evaluate the KDE $\hat{f}
$ at a grid of points x = np.linspace(0, 15, 151)
. Plot the resulting kernel density estimate $\hat{f}$ using plt.plot()
. Label the axes of your plot.
In [ ]:
x = np.linspace(0, 15, 151)
fhatx = [evaluate_fhat(x=xi, x_data=times, h=1.0) for xi in x]
#for i in range(len(x)):
# fhat_x.append(np.mean(scipy.stats.norm.pdf(times, loc=x[i], scale=1)))
plt.plot(x, fhatx)
plt.xlabel(r"$x$")
plt.ylabel(r"$\hat{f}(x)$")
plt.title(r"Kernel Density Estimate with $h = 1.0$")
plt.show()
(e) (5 points)
In this part, you will generate random variates drawn from the KDE $\
hat{f}$. Because $\hat{f}$ is non-parametric, the other methods we will soon learn in
class for random-variate generation can not be applied straightforwardly. There is instead a simple algorithm for drawing a random variate $X$ having distribution $\
hat{f}$:
1. Pick one of the observations $x_1, x_2, \ldots, x_n$ uniformly at random. Let $j$ be the index of the selected observation. 2. Generate a normal random variate having mean $x_j$ and standard deviation $h$ where $h > 0$ is the bandwidth parameter used for KDE. 3. Set $X$ to be the value of the normal random variate. Write a function to implement this sampling algorithm. Your function should return a single random variate X.
Read the documentation
for the random
package, specifically the sections on the random.randint()
and random.normalvariate()
functions.
In [ ]:
def sample_from_kde(x_data, h):
n = len(times)
j = random.randint(1, n) # Generate a random integer between 1 and 95 inclusive.
X = random.normalvariate(mu=x_data[j-1], sigma=h) # -1 is because Python indexes from 0.
return X
print(f"X = {sample_from_kde(x_data=times, h=1)}.")
X = 8.286396370621407.
(f) (4 points)
Use your algorithm to generate 10,000 random variates $X_1, X_2, \
ldots, X_{10000}$. Plot a histogram of $X_1, X_2, \ldots, X_{10000}$ using 100 bins and the argument density=True
. Superimpose your plot of the KDE $\hat{f}$ from
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
part (e). Comment on the shapes of the histogram and the KDE.
In [ ]:
plt.plot(x, fhatx)
X = [sample_from_kde(x_data=times, h=1.0) for _ in range(10000)]
plt.hist(X, bins=100, density=True)
plt.xlabel(r"$x$")
plt.ylabel(r"$\hat{f}(x)$ and frequency")
plt.title("KDE and Histogram")
plt.show()
The histogram closely matches the fitted KDE.
Problem 2. (30 points)
¶
The dataset service_times.csv
contains 25 service times (measured in minutes).
In [ ]:
# Import the dataset.
mydata2 = pd.read_csv('service_times.csv', header=None)
# Convert the data to a list.
service_times = mydata2[mydata2.columns[0]].values.tolist()
print(service_times)
[1.88, 0.54, 1.9, 0.15, 0.02, 2.81, 1.5, 0.53, 2.62, 2.67, 3.53, 0.53, 1.8, 0.79, 0.21,
0.8, 0.26, 0.63, 0.36, 2.03, 1.42, 1.28, 0.82, 2.16, 0.05]
(a) (3 points)
Plot a histogram of the service times with an appropriate number of bins. Label your axes. Comment on the shape of the data.
In [ ]:
plt.hist(service_times, bins=5)
plt.title("Histogram of Service Times")
plt.xlabel("Service Times (min)")
plt.ylabel("Frequency")
plt.show()
Most services are completed within fewer than 3 minutes. The data is positively skewed.
(b) (5 points)
Plot the empirical cumulative distribution function (ecdf) of the service time data. Recall that the ecdf is a piecewise constant function with jumps of size $1/n$ where $n$ is the number of observations. You will want to use a function like np.sort()
(see documentation
) to sort the data first. Label your axes.
In [ ]:
times = np.sort(service_times)
n = len(times)
ecdf = [(i + 1) / n for i in range(n)]
plt.step(times, ecdf, where="post")
plt.ylabel(r"$\hat{F}(x)$")
plt.xlabel(r"Service Time ($x$)")
plt.title("Empirical CDF")
plt.show()
(c) (5 points)
Generate 1000 random variates from the ecdf by sampling with replacement 1000 times from the service times data. You will want to use a function like random.randint()
(to sample a random (integer) index) or random.choices()
(to directly resample). Documentation for these functions can be found here
.
Plot the empirical cdf of the 1000 random variates - this will be another piecewise constant function, but with jumps of size 1/1000. Superimpose the empirical cdf (of the
original service time data) you plotted in part (b). If you did things correctly, the two curves should look very similar.
In [ ]:
new_times = random.choices(times, k=1000)
new_times = np.sort(new_times)
n_new = 1000
new_ecdf = [(i + 1) / n_new for i in range(n_new)]
plt.step(new_times, new_ecdf, where="post", label="1000 RVs")
plt.step(times, ecdf, color="red", where="post", label="Original")
plt.ylabel(r"$\hat{F}(x)$")
plt.xlabel(r"Service Time ($x$)")
plt.title("Empirical CDFs of original and generated data")
plt.legend()
plt.show()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(d) (5 points)
Plot the interpolated
empirical cumulative distribution function (iecdf)
of the original service time data. Recall that the iecdf is a piecewise linear and continuous function with breakpoints at heights of $0, \frac{1}{n-1}, \frac{2}{n-1}, \
ldots, \frac{n-2}{n-1}, 1$. Label your axes.
In [ ]:
iecdf = [i / (n - 1) for i in range(n)]
plt.plot(times, iecdf)
plt.ylabel(r"$\tilde{F}(x)$")
plt.xlabel(r"Service Time ($x$)")
plt.title("Interpolated Empirical CDF")
plt.show()
(e) (7 points)
Generate 1000 random variates from the iecdf using the two-stage procedure we discussed in class. To generate a random variate $X$,
1. Choose one of the intervals between successive (sorted) service times uniformly at random.
2. Generate a continuous uniform random variable with lower and upper bounds corresponding to the left and right endpoints of the selected interval. Set $X$ to be the value of the generated uniform random variable.
You will want to use a function like random.randint()
(to sample a random (integer) index) and random.uniform()
to sample uniformly from within an interval. Documentation for these functions can be found here
.
Plot the empirical cdf of the 1000 random variates - this will be a piecewise constant function with jumps of size 1/1000. Superimpose the interpolated empirical cdf (of the original service time data) you plotted in part (d). If you did things correctly, the two curves should look very similar.
In [ ]:
new_iecdf_times = []
n_new = 1000
for i in range(n_new):
interval_idx = random.randint(1, n - 1)
X = random.uniform(times[interval_idx - 1], times[interval_idx])
new_iecdf_times.append(X)
new_iecdf_times = np.sort(new_iecdf_times)
new_ecdf = new_ecdf = [(i + 1) / n_new for i in range(n_new)]
plt.step(new_iecdf_times, new_ecdf, where="post")
plt.plot(times, iecdf, color="red")
plt.ylabel(r"$\hat{F}(x)$ and $\tilde{F}(x)$")
plt.xlabel(r"Service Times ($x$)")
plt.title("ECDF and ECDF from sampling from IECDF")
plt.legend(["1000 RVs", "Original"])
plt.show()
(f) (2 points)
Given this service time data, why might we prefer to sample from the interpolated empirical cdf instead of the empirical cdf?
In [ ]:
print(np.sort(times))
[0.02 0.05 0.15 0.21 0.26 0.36 0.53 0.53 0.54 0.63 0.79 0.8 0.82 1.28
1.42 1.5 1.8 1.88 1.9 2.03 2.16 2.62 2.67 2.81 3.53]
Sampling from the interpolated empirical cdf allows us to generate intermediate values. For instance, we would generate service times between 0.82 and 1.28 and between 2.16 and 2.62.
(g) (3 points)
Give an example of a situation where you have data and would prefer to sample from the empirical cdf (i.e., resample your data with replacement) as opposed to fitting a parametric distribution? Explain your reasoning.
We might prefer to sample from the empirical cdf if (1) the stochastic input is modeled
as a discrete random variable, (2) we have obtained a large amount of data such that we have observed most of the plausible values the random variable can take, and (3) this data does not follow the shape of a known parametric distribution. In this case, the
data can be believed to resemble the unknown underlying pmf better than any common parametric distribution.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Chap 6 13
arrow_forward
# 4
arrow_forward
Hello, I'm not sure on how to go about solving questions 6, 7, and 8. Thanks in advance for your help.
arrow_forward
2.1 quest 2
arrow_forward
P Chrome - Do Homework - HW #12 = 8.4, 9.1, 9.2
A mathxl.com/Student/PlayerHomework.aspx?homeworkld3610059707&questionld=5&flushed%-true&cld%3D6700191¢erw
Math for Aviation I
E Homework: HW #12 = 8.4, 9.1, 9.2
Question 6, 8.
For an arc length s, area of sector A, and central angle 0 of a circle of radius r, find the indicated quantity for the given values.
A= 15.3 m2, r=4.02 m, s = ?
S=
m (Type an integer or decimal rounded to two decimal places as needed.)
Help me solve this
View an example
Get more help -
DELL
arrow_forward
HCAhdUBrYB cE2JgCq6lOpVi04Ymk5B6c/edit
h's scienc..
RendallStudents - h..
G Home Schoology
Quiz | ReadTheory
Classroom
M Frontier Academy..
8th
☆ 回
T
changing. Starting October 13, items will be automatically deleted forever after they've been in your tr
12
iberal Arts Math
Chapter 2
A
Name
Date
Block
1Non Calculator (must show work for all calculations)
1. Given U={1, 2, 3, 4, 5, 6, 7, 8, 9}, A = {2, 4, 6, 8} and B = {1, 2, 3, 4}
%3D
AUB
A nB
AUB
arrow_forward
Please help me with my home
arrow_forward
Stud
brak
Drak
CE
Drak
t Drak
Cón
Drak
¿Cón E
com/web/viewer.html?state%=%7B"ids"%3A%5B"17XBedGdADvjqxVdi9biQDS5GyNyj5X58"%5D%2C"action"%3A
Ardans - B2-
Drake Truman - 5.4 - Practice Problems.pdf
llustrative
Mathematics
3. A plece of paper Is folded Into thirds multiple times. The area, A, of the plece of
paper In square Inches, after n folds, Is A = 90 (-)
G)".
a. What Is the value of A when n=
0? What does this mean in the situation?
D.
90
the 0 will always 1
b. How many folds are needed before the area is less than 1 square inch?
C. The area of another piece of paper in square inches, after n folds, is given by
numbers 100 and mean in this situation?
B= 100 ().What do the
B= 100
arrow_forward
PLEASE HELP ME WITH THISS!!
arrow_forward
Chrome- Do Homework- HW #12 = 8.4, 9.1, 9.2
A mathxl.com/Student/PlayerHomework.aspx?homeworkld 610059707&questionld%35&flushed%-Dtrue&cld%3D6700191¢er
Math for Aviation I
E Homework: HW #12 = 8.4, 9.1, 9.2
Question 7, 8.
For an arc length s, area of sector A, and central angle 0 of a circle of radius r, find the indicated quantity for the given value.
A= 76.9 mi2r= 76.9 mi, 0 = ?
radian
(Do not round until the final answer. Then round to three decimal places as needed.)
Heip me solve this
View an example
Get more belp-
DELL
arrow_forward
II
so MyPath - Home
LTI Launch
P Do Homework - 1.1 Numbers, X
+
A mathxl.com/Student/PlayerHomework.aspx?homeworkld3615773774&questionld%3D2&flushed%=f
MAT 150/100: College Algebra (4216_10PZ)
Homework: 1.1 Numbers, Data, and Problem
Solving
Question 2, 1.
>
Classify the number as one or more of the following: natural number, integer, rational number, or real number.
7.9 (Average number of gallons of gasoline a driver uses each day to drive a car)
Of which of the following sets is 7.9 a member? Select all that apply.
rational
real
natural
integer
Help me solve this
Textbook
Get more help -
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Related Questions
- 2.1 quest 2arrow_forwardP Chrome - Do Homework - HW #12 = 8.4, 9.1, 9.2 A mathxl.com/Student/PlayerHomework.aspx?homeworkld3610059707&questionld=5&flushed%-true&cld%3D6700191¢erw Math for Aviation I E Homework: HW #12 = 8.4, 9.1, 9.2 Question 6, 8. For an arc length s, area of sector A, and central angle 0 of a circle of radius r, find the indicated quantity for the given values. A= 15.3 m2, r=4.02 m, s = ? S= m (Type an integer or decimal rounded to two decimal places as needed.) Help me solve this View an example Get more help - DELLarrow_forwardHCAhdUBrYB cE2JgCq6lOpVi04Ymk5B6c/edit h's scienc.. RendallStudents - h.. G Home Schoology Quiz | ReadTheory Classroom M Frontier Academy.. 8th ☆ 回 T changing. Starting October 13, items will be automatically deleted forever after they've been in your tr 12 iberal Arts Math Chapter 2 A Name Date Block 1Non Calculator (must show work for all calculations) 1. Given U={1, 2, 3, 4, 5, 6, 7, 8, 9}, A = {2, 4, 6, 8} and B = {1, 2, 3, 4} %3D AUB A nB AUBarrow_forward
- Please help me with my homearrow_forwardStud brak Drak CE Drak t Drak Cón Drak ¿Cón E com/web/viewer.html?state%=%7B"ids"%3A%5B"17XBedGdADvjqxVdi9biQDS5GyNyj5X58"%5D%2C"action"%3A Ardans - B2- Drake Truman - 5.4 - Practice Problems.pdf llustrative Mathematics 3. A plece of paper Is folded Into thirds multiple times. The area, A, of the plece of paper In square Inches, after n folds, Is A = 90 (-) G)". a. What Is the value of A when n= 0? What does this mean in the situation? D. 90 the 0 will always 1 b. How many folds are needed before the area is less than 1 square inch? C. The area of another piece of paper in square inches, after n folds, is given by numbers 100 and mean in this situation? B= 100 ().What do the B= 100arrow_forwardPLEASE HELP ME WITH THISS!!arrow_forward
- Chrome- Do Homework- HW #12 = 8.4, 9.1, 9.2 A mathxl.com/Student/PlayerHomework.aspx?homeworkld 610059707&questionld%35&flushed%-Dtrue&cld%3D6700191¢er Math for Aviation I E Homework: HW #12 = 8.4, 9.1, 9.2 Question 7, 8. For an arc length s, area of sector A, and central angle 0 of a circle of radius r, find the indicated quantity for the given value. A= 76.9 mi2r= 76.9 mi, 0 = ? radian (Do not round until the final answer. Then round to three decimal places as needed.) Heip me solve this View an example Get more belp- DELLarrow_forwardII so MyPath - Home LTI Launch P Do Homework - 1.1 Numbers, X + A mathxl.com/Student/PlayerHomework.aspx?homeworkld3615773774&questionld%3D2&flushed%=f MAT 150/100: College Algebra (4216_10PZ) Homework: 1.1 Numbers, Data, and Problem Solving Question 2, 1. > Classify the number as one or more of the following: natural number, integer, rational number, or real number. 7.9 (Average number of gallons of gasoline a driver uses each day to drive a car) Of which of the following sets is 7.9 a member? Select all that apply. rational real natural integer Help me solve this Textbook Get more help -arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Algebra for College StudentsAlgebraISBN:9781285195780Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage LearningElementary Geometry for College StudentsGeometryISBN:9781285195698Author:Daniel C. Alexander, Geralyn M. KoeberleinPublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
- Mathematics For Machine TechnologyAdvanced MathISBN:9781337798310Author:Peterson, John.Publisher:Cengage Learning,Algebra: Structure And Method, Book 1AlgebraISBN:9780395977224Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. ColePublisher:McDougal Littell

Algebra for College Students
Algebra
ISBN:9781285195780
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell