HW10-solutions

html

School

University of Texas *

*We aren’t endorsed by this school

Course

230

Subject

Statistics

Date

Apr 3, 2024

Type

html

Pages

9

Uploaded by sjobs3121

Report
Homework 10 for ISEN 355 (System Simulation) (C) 2023 David Eckman Due Date: Upload to Canvas by 9:00pm (CDT) on Friday, April 21. In [ ]: # Import some useful Python packages. import numpy as np import matplotlib.pyplot as plt import scipy.stats import pandas as pd Problem 1. (23 points) This problem concerns generating Beta($a=6, b=3$) random variates via acceptance- rejection. (a) (4 points) Use scipy.stats.beta.rvs() to generate 1000 Beta random variables with parameters $a=6$ and $b=3$. Documentation for the Beta distribution in scipy.stats can be found here . Plot a histogram of the 1000 random variates with 30 bins and density=True . Superimpose the probability density function (pdf) of a Beta ($a=6, b=3$) random variable. In [ ]: # Generate 1000 Beta(a=6, b=3) random variates. beta_rvs = scipy.stats.beta.rvs(a=6, b=3, size=1000) plt.hist(beta_rvs, bins=30, density=True) # Evalute pdf of Beta (a=6, b=3) on a fine grid. x = np.linspace(0,1,101) pdf = scipy.stats.beta.pdf(x, a=6, b=3) plt.plot(x, pdf) plt.xlabel(r"$x$") plt.ylabel(r"Relative Frequency / pdf $f(x)$") plt.title(r"PDF of Beta($a=6, b=3$) and Histogram") plt.show()
(b) (5 points) Calculate $M = \max_{x \in [0, 1]} f(x)$ where $f(\cdot)$ is the pdf of the Beta($a=6, b=3$) distribution. You may use the fact that the mode of the Beta distribution is $(a-1)/(a+b-2)$ provided $a > 1$ and $b > 1$. The scipy.stats.beta.pdf() function may prove useful. As a means of double-checking your answer, plot the pdf $f$ (the same pdf you plotted in part (a)) and superimpose a horizontal line at a height $M$ using matplotlib.pyplot.hlines() ; documentation found here . In [ ]: a = 6 b = 3 x_mode = (a - 1) / (a + b - 2) # Evalute pdf at the mode. M = scipy.stats.beta.pdf(x_mode, a=6, b=3) plt.plot(x, pdf) plt.hlines(M, 0, 1, 'r') plt.xlabel(r"$x$") plt.ylabel(r"pdf $f(x)$") plt.title(r"pdf of Beta(a=6, b=3) with upper bound") plt.show()
(c) (2 points) Recall that a Beta random variable (with any values of $a$ and $b$) takes values on the interval $[0, 1]$. Using your answer from part (b), what is the expected number of Uniform[0, 1] random variates needed by an acceptance-rejection algorithm (with the choice of $M$ you specified in part(b)) to generate a single Beta($a=6, b=3$) random variate? In [ ]: # From class notes, expected number of uniforms needed to generate one RV is 2 * c # where c = M * (upper bound - lower bound) is the area of the box. # Lower and upper bounds of Beta rv are 0 and 1. c = M * (1 - 0) print(f"The expected number of Uniform[0, 1] random variates needed is {np.round(2 * c, 3)}.") The expected number of Uniform[0, 1] random variates needed is 5.1. (d) (3 points) For the acceptance-rejection method described in part (c), what is the probability that it terminates on the first iteration (i.e., that it uses only 2 Uniform[0, 1] random variates) AND returns Beta random variate $X$ having a value less than or equal to 0.5? \begin{align*} P(\text{success on first iteration and } X \leq 0.5) &= P(Z_2 \leq f(Z_1) \ text{ and } Z_1 \leq 0.5) \\ &= P(Z_1 \leq 0.5 | Z_2 \leq f(Z_1)) \times P(Z_2 \leq f(Z_1)) \\ &= F(0.5) \times (1/c) \end{align*} This is essentially the numerator term in the proof of the validity of the acceptance- rejection method. In [ ]: print(f"The requested probablity is {np.round(scipy.stats.beta.cdf(0.5, a=6, b=3) / c, 3)}.") The requested probablity is 0.057. (e) (6 points) Write a function that generates a single Beta($a=6, b=3$) random
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
variate via acceptance-rejection, as described in Lecture 13. You may either work directly with the random variates $Z_1$ and $Z_2$ discussed in the slides, or with the more primitive Uniform[0, 1] random variates $U_1$ and $U_2$. In [ ]: def generate_beta(a, b): # Optional... can use global variable M from part (b). x_mode = (a - 1) / (a + b - 2) M = scipy.stats.beta.pdf(x_mode, a=a, b=b) accept = False while not accept: # Generate 2 Uniform[0,1] random variates. U1 = scipy.stats.uniform.rvs() U2 = scipy.stats.uniform.rvs() # Transform into Z1 ~ Uniform[0,1] and Z2 ~ Uniform[0,M]. Z1 = 0 + (1 - 0) * U1 # Z1 = U1 in this case Z2 = 0 + (M - 0) * U2 # Z2 = MU2 # Alternatively, directly generate Z1 and Z2. # Z1 = scipy.stats.uniform.rvs(loc=0, scale=1) # Z2 = scipy.stats.uniform.rvs(loc=0, scale=M) if Z2 <= scipy.stats.beta.pdf(Z1, a=a, b=b): # accept Z1 as Beta(a,b) random variable return U1 # This will break the loop. (f) (3 points) Use your function from part (e) to generate 1000 Beta($a=6, b=3$) random variates via acceptance rejection. Plot a histogram of the 1000 random variates with 30 bins and density=True . Superimpose the probability density function (pdf) of a Beta ($a=6, b=3$) random variable. In [ ]: beta_AR_rvs = [generate_beta(a=6, b=3) for _ in range(1000)] plt.hist(beta_AR_rvs, bins=30, density=True) x = np.linspace(0,1,101) pdf = scipy.stats.beta.pdf(x, a=6, b=3) plt.plot(x, pdf) plt.xlabel(r"$x$") plt.ylabel(r"Relative Frequency / pdf $f(x)$") plt.title(r"PDF of Beta($a=6, b=3$) and Histogram") plt.show()
Problem 2. (27 points) Recall the file arrival_counts.csv from Homework 8. It contains counts of the number of vehicles that arrived in 1-hour periods from 8am-4pm over the course of 30 days. The code below loads the data and stores it in a numpy ndarray , which you can think of as a matrix. Each row in the matrix corresponds to a day and each column corresponds to a 1-hour period, starting with 8am-9am in Column 0 (remember that Python indexes from 0) and 3-4pm in Column 7. In [ ]: # Import the dataset. mydata = pd.read_csv('arrival_counts.csv') arrivalcounts = np.array(mydata) print(arrivalcounts) [[253 380 307 179 118 174 162 77] [248 350 329 155 121 169 134 62] [285 384 310 161 103 169 135 64] [291 406 301 167 124 171 164 67] [246 396 302 169 128 155 135 77] [264 383 314 185 115 153 132 43] [256 352 337 166 133 164 137 64] [255 381 304 192 120 178 152 53] [267 396 318 176 117 166 137 48] [253 390 316 181 119 174 176 70] [266 356 322 195 118 165 142 60] [263 346 305 174 118 185 123 61] [284 360 322 166 114 159 141 74] [255 385 293 167 121 169 125 61] [259 400 313 198 108 183 133 56]
[249 397 331 177 123 175 124 59] [269 367 322 175 112 145 123 54] [293 384 320 165 115 188 146 59] [256 393 328 186 122 134 125 60] [261 343 331 202 121 156 139 52] [269 376 314 194 135 154 158 71] [270 388 295 155 141 144 155 58] [239 367 311 177 116 177 148 61] [264 384 314 177 130 154 139 57] [250 382 332 175 132 192 150 60] [278 367 327 192 124 165 128 77] [262 366 339 197 124 173 151 64] [261 362 303 186 128 180 150 74] [262 345 348 169 126 173 136 66] [238 345 316 179 135 186 154 55]] (a) (10 points) Write a function that generate a day's worth of vehicle arrival times from the fitted NSPP using the thinning method discussed in class. Use the estimated arrival rate function that varies by hour , i.e., the one you fitted in Problem 1(l) of Homework 8. In [ ]: def parking_NSPP_thinning(): hourly_rates = np.mean(arrivalcounts, axis=0) lambda_star = max(hourly_rates) T_star = 0 arrival_times = [] # Accepted arrival times while T_star < 8: # Simulate arrival up until 4pm (=8). A = scipy.stats.expon.rvs(scale = 1/lambda_star) T_star = T_star + A U = scipy.stats.uniform.rvs() if T_star < 8: # T_star may have gone past 8 within the loop. if U <= hourly_rates[int(np.floor(T_star))] / lambda_star: arrival_times.append(T_star) return arrival_times (b) (5 points) Run your function from part (a) 30 times to produce 30 days' worth of arrival times. For each day's worth of arrival times, use np.histogram() to count the number of arrivals in each of the eight 1-hour intervals: 8am-9am, 9am-10am, etc. Using np.mean() , calculate the mean counts for each 1-hour interval, averaged over 30 days. Plot the mean arrival count by interval as a piecewise-constant function with 8 pieces. Superimpose the fitted rate function, which is another piecewise-constant function with 8 pieces. Comment on what you see in the plot. NOTE: This problem and the rest in the assignment ask you to calculate statistics from your 30 days' worth of generated arrivals, not from the data in arrival_counts.csv . In [ ]: all_arrival_times = [parking_NSPP_thinning() for _ in range(30)] hourly_counts = [list(np.histogram(all_arrival_times[idx], bins=np.linspace(0, 8, 9)) [0]) for idx in range(30)] mean_counts = np.mean(hourly_counts, axis=0) # Simulated data. hourly_rates = np.mean(arrivalcounts, axis=0) # Original data.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
# Optional: Append duplicate end value to make plot look nice. mean_counts_app = list(mean_counts) + [mean_counts[-1]] hourly_rates_app = list(hourly_rates) + [hourly_rates[-1]] t = np.arange(9) plt.step(t, mean_counts_app, where='post') plt.step(t, hourly_rates_app, where='post') plt.legend(["Simulated", "Original"]) plt.xlabel("Time of Day") plt.ylabel("Number of Arrivals") plt.title("Mean Hourly Arrival Rates/Counts") plt.xticks(ticks=[0, 1, 2, 3, 4, 5, 6, 7, 8], labels=["8a", "9a", "10a", "11a", "12p", "1p", "2p", "3p", "4p"]) plt.show() The mean hourly arrival count and fitted hourly rate function look almost identical. (c) (4 points) Similar to what you did for Problem 1(i) on Homework 8, use np.var() to compute the variance of the 30 generated arrival counts for each 1-hour period . Use plt.scatter() to produce a scatter plot of (sample mean, sample variance) with 8 points, one for each 1-hour period. Superimpose a 45-degree line passing through the origin. Comment on whether your plot shows evidence that the arrival counts you generated satisfy the mean=variance assumption of a Poisson process. In [ ]: var_counts = np.var(hourly_counts, axis=0, ddof=1) plt.scatter(mean_counts, var_counts) plt.plot([0, 400], [0, 400], 'r') plt.xlabel("Sample Means") plt.ylabel("Sample Variances")
plt.title("Mean vs Variances") plt.show() The scatter points fall roughly along the 45 degree line through the origin. For a Poisson process, the number of arrivals in any interval is Poisson distributed, and so will have mean equal to variance. This scatterplot supports this, but with only 30 replications, there is still a fair amount of variability in the sample mean and sample variance. (d) (3 points) Similar to what you did for Problem 1(e-f), use the function np.corrcoef() to compute the sample correlation of generated arrival counts. Print the correlation matrix and use the function np.round() to round the entries to the nearest 3 digits. Comment on what you observe in the correlation matrix and what you would expect to observe for a non-stationary Poisson process. (Hint: Think about the independent increments assumption.) In [ ]: hourly_counts= np.array(hourly_counts) corr_matrix = np.corrcoef(hourly_counts, rowvar=False) print(np.round(corr_matrix,3)) [[ 1. 0.019 -0.014 0.054 -0.075 -0.237 0.416 0.118] [ 0.019 1. -0.184 0.244 -0.183 -0.432 -0.039 -0.115] [-0.014 -0.184 1. -0.25 -0.026 -0.009 0.139 0.026] [ 0.054 0.244 -0.25 1. -0.188 -0.143 -0.03 -0.063] [-0.075 -0.183 -0.026 -0.188 1. 0.226 0.031 -0.029] [-0.237 -0.432 -0.009 -0.143 0.226 1. -0.039 -0.203] [ 0.416 -0.039 0.139 -0.03 0.031 -0.039 1. 0.024] [ 0.118 -0.115 0.026 -0.063 -0.029 -0.203 0.024 1. ]] Most off-diagonal correlations in the matrix are closer to zero than to +/-1. For a Poisson process, the increments are independent, so we should expect to see a
correlation matrix that is close to the identity matrix. We are seeing something that, but there is still amount of variability with only 30 replications. (e) (5 points) For this NSPP, on average, how many Uniform[0, 1] random variates does your thinning algorithm need to generate one arrival time? You may assume that in Step 2 of the thinning method, when an exponential random variate is needed, one can be generated (via inversion) from exactly one Uniform[0, 1] random variate. (Hint: Think about the ratio of the expected number of green stars to the expected number of black stars on Slide 6 of the Lecture 14 slides.) The number of black stars is Poisson distributed with mean $8\lambda^*$, so the expected number of black stars is $8\lambda^*$. Similarly, the number of green starts in time interval $i$ is Poisson distributed with mean $\lambda_i$ where $\lambda_i$ is the hourly rate associated with that interval. Thus the expected number of green stars in interval $i$ is $\lambda_i$, and the expected total number of green stars is $\ sum_{i=1}^8 \lambda_i$. We use one Uniform[0,1] random variate to generate each black star and another to decide whether it becomes a green star. Thus the average number of Uniform[0,1] random variates needed to generate one arrival time is $$ \frac{2 \times 8 \times \lambda^*}{\sum_{i=1}^8 \lambda_i}. $$ In [ ]: hourly_rates = np.mean(arrivalcounts, axis=0) lambda_star = max(hourly_rates) print(f"The average number of Uniform[0, 1] RVs per arrival is {round(2 * 8 * lambda_star / np.sum(hourly_rates), 3)}.") The average number of Uniform[0, 1] RVs per arrival is 3.685.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help