HW9-solutions

html

School

University of Texas *

*We aren’t endorsed by this school

Course

230

Subject

Statistics

Date

Apr 3, 2024

Type

html

Pages

Uploaded by sjobs3121

Homework 9 for ISEN 355 (System Simulation) ¶ (C) 2023 David Eckman Due Date: Upload to Canvas by 9:00pm (CDT) on Monday, April 10. In [ ]: # Import some useful Python packages. import numpy as np import matplotlib.pyplot as plt import scipy.stats Problem 1. (10 points) ¶ (a) (2 points) Recall the linear congruential generator (LCG) introduced in Lecture 11. Write a function that takes as input parameters $m$, $a$, and $c$ and a current value $X_n$ and returns the next random number in the sequence, $X_{n+1}$. In Python, the ``%'' symbol acts as the modulus operator. See Section 6.6 of the Python documentation . In [ ]: # Example of modulus with % symbol. # 5 mod 3 = 2 5 % 3 Out[ ]: 2 In [ ]: def myLCG(current_x, m, a, c): next_x = (a * current_x + c) % m return next_x (b) (5 points) As we did in the LCG example in lecture, set $m=32$, $a=11$, and $c=0$. An LCG with $m=32$ has the potential to generate numbers in the set $\{0, 1, 2, \ldots, 30, 31\}$, but as we saw in class, for this choice of $a$ and $c$, the period is less than 32. For a fixed seed, e.g., $X_0 = 1$, recursively run your algorithm until the sequence of returned numbers repeats. Take turns trying different seeds until you have discovered all cycles of numbers. List all cycles. (Note: Each number between 0 and 31 should appear exactly once in your list of cycles.) In [ ]: m = 32 a = 11 c = 0 ListOfCycles = [] repeat = set() for seed in range(m): period = 0 if seed in repeat: continue # If we've seen this number already, skip it. CurrentCycle = [] current_x = seed while current_x not in repeat: repeat.add(current_x) period = period + 1 CurrentCycle.append(current_x) current_x=myLCG(current_x, m, a, c) ListOfCycles.append(CurrentCycle) print(f"The cycle {CurrentCycle} has a period of {period}.")

The cycle [0] has a period of 1. The cycle [1, 11, 25, 19, 17, 27, 9, 3] has a period of 8. The cycle [2, 22, 18, 6] has a period of 4. The cycle [4, 12] has a period of 2. The cycle [5, 23, 29, 31, 21, 7, 13, 15] has a period of 8. The cycle [8, 24] has a period of 2. The cycle [10, 14, 26, 30] has a period of 4. The cycle [16] has a period of 1. The cycle [20, 28] has a period of 2. (c) (3 points) Repeat part (b) with the values $m=32$, $a=13$, and $c=3$. What is the period of the generator for these choices of parameters? In [ ]: m = 32 a = 13 c = 3 ListOfCycles = [] repeat = set() for seed in range(m): period = 0 if seed in repeat: continue # If we've seen this number already, skip it. CurrentCycle = [] current_x = seed while current_x not in repeat: repeat.add(current_x) period = period + 1 CurrentCycle.append(current_x) current_x = myLCG(current_x, m, a, c) ListOfCycles.append(CurrentCycle) print(f"The cycle {CurrentCycle} has a period of {period}.") The cycle [0, 3, 10, 5, 4, 23, 14, 25, 8, 11, 18, 13, 12, 31, 22, 1, 16, 19, 26, 21, 20, 7, 30, 9, 24, 27, 2, 29, 28, 15, 6, 17] has a period of 32. Problem 2. (5 points) ¶ For your simulation model of ETB, how many random numbers do you expect you might need to simulate one day of activity in the building? You may assume that for each random variate needed by your simulation, it can be transformed from one Uniform(0, 1) random variate. Explain in detail how you arrived at your answer. A simulation model of ETB would need to generate random numbers for the interarrival times of students, the schedules of students, the attendance of students to the classes, among other things. Using the data we have, we can obtain the expected number of individuals entering ETB each day; call this quantity X. To generate the arrival times of the X students, we would need at least X random numbers. (If using inversion to generate arrivals from a NSPP, which we didn't discuss in class, this would be true. If using thinning, the number of random numbers needed would depend on the peak of the arrival rate function, because we generate arrivals at the fastest rate.) For these X students, we might need to generate their daily class schedules, which would feature a random number of classes (likely less than or equal to 4) and random selections of classes. As an upper bound, we might need as many as 5X random numbers: 5 for each student (1 to generate a number 1-4 and then that many of numbers to pick the classes). We also need random numbers to handle the choice of which entrance to use ( 1X), taking the elevators or stairs (upper bounded by 1X), and

how long to spend studying (upper bounded by 3X, assuming no one has more than 3 study periods in a day). If the system is more complicated, more variates might be needed, such as for the activities of non-students (e.g., whether they take an elevator, to which floor they go, how long they spend in the building). From the analysis above, we might be looking at around 11X random numbers. If X is around 1500, for example, we might need about 16500 random numbers to simulate one day of operations in ETB. Answers will vary greatly depending on what level of detail you intend to have in your Simio model. Problem 3. (12 points) ¶ (a) (3 points) Print the first 5 Uniform[0, 1] random variates generated by scipy.stats.uniform.rvs when the underlying pseudo-random number generator is seeded with initial state 1234. Read the documentation of the scipy.stats.uniform class and the random_state argument to see how the size and random_state arguments should be set. In [ ]: random_variates = scipy.stats.uniform.rvs(size=5, random_state=1234) print(f"The first 5 random variates are: {random_variates}.") The first 5 random variates are: [0.19151945 0.62210877 0.43772774 0.78535858 0.77997581]. (b) (4 points) With the generator again seeded at state 1234, generate 10,000 Uniform[0, 1] random variates and plot a histogram of their values using 100 bins. Comment on the shape of the histogram. In [ ]: random_variates = scipy.stats.uniform.rvs(size=10000, random_state=1234) plt.hist(random_variates, bins=100) plt.xlabel(r"Random Variate ($U$)") plt.ylabel("Frequency") plt.title("Histogram of 10,000 Uniform[0, 1] RVs") plt.show()

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

We should expect to see about 100 Uniform[0, 1] random variates in each bin. The histogram is roughly uniform across the bins, though we can see some bin counts that vary from 100 by about as much as 20. (c) (3 points) Perform a Chi-Square test with 100 equiprobable bins to test whether the data (the 10,000 random variates) come from a Uniform[0, 1] distribution. Report the $p$-value of the test and interpret the conclusion at a significance level of $\alpha = 0.1$. In [ ]: expected = [100] * 100 observed,_ = np.histogram(random_variates, bins=np.linspace(0, 1, 101)) _, p_value = scipy.stats.chisquare(f_obs=observed, f_exp=expected) print(f"The p-value of the Chi-Square test is {round(p_value, 4)}.") The p-value of the Chi-Square test is 0.6886. The p-value is larger than the significance level of $\alpha = 0.1$, therefore we fail to reject the null hypothesis. This means we cannot rule out that the random variates come from a Uniform[0, 1] distribution. (d) (2 points) Perform a K-S test to test whether the data come from a Uniform[0, 1] distribution. Report the $p$-value of the test and interpret the conclusion at a significance level of $\alpha = 0.1$. In [ ]: _, p_value_KS = scipy.stats.kstest(random_variates, scipy.stats.uniform.cdf) print(f"The p-value of the K-S test is {round(p_value_KS, 4)}.") The p-value of the K-S test is 0.8065. The p-value is larger than the significance level of $\alpha = 0.1$, therefore we fail to reject the null hypothesis. This means we cannot rule out that the random variates come from a Uniform[0, 1] distribution.

Problem 4. (11 points) ¶ (a) (5 points) In lecture, we derived an expression for generating a Exponential($\ lambda$) random variate from a Uniform[0, 1] random variate. Use this transformation to generate 1000 Exponential ($\lambda=2$) random variates. Plot a histogram with 30 bins and density=True . Superimpose the probability density function (pdf) of an Exponential($\lambda=2$) random variable. Comment on the fit. In [ ]: lambd = 2 # Generate 1000 exponential(2) RVs via inversion. # Inversion formula is X = (-1/lambda)*ln(1-U) rand_exponential_variates = (-1 / lambd) * np.log(1 - np.random.uniform(size=1000)) plt.hist(rand_exponential_variates, bins=30, density=True) x = np.linspace(0, 3, 100) plt.plot(x, scipy.stats.expon.pdf(x, scale=1/lambd)) plt.xlabel("x") plt.ylabel(r"Normalized frequency / $f(x)$") plt.title(r"PDF of Exponential($\lambda=2$) and Histogram") plt.show() The histogram and pdf closely align over the range [0, 3]. This suggests the inversion method is properly generating Exponential($\lambda=2$) random variates. (b) (2 points) Perform a K-S test to test whether the data (the 1000 random variates) come from an Exponential($\lambda=2$) distribution. Report the $p$-value of the test and interpret the conclusion at a significance level of $\alpha = 0.1$. In [ ]: my_expon = scipy.stats.expon(scale=1/2) # Scale = 1/lambda _, p_value_KS = scipy.stats.kstest(rand_exponential_variates, my_expon.cdf) print(f"The p-value of the K-S test is {round(p_value_KS, 4)}.")

The p-value of the K-S test is 0.4572. The p-value is larger than the significance level of $\alpha = 0.1$, therefore we fail to reject the null hypothesis. This means we cannot rule out that the random variates come from an Exponential($\lambda=2$) distribution. (c) (4 points) In class, we have used a Q-Q plot to compare a sample to a theoretical distribution. A Q-Q plot can also be used to compare two samples of equal size by pairing their empirical quantiles. Use scipy.stats.expon.rvs() to generate 1000 Exponential($\lambda=2$) random variates. Produce a two-sample Q-Q plot to compare your random variates from part (a) with those generated in this part. Plot a 45 degree line passing through the origin. Comment on the quality of the fit. In [ ]: expodata = scipy.stats.expon.rvs(size=1000, scale=1/2) expodata = np.sort(expodata) rand_exponential_variates = np.sort(rand_exponential_variates) #emp_quantiles = np.percentile(rand_exponential_variates, np.linspace(0,100, 101)) #theoretical_quantiles = np.percentile(expodata, np.linspace(0,100, 101)) #plt.plot(theoretical_quantiles, emp_quantiles, 'o') plt.scatter(expodata, rand_exponential_variates) plt.plot([0, 5], [0, 5], 'r') plt.xlabel('scipy.stats Exponential RVs') plt.ylabel('Inversion Exponential RVs') plt.show() Most points are fairly close to 45 degree line. There are some deviations in the right- tail, which is to be expected because the exponential distribution can take high values with low probability. The plot has 1000 points and the deviations are only present in the last 20 or so points. The Q-Q plot suggests that the inversion method is generating random variates having the correct distribution.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Problem 5. (12 points) ¶ For this problem, consider a Burr Type XII distribution, which has a cumulative distribution function $$ F(x) = \begin{cases} 1 - (1+x^c)^{-d} & \text{if } x > 0 \\ 0 & \text{otherwise.}\ end{cases}$$ for $x > 0$ where $c > 0$ and $d > 0$ are parameters of the distribution (a) (4 points) Derive a general expression (in terms of $c$ and $d$) for the inverse cdf, $F^{-1}$. In particular, give an expression of the form $X = F^{-1}(U)$, so that it could be used to transform a Uniform[0, 1] random variate $U$ into a Burr Type XII random variate $X$. Via inversion, we have $$\begin{align*} u &= 1 - (1+x^c)^{-d} \\ (1+x^c)^{-d} &= 1 - u \\ 1+x^c &= (1-u)^{-1/d} \\ x^c &= (1-u)^{-1/d}-1 \\ x &= ((1-u)^{-1/d}- 1)^{1/c}. \end{align*}$$ Thus to generate a Burr Type XII random variate, we should use the expression $X = ((1-U)^{-1/d} -1)^{1/c}$. (b) (3 points) Derive the pdf of a Burr Type XII distribution with arbitrary parameters $c$ and $d$. Use calculus and show your work. (Hint: If you having trouble knowing where to start, refer to the Useful Fact on Slide 5 of the Lecture 4 slides.) The pdf is the derivative of the cdf, so $$\begin{align*} f(x) &= \frac{d}{dx} F(x) \\ &= \frac{d}{dx} (1 - (1 + x^c)^{-d}) \\ &= \frac{d}{dx} (-(1 + x^c)^{-d}) \\ &= d(1 + x^c)^{-d - 1} \frac{d}{dx} (1 + x^c) \\ &= cd(1 + x^c)^{-d - 1}{x^{c-1}} \\ &= cd \frac{x^{c-1}}{(1 + x^c)^{d+1}} \end{align*}$$ Thus we have $$ f(x) = \begin{cases} cd \frac{x^{c-1}}{(1 + x^c)^{d+1}} & \text{ if } x \geq 0 \\ 0 & \text{ otherwise.} \end{cases} $$ (c) (5 points) Using your inversion expression from part (a), generate 1000 Burr Type XII random variates with parameters $c=3$ and $d=1$. Plot a histogram of the 1000 random variates with 30 bins and density=True . Superimpose the pdf of the associated Burr XII distribution over an appropriate range of $x$ values. Comment on the fit. In [ ]: c = 3 d = 1 U = scipy.stats.uniform.rvs(size=1000) # Use inversion expression from part (a). X = ((1 - U)**(-1 / d) - 1)**(1 / c) x = np.linspace(0, 5, 1000) # Use pdf expression from part (b). fx = (c * d * (1 + x**c)**(-d - 1) * x**(c - 1)) plt.hist(X, bins=30, density=True) plt.plot(x, fx) # Restrict the x-axis range. plt.xlim([0, 10]) plt.xlabel(r"$x$") plt.ylabel(r"Normalized Frequency / $f(x)$") plt.title(r"Burr Type XII ($c=3$, $d=1$) PDF and Histogram") plt.show()

The histogram and pdf closely align over the range [0, 10]. This suggests the inversion method is properly generating Burr Type XII($c=3$, $d=1$) random variates.