Assignment 2 (5%)

pdf

School

Western University *

*We aren’t endorsed by this school

Course

2143

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

6

Uploaded by CommodoreJayMaster

Report
4/30/2021 Assignment 2 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/82d4674e-7831-486c-9531-3212e6761e37/S… 1/6 Assignment 2 (5%) Instructions Submit one PDF document per team with the names and student numbers of all members. The project is due Sunday, March 7 (10:00PM), and to be submitted via Gradescope. In this assignment you will conduct an empirical analysis of earthquakes occurring in Greece between January 1 1901 and December 31 2017. You will use distributions introduced in class to model earthquake magnitudes, locations, and frequencies. Download the dataset “project2_data.csv” in the Announcements section of the OWL course site. These data contain 1173 earthquake observations of 4 variables listed below: Date: the date of the observed earthquake Latitude: latitude coordinate in the Northern Hemisphere Longitude: longitude coordinate in the Eastern Hemisphere Magnitude: earthquake magnitude as measured by the Richter scale Answer each of the questions below with full sentences accompanied by reproducible code from the software of your choice (e.g. Excel, RStudio, Python). Question 1 (2 points) Compute , an estimate of the mean of earthquake magnitudes (1 point). Provide confidence intervals for with confidence levels of 90%, 95%, and 99%. (1 point) Answer: # Average magnitude muhat <- mean(Magnitude) # Number of observations nobs <- nrow(data) # Standard deviation of average magnitude muhat_StdErr <- sd(Magnitude)/sqrt(nobs) # Confidence levels level <- c(.9,.95,.99) CI <- matrix(nrow = 3, ncol = 2) colnames(CI) <- c("Lower bound","Upper bound") rownames(CI) <- level for (i in 1:3){ # Compute z-score exactly with any type of software # 'qnorm' is the inverse CDF of the normal distribution, which are the values in a z-table zalpha = qnorm(1-(1-level[i])/2) # Compute confidence interval CI[i,] <- muhat + c(- zalpha, zalpha)*muhat_StdErr } A sample estimate of the mean of earthquake magnitudes is 5.4352941 . The confidence intervals are printed below:
4/30/2021 Assignment 2 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/82d4674e-7831-486c-9531-3212e6761e37/S… 2/6 print(CI) ## Lower bound Upper bound ## 0.9 5.412228 5.458360 ## 0.95 5.407810 5.462779 ## 0.99 5.399173 5.471415 As expected the widths of the confidence intervals increase with the confidence level. Here, some students simply made a typing error with their calculators or forgot digits when writing down their results. Other students rounded intermediary results too crudely such that their final answer was far from the exact answer. Never round your answers unless explicitly stated otherwise. Using software also allows you to avoid accumulating errors in this way. I accepted answers from students who used a z-table to obtain approximate z- scores. Question 2 (5 points) For this question, assume that the latitude and longitude variables are governed by independent Normal distributions such that and . The joint density of is denoted . 2.1 Estimate the four Normal parameters using sample averages and (adjusted) variances. (2 points) Answer: # Sample mean and standard deviation of Latitude muPhi <- mean(Latitude) sdPhi <- sd(Latitude) # Sample mean and standard deviation of Longitude muLambda <- mean(Longitude) sdLambda <- sd(Longitude) We find that 37.9273913, 23.8392583, 2.0285719, 2.9541047 . 2.2 Observe that the capital of Greece, Athens, is approximately located at , and that the second most populated city, Thessaloniki, is approximately located at . Given the event of an earthquake, compute the probability that it occurs at the Athens coordinates, i.e. , and the probability that it occurs at the Thessaloniki coordinates, i.e. . (1 point) Answer: The probability of a punctual event in a continuous space is always zero. For a continuous random variable with density , the probability can be thought of as the area under the curve between two points and such that , or the definite integral
4/30/2021 Assignment 2 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/82d4674e-7831-486c-9531-3212e6761e37/S… 3/6 The probability of a punctual event can then be expressed as The same kind of idea can be applied to a 2-dimensional case like latitude and longitude. The probability of an earthquake at any exact point is zero. 2.3 Given the event of an earthquake, what is the likelihood that it occurs at the Athens coordinates, i.e. ? What is the likelihood that it occurs at the Thessaloniki coordinates, i.e. ? Which is more likely? (2 points) Hint: For 2.2 and 2.3, think about the difference between probability and density. Answer: Since the latitude and longitude are assumed independent, we can express the joint density as the product of densities. # Compute joint density as the product of normal densities with 'dnorm' # Athens likelihood fAthens <- dnorm(38, muPhi, sdPhi) * dnorm(23.75, muLambda, sdLambda) # Thessaloniki likelihood fThessaloniki <- dnorm(40.65, muPhi, sdPhi) * dnorm(22.9, muLambda, sdLambda) The likelihood of an earthquake occurring in Athens is 0.1965357 0.1349851 0.0265294 , and 0.0799041 0.1283903 0.0102589 in Thessaloniki. Earthquakes are therefore more likely in Athens. Question 3 (10 points) We hereafter restrict our attention to strong earthquakes with magnitudes strictly greater than 6 Richter. Consider a random variable that counts the number of earthquakes with magnitude greater than 6 Richter over years. Under the assumptions that interarrival times are independent and have the memoryless property, we model as a , where is an annual intensity parameter. 3.1 Estimate given the historical data. (1 point) What is the probability that at least one earthquake occurs in the next six months? (1 point) In the next year? (1 point) In the next decade, what is the probability that less than 5 earthquakes occur? (1 point) less than 10? (1 point) Answer:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4/30/2021 Assignment 2 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/82d4674e-7831-486c-9531-3212e6761e37/S… 4/6 # Number of earthquakes with magnitude greater than 6 NbrQuake <- sum(Magnitude > 6) # Number of years NbrYears <- 2017 - 1901 + 1 # Estimated annual intensity parameter theta <- NbrQuake/NbrYears In the provided data, there are 129 earthquakes with a magnitude greater than 6. The dataset covers a period of 117 years (January 1 1901 to December 31 2017). The resulting estimate of is 1.1025641 . Many students used the time in days between the first and last earthquakes in the dataset, or simply counted the number of years wrong. All subsequent errors were not counted if they were consistent with the erroneous intensity parameter. # Prob. of at least one earthquake in the next six months (0.5 years) # 'ppois' is the CDF of a Poisson distribution 1-ppois(0,theta*0.5) ## [1] 0.4237894 # in the next year 1-ppois(0,theta*1) ## [1] 0.6679813 The probability of at least one earthquake in the next six months is 0.4237894 . The probability of at least one earthquake in the next year is 0.6679813 . # Prob. of less than 5 earthquakes in the next decade ppois(4,theta*10) ## [1] 0.01484547 # less than 10 ppois(9,theta*10) ## [1] 0.3377344 The probability that less than 5 earthquakes occur in the next decade is 0.0148455 . The probability that less than 10 earthquakes occur in the next decade is 0.3377344 . 3.2 Consider the time (in years) until the next earthquake from today. What is the distribution family of ? (1 point) Given that the previous earthquake just occurred, how much time is expected to pass before the next? (1 point)
4/30/2021 Assignment 2 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/82d4674e-7831-486c-9531-3212e6761e37/S… 5/6 Given that the previous earthquake occurred exactly one year ago, how much time is expected to pass before the next? (1 point) Answer: Assuming that the number of earthquakes over a period of years can be modeled as , then the interarrival times (in years) can be modeled as . By virtue of the memoryless property of the exponential distribution, it immediately follows that the expected time until the next earthquake is independent of the moment of the last earthquake. The expected amount of time before the next earthquake is therefore constant at 0.9069767 years. For the first part of 3.1, many students said that the time between Poisson events was also a Poisson. This directly contradicts the fact that the Poisson distribution is a discrete random variable. As a counting process, the values taken by a Poisson distribution are , whereas random interarrival times between Poisson events are continuous in . To sum up, the number of earthquakes follows a Poisson distribution, whereas the time between earthquakes follows an Exponential distribution. Other students simply forgot to give an answer for the distribution family, and only computed the expected values. For the last part of 3.2, some students integrated the probability density function over 1 to infinity. I believe that their intention was to compute the following conditional expectation: but only the term is red was computed. 3.3 Consider the time (in years) of the th earthquake from today. What is the distribution family of ? (1 point) How much time is expected to pass before three more earthquakes occur? (1 point) Answer: Assuming that the interarrival times are modeled as , then the sum of such interarrival times can be modeled as . The expected time until the third earthquake from now is 2.7209302 years. Many students said that a random variable defined as sum of exponential distributions with parameter could be modeled as . Although, this leads to the correct expected value , it is not the same distribution as a . To see this, notice that the variances of the two distributions differ: General Comments
4/30/2021 Assignment 2 (5%) https://owl.uwo.ca/access/content/attachment/9155fbbb-4ff5-4431-b269-d36c50cda88c/Announcements/82d4674e-7831-486c-9531-3212e6761e37/S… 6/6 Always work with maximum precision, and simply copy and paste your answers with all the significant digits printed by your software. If you are using a scientific calculator, store intermediate values in the memory for subsequent steps. Some students consider the use of software as an obstacle, but a small computational effort saves you a lot of time, and prevents potential mistakes. Many of the calculations in this assignment require only one or two simple commands, yet many still carry out tedious calculations by hand. For instance, tasks like evaluating CDFs can be very time consuming and prone to error. All in all, this assignment was well done with a median score of 70.59%, a mean of 73.01% and a standard deviation of 12.21%. I am very happy to see that most of you had a solid intuition about the relation between Poisson, Exponential, and Gamma distributions. I hope that this assignment gave you a sense of the practical applications made possible with these statistical models. Ideas for real-world data to use in future assignments are welcome!
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help