STAT847_W24_Reading2

pdf

School

University of Waterloo *

*We aren’t endorsed by this school

Course

847

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

3

Uploaded by DoctorMoonCrocodile37

Report
Stat 847 - Reading Assignment 2: Model Selection and Averaging DUE: Friday February 16, 2024 by 11:59pm Eastern NOTES Your assignment must be submitted by the due date listed at the top of this document, and it must be submitted electronically in .pdf format via Crowdmark. Organization and comprehensibility is part of a full solution. Consequently, points will be deducted for solutions that are not organized and incomprehensible. Furthermore, if you submit your assignment to Crowdmark, but you do so incorrectly in any way (e.g., you upload your Question 2 solution in the Question 1 box), you will receive a 5% deduction (i.e., 5% of the assignment’s point total will be deducted from your point total). Reading: Model Selection and Model Averaging [12 marks] Open the UWaterloo Library website, lib.uwaterloo.ca , and use your WatIAM account to search for an open the book Model Selection and Model Averaging . By Gerda Claeskens and Nils Lid Hjort. The following questions can be answered by reading the “1 - Model Selection: data examples and introduction” chapter. Please put your answers to questions 1-3, 4-6, and 7-8 in three separate pages , this can be done in Word with Crtl + Enter, or in Markdown with \newpage . In Section 1.1 “Introduction”, answer the following with quotes or your own words (your choice) 1. (1 mark) What is George Box’s maxim about models? Answer: Äll models are wrong, some are useful. ¨ The true model which in reality generated the data collected may be very complex and unknown most of the time, our purpose is to ¨ guessän almost-as-good model which can help us make some predictions or analysis regarding the data generating mechanism. 2. (1 mark) Give an example of the principle of parsimony in action in statistical modelling. Answer: In statistical modeling, following the principle of parsimony or the ¨ less is moreäpproach means picking the simplest model that does a good job explaining our data. For instance if we’re trying to predict how well students do on a test based on their study hours. If a straight line (linear model) captures the trend well, there’s no need to complicate things by curving the line (adding more parameters in the poly- nomial model) unless it really helps us predict better. It means choosing the lowest degree polynomial that sufficiently captures the relationship between the variables, avoiding overfitting by not including unnecessary higher-degree terms. 3. (1 mark) In one sentence, why would you employ model averaging? Answer: Model averaging is employed to combine the strengths and reduce the uncertainty of multiple top-performing models, enhancing prediction accuracy when no single model clearly outperforms the others. 1
In Section 1.3 “Who wrote ‘The Quiet Don’?”, answer the following in your own words. 4. (1 mark) What are the three data sets used in this model selection problem? Answer: - Sh, published work of the author: Sholokov - Kr, published work of the author: Kriukov - QD, text of the book: The Quiet Don Each corpus contained 50,000 words. 5. (2 marks) What are the three competing models that were suggested? Answer: - Model M1: Sholokhov is the rightful author, meaning the text corpora Sh (Sholokhov’s known works) and QD (The Quiet Don) come from the same statistical distribution, while Kr (Kriukov’s works) represents a different one. - Model M2: D and Solzhenitsyn were correct in their claim that Sholokhov is not the author, which means Sh (Sholokhov’s known works) is not statistically compatible with QD (The Quiet Don), but QD and Kr (Kriukov’s works) do come from the same distribution. - Model M3: Sh (Sholokhov’s known works), Kr (Kriukov’s works), and QD (The Quiet Don) each represent three statistically different corpora. 6. (3 marks) How were the models compared to each other? (A vague description in your own words is good. Details about the methodology aren’t necessary, but do mention what data specifically was important) Answer: The comparison between the models was based on statistical principles, using Pearson residuals and chi-squared tests to measure how well each model’s predictions of sentence lengths aligned with the actual lengths observed in ’The Quiet Don’. They looked at the number of words per sentence in the novel and checked this against what each model would predict, based on the sentence lengths typical of Sholokhov and Kriukov’s known works. This approach helped determine if ’The Quiet Don’ shared a writing style closer to either author or if it represented a different style completely. The critical data here was the distribution of sentence lengths (word count of each sentence), which provided a quantitative basis for comparing the models. 2
In Section 1.6 “Football match prediction”, answer the following in your own words. 7. (2 marks) Describe the dataset being used in this problem. Answer: The dataset for predicting football match outcomes pulls together results from big tournaments like the World Cup and European Cup between 1998 and 2006: the 1998 World Cup in France, the 2000 European Cup in Belgium and the Netherlands, the 2002 World Cup in Korea and Japan, the 2004 European Cup in Portugal, and the 2006 World Cup in Germany. The dataset comprises results from 64 matches among 32 national teams for the World Cups and 31 matches among 16 teams for the European Cups. It also includes FIFA’s official team rankings right before each tournament kicked off, giving an idea of each team’s form and expected performance. This mix of past match results and rankings is used to try and guess the outcomes of future football matches. 8. (1 mark) Figure 1.5 shows the distribution of football scores, which are whole numbers. Why are there clouds of data points instead of values only at the whole numbers? Answer: The results from the matches are ¨ jittered" to make the individual match results visible. Since many matches can have the same score, plotting them simply will result in points on top of each other that won’t give a good idea of the count and/or magnitude. Jittering adds some random noise to whole number results to spread the values around the number so as to make the individual data points distinguishable. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help