STAT847_W24_Reading1

pdf

School

University of Waterloo *

*We aren’t endorsed by this school

Course

847

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

2

Uploaded by DoctorMoonCrocodile37

Report
STAT 847: Reading Assignment 1 DUE: Friday January 19, 2024 by 11:59pm Eastern NOTES Your assignment must be submitted by the due date listed at the top of this document, and it must be submitted electronically in .pdf format via Crowdmark. Organization and comprehensibility is part of a full solution. Consequently, points will be deducted for solutions that are not organized and incomprehensible. Furthermore, if you submit your assignment to Crowdmark, but you do so incorrectly in any way (e.g., you upload your Question 2 solution in the Question 1 box), you will receive a 5% deduction (i.e., 5% of the assignment’s point total will be deducted from your point total). Reading: Hands-On Exploratory Data Analysis with R [8 marks] Open the UWaterloo Library website, lib.uwaterloo.ca , and use your WatIAM account to search for an open the book Hands-On Exploratory Data Analysis with R. By Radhika Datar, Harish Garg. The following questions can be answered by reading the “Univariate and Control Datasets” chapter. Please put your answers to questions 1-4 on a separate page from your answers to questions 5-8 , this can be done in Word with Crtl + Enter, or in Markdown with \newpage . Each question is work one mark. (Unless “in your own words” is specified, you can directly quote the book.) Winter 2024 Reading Assignment Questions Q1 . What’s the name of the test for outliers used in this chapter? Answer: Tietjen-Moore test Q2 . What does the variable ‘pdays’ represent? Answer: The variable ’pdays’ represents the number of days that have elapsed since the customer was last contacted Q3 . How many rows are there in the bank marketing data? Answer: 11162 Q4 . What does each row represent in the bank marketing data? Answer: Each row in the dataset represents the details of a single client contact made during a bank marketing campaign. The columns of each row can be categorized as follows: 1. client’s personal information 2. client’s Financial information 3. details of the interaction regarding campaign 4. historic information about previous campaign Each row is a comprehensive record of a single marketing interaction with a client, encompassing personal, financial, and campaign-related information. 1
Q5 . (Challenge) What is two sample t-test actually comparing in the “the t-test in R” page? Answer: The t-test is a method for comparing two samples. In the page in context, the client’s age and bank balance is being compared i.e. we are trying to determine whether their means differ significantly. The test is conducted under the null hypothesis that there is no difference between the means. The alternative hypothesis is that there is a difference. In this case, the result suggests rejecting the null hypothesis in favor of the alternative. Q6 . What makes a model parsimonious? Answer: A model is considered parsimonious if it is simple yet has great explanatory or predictive power, using a minimum number of parameters or predictor variables. It should employ parsimonious covariance structures and only consider relevant variables. Q7 . Almost every named distribution (e.g., the normal, the uniform) has a function that calculates its cumulative distribution function. What is the letter that all such functions start with? Answer: The letter ’p’. Density or probability functions start with the letter ’d’ (eg. dnorm) and the R funtion that calculates their respective cumulative distribution starts with the letter ’p’ (eg pnorm) Q8 . According to the Shapiro-Wilk test, are bank balances normally distributed? Answer: In the example provided in the book, a small fraction of samples are being used to perform the Shapiro-Wilk test in which the p-value is less than 0.05 and we reject the null hypothesis that bank balances are normally distributed (for those 10 samples only). However, it is provided in the text that as we increase the number of samples, the p-value increases beyond 0.05, satisfying the null hypothesis for the larger set of samples that form a normal distribution. 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help