1690 HW 6

docx

School

University of Texas *

*We aren’t endorsed by this school

Course

PHM1690

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

4

Uploaded by marshallmalaysia

Report
Malaysia Marshall PHWM 1690 HW 6 Part A 1. Auto exhaust and lead exposure a. The parameter of interest is the average blood lead concentration of police officers. b. The null and alternative hypotheses can be defined as follows: i. H 0 : the average blood lead concentration of police officers in urban environments in the same as the average blood lead concentration of police officers in the suburbs ii. H A : The average blood lead concentration of police officers in urban environments is different from the average blood lead concentration of police officers in the suburbs. c. A hypothesis (CLT-based) test would be an appropriate test to answer this research question. A one-sample t-test would also be an appropriate test to utilize. d. The conditions necessary for a CLT-based test are: i. Independence of data points – which we can determine based on the random sample of police officers. ii. large sample size (n > 30) – our sample size is 52 police officers. iii. Data is not extremely skewed – we cannot be sure that this condition is met without plotting the data points, but for the purposes of this test we will assume the data follows a normal distribution. e. The test statistic was calculated to be 17.07. f. The p-value was computed to be 9.914x10 -23 . i. Stata output: di 2*ttail (52-1, abs(17.06665818) g. We have enough statistical evidence to support that the average blood lead concentration of police officers in urban environments is different than the average blood lead concentration of police officers in the suburbs. Therefore, we can reject the null hypothesis in favor of the alternative. 2. Paired or Not, Part II a. This data is paired because the variable is the same for Intel and Southwest Airline. b. This is also paired data because we are looking at the same items for each store, and just noting the prices. c. This is not paired data because both high schools will have different students taking the exam and there will only be one set of data points for each student. 3. High School and Beyond, Part I a. Based on the graphs, there does not appear to be a clear difference in the average reading and writing scores. b. The reading and writing scores of each student are not independent of each other. c. The hypothesis can be written as follows: i. H 0 : There is not an evident difference in the average scores of students in the reading and writing exam. ii. H A : There is an evident difference in the average scores of students in the reading and writing exam. d. The paired t-test would be the most appropriate test to answer the research question.
Malaysia Marshall PHWM 1690 HW 6 e. The paired t-test requires that: i. Subjects are randomly selected. This is satisfied by the random sample of 25 students from the survey. ii. Datapoints come from a nearly normal distribution. We can see this in the histogram for the difference in scores. f. t = x diff δ s diff n = 0.545 0 8.887 25 = 0.545 1.7774 =− 0.31 g. The p-value is calculated to be 0.76. Since this number is larger than 0.76, we would fail to reject the null hypothesis. h. There is not strong evidence to suggest at the 0.05 significance level that there is an evident difference in the average score of the students in the reading and writing exam. i. It is possible that a Type II error may have been made. This means that we would fail to reject the null (that there isn’t an evident difference in the average scores) when the alternate hypothesis (there is an evident difference in the average scores) is true. j. Yes, I would expect the confidence interval to include 0 because the order of the subtraction means that there will likely be negative scores if someone did better on the writing portion than the reading portion. It is also possible that some students scored the same on both which would yield a difference of 0. 4. Highschool and Beyond, Part II a. x diff ±t ¿ × s diff n =− 0.545 ± 2.06 × 8.887 25 =− 0.545 ± 2.06 ( 1.7774 ) =− 0.545 ± 3.661444 =(− 4.21 , 3.12 ) b. We are 95% confident that the true mean difference of reading and writing scores falls between -4.21 and 3.12 points. c. The confidence interval does not provide convincing evidence that there is a real difference in the average scores because this calculated interval also includes the null value, 0. Part B 1. Highschool and Beyond, Part III a. The data does not provide convincing evidence at the 99% confidence level that there is a difference between the average scores on the two exams. The p-value is calculated to be 0.7618 which is larger than the 0.01 significance level – further proving that we would fail to reject the null. b. We are 99% confident that the true mean difference between the reading and writing scores of all students falls between -5.52 and 4.43. 2. Infections Disease a. The parameter of interest is the average difference between Dr. A and Dr.B’s assessments. i. H 0 : δ = 0 ii. H A : δ ≠ 0
Malaysia Marshall PHWM 1690 HW 6 b. A paired t-test would be most appropriate because the data results are paired with both doctor’s responses. c. Conditions for hypothesis testing i. The observations are independent because they come from randomly selected participants. ii. Normality a) Histogram The histogram is shown with a normal probability curve. Based on the probability curve, the histogram is unimodal with no extreme skew and approximately normal distribution. b) Boxplot The boxplot appears to show a relatively normal distribution with one outlier. It does not appear to have any extreme skew. c) QQ plot The QQ plot shows that the data points follow along the reference line with no extreme deviations. This could be approximated to be normal. d) Shapiro-Wilks Test Variable Obs w v v Prob > z diff 32 0.98375 0.542 -1.271 0.89808 The Shapiro-Wilks test shows a relatively large p-value (0.89808) which is larger than the significance level of 0.05 – meaning we fail to reject the null (A sample came from a normally distributed population) in this case. d. A t-test is appropriate because the data is approximately normally distributed with no extreme outliers. e. T-test
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Malaysia Marshall PHWM 1690 HW 6 i. Using the original variables, the test statistic was calculated as 5.5000. ii. Using the difference variable, the test statistic was also calculated at 5.5000. f. The p-value can be reported as 0.0001 which is smaller than the 0.05 significance level. We have enough statistical evidence that the true average difference between dra and drb is different from zero. g. There is a systematic difference between the assessments of Dr. A and Dr. B. This implies that the mean difference is not zero and that some patients may have been incorrectly classified/diagnosed by one (or both) doctors.