PracticeMidterm

pdf

School

University of Alberta *

*We aren’t endorsed by this school

Course

195

Subject

Mathematics

Date

Apr 3, 2024

Type

pdf

Pages

10

Uploaded by ChefDragonflyMaster882

Report
CMPUT 195: Midterm II March 14 2024 Student Name: Student ID: CCID: Instructions: Do not open this exam until you are instructed to do so. Read the instructions carefully. The duration of the exam is 60 minutes. The exam is worth 15% of your overall course grade. Read all questions carefully. Do not read diagonally. You may miss things. Use a pen, not a pencil. If you use a pencil, you may not dispute your grade. Do not use a pen with red ink. For full marks, answer all parts of all questions and show all your work. Be concise and give clear and legible answers. Non-legible answers will not be marked. Cheating is a serious offense in the Code of Student Behavior. No books, notes, or other aids are permitted during the exam. No smartphones, cellphones, or other electronic devices are allowed. You may use an approved, non-programmable calculator. Good luck! 1
1. A study was done to determine the effect of various attributes on Canadian housing prices. The attributes collected were age , distance from downtown (dist dt) , area in m 2 , and price . Price was plotted on the y-axis in the four scatterplots below. A DataFrame containing the correlations was also computed. Use the plots and DataFrame to answer the questions below. (5 pts) a) For each scatterplot, state the variable on the x-axis. (2 pts) Scatterplot A: Scatterplot B: Scatterplot C: Scatterplot D: 2
b) List the four columns by strength of correlation with price , in ascending order. (1 pt) c) Which type of regression model would be used to predict price ? Why? (2 pts) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2. A department store ran two ads, ad A and ad B. From each ad, they recorded the amount spent by 60 customers who had seen the ad. The store wants to determine if ad A has greater sales, on average, than ad B. A permutation test was run, and the resulting histogram was shown below. The observed difference was 40. (5 pts) a) Suppose that the mean sales for ads A and B are μ A and μ B , respectively. State the null and alternative hypotheses of the test in terms of μ A and μ B . (1 pt) H 0 : H A : b) The red area of the histogram makes up 2 . 7% of the total area. What is the approximate p-value of the test? (2 pts) 4
c) Use a significance level of α = 0 . 05 to write a conclusion to the test. (2 pts) 5
3. The manager of Edmonton’s MLS (Major League Soccer) team is looking to sign some new players. To get an idea of how much she will be paying her new players, she collected data for the salaries of forwards (F, red) and defenders (D, blue), as well as their sprint speeds and heights. Use this information, summarized in the figures below, to answer the following questions. (4 pts) a) Describe the relationship between a player’s sprint speed and salary. (2 pts) b) Describe the relationship between a player’s height and salary. (2 pts) 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4. 500,000 times, a simulated coin was flipped until it landed on tails. A relative frequency histogram was plotted with the number of flips needed on each repetition. Below is the relative frequency histogram, descriptive statistics for the histogram, and the distribution of the sample mean drawn from the histogram, with n = 150. (6 pts) a) Does the ”Flips until first tails” histogram approximate the distribution it is sampled from well? Why or why not? (2 pts) 7
b) Fill in A, B, and C below, so that the code block will approximate the distribution of a sample mean with n = 150, drawn from the array “flips”. (1 pt) A = B = C = 1 sample size = 150 2 iterations = 100000 3 4 sample means = [] 5 for in range (A): 6 7 sample = np.random.choice(flips, B) 8 sample means.append(C) c) General rule of thumb states that n should be at least 30. Why do you think that this case requires a far greater sample size of n = 150? (1 pt) d) On the CLT histogram, which two values contain an area of 0 . 66 between them? State your answer in fractional form or to 2 decimal places. (2 pts) 8
MCQ / True-False ( 15 pts, 1 each ) 1. A column’s correlation with itself will never be -1. A. True B. False 2. The least-squares regression line is defined to have RMSE equal to zero. A. True B. False 3. Standardizing columns will usually improve correlation scores. A. True B. False 4. A correlation of -0.7 is stronger than a correlation of 0.5. A. True B. False 5. A hypothesis test uses the assumption that the alternative hypothesis is true. A. True B. False 6. Logistic Regression is used to classify observations, but not predict class probabilities. A. True B. False 7. Logistic Regression should be used when predicting a binary variable. A. True B. False 8. Linear Regression should be used when predicting actual probabilities, not just classifying observations. A. True B. False 9. Transforming a column can decrease its correlation strength with the target column. A. True B. False 10. You cannot use a hypothesis test to prove or disprove a hypothesis. 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
A. True B. False 11. A hypothesis test shows extremely strong evidence against H 0 . To conclude, we: A. Accept H A B. Fail to reject H A C. Reject H 0 D. Fail to accept H 0 12. To analyze the effect of a categorical variable on a numerical variable, we plot the on a , grouped by the . A. numerical variable, bar chart, categorical variable B. categorical variable, bar chart, numerical variable C. numerical variable, histogram, categorical variable D. categorical variable, histogram, numerical variable 13. An r -value of -0.86 indicates a , relationship. A. weak, negative B. strong, negative C. weak, positive D. strong, positive 14. The logistic curve is useful for modeling probabilities because: A. It is bounded between 0 and 1. B. It can extrapolate to probabilities less than 0 or greater than 1. C. It is a straight line. D. The logistic curve is not useful for modeling probabilities. 15. Consider two linear models: y 1 = β 0 + β 1 x , and y 2 = β 1 x , where β 1 is the same for both models. Which of the following statements is FALSE? A. If x increases by 1, y 1 and y 2 will both increase by β 1 . B. At x = 0, y 1 = β 0 and y 2 = 0. C. At x = 0, y 1 may be negative. D. None of the above statements are false. 10