PracticeMidterm
pdf
keyboard_arrow_up
School
University of Alberta *
*We aren’t endorsed by this school
Course
195
Subject
Mathematics
Date
Apr 3, 2024
Type
Pages
10
Uploaded by ChefDragonflyMaster882
CMPUT 195: Midterm II
March 14 2024
Student Name:
Student ID:
CCID:
Instructions:
•
Do not open this exam until you are instructed to do so. Read the instructions carefully.
•
The duration of the exam is 60 minutes.
•
The exam is worth 15% of your overall course grade.
•
Read all questions carefully. Do not read diagonally. You may miss things.
•
Use a pen, not a pencil. If you use a pencil, you may not dispute your grade. Do not use a
pen with red ink.
•
For full marks, answer all parts of all questions and show all your work.
•
Be concise and give clear and legible answers.
•
Non-legible answers will not be marked.
•
Cheating is a serious offense in the Code of Student Behavior.
•
No books, notes, or other aids are permitted during the exam.
•
No smartphones, cellphones, or other electronic devices are allowed.
•
You may use an approved, non-programmable calculator.
•
Good luck!
1
1. A study was done to determine the effect of various attributes on Canadian housing prices.
The attributes collected were
age
, distance from downtown
(dist
dt)
,
area
in m
2
, and
price
.
Price was plotted on the y-axis in the four scatterplots below. A DataFrame containing the
correlations was also computed. Use the plots and DataFrame to answer the questions below.
(5 pts)
a) For each scatterplot, state the variable on the x-axis.
(2 pts)
Scatterplot A:
Scatterplot B:
Scatterplot C:
Scatterplot D:
2
b) List the four columns by strength
of correlation with
price
, in ascending order.
(1 pt)
c) Which type of regression model would be used to predict
price
? Why?
(2 pts)
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2. A department store ran two ads, ad A and ad B. From each ad, they recorded the amount
spent by 60 customers who had seen the ad. The store wants to determine if ad A has greater
sales, on average, than ad B.
A permutation test was run, and the resulting histogram was shown below.
The observed
difference was 40.
(5 pts)
a) Suppose that the mean sales for ads A and B are
μ
A
and
μ
B
, respectively. State the null
and alternative hypotheses of the test in terms of
μ
A
and
μ
B
.
(1 pt)
H
0
:
H
A
:
b) The red area of the histogram makes up
≈
2
.
7% of the total area. What is the approximate
p-value of the test?
(2 pts)
4
c) Use a significance level of
α
= 0
.
05 to write a conclusion to the test.
(2 pts)
5
3. The manager of Edmonton’s MLS (Major League Soccer) team is looking to sign some new
players. To get an idea of how much she will be paying her new players, she collected data
for the salaries of forwards (F, red) and defenders (D, blue), as well as their sprint speeds
and heights. Use this information, summarized in the figures below, to answer the following
questions.
(4 pts)
a) Describe the relationship between a player’s sprint speed and salary.
(2 pts)
b) Describe the relationship between a player’s height and salary.
(2 pts)
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4. 500,000 times, a simulated coin was flipped until it landed on tails.
A relative frequency
histogram was plotted with the number of flips needed on each repetition.
Below is the
relative frequency histogram, descriptive statistics for the histogram, and the distribution of
the sample mean drawn from the histogram, with
n
= 150.
(6 pts)
a) Does the ”Flips until first tails” histogram approximate the distribution it is sampled
from well? Why or why not?
(2 pts)
7
b) Fill in A, B, and C below, so that the code block will approximate the distribution of a
sample mean with
n
= 150, drawn from the array “flips”.
(1 pt)
A =
B =
C =
1
sample
size = 150
2
iterations = 100000
3
4
sample
means = []
5
for
in range
(A):
6
7
sample = np.random.choice(flips, B)
8
sample
means.append(C)
c) General rule of thumb states that
n
should be at least 30. Why do you think that this
case requires a far greater sample size of
n
= 150?
(1 pt)
d) On the CLT histogram, which two values contain an area of
≈
0
.
66 between them? State
your answer in fractional form or to 2 decimal places.
(2 pts)
8
MCQ / True-False (
15 pts, 1 each
)
1. A column’s correlation with itself will never be -1.
A. True
B. False
2. The least-squares regression line is defined to have RMSE equal to zero.
A. True
B. False
3. Standardizing columns will usually improve correlation scores.
A. True
B. False
4. A correlation of -0.7 is stronger than a correlation of 0.5.
A. True
B. False
5. A hypothesis test uses the assumption that the alternative hypothesis is true.
A. True
B. False
6. Logistic Regression is used to classify observations, but not predict class probabilities.
A. True
B. False
7. Logistic Regression should be used when predicting a binary variable.
A. True
B. False
8. Linear Regression should be used when predicting actual probabilities, not just classifying
observations.
A. True
B. False
9. Transforming a column can decrease its correlation strength with the target column.
A. True
B. False
10. You cannot use a hypothesis test to prove or disprove a hypothesis.
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
A. True
B. False
11. A hypothesis test shows extremely strong evidence against
H
0
. To conclude, we:
A. Accept
H
A
B. Fail to reject
H
A
C. Reject
H
0
D. Fail to accept
H
0
12. To analyze the effect of a categorical variable on a numerical variable, we plot the
on a
, grouped by the
.
A. numerical variable, bar chart, categorical variable
B. categorical variable, bar chart, numerical variable
C. numerical variable, histogram, categorical variable
D. categorical variable, histogram, numerical variable
13. An
r
-value of -0.86 indicates a
,
relationship.
A. weak, negative
B. strong, negative
C. weak, positive
D. strong, positive
14. The logistic curve is useful for modeling probabilities because:
A. It is bounded between 0 and 1.
B. It can extrapolate to probabilities less than 0 or greater than 1.
C. It is a straight line.
D. The logistic curve is not useful for modeling probabilities.
15. Consider two linear models:
y
1
=
β
0
+
β
1
x
, and
y
2
=
β
1
x
, where
β
1
is the same for both
models. Which of the following statements is FALSE?
A. If
x
increases by 1,
y
1
and
y
2
will both increase by
β
1
.
B. At
x
= 0,
y
1
=
β
0
and
y
2
= 0.
C. At
x
= 0,
y
1
may be negative.
D. None of the above statements are false.
10