hw_exercises - Copy (3)
pdf
keyboard_arrow_up
School
New York University *
*We aren’t endorsed by this school
Course
1305
Subject
Statistics
Date
Jan 9, 2024
Type
Pages
6
Uploaded by ProfessorTeam17161
STATISTICS AND DATA ANALYSIS
HOMEWORK EXERCISES
PETER LAKNER
1.
Import the zagat file in R.
a. Calculate the mean, the standard deviation, the median, the two quartiles, the minimum
and the maximum for the Price variable.
Help: Use the “summary” and the “sd” commands.
b. Create a histogram for the variable Price.
Help: use the “histogram” command, similarly to the way it is done in the Course Supplement
for the Food variable. In order to specify the breaks, use “breaks=seq(7.5,80.5,by=1)”. For
the axis, use the command “axis(1,at=seq(8,80,8))”. This will show ticks at 8, 16, 24, etc.
Actually, R will put a few other ticks on the axis, but this is OK. Click on “export” above
the histogram in R studio to save your work.
2.
In production line 5% of the produced items is defective (typically this proportion is un-
known; we assume it to be known for the sake of this exercise). A quality control inspector
selects a random sample of
n
= 20 items.
a) What is the probability that there will be no defective item in the sample?
b) what is the probability that there will be at least 1 defective item in the sample?
3
A normal random variable
X
has mean 3.0 and standard deviation 0.2.
What is the
probability that
X
falls between 2.75 and 3.1?
4
Suppose that
X
follows normal distribution with mean 5.5 and standard deviation 0.3.
Find a number
w
such that
X < w
with 30% probability.
5.
Bluefish purchased at the Lime Beach Fishing Terminal produce a filet weight which
has a mean of 4.5 pounds with a standard deviation of 0.8 pound. If a restaurant manager
purchases 50 such fish, then what is the probability that she will have at least 220 pounds
of filets?
6.
A company is interested in estimating
µ
, the mean number of days of sick leave during
the last year taken by all its employees. They select a random sample of 100 employees and
note the number of sick days taken by each employee in the sample. The following sample
statistics are computed: ¯
x
= 12
.
2 days,
s
= 3 days. Find a 95% confidence interval for
µ
.
1
2
7.
A firm that manages rental properties is assessing an expansion into an expensive area
is San Francisco. To cover its costs, the firm needs the average rent in this area to be more
than
1,500 per month. They set up two hypothesis:
H
0
:
µ
= 1500 and
H
a
:
µ >
1500.
In order to make a decision the firm obtained rents for a sample of
n
= 115 rental units
in the area. Among these, the average rent is
1,657 with sample standard deviation
s
= 581.
(a) Test the above hypothesis at 5% significance level.
(b) Calculate the
p
-value for this test.
8.
The data file file HEATING deals with the heating bill for dwelling units of various
numbers of rooms. Use R whenever possible in answering the following questions.
a.
Obtain a scatter-plot of the two variables. Which variable should be on the horizontal
axis?
Help: Use the R command “plot(HEATING ROOMS, HEATING FUELBILL)”.
b.
Find the linear regression equation resulting from regression of FUELBILL on ROOMS.
Give an interpretation for the slope and the intercept.
c.
Test the hypothesis that the true slope of the regression line is zero.
d.
Predict the FUELBILL for a unit with ROOMS=6.
e.
Create a 95% confidence interval for the average FUELBILL of all dwelling units in this
population with ROOMS=6.
Help: Use the following R commands:
x=HEATING ROOMS
y=HEATING FUELBILL
new=data.frame(x=6)
conf=predict(lm(y
∼
x),new,interval=”confidence”)
conf
f.
Create a 95% prediction interval for a particular dwelling unit with ROOMS variable
equal to 6.
g.
A particular 6 room unit last year had a heating bill of
958. Do you find this amount
unusually high?
9.
You are considering a quality inspection scheme to use on the spark plugs which are sent
from your supplier. These spark plugs come in a shipments of 50,000. Denote the unknown
proportion of defective spark plugs in the shipment by
p
. Ideally you would like to reject the
shipment if
p > .
05 and accept it if
p
≤
.
05. In practice you can’t follow this plan since you
3
don’t know
p
. Instead you decide to apply a scheme that consists of the following steps:
A random sample of 20 of the spark plugs will be selected from each shipment.
Each of
the selected plugs will be tested to see whether it is defective or not.
(The test involves
measuring the plug gap and determining the electrical resistance.) You will note as
X
the
(random) number of defective plugs in the sample. If
X <
2 then the shipment passes your
quality standard. If
X
≥
2 then the shipment fails the quality test and will be returned to
the supplier.
(a) Find the probability that the shipment is rejected when
p
=
.
05 (this corresponds to an
“error” since at
p
=
.
05 we would want to accept the shipment).
(b) Find the probability that the shipment is accepted when
p
=
.
1 (this corresponds to an
“error” again since at
p
=
.
1 we would want to reject the shipment).
(c) Find the probability that the shipment is accepted when
p
=
.
2.
Note:
The value of
p
is, of course, unknown.
In these questions we assume that it has
various concrete values in order to analyze the inspection scheme.
10.
We would like to modify the quality control test described in question 9 above in the
following way.
We want to pass the shipment if
X < w
and reject the shipment when
X
≥
w
where
w
is a number to be determined. Determine the smallest possible value for
w
such that the probability of rejecting the shipment when
p
=
.
05 is no more than .01, i.e., 1%.
11.
A polling agency wants to predict the the percentage of votes candidate A will receive
in election day. Their objective is that the difference between the estimate and the actual
fraction of votes candidate A will receive should not exceed a half percent, with 95% prob-
ability.
How large sample should they draw from the population in order to achieve this
objective?
12.
The average monthly electricity use per household in the USA is 910 kWh. A local util-
ity company wants to know whether the average use within the population it serves exceeds
the national average. In a random sample of 100 households the average monthly electricity
use was 920 kWh, with a sample standard deviation 50 kWh.
a.
Use a one-sided test to decide whether the average electricity use within the local popu-
lation exceeds the national average. Select the significance level
α
=
.
05.
b.
Repeat the same test as in part (a), using
α
=
.
01.
c.
Without calculating the
p
-value, what can you say about it, based on your answers to
parts (a) and (b). Help: your answer should be of the form “the
p
-value is less than a certain
number, and larger than another number”. Give a reason for your answer.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4
d.
Make a decision for the same test as above with
α
=
.
1, without any calculation.
e.
Make a decision for the same test as above with
α
=
.
005, without any calculation.
13.
An industrial supply firm sometimes gets calls related to improperly filled orders. This
situation is related to the salesperson’s error in writing up the bill of sale. It happens that
Hank will make an error on the bill of sale with probability 0.07, Jerry will make an error
on the bill of sale with probability 0.04, and Carl will make an error on the bill of sale with
probability 0.11.
It also should be noted that Hank writes 30% of all sales, Jerry writes 30% of all sales, and
Carl writes 40% of all sales.
If the firm receives a call about an improperly filled order, what is the probability that the
bill of sale was written by Hank? by Jerry? by Carl?
14.
A polling agency collected data concerning a coming election. They polled 100 voters.
In this sample 56 people said that they will vote for candidate A. Create a 95% confidence
interval for the proportion of votes candidate A will receive in the election.
15.
A manufacturer of boxes of candy is concerned about the proportion of imperfect boxes
- those containing cracked, broken, or otherwise damaged candies.
(a)
How large a sample is needed to be 99% confident that the difference between the sample
fraction of imperfect boxes and the population proportion of imperfect boxes is no more than
.015? Assume here that we have absolutely no information concerning the true proportion
of imperfect boxes.
(b)
How does your answer to part (a) change if we assume that the population proportion
of imperfect boxes is at least .005 and no more than .1?
16.
The SEC requires a company to file Form 8-K to report material changes in its finan-
cial condition of operation. In a sample of 462 firms with material events, only 23 were in
violation of this rule. Are you able to conclude that the true percentage of firms in violation
of the 4-day rule is less then 10%? Use a one-sided test with
α
=
.
01.
17
In an earlier study of American cable TV viewers who purchase items from one of the
home shopping channels, it was found that the average age of these shoppers was 51 years.
Suppose you want to test the null hypothesis
H
0
:
µ
= 51, using a sample of
n
= 50 TV
shoppers.
(a) Find the
p
-value of a two-sided test if ¯
x
= 52
.
3 and
s
= 7
.
1.
(b) Find the
p
-value of an one-sided test (
H
a
:
µ >
51) if ¯
x
= 52
.
3 and
s
= 7
.
1.
(c) Find the
p
-value of a two-sided test if ¯
x
= 52
.
3 and
s
= 10
.
4.
18.
In a random sample of size 106 shoppers 64 favored brand A against brand B. Let
p
be
the fraction in the entire population of shoppers who prefer brand A against brand B.
(a) A claim is made that
p
=
.
7
.
Set up the null and alternative hypotheses to test this claim
(two-sided). Make a decision using the significance level
α
=
.
01.
(b) Calculate the
p
-value corresponding to this test.
5
19.
A heating contractor sends a repair person to homes in response to calls about heating
problems. The contractor would like to have a way to estimate how long the customer will
have to wait before the repair person can begin work. Data on the number of minutes of
waiting time (Wait.Tim) and the backlog of previous calls waiting for service (Backlog) were
obtained. The data file is available on the class website, under the name WAITTIMEBACK-
LOG.
Answer the questions below. You may use R for answering these questions.
(a) Find the linear regression equation resulting from regression of Wait.Tim on Backlog.
Give an interpretation for the slope and the intercept.
Help: It makes answering this and the other questions in this exercise easier if you issue first
the following command:
attach(WAITTIMEBACKLOG)
This way you can refer directly to the columns in the data file.
For example, instead of
WAITTIMEBACKLOG Backlog you can simply write Backlog.
(b) Calculate the predicted value and the 95% prediction interval for the time to respond to
a call when the backlog is 6.
(c) Consider a regression for a model with the base-10 logarithm of Wait.Tim as a response
and Backlog as a predictor. Run a linear regression in R for this model. Does this model
appear better than the one without taking the logarithm of the Wait Time?
Help: Calculate the 10 based log of the Wait Time using the following R command:
Logtime=log10(Wait.Tim)
Then run a regression using Logtime as the response.
(d) Calculate the predicted value for the log of the Wait Time when the backlog is 6.
(e) Convert your answer to question (d) to a predicted value for the Wait Time when the
backlog is 6.
Help: You need to take the 10 based exponential of the prediction you received in part (d).
20.
You will need the data file ”sales” for completing this exercise. The file has the following
columns that are relevant to this exercise:
SalesPerSF: Sales per square foot of stores operated by a retail chain,
Income: the median household income in the surrounding community (dollars),
Population000: and the size of the community (in thousands).
6
Market: This is a qualitative variable.
There are 3 types of geographic locations: urban,
suburban, and rural. Two dummy variables have been set up, UrbanDummy and Suburban-
Dummy. Rural is selected as the base level.
Disregard the other columns in the file.
(a) Run a regression using SalesPerSF as the dependent variable, and Income, Population000,
and the two dummy variables as predictors. Which of the coefficients are significantly dif-
ferent from zero?
(b) Predict the sales per square foot for a store located in a suburban community with median
household income
71,000, and population size equal to 500,000 people. Create a 95% predic-
tion interval and a 95% confidence interval. Explain the difference between the two intervals.
(c) Interpret all four coefficients in the estimated regression equation.
21.
A firm produces metal wheels. The mean diameter of the wheels should be 4 inches.
Because of chance variation and other factors, the diameters of the wheels vary.
To test
whether the population average is really 4 inches, the firm selects a random sample of 100
wheels, and finds that the sample mean diameter equals 3.97 inches, and the sample standard
deviation equals .14 inch.
a
What should the firm’s decision be if they use 5% significance level?
b
What should the firm’s decision be if they use 1% significance level?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Recommended textbooks for you
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
data:image/s3,"s3://crabby-images/9ae58/9ae58d45ce2e430fbdbd90576f52102eefa7841e" alt="Text book image"
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
data:image/s3,"s3://crabby-images/9ae58/9ae58d45ce2e430fbdbd90576f52102eefa7841e" alt="Text book image"
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL