Julia Deutsch _ HW_ Week 9 (2)
pdf
keyboard_arrow_up
School
New York University *
*We aren’t endorsed by this school
Course
1305
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
Pages
8
Uploaded by ProfessorTeam17161
Julia Deutsch
HW: Week 9
STATISTICS AND DATA ANALYSIS
HOMEWORK EXERCISES
8. The data file file HEATING deals with the heating bill for dwelling units of various numbers of
rooms. Use R whenever possible in answering the following questions.
e. Create a 95% confidence interval for the average FUELBILL of all dwelling units in this population
with ROOMS=6. Help: Use the following R commands: x=HEATING$ROOMS
y=HEATING$FUELBILL new=data.frame(x=6) conf=predict(lm(y
∼
x),new,interval=”confidence”) conf f.
Create a 95% prediction interval for a particular dwelling unit with ROOMS variable equal to 6.
IN R:
x=HEATING$ROOMS
y=HEATING$FUELBILL
new=data.frame(x=6)
conf=predict(lm(y~x),new,interval='confidence')
conf
fit
lwr
upr
1 565.1261 539.8317 590.4206
Answer:
Confidence Interval = (539.8317, 590.4206)
f. Create a 95% prediction interval for a particular dwelling unit with ROOMS variable equal to 6.
Predicted Mean Heating Bill: $565.13 | Standard Error of the Mean: $12.80
IN R:
heating_data <- read.csv("path_to_your_file/HEATING.csv")
model <- lm(FUELBILL ~ ROOMS, data = heating_data)
new_data <- data.frame(ROOMS = 6)
pred_interval <- predict(model, newdata = new_data, interval = "prediction", level = 0.95)
print(pred_interval)
●
95% Confidence Interval for the Mean:
●
Lower Bound: $539.83
●
Upper Bound: $590.42
●
95% Prediction Interval:
●
Lower Bound: $278.45
●
Upper Bound: $851.81
Answer:
This interval suggests that we can be 95% confident that the heating bill for a dwelling unit
with 6 rooms will fall between
approximately $278.45 and $851.81.
g. A particular 6 room unit last year had a heating bill of $958. Do you find this amount unusually
high?
IN R:
> summary(HEATING)
ROOMS
FUELBILL
Min.
: 3.000
Min.
: 210.0
1st Qu.: 5.000
1st Qu.: 426.0
Median : 6.500
Median : 568.5
Mean
: 6.611
Mean
: 648.3
3rd Qu.: 8.000
3rd Qu.: 832.8
Max.
:11.000
Max.
:1356.0
Answer:
Yes, a FUELBILL of $958.00 for a 6 room unit is unusually high given the mean FUELBILL
for a unit with the same number of rooms would be only $648.30. A FUELBILL of $958.00 would
conceivably be enough to heat a unit with 8+ rooms.
19. A heating contractor sends a repair person to homes in response to calls about heating
problems. The contractor would like to have a way to estimate how long the customer will have to
wait before the repair person can begin work. Data on the number of minutes of waiting time
(Wait.Tim) and the backlog of previous calls waiting for service (Backlog) were obtained. The data
file is available on the class website, under the name WAITTIMEBACKLOG.
(c) Consider a regression for a model with the base-10 logarithm of Wait.Tim as a response and
Backlog as a predictor. Run a linear regression in R for this model. Does this model appear better
than the one without taking the logarithm of the Wait Time? Help: Calculate the 10 based log of the
Wait Time using the following R command: Logtime=log10(Wait.Tim) Then run a regression using
Logtime as the response.
IN R:
attach(WAITTIMEBACKLOG_1_) #saves data to memory for easier reference
mod<- lm(`Wait Tim` ~ Backlog) #builds regression
summary(mod) #regression output
Logtime=log10(`Wait Tim`)
mod2<- lm(Logtime ~ Backlog) #builds regression | summary(mod2) #regression output
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Answer:
The regression model with the transformed wait time seems better than without the transformed
wait time since the adjusted r-squared increased from 0.2413 to 0.2804
(d) Calculate the predicted value for the log of the Wait Time when the backlog is 6
Log10(wait time) = 1.47008 + 0.194(6)
Answer:
The predicted logarithm of Wait Time when the backlog is 6 is approximately
2.63408
(e) Convert your answer to question (d) to a predicted value for the Wait Time when the backlog is
6. Help: You need to take the 10 based exponential of the prediction you received in part (d)
Log(
a
) Y = X
Y = a
x
Wait Time = 10
2.4608
Answer:
Wait Time = 430.6059 ≈ 431 Minutes
20. You will need the data file ”sales” for completing this exercise. The file has the following columns
that are relevant to this exercise: SalesPerSF: Sales per square foot of stores operated by a retail
chain, Income: the median household income in the surrounding community (dollars),
Population000: and the size of the community (in thousands). 6 Market: This is a qualitative
variable. There are 3 types of geographic locations: urban, suburban, and rural. Two dummy
variables have been set up, UrbanDummy and SuburbanDummy. Rural is selected as the base
level. Disregard the other columns in the file.
(a) Run a regression using SalesPerSF as the dependent variable, and Income, Population000, and
the two dummy variables as predictors. Which of the coefficients are significantly different from
zero?
IN R:
attach(sales) #saves data to memory for easier reference
mod<- lm(SalesPerSF ~ Income + Population000 + UrbanDummy + SuburbanDummy) #builds
regression
summary(mod) #regression output
Answer:
Because the p-value for Income, Population000, UrbanDummy, SuburbanDummy, is approximately
there is sufficient evidence to conclude that all of our predictors are significantly different than 0
(b) Predict the sales per square foot for a store located in a suburban community with median
household income $71,000, and population size equal to 500,000 people. Create a 95% prediction
interval and a 95% confidence interval. Explain the difference between the two intervals.
> predicted_sales
[1] 310.9872
> prediction_interval
[1] 176.2237 445.7508
Answer:
The predicted sales per square foot for a store located in a suburban community with a median
household income of $71,000 and a population size of 500,000 people is approximately 311.
●
The 95% Prediction Interval is between approximately 176 and 446. This interval
predicts the range within which we can expect the sales per square foot for a similar store
under similar conditions to fall 95% of the time.
●
The 95% Confidence Interval is between approximately 270 and 352. This interval
provides a range within which we can be 95% confident that the true mean sales per square
foot for all similar stores under similar conditions lies.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The key difference between these two intervals is that the prediction interval accounts for the
variability in individual observations and is typically wider than the confidence interval, which
only accounts for the uncertainty in estimating the true mean value for the population.
(c) Interpret all four coefficients in the estimated regression equation.
Constant
(Intercept)
: -78.11. This is the baseline sales per square foot when all other
predictors are zero. However, it's not practically interpretable in this context since predictors
like Income and Population cannot be zero.
Income:
0.00475. This coefficient suggests that for every additional dollar in median
household income, the sales per square foot increase by approximately 0.00475 units. This
indicates a positive relationship between household income and sales per square foot.
Population000:
0.2344. This means that for each increase of 1,000 people in the
community's population, the sales per square foot increase by approximately 0.2344 units. It
indicates a positive impact of the community's population size on sales per square foot.
UrbanDummy:
133.59. This coefficient indicates that the sales per square foot in urban
locations are, on average, 133.59 units higher than in rural areas (the base category).
SuburbanDummy:
-65.41. This implies that sales per square foot in suburban areas are, on
average, 65.41 units lower than in rural areas.
Answer:
As explained earlier, the coefficients in the regression equation represent the change
in the dependent variable (SalesPerSF) for a one-unit change in the respective independent
variables (Income, Population000), and the differences in SalesPerSF for different market
types (Urban and Suburban compared to Rural). The positive coefficients for Income and
Population000 indicate a positive relationship with SalesPerSF, while the UrbanDummy
coefficient shows higher sales in urban areas compared to rural. The negative coefficient for
SuburbanDummy indicates lower sales in suburban areas compared to rural areas.