Julia Deutsch _ HW_ Week 9 (1)

pdf

School

New York University *

*We aren’t endorsed by this school

Course

1305

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

6

Uploaded by ProfessorTeam17161

Report
Julia Deutsch HW: Week 9 STATISTICS AND DATA ANALYSIS HOMEWORK EXERCISES 19. A heating contractor sends a repair person to homes in response to calls about heating problems. The contractor would like to have a way to estimate how long the customer will have to wait before the repair person can begin work. Data on the number of minutes of waiting time (Wait.Tim) and the backlog of previous calls waiting for service (Backlog) were obtained. The data file is available on the class website, under the name WAITTIMEBACKLOG. (c) Consider a regression for a model with the base-10 logarithm of Wait.Tim as a response and Backlog as a predictor. Run a linear regression in R for this model. Does this model appear better than the one without taking the logarithm of the Wait Time? Help: Calculate the 10 based log of the Wait Time using the following R command: Logtime=log10(Wait.Tim) Then run a regression using Logtime as the response. IN R: attach(WAITTIMEBACKLOG_1_) #saves data to memory for easier reference mod<- lm(`Wait Tim` ~ Backlog) #builds regression summary(mod) #regression output Logtime=log10(`Wait Tim`) mod2<- lm(Logtime ~ Backlog) #builds regression summary(mod2) #regression output
Answer: The regression model with the transformed wait time seems better than without the transformed wait time since the adjusted r-squared increased from 0.2413 to 0.2804 (d) Calculate the predicted value for the log of the Wait Time when the backlog is 6 Log10(wait time) = 1.47008 + 0.194(6) Answer: The predicted logarithm of Wait Time when the backlog is 6 is approximately 2.63408
(e) Convert your answer to question (d) to a predicted value for the Wait Time when the backlog is 6. Help: You need to take the 10 based exponential of the prediction you received in part (d) Log( a ) Y = X Y = a x Wait Time = 10 2.4608 Answer: Wait Time = 430.6059 ≈ 431 Minutes 20. You will need the data file ”sales” for completing this exercise. The file has the following columns that are relevant to this exercise: SalesPerSF: Sales per square foot of stores operated by a retail chain, Income: the median household income in the surrounding community (dollars), Population000: and the size of the community (in thousands). 6 Market: This is a qualitative variable. There are 3 types of geographic locations: urban, suburban, and rural. Two dummy variables have been set up, UrbanDummy and SuburbanDummy. Rural is selected as the base level. Disregard the other columns in the file. (a) Run a regression using SalesPerSF as the dependent variable, and Income, Population000, and the two dummy variables as predictors. Which of the coefficients are significantly different from zero? IN R: attach(sales) #saves data to memory for easier reference mod<- lm(SalesPerSF ~ Income + Population000 + UrbanDummy + SuburbanDummy) #builds regression summary(mod) #regression output
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Answer: Because the p-value for Income, Population000, UrbanDummy, SuburbanDummy, is approximately there is sufficient evidence to conclude that all of our predictors are significantly different than 0 (b) Predict the sales per square foot for a store located in a suburban community with median household income $71,000, and population size equal to 500,000 people. Create a 95% prediction interval and a 95% confidence interval. Explain the difference between the two intervals.
> predicted_sales [1] 310.9872 > prediction_interval [1] 176.2237 445.7508 Answer: The predicted sales per square foot for a store located in a suburban community with a median household income of $71,000 and a population size of 500,000 people is approximately 311. The 95% Prediction Interval is between approximately 176 and 446. This interval predicts the range within which we can expect the sales per square foot for a similar store under similar conditions to fall 95% of the time. The 95% Confidence Interval is between approximately 270 and 352. This interval provides a range within which we can be 95% confident that the true mean sales per square foot for all similar stores under similar conditions lies. The key difference between these two intervals is that the prediction interval accounts for the variability in individual observations and is typically wider than the confidence interval, which only accounts for the uncertainty in estimating the true mean value for the population. (c) Interpret all four coefficients in the estimated regression equation. Constant (Intercept) : -78.11. This is the baseline sales per square foot when all other predictors are zero. However, it's not practically interpretable in this context since predictors like Income and Population cannot be zero.
Income: 0.00475. This coefficient suggests that for every additional dollar in median household income, the sales per square foot increase by approximately 0.00475 units. This indicates a positive relationship between household income and sales per square foot. Population000: 0.2344. This means that for each increase of 1,000 people in the community's population, the sales per square foot increase by approximately 0.2344 units. It indicates a positive impact of the community's population size on sales per square foot. UrbanDummy: 133.59. This coefficient indicates that the sales per square foot in urban locations are, on average, 133.59 units higher than in rural areas (the base category). SuburbanDummy: -65.41. This implies that sales per square foot in suburban areas are, on average, 65.41 units lower than in rural areas. Answer: As explained earlier, the coefficients in the regression equation represent the change in the dependent variable (SalesPerSF) for a one-unit change in the respective independent variables (Income, Population000), and the differences in SalesPerSF for different market types (Urban and Suburban compared to Rural). The positive coefficients for Income and Population000 indicate a positive relationship with SalesPerSF, while the UrbanDummy coefficient shows higher sales in urban areas compared to rural. The negative coefficient for SuburbanDummy indicates lower sales in suburban areas compared to rural areas.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help