hw_exercises - Copy (3)

pdf

School

New York University *

*We aren’t endorsed by this school

Course

1305

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

6

Uploaded by ProfessorTeam17161

Report
STATISTICS AND DATA ANALYSIS HOMEWORK EXERCISES PETER LAKNER 1. Import the zagat file in R. a. Calculate the mean, the standard deviation, the median, the two quartiles, the minimum and the maximum for the Price variable. Help: Use the “summary” and the “sd” commands. b. Create a histogram for the variable Price. Help: use the “histogram” command, similarly to the way it is done in the Course Supplement for the Food variable. In order to specify the breaks, use “breaks=seq(7.5,80.5,by=1)”. For the axis, use the command “axis(1,at=seq(8,80,8))”. This will show ticks at 8, 16, 24, etc. Actually, R will put a few other ticks on the axis, but this is OK. Click on “export” above the histogram in R studio to save your work. 2. In production line 5% of the produced items is defective (typically this proportion is un- known; we assume it to be known for the sake of this exercise). A quality control inspector selects a random sample of n = 20 items. a) What is the probability that there will be no defective item in the sample? b) what is the probability that there will be at least 1 defective item in the sample? 3 A normal random variable X has mean 3.0 and standard deviation 0.2. What is the probability that X falls between 2.75 and 3.1? 4 Suppose that X follows normal distribution with mean 5.5 and standard deviation 0.3. Find a number w such that X < w with 30% probability. 5. Bluefish purchased at the Lime Beach Fishing Terminal produce a filet weight which has a mean of 4.5 pounds with a standard deviation of 0.8 pound. If a restaurant manager purchases 50 such fish, then what is the probability that she will have at least 220 pounds of filets? 6. A company is interested in estimating µ , the mean number of days of sick leave during the last year taken by all its employees. They select a random sample of 100 employees and note the number of sick days taken by each employee in the sample. The following sample statistics are computed: ¯ x = 12 . 2 days, s = 3 days. Find a 95% confidence interval for µ . 1
2 7. A firm that manages rental properties is assessing an expansion into an expensive area is San Francisco. To cover its costs, the firm needs the average rent in this area to be more than 1,500 per month. They set up two hypothesis: H 0 : µ = 1500 and H a : µ > 1500. In order to make a decision the firm obtained rents for a sample of n = 115 rental units in the area. Among these, the average rent is 1,657 with sample standard deviation s = 581. (a) Test the above hypothesis at 5% significance level. (b) Calculate the p -value for this test. 8. The data file file HEATING deals with the heating bill for dwelling units of various numbers of rooms. Use R whenever possible in answering the following questions. a. Obtain a scatter-plot of the two variables. Which variable should be on the horizontal axis? Help: Use the R command “plot(HEATING ROOMS, HEATING FUELBILL)”. b. Find the linear regression equation resulting from regression of FUELBILL on ROOMS. Give an interpretation for the slope and the intercept. c. Test the hypothesis that the true slope of the regression line is zero. d. Predict the FUELBILL for a unit with ROOMS=6. e. Create a 95% confidence interval for the average FUELBILL of all dwelling units in this population with ROOMS=6. Help: Use the following R commands: x=HEATING ROOMS y=HEATING FUELBILL new=data.frame(x=6) conf=predict(lm(y x),new,interval=”confidence”) conf f. Create a 95% prediction interval for a particular dwelling unit with ROOMS variable equal to 6. g. A particular 6 room unit last year had a heating bill of 958. Do you find this amount unusually high? 9. You are considering a quality inspection scheme to use on the spark plugs which are sent from your supplier. These spark plugs come in a shipments of 50,000. Denote the unknown proportion of defective spark plugs in the shipment by p . Ideally you would like to reject the shipment if p > . 05 and accept it if p . 05. In practice you can’t follow this plan since you
3 don’t know p . Instead you decide to apply a scheme that consists of the following steps: A random sample of 20 of the spark plugs will be selected from each shipment. Each of the selected plugs will be tested to see whether it is defective or not. (The test involves measuring the plug gap and determining the electrical resistance.) You will note as X the (random) number of defective plugs in the sample. If X < 2 then the shipment passes your quality standard. If X 2 then the shipment fails the quality test and will be returned to the supplier. (a) Find the probability that the shipment is rejected when p = . 05 (this corresponds to an “error” since at p = . 05 we would want to accept the shipment). (b) Find the probability that the shipment is accepted when p = . 1 (this corresponds to an “error” again since at p = . 1 we would want to reject the shipment). (c) Find the probability that the shipment is accepted when p = . 2. Note: The value of p is, of course, unknown. In these questions we assume that it has various concrete values in order to analyze the inspection scheme. 10. We would like to modify the quality control test described in question 9 above in the following way. We want to pass the shipment if X < w and reject the shipment when X w where w is a number to be determined. Determine the smallest possible value for w such that the probability of rejecting the shipment when p = . 05 is no more than .01, i.e., 1%. 11. A polling agency wants to predict the the percentage of votes candidate A will receive in election day. Their objective is that the difference between the estimate and the actual fraction of votes candidate A will receive should not exceed a half percent, with 95% prob- ability. How large sample should they draw from the population in order to achieve this objective? 12. The average monthly electricity use per household in the USA is 910 kWh. A local util- ity company wants to know whether the average use within the population it serves exceeds the national average. In a random sample of 100 households the average monthly electricity use was 920 kWh, with a sample standard deviation 50 kWh. a. Use a one-sided test to decide whether the average electricity use within the local popu- lation exceeds the national average. Select the significance level α = . 05. b. Repeat the same test as in part (a), using α = . 01. c. Without calculating the p -value, what can you say about it, based on your answers to parts (a) and (b). Help: your answer should be of the form “the p -value is less than a certain number, and larger than another number”. Give a reason for your answer.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 d. Make a decision for the same test as above with α = . 1, without any calculation. e. Make a decision for the same test as above with α = . 005, without any calculation. 13. An industrial supply firm sometimes gets calls related to improperly filled orders. This situation is related to the salesperson’s error in writing up the bill of sale. It happens that Hank will make an error on the bill of sale with probability 0.07, Jerry will make an error on the bill of sale with probability 0.04, and Carl will make an error on the bill of sale with probability 0.11. It also should be noted that Hank writes 30% of all sales, Jerry writes 30% of all sales, and Carl writes 40% of all sales. If the firm receives a call about an improperly filled order, what is the probability that the bill of sale was written by Hank? by Jerry? by Carl? 14. A polling agency collected data concerning a coming election. They polled 100 voters. In this sample 56 people said that they will vote for candidate A. Create a 95% confidence interval for the proportion of votes candidate A will receive in the election. 15. A manufacturer of boxes of candy is concerned about the proportion of imperfect boxes - those containing cracked, broken, or otherwise damaged candies. (a) How large a sample is needed to be 99% confident that the difference between the sample fraction of imperfect boxes and the population proportion of imperfect boxes is no more than .015? Assume here that we have absolutely no information concerning the true proportion of imperfect boxes. (b) How does your answer to part (a) change if we assume that the population proportion of imperfect boxes is at least .005 and no more than .1? 16. The SEC requires a company to file Form 8-K to report material changes in its finan- cial condition of operation. In a sample of 462 firms with material events, only 23 were in violation of this rule. Are you able to conclude that the true percentage of firms in violation of the 4-day rule is less then 10%? Use a one-sided test with α = . 01. 17 In an earlier study of American cable TV viewers who purchase items from one of the home shopping channels, it was found that the average age of these shoppers was 51 years. Suppose you want to test the null hypothesis H 0 : µ = 51, using a sample of n = 50 TV shoppers. (a) Find the p -value of a two-sided test if ¯ x = 52 . 3 and s = 7 . 1. (b) Find the p -value of an one-sided test ( H a : µ > 51) if ¯ x = 52 . 3 and s = 7 . 1. (c) Find the p -value of a two-sided test if ¯ x = 52 . 3 and s = 10 . 4. 18. In a random sample of size 106 shoppers 64 favored brand A against brand B. Let p be the fraction in the entire population of shoppers who prefer brand A against brand B. (a) A claim is made that p = . 7 . Set up the null and alternative hypotheses to test this claim (two-sided). Make a decision using the significance level α = . 01. (b) Calculate the p -value corresponding to this test.
5 19. A heating contractor sends a repair person to homes in response to calls about heating problems. The contractor would like to have a way to estimate how long the customer will have to wait before the repair person can begin work. Data on the number of minutes of waiting time (Wait.Tim) and the backlog of previous calls waiting for service (Backlog) were obtained. The data file is available on the class website, under the name WAITTIMEBACK- LOG. Answer the questions below. You may use R for answering these questions. (a) Find the linear regression equation resulting from regression of Wait.Tim on Backlog. Give an interpretation for the slope and the intercept. Help: It makes answering this and the other questions in this exercise easier if you issue first the following command: attach(WAITTIMEBACKLOG) This way you can refer directly to the columns in the data file. For example, instead of WAITTIMEBACKLOG Backlog you can simply write Backlog. (b) Calculate the predicted value and the 95% prediction interval for the time to respond to a call when the backlog is 6. (c) Consider a regression for a model with the base-10 logarithm of Wait.Tim as a response and Backlog as a predictor. Run a linear regression in R for this model. Does this model appear better than the one without taking the logarithm of the Wait Time? Help: Calculate the 10 based log of the Wait Time using the following R command: Logtime=log10(Wait.Tim) Then run a regression using Logtime as the response. (d) Calculate the predicted value for the log of the Wait Time when the backlog is 6. (e) Convert your answer to question (d) to a predicted value for the Wait Time when the backlog is 6. Help: You need to take the 10 based exponential of the prediction you received in part (d). 20. You will need the data file ”sales” for completing this exercise. The file has the following columns that are relevant to this exercise: SalesPerSF: Sales per square foot of stores operated by a retail chain, Income: the median household income in the surrounding community (dollars), Population000: and the size of the community (in thousands).
6 Market: This is a qualitative variable. There are 3 types of geographic locations: urban, suburban, and rural. Two dummy variables have been set up, UrbanDummy and Suburban- Dummy. Rural is selected as the base level. Disregard the other columns in the file. (a) Run a regression using SalesPerSF as the dependent variable, and Income, Population000, and the two dummy variables as predictors. Which of the coefficients are significantly dif- ferent from zero? (b) Predict the sales per square foot for a store located in a suburban community with median household income 71,000, and population size equal to 500,000 people. Create a 95% predic- tion interval and a 95% confidence interval. Explain the difference between the two intervals. (c) Interpret all four coefficients in the estimated regression equation. 21. A firm produces metal wheels. The mean diameter of the wheels should be 4 inches. Because of chance variation and other factors, the diameters of the wheels vary. To test whether the population average is really 4 inches, the firm selects a random sample of 100 wheels, and finds that the sample mean diameter equals 3.97 inches, and the sample standard deviation equals .14 inch. a What should the firm’s decision be if they use 5% significance level? b What should the firm’s decision be if they use 1% significance level?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help