Fortune magazine publishes an annual list of 100 best companies to work for. The data in the “HW7.xlsx” file on Moodle shows a portion of the data for a sample of 20 of the companies that made the top 100 list for 2012. The column labeled Rank shows the rank of the company in the Fortune 100 list. The column labeled Size indicates whether the company is a small (fewer than 2500 employees), midsize (2500-10,000 employees) or large company (above 10,000 employees). The column labeled Salaried ($1000s) shows the shows the average annual salary for salaried employees rounded to the nearest $1000. Finally, the column labeled Hourly ($1000s) shows the average annual salary for hourly employees rounded to the nearest $1000. a) Use these data to develop an estimated regression equation that could be used to predict the average annual salary for salaried employees given the average annual salary for hourly employees. b) Use ? = 0.05 to test for overall significance. If you were conducting this test using statistical tables, what would be the critical value you would use for this test? Report the appropriate distribution, degrees of freedom, and the critical value. c) To incorporate the effect of size, a categorical variable with three levels, use two dummy variables: one indicating small companies, and another indicating middle-sized ones. The first one should take a value of 1 if a company is a small one, and 0 otherwise. The second should take a value of 1 if a company is a medium one, and 0 otherwise. Hint: use a formula of a similar form: =IF(A1="Small",1,0). Develop an estimated regression equation that could be used to predict the average annual salary for salaries employees given the average annual salary for hourly employees and the size of the company.
Inverse Normal Distribution
The method used for finding the corresponding z-critical value in a normal distribution using the known probability is said to be an inverse normal distribution. The inverse normal distribution is a continuous probability distribution with a family of two parameters.
Mean, Median, Mode
It is a descriptive summary of a data set. It can be defined by using some of the measures. The central tendencies do not provide information regarding individual data from the dataset. However, they give a summary of the data set. The central tendency or measure of central tendency is a central or typical value for a probability distribution.
Z-Scores
A z-score is a unit of measurement used in statistics to describe the position of a raw score in terms of its distance from the mean, measured with reference to standard deviation from the mean. Z-scores are useful in statistics because they allow comparison between two scores that belong to different normal distributions.
Fortune magazine publishes an annual list of 100 best companies to work for. The data in the “HW7.xlsx” file on Moodle shows a portion of the data for a sample of 20 of the companies that made the top 100 list for 2012. The column labeled Rank shows the rank of the company in the Fortune 100 list. The column labeled Size indicates whether the company is a small (fewer than 2500 employees), midsize (2500-10,000 employees) or large company (above 10,000 employees). The column labeled Salaried ($1000s) shows the shows the average annual salary for salaried employees rounded to the nearest $1000. Finally, the column labeled Hourly ($1000s) shows the average annual salary for hourly employees rounded to the nearest $1000.
-
a) Use these data to develop an estimated regression equation that could be used to predict the average annual salary for salaried employees given the average annual salary for hourly employees.
-
b) Use ? = 0.05 to test for overall significance. If you were conducting this test using statistical tables, what would be the critical value you would use for this test? Report the appropriate distribution, degrees of freedom, and the critical value.
-
c) To incorporate the effect of size, a categorical variable with three levels, use two dummy variables: one indicating small companies, and another indicating middle-sized ones. The first one should take a value of 1 if a company is a small one, and 0 otherwise. The second should take a value of 1 if a company is a medium one, and 0 otherwise. Hint: use a formula of a similar form: =IF(A1="Small",1,0). Develop an estimated regression equation that could be used to predict the average annual salary for salaries employees given the average annual salary for hourly employees and the size of the company.
-
d) Use the regression equation developed in part c) to predict the average annual salary for salaried employees in a Small company with an average annual salary for hourly employees equal to $60,000.
-
e) For the estimated regression equation developed in part c), use the t test to determine the significance of the independent variables. Use ? = 0.10. If you were to conduct this test using statistical tables, which t distribution (how many degrees of freedom) would you use? What is the critical value for this test?
-
f) Based upon your findings in part e), adjust the set of independent variables and develop an estimated regression equation that can be used to predict the average annual salary for salaried employees given the average annual salary for hourly employees and the size of the company.
-
g) Report and discuss the unadjusted and adjusted coefficients of determination for your equations in parts c) and f). Did dropping a variable improve the fit of the model?
Trending now
This is a popular solution!
Step by step
Solved in 4 steps