Question 4 Documentation for CollegeDistance Data These data are taken from the HighSchool and Beyond survey conducted by the Department of Education in 1980, with a follow-up in 1986. The survey included students from approximately 1100 high schools. The data in CollegeDistance exclude students in the western states. The data in CollegeDistanceWest includes only those students in the western states. Series in Data Set Name Description ed Years of Education Completed (See below) female 1 = Female/0 = Male black 1 = Black/0 = Not-Black Hispanic 1 = Hispanic/0 = Not-Hispanic bytest Base Year Composite Test Score. (These are achievement tests given to high school seniors in the sample) dadcoll 1 = Father is a College Graduate/ 0 = Father is not a College Graduate momcoll 1 = Mother is a College Graduate/ 0 = Mother is not a College Graduate incomehi 1 = Family Income > $25,000 per year/ 0 = Income ≤ $25,000 per year. ownhome 1= Family Owns Home / 0 = Family Does not Own Home urban 1 = School in Urban Area / = School not in Urban Area cue80 County Unemployment rate in 1980 stwmfg80 State Hourly Wage in Manufacturing in 1980 dist Distance from 4yr College in 10's of miles tuition Avg. State 4yr College Tuition in $1000's Years of Education: Rouse computed years of education by assigning 12 years to all members of the senior class. Each additional year of secondary education counted as a one year. Students with vocational degrees were assigned 13 years, AA degrees were assigned 14 years, BA degrees were assigned 16 years, those with some graduate education were assigned 17 years, and those with a graduate degree were assigned 18 years. You ran a regression with all the variables and the following is the output from your regression analysis: SUMMARY OUTPUT Regression Statistics Multiple R 0.532740389 R Square 0.283812322 Adjusted R Square 0.281350545 Standard Error 1.537759319 Observations 3796 ANOVA df SS MS F Regression 13 3544.073026 272.621002 115.2875937 Residual 3782 8943.309482 2.364703723 Total 3795 12487.38251 Coefficients Standard Error t Stat P-value Intercept 8.893532213 0.252992583 35.15333185 1.6778E-234 dist -0.032586013 0.013323796 -2.445700324 0.01450239 bytest 0.093069305 0.003181759 29.25089857 9.7203E-170 female 0.14392191 0.050444897 2.853051891 0.004353675 black 0.33836742 0.072226404 4.684816096 2.90066E-06 Hispanic 0.349177035 0.078241758 4.462796366 8.3253E-06 incomehi 0.374115378 0.060762082 6.157053322 8.18478E-10 townhome 0.143257558 0.066881307 2.141967068 0.032259819 dadcoll 0.574015371 0.073754585 7.782775375 9.09068E-15 momcoll 0.37867 0.081528961 4.644607203 3.52311E-06 cue80 0.028259694 0.009874008 2.862028562 0.004232516 stwmfg80 -0.042632589 0.020208496 -2.109636897 0.034955178 urban 0.065166355 0.063650146 1.023820971 0.305985324 tuition -0.184834534 0.101061915 -1.828923725 0.067489741 (a) Are all the outputs statistically significant? Give reasons (b) Do all the variables have expected signs? Explain Irrespective of your findings in part (a) you decided to exclude urban and tuition from your model. Below is the output of your restricted model. SUMMARY OUTPUT Regression Statistics Multiple R 0.531915985 R Square 0.282934615 Adjusted R Square 0.280850123 Standard Error 1.538294625 Observations 3796 ANOVA df SS MS F Regression 11 3533.112766 321.1920697 135.7331001 Residual 3784 8954.269742 2.366350355 Total 3795 12487.38251 Coefficients Standard Error t Stat P-value Intercept 8.861373218 0.24970537 35.48731545 2.2539E-238 dist -0.030803906 0.012337745 -2.496720866 0.012576959 bytest 0.092447362 0.003167406 29.18709361 4.3712E-169 female 0.14337772 0.050453511 2.841778818 0.004510268 black 0.35380829 0.07123451 4.96681021 7.10702E-07 hispanic 0.402351451 0.074264234 5.417836121 6.40833E-08 incomehi 0.366595238 0.060679243 6.041526185 1.67409E-09 ownhome 0.145641624 0.066640862 2.185470293 0.028915488 dadcoll 0.569915283 0.07371817 7.731001473 1.35828E-14 momcoll 0.379183612 0.081549788 4.64971917 3.43733E-06 cue80 0.024417986 0.00960948 2.541030802 0.01109222 stwmfg80 -0.050204408 0.019801292 -2.535410723 0.011271484 (c) conduct a joint hypothesis test to see if the variables you have left out are jointly insignificant or not.
Inverse Normal Distribution
The method used for finding the corresponding z-critical value in a normal distribution using the known probability is said to be an inverse normal distribution. The inverse normal distribution is a continuous probability distribution with a family of two parameters.
Mean, Median, Mode
It is a descriptive summary of a data set. It can be defined by using some of the measures. The central tendencies do not provide information regarding individual data from the dataset. However, they give a summary of the data set. The central tendency or measure of central tendency is a central or typical value for a probability distribution.
Z-Scores
A z-score is a unit of measurement used in statistics to describe the position of a raw score in terms of its distance from the mean, measured with reference to standard deviation from the mean. Z-scores are useful in statistics because they allow comparison between two scores that belong to different normal distributions.
Question 4
Documentation for CollegeDistance Data
These data are taken from the HighSchool and Beyond survey conducted by the Department
of Education in 1980, with a follow-up in 1986. The survey included students from
approximately 1100 high schools.
The data in CollegeDistance exclude students in the western states. The data in
CollegeDistanceWest includes only those students in the western states.
Series in Data Set
Name Description
ed Years of Education Completed (See below)
female 1 = Female/0 = Male
black 1 = Black/0 = Not-Black
Hispanic 1 = Hispanic/0 = Not-Hispanic
bytest Base Year Composite Test Score. (These are achievement tests given to
high school seniors in the sample)
dadcoll 1 = Father is a College Graduate/ 0 = Father is not a College Graduate
momcoll 1 = Mother is a College Graduate/ 0 = Mother is not a College Graduate
incomehi 1 = Family Income > $25,000 per year/ 0 = Income ≤ $25,000 per year.
ownhome 1= Family Owns Home / 0 = Family Does not Own Home
urban 1 = School in Urban Area / = School not in Urban Area
cue80 County Unemployment rate in 1980
stwmfg80 State Hourly Wage in Manufacturing in 1980
dist Distance from 4yr College in 10's of miles
tuition Avg. State 4yr College Tuition in $1000's
Years of Education: Rouse computed years of education by assigning 12 years to all members
of the senior class. Each additional year of secondary education counted as a one year.
Students with vocational degrees were assigned 13 years, AA degrees were assigned 14
years, BA degrees were assigned 16 years, those with some graduate education were
assigned 17 years, and those with a graduate degree were assigned 18 years.
You ran a regression with all the variables and the following is the output from your
regression analysis:
SUMMARY
OUTPUT
Regression Statistics
Multiple R 0.532740389
R Square 0.283812322
Adjusted R Square 0.281350545
Standard Error 1.537759319
Observations 3796
ANOVA
df SS MS F
Regression 13 3544.073026 272.621002 115.2875937
Residual 3782 8943.309482 2.364703723
Total 3795 12487.38251
Coefficients Standard Error t Stat P-value
Intercept 8.893532213 0.252992583 35.15333185 1.6778E-234
dist -0.032586013 0.013323796 -2.445700324 0.01450239
bytest 0.093069305 0.003181759 29.25089857 9.7203E-170
female 0.14392191 0.050444897 2.853051891 0.004353675
black 0.33836742 0.072226404 4.684816096 2.90066E-06
Hispanic 0.349177035 0.078241758 4.462796366 8.3253E-06
incomehi 0.374115378 0.060762082 6.157053322 8.18478E-10
townhome 0.143257558 0.066881307 2.141967068 0.032259819
dadcoll 0.574015371 0.073754585 7.782775375 9.09068E-15
momcoll 0.37867 0.081528961 4.644607203 3.52311E-06
cue80 0.028259694 0.009874008 2.862028562 0.004232516
stwmfg80 -0.042632589 0.020208496 -2.109636897 0.034955178
urban 0.065166355 0.063650146 1.023820971 0.305985324
tuition -0.184834534 0.101061915 -1.828923725 0.067489741
(a) Are all the outputs statistically significant? Give reasons
(b) Do all the variables have expected signs? Explain
Irrespective of your findings in part (a) you decided to exclude urban and tuition from your
model.
Below is the output of your restricted model.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.531915985
R Square 0.282934615
Adjusted R Square 0.280850123
Standard Error 1.538294625
Observations 3796
ANOVA
df SS MS F
Regression 11 3533.112766 321.1920697 135.7331001
Residual 3784 8954.269742 2.366350355
Total 3795 12487.38251
Coefficients
Standard
Error t Stat P-value
Intercept 8.861373218 0.24970537 35.48731545 2.2539E-238
dist -0.030803906 0.012337745 -2.496720866 0.012576959
bytest 0.092447362 0.003167406 29.18709361 4.3712E-169
female 0.14337772 0.050453511 2.841778818 0.004510268
black 0.35380829 0.07123451 4.96681021 7.10702E-07
hispanic 0.402351451 0.074264234 5.417836121 6.40833E-08
incomehi 0.366595238 0.060679243 6.041526185 1.67409E-09
ownhome 0.145641624 0.066640862 2.185470293 0.028915488
dadcoll 0.569915283 0.07371817 7.731001473 1.35828E-14
momcoll 0.379183612 0.081549788 4.64971917 3.43733E-06
cue80 0.024417986 0.00960948 2.541030802 0.01109222
stwmfg80 -0.050204408 0.019801292 -2.535410723 0.011271484
(c) conduct a joint hypothesis test to see if the variables you have left out are jointly
insignificant or not.
Trending now
This is a popular solution!
Step by step
Solved in 4 steps