BUAN6356_HW1

txt

School

University of Texas, Dallas *

*We aren’t endorsed by this school

Course

6356

Subject

Economics

Date

Feb 20, 2024

Type

txt

Pages

4

Uploaded by CorporalWorld12066

Report
library(DBI)library(RSQLite)wpull <- function(tablename){ con <- DBI::dbConnect(RSQLite::SQLite(),'wooldridge2.db') dt <- DBI::dbReadTable(con,tablename) dt <- data.table(dt) print(DBI::dbReadTable(con,paste(tablename,'labels',sep='_'))) DBI::dbDisconnect(con) rm(con) return(dt)}con <- dbConnect(SQLite(),'wooldridge2.db')dbListTables(con)bwght <- dbReadTable(con,'bwght')bwght <- data.table(bwght)dbReadTable(con,'bwght_labels') dbDisconnect(con)rm(con) #QUESTION - 1wage1 <- wpull('wage1')# 1. Find the average education level in the sample. What are the lowest and highest years of education?summary(wage1)avg_educ <- mean(wage1$educ) avg_educ #1_average education level = 12.56274min_educ <- min(wage1$educ)min_educ #1_lowest years of education = 0max_educ <- max(wage1$educ)max_educ #1_highest years of education 18# 2. Find the average hourly wage in the sample. Does it seem high or low?mean(wage1$wage) #2_average hourly wage = 5.908992 is low # 3. The wage data are reported in 1976 dollars. Using the Economic Report of the President (2021 or later), # obtain and report the Consumer Price Index (CPI) for the years 1976 and 2020#CP1 of 1976 is 56.9, CPI of 2020 is 258.8# 4. Use the CPI values from above to find the average hourly wage in 2020 dollars. Now does the average# hourly wage seem reasonable?mean(wage1$wage)*258.8/56.9 #4_this yields 26.87605 which is reasonable# 5. How many women are in the sample? How many men?table(wage1$female) #5_274 male & 252 female #QUESTION - 2meap01 <- wpull('meap01')# 1.Find the largest and smallest values of math4. Does the range make sense? Explain.summary(meap01) #smallest value is 0 and largest value is 100# 2. How many schools have a perfect pass rate on the math test? What percentage is this of the total# sample?sum(meap01$math4==100) #number of schools with 100 math score = 38mean(meap01$math4==100) #2% of the schools# 3. How many schools have math pass rates of exactly 50%?sum(meap01$math4==50) # 17 schools # 4. Compare the average pass rates for the math and reading scores. Which test is harder to pass? mean(meap01$math4) #71.909mean(meap01$read4) #60.06188# Reading is harder than Math # 5. Find the correlation between math4 and read4. What do you conclude? cor(meap01$math4,meap01$read4) #0.843 - a high correlation # 6. The variable exppp is expenditure per pupil. Find the average of exppp along with its standard# deviation. Would you say there is wide variation in per pupil spending? mean(meap01$exppp) #average of expp is $5194.865sd(meap01$exppp) #standard deviation of expp is 1091.9, not a wide variations# 7. Suppose School A spends $6000 per student and School B spends $5500 per student. By what percent-# age does School A’s spending exceed School B’s? Compare this to 100 [ln(6000) − ln(5500)], # which is the approximation percentage difference based on the difference in the natural logs.a_spending <- 6000b_spending <- 5500percent_dev <- 100*((a_spending- b_spending)/b_spending)percent_dev #exceeds by 9.091% 100*(log(6000)-log(5500)) #8.7% approc difference is 0.39 #QUESTION - 3f01k <- wpull('401k')# 1. Find the average participation rate and the average match rate in the sample of plans.summary(f01k)mean(f01k$prate) #87.363mean(f01k$mrate) #0.7315# 2. Now, estimate the simple regression equation:# prate = β0 + β1mrate,# and report the results along with the sample size and R-squared.model <- lm(prate~mrate, data = f01k)summary(model) #sample size = 1532, R squared = 0.0747# prate = 83.0755 + 5.8611 mrate# 3. Interpret the intercept in your equation. Interpret the coefficient on mrate.#intercept of the eq = 83.0755 i.e when mrate tends to 0, prate becomes 83.0755%#coeff of mrate = 5.8611, prate increases by 5.8611% for every percent increse in mrate# 4. Find the predicted prate when mrate = 3.5. Is this a reasonable prediction? #Explain what is happening here.83.0755 + 5.8611*(3.5) #103.5894 which is >100 and prate cannot exceed 100 # 5. How much of the variation in prate is explained by mrate?#R^2 is 7.47% which implies mrate has very less impact on prate #QUESTION - 4ceosal2 <- wpull('ceosal2')# 1. Find the average salary and the average tenure in the sample. mean(ceosal2$salary) #865.8644mean(ceosal2$ceoten) #7.955summary(ceosal2)# 2. How many CEOs are in their first year as CEO (that is, ceoten = 0)? # What is the longest tenure as a CEO?sum(ceosal2$ceoten==0) #5#longest as ceo is 37 years# 3. Estimate the simple regression model# ln[salary] = β0 + β1ceoten + u# and report
your results in the usual form. What is the (approximate) predicted percentage increase in# salary given one more year as a CEO?model2 <- lm(log(salary)~ceoten, data = ceosal2)summary(model2) #0.97% increase # Question 05wage2 <- wpull('wage2')# 1. Find the average salary and average IQ in the sample. # What is the sample standard deviation of IQ? # (IQ scores are standardized so that the average in the population is 100 with a standard deviation equal to 15.) summary(wage2) # average salary = 957.9, average IQ = 101.3 sd(wage2$IQ) #15.05264 standard dev of IQ# 2. Estimate a simple regression model where a one-point increase in IQ changes wage by a constant dollar amount.# Use this model to find the predicted increase in wage for an increase in IQ of 15 points.# Does IQ explain most of the variation in wage?model3 <- lm(wage~IQ, data = wage2)summary(model3) #wage = 8.3031(IQ)8.3031*(15) #124.5465 dollars increase#r^2 is 9.5%, so IQ doesnt impact wage significantly enough# 3. Now, estimate a model where each one-point increase in IQ has the same percentage effect on wage.model4 <- lm(log(wage)~IQ, data = wage2)summary(model4)# If IQ increases by 15 points, what is the approximate percentage increase in predicted wage# wage = 0.0088072 * IQ0.0088072 * 15 # 13.2108 % increase in wage # Question 6meap93 <- wpull('meap93')# 1. Do you think each additional dollar spent has the same effect on the pass rate, # or does a diminishing effect seem more appropriate? Explain.#it has a dimnishing affect as can be seen from the plot, with increase in expenditure # the pass rate gas to go up but we can see data that is unusally lowerm <- lm(math10~expend, data = meap93)ggplot(meap93,aes(x=expend,y=math10)) + geom_point() + geom_line(aes(y=predict(m)),color='red') + scale_x_continuous('expenditure') + scale_y_continuous('math percentages') cor(meap93$math10,meap93$expend) #0.1815503# 2. In the population model,# math10 = β0 + β1 ln[expend] + u# argue that β1/10 is the percentage point change in math10 given a 10% increase in expend.model5 <- lm(math10~log(expend), data = meap93) summary(model5) #β0 = -69.341, β1 = 11.164 #since logs are denoted as percentages β1/10 translates to β1*10%.# 3. Estimate this model. Report the estimated equation in the usual way, # including the sample size and R-squared.# math10 = -69.341+ 11.164 ln[expend]# sample size = 406, R^2 is 0.02966# 4. How big is the estimated spending effect? Namely, if spending increases by 10%, what is the estimated# percentage point increase in math10?# The coefficient on expend means that the pass rate increases by 11.16%, as the expenditure increases by 100%(1.00).# Hence, if expend increases by 10%(0.1), then the estimated percentage point change in math10 is 1.116 = 11.16×0.1. # 5. One might worry that regression analysis can produce fitted values for math10 that are greater than# 100. Why is this not much of a worry in this data set?summary(meap93)#max math10 is 66.7, implies there are no perfect scores or more than 100 #QUESTION 7# 1. Write out the results in equation form.hprice1<- wpull('hprice1') model9<- lm(price~sqrft+bdrms, data =hprice1)summary(model9)# price = -19.31500+ 0.12844*sqrft + 15.19819 * bdrms# 2. What is the estimated increase in price for a house with one more bedroom, holding square footage# constant?# with sqrft constant, adding 1 bedroom by 15.2 thousands# 3. What is the estimated increase in price for a house with an additional bedroom that is 140 square feet# in size? Compare this to your answer from above.0.12844*(140) + 15.19819 * 1 #33.18 thousands# 4. What percentage of the variation in price is explained by square footage and number of bedrooms?# 63.19% is associated with sqrft and bdrms# 5. The first house in the sample has sqrft = 2438 and bdrms = 4. Find the predicted selling price for# this house from the OLS regression line.0.12844*(2438) + 15.19819 * 4 -19.31500 # 354.6145 thousand# 6. The actual selling price of the first house in the sample was $300000 (so price = 300). Find the residual# for this house. Does it suggest that the buyer underpaid or overpaid for the house?354.6145 - 300 #54.6145 thousand, they buyer underpaid. #question 8ceosal2 <- wpull('ceosal2')# 1. Estimate a model relating annual salary to firm sales and market value. # Make the model of the constant elasticity variety for both independent variables. # Write the results out in equation form.model10 <- lm(log(salary)~log(sales)+log(mktval), data = ceosal2)summary(model10)# ln(salary) = 0.16213ln(sales)+0.10671ln(mktval)+4.62092# 2. Add profits to the model. Why can
this variable not be included in logarithmic form? # Would you say that these firm performance variables explain most of the variation in CEO salaries?model11 <- lm(log(salary)~log(sales)+log(mktval)+log(profits), data = ceosal2)summary(model11) model12 <- lm(log(salary)~log(sales)+log(mktval)+profits, data = ceosal2) summary(model12)# profits cannot be added in ln form because profits take on negative values.# log of a negative value is undefined. # we cannot say that these firm performance explain most of the variation in CEO salaries. # The R-squared is approximately 0.3, meaning that only 30% of the variation in log(salary) is explained. # Profits seems to add very little to the model,suggesting that profits have very little influence on log(salary)# 3. Now also add the variable ceoten to the model. # What is the estimated percentage return for another year of CEO tenure, holding other factors fixed?model13 <- lm(log(salary)~log(sales)+log(mktval) +profits+ceoten, data = ceosal2)summary(model13)# increases salary by 1.168%# 4. Find the sample correlation coefficient between the variables log(mktval) and profits. # Are these varables highly correlated? What does this say about the OLS estimators?cor(log(ceosal2$mktval),ceosal2$profits)#77.7% - high correlation. do not impact in a significantly large #QUESTION 09attend<- wpull('attend')attend$atndrte<-attend$attend/32# 1. Obtain the minimum, maximum, and average values for the variables atndrte, priGPA, and ACT. summary(attend)# for atndrte, min = 6.25%, max = 100%, avg = 81.71%#for priGPA, min = 0.857, max = 3.93, avg = 2.587# for ACT scores, min = 13, max = 32, avg = 22.51# 2. Estimate the model# atndrte = β0 + β1GPA + β2ACT + u# and write the results in equation form. Interpret the intercept. Does it have a useful meaning?model14 <- lm(atndrte~priGPA+ACT, data = attend)summary(model14)# atndrte = 0.75700 + 0.17261*priGPA + (-0.01717)*ACT # the intercept implies that attendance rate is 75.7% when priGPA & ACT tends to 0, which makes no sense# 3. Discuss the estimated slope coefficients. Are there any surprises?#the ACT intercept is not making sense as with 1 increase in score the attendance rate decreases by 1.72%# the priGPA makes sense, with one point increase in GPA attendance increases by 17.26%# 4. What is the predicted atndrte if priGPA = 3.65 and ACT = 20? What do you make of this result?# Are there any students in the sample with these values of the explanatory variables?0.75700 + 0.17261*3.65 + (-0.01717)*20# attendance rate is 104.3627% which is greater tha 100%sum(attend$priGPA == 3.65 & attend$ACT == 20) #there is 1 studentattend$priGPA>3.64 & attend$ACT == 20attend[569]#the student with priGPA = 3.65 and ACT = 20 has attendance rate = 87.5%, is afresher# index attend termGPA priGPA ACT final atndrte hwrte frosh soph# 1: 568 28 3.5 3.65 20 29 0.875 50 1 0# 5. If Student A has priGPA = 3.1 and ACT = 21 and Student B has priGPA = 2.1 and ACT = 26,# what is the predicted difference in their attendance rates?0.75700 + 0.17261*3.1 + (-0.01717)*21#attendance rate is 93.1521% 0.75700 + 0.17261*2.1 + (-0.01717)*26#attendance rate is 67.31%0.931521 - 0.673061 #there is a 25.85% difference # QUESTION 10htv<- wpull('htv')summary(htv)# 1. What is the range of the educ variable in the sample? summary(htv$educ) # 6-20# What percentage of men completed 12th grade but no higher grade?sum(htv$educ==12) 512/1230#512 out of 1230 - 41.63%# Do the men or their parents have, on average, higher levels of education?summary(htv)#On average, men have higher education than their parents# 2. Estimate the regression model by OLS and report the results in the usual form. # educ = β0 + β1motheduc + β2fatheduc + u model15<-lm(educ~motheduc+fatheduc,data=htv)summary(model15)# educ = 6.96435 + 0.30420 *motheduc + 0.19029 * fatheduc # How much sample variation in educ is explained by parents’ education?# 24.93% is explained by mother and father's education# Interpret the coefficient on motheduc.# Motheduc's coefficient is .30420 # every year of mother's education is associated with a 30.4% increase in the men's education# 3. Add the variable abil (a measure of cognitive ability) to the regression above, and report the results in# equation form. Does ability help to explain variations in education, even after controlling for parents’# education? Explain.model16<-lm(educ~motheduc+fatheduc+abil,data=htv)summary(model16)# educ = 8.44869 + 0.18913 *motheduc + 0.111099 * fatheduc + 0.50248*abil# the R^2 has increase to 0.4275, implies that ability is affecting education significantly even
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
with parents education# 4. Now estimate an equation where abil appears in quadratic form:# educ = β0 + β1motheduc + β2fatheduc + β3abil + β4abil^2 + u.# With the estimated coefficients on ability, use calculus to find the value of abil where educ is minimized.# (The other coefficients and values of parents’ education variables have no effect; we are holding parents’ education fixed.)# Notice that abil is measured so that negative values are permissible. # You might also verify that the second derivative is positive so that you do indeed have a minimum. model17<-lm(educ~motheduc+fatheduc+abil+I(abil^2),data=htv)summary(model17)# differentiating with abil on both sides we get d(educ)/d(abil)= β3 + 2β4(abil)=0 abil_min = - 0.401462/(2*(0.050599))abil_min # is -3.967094 (this is minimun since second derv is positive)# 5. Argue that only a small fraction of men in the sample have ability less than the value calculated above. # Why is this important? sum(htv$abil<=abil_min)15/1230 #1.22% of men have the the min ability, this again is similar to the bwght problem# we do not have enough data points to interpret the data# 6. Use the estimates above to plot the relationship beween the predicted education and abil. # Let motheduc and fatheduc have their average values in the sample, 12.18 and 12.45, respectively.htv2<- htvhtv2$motheduc<- as.double(htv2$motheduc)htv2$fatheduc<- as.double(htv2$fatheduc)htv2$motheduc<- 12.18htv2$fatheduc<-12.45ggplot(htv,aes(x=abil,y=educ)) + geom_point() + geom_line(aes(y=predict(model17,htv2)),color='red') + scale_x_continuous('ability') + scale_y_continuous('years of education')