Assignment4

pdf

School

Georgia State University *

*We aren’t endorsed by this school

Course

225

Subject

Business

Date

Apr 3, 2024

Type

pdf

Pages

Uploaded by DoctorFogButterfly43

Assignment 4 – Business Analytics Instructor: Pan Li SUBMISSION & COLLABORATION RULES Homework Assignment 4 – due 11:59PM March 25, in the Assignments section of CANVAS • This is an individual assignment. The GT honor code applies to this and other assignments! • There are plenty of resources to get help on using R. You can also ask your instructor/TA/classmates for help on using R. • On CANVAS I have posted the R script files that I have used in my PPT decks for the Experiments, Variable Selection, and Cross Validation modules • This assignment also has my R code to run Parts A and B. You will have to interpret the output to answer the questions in those parts. • You should submit your assignment as a Word file with your answers. • Please name your file as lastname_firstname_HW4.docx • You should submit your homework in the Assignments section of CANVAS • You will need to install and use the following R libraries for this homework o tidyverse o psych o Ecdat PURPOSE Understand how to apply and interpret the following concepts: • Randomized Controlled Experiments • Natural Experiments • Prediction – Variable Selection • Prediction – Cross Validation Part A (20 points) Part A is aimed to help students understand Randomized Controlled Experiments (RCE) and how to interpret output correctly. You will need to interpret the coefficients of Regression used for a RCE and estimate treatment effect. Data set to be used for Part C: Ecdat::Star You will need to download and run HW4.PartA.R For questions (A1 and A2) use the star dataset in the Ecdat package. Create a new dataset mydata that only has records for the “small” and “regular.with.aide” classes. Note: We are not interested in regular sized classes. The R code for selecting these two type of classes is mydata <- dplyr::filter(Ecdat::Star, classk=="small.class"|classk=="regular.with.aide") Create a dummy variable called small which is 1 for a student in a “small” size class, and is 0 for a student in a “regular.with.aide” class. Create totalscore which is the sum of the math and the reading scores for each record.

Please type ? Ecdat::Star in R to get a description of the Star dataset. Question A1 (10 points) Run a linear regression model reg_1 using all the data in mydata, using totalscore as the response variable and small as the predictor. • Show the output of summary(reg_1) • What is the estimated coefficient of small ? • What is its p-value? • Is small statistically significant? • What is the interpretation of the coefficient of small? Question A2 (10 points) Please run a linear regression model reg-2 using all the data in mydata, using totalscore as the response variable and small and teacher experience as the predictors. • Show the output of summary(reg_2) • What are the estimated coefficients of small and teacher experience? • What are their p-values? • Are they statistically significant? • What is the interpretation of the coefficient of small and teacher? Part B (70 points) Part B is aimed to help students understand Natural Experiments and how to interpret output correctly. You will need to interpret the coefficients of Regression used for a natural experiment and estimate treatment effect. Data set to be used for Part B: Ecdat::Treatment You will need to download and run HW4.PartB.R We will use the dataset named “Treatment” (in the Ecdat package). Create a new data frame , mydata, that is a copy of Treatment for your analysis, i.e., mydata <- Ecdat::Treatment . You will use this dataframe in questions B1-B3. The National Supported Work (NSW) demonstration project, conducted in the 1970s, measured the impact of training on earnings by a randomized experiment that assigned some individuals to receive training (a treatment group) and others to receive no training (a control group). Dehejia and Wahba (1999, 2002) adapted this data by creating a different “control” sample from national surveys that allowed them to compare experimental data with methods used in non-experimental settings. We have adapted the dataset from Dehejia and Wahba (1999) to compare different methods for determining causal impact of training on wages. We will consider 1978 to be post-treatment ( AFTER ) and 1975 as the pre-treatment period (BEFORE). Please enter ?Ecdat::Treatment in R-studio to get a description of the Treatment dataset. Description

a cross-section from 1974 number of observations : 2675 country : United States Format A dataframe containing : treat , was the individual in the treated group? Age , age of individual educ, education in years of the individual ethn , a factor with levels (“other”,”black”,”hispanic”) married , is the individual married ? re74 , the individual real annual earnings in 1974 (pre-treatment) re75 , the individual real annual earnings in 1975 (pre-treatment) re78, the individual real annual earnings in 1978 (post-treatment) u74, was the individual unemployed in 1974? u75, was the individual unemployed in 1975? Use the mutate function (in the dplyr package) to create a dummy variable Treated using this code: mydata <- mydata %>% mutate(Treated = as.numeric(treat)) So, if a record has Treated = 1 then it belongs to the treatment group; otherwise, if a record has Treated = 0 then it belongs to the control group Question B1: (10 points) Use the regression model, re75 = b0 + b1*Treated, to calculate the mean value of the pre- treatment period (BEFORE) annual earnings (re75) for the control and treatment groups, respectively. Please show you got your answers with the relevant R output. Question B2: (10 points) Use the regression model, re78 = b0 + b1*Treated, to calculate the mean value of the post-treatment period (AFTER) annual earnings (re78) for the control and treatment groups, respectively. Please show you got your answers with the relevant R output. Question B3 (10 Points) Please enter the values for A and B from Question B1 and C and D from Question B2 into the following table: • A = • B = • C =

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

• D = Question B4: (10 points) Using the table in B3, what is the difference between average re78 values and average re75 values for the Control group? Please show your computation. Question B5: (10 points) Using the table in B3, what is the difference between average re78 values and average re75 values for the Treatment group? Please show your computation. Question B6: (10 points) Enter the values for C - A from Question B4 and D - B from Question B6 into the following table: • C – A = • D – B = Question B7: (10 points) What is the difference-in-difference estimate for the impact of training on wages? Please show your computation. Diff-in-Diff = (D – B) – (C – A) = Part C (10 points; 2 points per question – Please select the correct answer(s)) 1. We have several linear regression models that are being considered for prediction and we have training data. If we focus only on how a model’s performance is measured by its Mean Square Error on the training data alone, we would always get the best linear regression model for prediction. Is this statement True or False? A. True B. False 2. Which of the following statements are TRUE about Best Subset Selection? (Select all Correct Answer(s))

A. If you have p predictors, there are p separate regressions possible that need to be evaluated. B. If you have p predictors, there are 2p separate regressions possible that need to be evaluated. C. Best Subset selection is not practical with p (# of predictors) over 30. Instead, we have to resort to alternative efficient methods. 3. Which of the following statements are TRUE? (Select all Correct Answer(s)) A. Stepwise methods – Forward and Backward – are alternative approaches to best subset selection that involve a considerably smaller number of models compared to Best Subset Selection. B. Both Forward and Backward stepwise selection methods involve consideration of 1 + p(p + 1)/2 models which is far more than the 2 𝑝 models being considered in best subset selection. C. Both Forward and Backward stepwise selection methods involve consideration of 1 + p(p + 1)/2 models which is far less than the 2 𝑝 models being considered in best subset selection. 4. Both Best Subset and Forward stepwise will always select the identical set of variables for their best models with 5 variables (assume that p = 10) A. TRUE B. FALSE 5. Which of the following statements are TRUE, when comparing Ridge Regression and Lasso? ( Select all Correct Answer(s)) A. Lasso has a very useful advantage as it produces simpler models which are more interpretable. B. Ridge Regression will always dominate Lasso over all data sets. C. As λ becomes very large, the penalty impact grows and more of the Lasso regression coefficients are set to zero *** End of Homework Assignment #4***

Related Documents

CYB_320_Project_One_SteppingStoneTwo_Emily_Brooks.docx

Business Communications - Week 4 Assignment.docx

WMAC7893 Module 13 RRM.docx

HUM115_v9_wk5_Reflection.docx

S.T.A.R. assignment-Communicating in a Business Environment.docx

1-3 Journal Company Research and Selection.docx

business-model-canvas Zane.docx

COM 127 Hiring Request Form (1).docx

annotated-Research%20Project-1.docx.pdf

CASE 12.2.docx

INT MODULE 3 TRADE.docx

WK3Assgn+Lehman+K.docx

Recommended textbooks for you

BUSN 11 Introduction to Business Student Edition

Business

ISBN:9781337407137

Author:Kelly

Publisher:Cengage Learning

Essentials of Business Communication (MindTap Cou...

Business

ISBN:9781337386494

Author:Mary Ellen Guffey, Dana Loewy

Publisher:Cengage Learning

Accounting Information Systems (14th Edition)

Business

ISBN:9780134474021

Author:Marshall B. Romney, Paul J. Steinbart

Publisher:PEARSON

Introduction to Business

Business

ISBN:9781947172548

Author:OpenStax

Publisher:OpenStax College

International Business: Competing in the Global M...

Business

ISBN:9781259929441

Author:Charles W. L. Hill Dr, G. Tomas M. Hult

Publisher:McGraw-Hill Education

Bcom

Business

ISBN:9780357026595

Author:LEHMAN, Carol M.

Publisher:Cengage Learning,

SEE MORE TEXTBOOKS

Recommended textbooks for you

BUSN 11 Introduction to Business Student Edition
Business
ISBN:9781337407137
Author:Kelly
Publisher:Cengage Learning
Essentials of Business Communication (MindTap Cou...
Business
ISBN:9781337386494
Author:Mary Ellen Guffey, Dana Loewy
Publisher:Cengage Learning
Accounting Information Systems (14th Edition)
Business
ISBN:9780134474021
Author:Marshall B. Romney, Paul J. Steinbart
Publisher:PEARSON
Introduction to Business
Business
ISBN:9781947172548
Author:OpenStax
Publisher:OpenStax College
International Business: Competing in the Global M...
Business
ISBN:9781259929441
Author:Charles W. L. Hill Dr, G. Tomas M. Hult
Publisher:McGraw-Hill Education
Bcom
Business
ISBN:9780357026595
Author:LEHMAN, Carol M.
Publisher:Cengage Learning,