Assigment 5 - ML

docx

School

St. John's University *

*We aren’t endorsed by this school

Course

602

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

24

Uploaded by lore150

Report
Boston The following objects are masked from ‘package:ISLR’: Auto, Credit > library(MASS) > library(car) Error in library(car) : there is no package called ‘car’ > library(boot) > library(class) > # import data and clean > capstr <- na.omit(capstr) > dim(capstr) [1] 5634 14 > names(capstr) [1] "gvkey" "year" "conm" [4] "spquality" "industry" "leverage" [7] "logassets" "rdta" "cashta" [10] "divta" "taxes" "capexta" [13] "roa" "leverageincrease" > #inspect your data > mean(capstr$leverage) [1] 0.3427896 > median(capstr$leverage) [1] 0.3278918 > sd(capstr$leverage) [1] 0.2064969 > #histograms of variables of interest > hist(capstr$leverage) > hist(capstr$logassets) > #linear regression of leverage > lm.fit1 <- lm(leverage~logassets, data=capstr) > summary(lm.fit1) Call: lm(formula = leverage ~ logassets, data = capstr) Residuals: Min 1Q Median 3Q Max -0.36522 -0.14258 -0.01556 0.11545 1.50374 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 0.250999 0.019425 12.922 < 2e-16 *** logassets 0.010638 0.002228 4.773 1.86e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2061 on 5632 degrees of freedom Multiple R-squared: 0.004029, Adjusted R-squared: 0.003853 F-statistic: 22.79 on 1 and 5632 DF, p-value: 1.857e-06 > plot(lm.fit1) Hit <Return> to see next plot: #training test split Hit <Return> to see next plot: train<-(capstr$year<2018) Hit <Return> to see next plot: test <- capstr[!train,] Hit <Return> to see next plot: lm.fit3 <- lm(leverage~logassets, data=capstr, subset=train) > mean((test$leverage-predict(lm.fit3, test))^2) [1] 0.06209251 > #multiple regression > lm.fit5 <- lm(leverage~logassets+capexta+rdta+taxes+spquality+divta+cashta, data=capstr, subset=train) > summary(lm.fit5) Call: lm(formula = leverage ~ logassets + capexta + rdta + taxes + spquality + divta + cashta, data = capstr, subset = train) Residuals: Min 1Q Median 3Q Max -0.49552 -0.11788 -0.01290 0.09373 1.30077 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.215996 0.034405 6.278 4.08e-10 *** logassets 0.009444 0.003231 2.923 0.003496 ** capexta -0.478974 0.130314 -3.676 0.000243 *** rdta -0.872677 0.097419 -8.958 < 2e-16 *** taxes -1.175891 0.591416 -1.988 0.046900 * spqualityA- 0.046032 0.021141 2.177 0.029550 * spqualityA+ -0.004828 0.028730 -0.168 0.866567 spqualityB 0.095494 0.018510 5.159 2.69e-07 *** spqualityB- 0.093547 0.018623 5.023 5.47e-07 *** spqualityB+ 0.110354 0.018303 6.029 1.91e-09 *** spqualityC 0.194454 0.019839 9.801 < 2e-16 ***
spqualityD 0.384456 0.037757 10.182 < 2e-16 *** divta 0.880130 0.171596 5.129 3.15e-07 *** cashta -0.268161 0.034545 -7.763 1.24e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1842 on 2324 degrees of freedom (4662 observations deleted due to missingness) Multiple R-squared: 0.1952, Adjusted R-squared: 0.1907 F-statistic: 43.35 on 13 and 2324 DF, p-value: < 2.2e-16 > mean((test$leverage-predict(lm.fit5, test))^2) Error in eval(predvars, data, env) : object 'capexta' not found > vif(lm.fit5) Error in vif(lm.fit5) : could not find function "vif" > #Add year and industry effects > lm.fit6 <- lm(leverage~logassets+capexta+rdta+taxes+spquality+divta+cashta+factor(industry), data=capstr,subset=train) > summary(lm.fit6) Call: lm(formula = leverage ~ logassets + capexta + rdta + taxes + spquality + divta + cashta + factor(industry), data = capstr, subset = train) Residuals: Min 1Q Median 3Q Max -0.44866 -0.11462 -0.01504 0.09621 1.33151 Coefficients: Estimate Std. Error t value (Intercept) 0.2845002 0.0393679 7.227 logassets 0.0088708 0.0032623 2.719 capexta -0.2056597 0.1439220 -1.429 rdta -0.6576442 0.1036640 -6.344 taxes -0.4173570 0.5933910 -0.703 spqualityA- 0.0365232 0.0208753 1.750 spqualityA+ -0.0006213 0.0282817 -0.022 spqualityB 0.0792077 0.0184534 4.292 spqualityB- 0.0699337 0.0187556 3.729 spqualityB+ 0.0978148 0.0181447 5.391 spqualityC 0.1736531 0.0199029 8.725 spqualityD 0.3608247 0.0373088 9.671
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
divta 0.5490122 0.1776610 3.090 cashta -0.2229544 0.0346485 -6.435 factor(industry)Bus-eq -0.1053724 0.0189318 -5.566 factor(industry)Chem -0.0501774 0.0238176 -2.107 factor(industry)Durbl -0.0709785 0.0247872 -2.864 factor(industry)Enrgy -0.0899567 0.0386883 -2.325 factor(industry)Fin 0.0028891 0.0208928 0.138 factor(industry)Hlth -0.0724094 0.0214998 -3.368 factor(industry)Manuf -0.0588759 0.0232521 -2.532 factor(industry)NoDur -0.0597369 0.0255610 -2.337 factor(industry)Shops -0.0782346 0.0201570 -3.881 factor(industry)Telcm 0.0558288 0.0391838 1.425 Pr(>|t|) (Intercept) 6.69e-13 *** logassets 0.006594 ** capexta 0.153149 rdta 2.68e-10 *** taxes 0.481913 spqualityA- 0.080323 . spqualityA+ 0.982474 spqualityB 1.84e-05 *** spqualityB- 0.000197 *** spqualityB+ 7.73e-08 *** spqualityC < 2e-16 *** spqualityD < 2e-16 *** divta 0.002024 ** cashta 1.50e-10 *** factor(industry)Bus-eq 2.91e-08 *** factor(industry)Chem 0.035248 * factor(industry)Durbl 0.004227 ** factor(industry)Enrgy 0.020149 * factor(industry)Fin 0.890031 factor(industry)Hlth 0.000770 *** factor(industry)Manuf 0.011405 * factor(industry)NoDur 0.019523 * factor(industry)Shops 0.000107 *** factor(industry)Telcm 0.154352 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.181 on 2314 degrees of freedom (4662 observations deleted due to missingness) Multiple R-squared: 0.2262, Adjusted R-squared: 0.2186
F-statistic: 29.42 on 23 and 2314 DF, p-value: < 2.2e-16 > mean((test$leverage-predict(lm.fit6, test))^2) Error in eval(predvars, data, env) : object 'capexta' not found > #Logisitc Regression > #simple logistic > glm.fit <- glm(leverageincrease~logassets, family=binomial, data=capstr) > summary(glm.fit) Call: glm(formula = leverageincrease ~ logassets, family = binomial, data = capstr) Deviance Residuals: Min 1Q Median 3Q Max -0.6956 -0.6610 -0.6528 -0.6454 1.8305 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.65714 0.23668 -7.002 2.53e-12 *** logassets 0.02735 0.02707 1.010 0.312 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 5552.1 on 5633 degrees of freedom Residual deviance: 5551.1 on 5632 degrees of freedom AIC: 5555.1 Number of Fisher Scoring iterations: 4 > #multiple logistic > glm.fit <- glm(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, family=binomial, data=capstr) > summary(glm.fit) Call: glm(formula = leverageincrease ~ logassets + capexta + rdta + taxes + spquality + divta + cashta, family = binomial, data = capstr) Deviance Residuals: Min 1Q Median 3Q Max
-1.1002 -0.6740 -0.6298 -0.5868 1.9781 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.973817 0.300742 -6.563 5.27e-11 *** logassets 0.045184 0.028067 1.610 0.10742 capexta 3.492695 1.102446 3.168 0.00153 ** rdta 1.130905 0.742120 1.524 0.12754 taxes -0.132474 5.417023 -0.024 0.98049 spqualityA- -0.194172 0.183344 -1.059 0.28957 spqualityA+ 0.270778 0.238488 1.135 0.25621 spqualityB -0.064246 0.154990 -0.415 0.67850 spqualityB- 0.008132 0.155342 0.052 0.95825 spqualityB+ -0.159035 0.154894 -1.027 0.30454 spqualityC 0.143413 0.163208 0.879 0.37956 spqualityD 0.363117 0.294022 1.235 0.21683 divta 1.339934 0.922925 1.452 0.14655 cashta 0.170486 0.286316 0.595 0.55155 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 5552.1 on 5633 degrees of freedom Residual deviance: 5520.1 on 5620 degrees of freedom AIC: 5548.1 Number of Fisher Scoring iterations: 4 > #Get predictions on whether market will go up or down > # First code everything as Down > glm.pred = rep("0",5634) > # Recode probabilities greater than .5 as up > glm.probs<-predict(glm.fit,capstr,type="response") > glm.pred[glm.probs>.5]="1" > table(glm.pred,capstr$leverageincrease) glm.pred 0 1 0 4538 1095 1 0 1 > mean(glm.pred==capstr$leverageincrease) [1] 0.8056443 > #Fit a logistic model on training dataset
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
> glm.fit <- glm(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, family=binomial, subset=train, data=capstr) > glm.probs<-predict(glm.fit,test,type="response") Error in eval(predvars, data, env) : object 'capexta' not found > glm.pred=rep("0",3191) > glm.pred[glm.probs>.5]="1" > mean(glm.pred==test$leverageincrease) [1] NaN > #Fit a logistic model on training dataset > glm.fit <- glm(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, family=binomial, subset=train, data=capstr) > glm.probs<-predict(glm.fit,test,type="response") Error in eval(predvars, data, env) : object 'capexta' not found > library(ISLR2) > library(MASS) > library(car) Error in library(car) : there is no package called ‘car’ > library(boot) > library(class) > #Fit a logistic model on training dataset > glm.fit <- glm(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, family=binomial, subset=train, data=capstr) > glm.probs<-predict(glm.fit,test,type="response") Error in eval(predvars, data, env) : object 'capexta' not found > #Add year and industry effects > lm.fit6 <- lm(leverage~logassets+capexta+rdta+taxes+spquality+divta+cashta+factor(industry), data=capstr,subset=train) > summary(lm.fit6) Call: lm(formula = leverage ~ logassets + capexta + rdta + taxes + spquality + divta + cashta + factor(industry), data = capstr, subset = train) Residuals: Min 1Q Median 3Q Max -0.44866 -0.11462 -0.01504 0.09621 1.33151 Coefficients: Estimate Std. Error t value (Intercept) 0.2845002 0.0393679 7.227 logassets 0.0088708 0.0032623 2.719 capexta -0.2056597 0.1439220 -1.429 rdta -0.6576442 0.1036640 -6.344
taxes -0.4173570 0.5933910 -0.703 spqualityA- 0.0365232 0.0208753 1.750 spqualityA+ -0.0006213 0.0282817 -0.022 spqualityB 0.0792077 0.0184534 4.292 spqualityB- 0.0699337 0.0187556 3.729 spqualityB+ 0.0978148 0.0181447 5.391 spqualityC 0.1736531 0.0199029 8.725 spqualityD 0.3608247 0.0373088 9.671 divta 0.5490122 0.1776610 3.090 cashta -0.2229544 0.0346485 -6.435 factor(industry)Bus-eq -0.1053724 0.0189318 -5.566 factor(industry)Chem -0.0501774 0.0238176 -2.107 factor(industry)Durbl -0.0709785 0.0247872 -2.864 factor(industry)Enrgy -0.0899567 0.0386883 -2.325 factor(industry)Fin 0.0028891 0.0208928 0.138 factor(industry)Hlth -0.0724094 0.0214998 -3.368 factor(industry)Manuf -0.0588759 0.0232521 -2.532 factor(industry)NoDur -0.0597369 0.0255610 -2.337 factor(industry)Shops -0.0782346 0.0201570 -3.881 factor(industry)Telcm 0.0558288 0.0391838 1.425 Pr(>|t|) (Intercept) 6.69e-13 *** logassets 0.006594 ** capexta 0.153149 rdta 2.68e-10 *** taxes 0.481913 spqualityA- 0.080323 . spqualityA+ 0.982474 spqualityB 1.84e-05 *** spqualityB- 0.000197 *** spqualityB+ 7.73e-08 *** spqualityC < 2e-16 *** spqualityD < 2e-16 *** divta 0.002024 ** cashta 1.50e-10 *** factor(industry)Bus-eq 2.91e-08 *** factor(industry)Chem 0.035248 * factor(industry)Durbl 0.004227 ** factor(industry)Enrgy 0.020149 * factor(industry)Fin 0.890031 factor(industry)Hlth 0.000770 *** factor(industry)Manuf 0.011405 * factor(industry)NoDur 0.019523 * factor(industry)Shops 0.000107 ***
factor(industry)Telcm 0.154352 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.181 on 2314 degrees of freedom (4662 observations deleted due to missingness) Multiple R-squared: 0.2262, Adjusted R-squared: 0.2186 F-statistic: 29.42 on 23 and 2314 DF, p-value: < 2.2e-16 > mean((test$leverage-predict(lm.fit6, test))^2) Error in eval(predvars, data, env) : object 'capexta' not found > #Logisitc Regression > #simple logistic > glm.fit <- glm(leverageincrease~logassets, family=binomial, data=capstr) > summary(glm.fit) Call: glm(formula = leverageincrease ~ logassets, family = binomial, data = capstr) Deviance Residuals: Min 1Q Median 3Q Max -0.6956 -0.6610 -0.6528 -0.6454 1.8305 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.65714 0.23668 -7.002 2.53e-12 *** logassets 0.02735 0.02707 1.010 0.312 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 5552.1 on 5633 degrees of freedom Residual deviance: 5551.1 on 5632 degrees of freedom AIC: 5555.1 Number of Fisher Scoring iterations: 4 > #multiple logistic > glm.fit <- glm(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, family=binomial, data=capstr)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
> summary(glm.fit) Call: glm(formula = leverageincrease ~ logassets + capexta + rdta + taxes + spquality + divta + cashta, family = binomial, data = capstr) Deviance Residuals: Min 1Q Median 3Q Max -1.1002 -0.6740 -0.6298 -0.5868 1.9781 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.973817 0.300742 -6.563 5.27e-11 *** logassets 0.045184 0.028067 1.610 0.10742 capexta 3.492695 1.102446 3.168 0.00153 ** rdta 1.130905 0.742120 1.524 0.12754 taxes -0.132474 5.417023 -0.024 0.98049 spqualityA- -0.194172 0.183344 -1.059 0.28957 spqualityA+ 0.270778 0.238488 1.135 0.25621 spqualityB -0.064246 0.154990 -0.415 0.67850 spqualityB- 0.008132 0.155342 0.052 0.95825 spqualityB+ -0.159035 0.154894 -1.027 0.30454 spqualityC 0.143413 0.163208 0.879 0.37956 spqualityD 0.363117 0.294022 1.235 0.21683 divta 1.339934 0.922925 1.452 0.14655 cashta 0.170486 0.286316 0.595 0.55155 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 5552.1 on 5633 degrees of freedom Residual deviance: 5520.1 on 5620 degrees of freedom AIC: 5548.1 Number of Fisher Scoring iterations: 4 > #Get predictions on whether market will go up or down > # First code everything as Down > glm.pred = rep("0",5634) > # Recode probabilities greater than .5 as up > glm.probs<-predict(glm.fit,capstr,type="response") > glm.pred[glm.probs>.5]="1"
> table(glm.pred,capstr$leverageincrease) glm.pred 0 1 0 4538 1095 1 0 1 > mean(glm.pred==capstr$leverageincrease) [1] 0.8056443 > #Fit a logistic model on training dataset > glm.fit <- glm(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, family=binomial, subset=train, data=capstr) > #Fit a logistic model on training dataset > glm.fit <- glm(leverageincrease~logassets+rdta+taxes+spquality+divta+cashta, family=binomial, subset=train, data=capstr) > glm.probs<-predict(glm.fit,test,type="response") Error in eval(predvars, data, env) : object 'taxes' not found > #Fit a logistic model on training dataset > glm.fit <- glm(leverageincrease~logassets+rdta+spquality+divta+cashta, family=binomial, subset=train, data=capstr) > glm.probs<-predict(glm.fit,test,type="response") Error in eval(predvars, data, env) : object 'spquality' not found > glm.probs<-predict(glm.fit,test,type="response") Error in eval(predvars, data, env) : object 'spquality' not found > glm.pred=rep("0",3191) > glm.pred[glm.probs>.5]="1" > mean(glm.pred==test$leverageincrease) [1] NaN > #Linear Discriminant Analysis > lda.fit1 <- lda(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, data=capstr) > lda.fit1 Call: lda(leverageincrease ~ logassets + capexta + rdta + taxes + spquality + divta + cashta, data = capstr) Prior probabilities of groups: 0 1 0.8054668 0.1945332 Group means: logassets capexta rdta taxes spqualityA- 0 8.620847 0.02826140 0.02841552 0.003660968 0.08241516 1 8.662744 0.03144299 0.03230833 0.003812621 0.07208029 spqualityA+ spqualityB spqualityB- spqualityB+ spqualityC 0 0.02269722 0.2194799 0.2170560 0.2322609 0.1542530
1 0.03193431 0.2107664 0.2198905 0.2025547 0.1806569 spqualityD divta cashta 0 0.01344204 0.02014315 0.1229749 1 0.01824818 0.02128598 0.1313459 Coefficients of linear discriminants: LD1 logassets 0.23315524 capexta 19.18949555 rdta 6.50725922 taxes -2.15264353 spqualityA- -0.97400065 spqualityA+ 1.56080364 spqualityB -0.30678978 spqualityB- 0.07470397 spqualityB+ -0.78022105 spqualityC 0.81401206 spqualityD 2.05192296 divta 8.27987601 cashta 0.82615561 > plot(lda.fit1) Error in plot.new() : figure margins too large > lda.pred <- predict(lda.fit1,capstr) > names(lda.pred) [1] "class" "posterior" "x" > lda.class <- lda.pred$class > table(lda.class, capstr$leverageincrease) lda.class 0 1 0 4538 1094 1 0 2 > table(lda.class, capstr$leverageincrease) lda.class 0 1 0 4538 1094 1 0 2 > mean(lda.class==capstr$leverageincrease) [1] 0.8058218 > lda.fit2<- lda(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, data=capstr, subset=train) > lda.fit2 Call: lda(leverageincrease ~ logassets + capexta + rdta + taxes + spquality + divta + cashta, data = capstr, subset = train)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Prior probabilities of groups: 0 1 0.8028229 0.1971771 Group means: logassets capexta rdta taxes spqualityA- 0 8.623723 0.02853900 0.02651503 0.003626550 0.08311135 1 8.696534 0.03031127 0.03444373 0.003849535 0.06507592 spqualityA+ spqualityB spqualityB- spqualityB+ spqualityC 0 0.02397443 0.2258924 0.2226958 0.2296217 0.1491742 1 0.03470716 0.1822126 0.2364425 0.2082430 0.1952278 spqualityD divta cashta 0 0.01331913 0.02021972 0.1193634 1 0.01084599 0.01828916 0.1312630 Coefficients of linear discriminants: LD1 logassets 0.2966223 capexta 5.4377002 rdta 12.5297516 taxes 5.4496934 spqualityA- -1.7077322 spqualityA+ 0.7639195 spqualityB -1.7749926 spqualityB- -0.8510942 spqualityB+ -1.3590768 spqualityC -0.1672927 spqualityD -1.4752464 divta -10.9214691 cashta -0.4883071 > plot(lda.fit2) Error in plot.new() : figure margins too large > lda.pred<-predict(lda.fit2, test) Error in eval(predvars, data, env) : object 'capexta' not found > names(lda.pred) [1] "class" "posterior" "x" > test.levinc <- test$leverageincrease > lda.class<-lda.pred$class > table(lda.class,test.levinc) Error in table(lda.class, test.levinc) : all arguments must have the same length > mean(lda.class==test.levinc) [1] NaN
> attach(capstr) The following objects are masked from ceo_comp: gvkey, industry, leverage, logassets, rdta, roa, year > library(readxl) > capstr <- read_excel("Desktop/capstr.xlsx") > View(capstr) > library(ISLR2) > library(MASS) > library(car) Error in library(car) : there is no package called ‘car’ > library(boot) > library(class) > # import data and clean > capstr <- na.omit(capstr) > dim(capstr) [1] 5634 14 > names(capstr) [1] "gvkey" "year" "conm" [4] "spquality" "industry" "leverage" [7] "logassets" "rdta" "cashta" [10] "divta" "taxes" "capexta" [13] "roa" "leverageincrease" > #inspect your data > mean(capstr$leverage) [1] 0.3427896 > median(capstr$leverage) [1] 0.3278918 > sd(capstr$leverage) [1] 0.2064969 > #linear regression of leverage > lm.fit1 <- lm(leverage~logassets, data=capstr) > summary(lm.fit1) Call: lm(formula = leverage ~ logassets, data = capstr) Residuals: Min 1Q Median 3Q Max -0.36522 -0.14258 -0.01556 0.11545 1.50374 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 0.250999 0.019425 12.922 < 2e-16 *** logassets 0.010638 0.002228 4.773 1.86e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2061 on 5632 degrees of freedom Multiple R-squared: 0.004029, Adjusted R-squared: 0.003853 F-statistic: 22.79 on 1 and 5632 DF, p-value: 1.857e-06 > #training test split > train<-(capstr$year<2018) > test <- capstr[!train,] > lm.fit3 <- lm(leverage~logassets, data=capstr, subset=train) > mean((test$leverage-predict(lm.fit3, test))^2) [1] 0.04393114 > #multiple regression > lm.fit5 <- lm(leverage~logassets+capexta+rdta+taxes+spquality+divta+cashta, data=capstr, subset=train) > summary(lm.fit5) Call: lm(formula = leverage ~ logassets + capexta + rdta + taxes + spquality + divta + cashta, data = capstr, subset = train) Residuals: Min 1Q Median 3Q Max -0.43731 -0.11904 -0.01312 0.09942 1.61583 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.220487 0.033958 6.493 1.02e-10 *** logassets 0.008460 0.003217 2.630 0.008594 ** capexta -0.595151 0.123316 -4.826 1.48e-06 *** rdta -0.521944 0.092958 -5.615 2.19e-08 *** taxes -1.661234 0.664315 -2.501 0.012461 * spqualityA- 0.077493 0.020388 3.801 0.000148 *** spqualityA+ -0.012045 0.028820 -0.418 0.676034 spqualityB 0.097512 0.017649 5.525 3.64e-08 *** spqualityB- 0.099913 0.017668 5.655 1.74e-08 *** spqualityB+ 0.111681 0.017550 6.364 2.35e-10 *** spqualityC 0.157216 0.018804 8.361 < 2e-16 *** spqualityD 0.264586 0.030928 8.555 < 2e-16 ***
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
divta 0.698880 0.133180 5.248 1.67e-07 *** cashta -0.352665 0.032688 -10.789 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1847 on 2429 degrees of freedom Multiple R-squared: 0.1743, Adjusted R-squared: 0.1699 F-statistic: 39.44 on 13 and 2429 DF, p-value: < 2.2e-16 > mean((test$leverage-predict(lm.fit5, test))^2) [1] 0.03657678 > #Add year and industry effects > lm.fit6 <- lm(leverage~logassets+capexta+rdta+taxes+spquality+divta+cashta+factor(industry), data=capstr,subset=train) > summary(lm.fit6) Call: lm(formula = leverage ~ logassets + capexta + rdta + taxes + spquality + divta + cashta + factor(industry), data = capstr, subset = train) Residuals: Min 1Q Median 3Q Max -0.45878 -0.11077 -0.01682 0.09314 1.62986 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.328070 0.038296 8.567 < 2e-16 logassets 0.006216 0.003209 1.937 0.052899 capexta -0.258067 0.136773 -1.887 0.059304 rdta -0.370863 0.096991 -3.824 0.000135 taxes -0.602740 0.664720 -0.907 0.364625 spqualityA- 0.061749 0.019961 3.094 0.002000 spqualityA+ -0.012442 0.028097 -0.443 0.657932 spqualityB 0.074368 0.017384 4.278 1.96e-05 spqualityB- 0.067537 0.017578 3.842 0.000125 spqualityB+ 0.092536 0.017186 5.385 7.96e-08 spqualityC 0.128479 0.018772 6.844 9.72e-12 spqualityD 0.249136 0.030222 8.244 2.71e-16 divta 0.468911 0.135039 3.472 0.000525 cashta -0.307321 0.032734 -9.388 < 2e-16 factor(industry)Bus-eq -0.124543 0.018648 -6.679 2.98e-11 factor(industry)Chem -0.051746 0.022961 -2.254 0.024309
factor(industry)Durbl -0.090336 0.024493 -3.688 0.000231 factor(industry)Enrgy -0.138919 0.036960 -3.759 0.000175 factor(industry)Fin -0.017993 0.019967 -0.901 0.367609 factor(industry)Hlth -0.071240 0.021143 -3.369 0.000765 factor(industry)Manuf -0.110165 0.022513 -4.893 1.06e-06 factor(industry)NoDur -0.092187 0.024685 -3.735 0.000192 factor(industry)Shops -0.131980 0.019895 -6.634 4.02e-11 factor(industry)Telcm 0.066267 0.035486 1.867 0.061962 (Intercept) *** logassets . capexta . rdta *** taxes spqualityA- ** spqualityA+ spqualityB *** spqualityB- *** spqualityB+ *** spqualityC *** spqualityD *** divta *** cashta *** factor(industry)Bus-eq *** factor(industry)Chem * factor(industry)Durbl *** factor(industry)Enrgy *** factor(industry)Fin factor(industry)Hlth *** factor(industry)Manuf *** factor(industry)NoDur *** factor(industry)Shops *** factor(industry)Telcm . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1797 on 2419 degrees of freedom Multiple R-squared: 0.2215, Adjusted R-squared: 0.2141 F-statistic: 29.93 on 23 and 2419 DF, p-value: < 2.2e-16 > mean((test$leverage-predict(lm.fit6, test))^2) [1] 0.0358012 > #Logisitc Regression
> #simple logistic > glm.fit <- glm(leverageincrease~logassets, family=binomial, data=capstr) > summary(glm.fit) Call: glm(formula = leverageincrease ~ logassets, family = binomial, data = capstr) Deviance Residuals: Min 1Q Median 3Q Max -0.6956 -0.6610 -0.6528 -0.6454 1.8305 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.65714 0.23668 -7.002 2.53e-12 *** logassets 0.02735 0.02707 1.010 0.312 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 5552.1 on 5633 degrees of freedom Residual deviance: 5551.1 on 5632 degrees of freedom AIC: 5555.1 Number of Fisher Scoring iterations: 4 > #multiple logistic > glm.fit <- glm(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, family=binomial, data=capstr) > summary(glm.fit) Call: glm(formula = leverageincrease ~ logassets + capexta + rdta + taxes + spquality + divta + cashta, family = binomial, data = capstr) Deviance Residuals: Min 1Q Median 3Q Max -1.1002 -0.6740 -0.6298 -0.5868 1.9781 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.973817 0.300742 -6.563 5.27e-11 ***
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
logassets 0.045184 0.028067 1.610 0.10742 capexta 3.492695 1.102446 3.168 0.00153 ** rdta 1.130905 0.742120 1.524 0.12754 taxes -0.132474 5.417023 -0.024 0.98049 spqualityA- -0.194172 0.183344 -1.059 0.28957 spqualityA+ 0.270778 0.238488 1.135 0.25621 spqualityB -0.064246 0.154990 -0.415 0.67850 spqualityB- 0.008132 0.155342 0.052 0.95825 spqualityB+ -0.159035 0.154894 -1.027 0.30454 spqualityC 0.143413 0.163208 0.879 0.37956 spqualityD 0.363117 0.294022 1.235 0.21683 divta 1.339934 0.922925 1.452 0.14655 cashta 0.170486 0.286316 0.595 0.55155 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 5552.1 on 5633 degrees of freedom Residual deviance: 5520.1 on 5620 degrees of freedom AIC: 5548.1 Number of Fisher Scoring iterations: 4 > #Get predictions on whether market will go up or down > # First code everything as Down > glm.pred = rep("0",5634) > # Recode probabilities greater than .5 as up > glm.probs<-predict(glm.fit,capstr,type="response") > glm.pred[glm.probs>.5]="1" > table(glm.pred,capstr$leverageincrease) glm.pred 0 1 0 4538 1095 1 0 1 > #Fit a logistic model on training dataset > glm.fit <- glm(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, family=binomial, subset=train, data=capstr) > glm.probs<-predict(glm.fit,test,type="response") > glm.pred=rep("0",3191) > glm.pred[glm.probs>.5]="1" > mean(glm.pred==test$leverageincrease) [1] 0.7586963
> #Linear Discriminant Analysis > lda.fit1 <- lda(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, data=capstr) > lda.fit1 Call: lda(leverageincrease ~ logassets + capexta + rdta + taxes + spquality + divta + cashta, data = capstr) Prior probabilities of groups: 0 1 0.8054668 0.1945332 Group means: logassets capexta rdta taxes spqualityA- 0 8.620847 0.02826140 0.02841552 0.003660968 0.08241516 1 8.662744 0.03144299 0.03230833 0.003812621 0.07208029 spqualityA+ spqualityB spqualityB- spqualityB+ spqualityC 0 0.02269722 0.2194799 0.2170560 0.2322609 0.1542530 1 0.03193431 0.2107664 0.2198905 0.2025547 0.1806569 spqualityD divta cashta 0 0.01344204 0.02014315 0.1229749 1 0.01824818 0.02128598 0.1313459 Coefficients of linear discriminants: LD1 logassets 0.23315524 capexta 19.18949555 rdta 6.50725922 taxes -2.15264353 spqualityA- -0.97400065 spqualityA+ 1.56080364 spqualityB -0.30678978 spqualityB- 0.07470397 spqualityB+ -0.78022105 spqualityC 0.81401206 spqualityD 2.05192296 divta 8.27987601 cashta 0.82615561 > plot(lda.fit1) Error in plot.new() : figure margins too large > lda.pred <- predict(lda.fit1,capstr) > names(lda.pred) [1] "class" "posterior" "x" > lda.class <- lda.pred$class
> table(lda.class, capstr$leverageincrease) lda.class 0 1 0 4538 1094 1 0 2 > mean(lda.class==capstr$leverageincrease) [1] 0.8058218 > lda.fit2<- lda(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, data=capstr, subset=train) > lda.fit2 Call: lda(leverageincrease ~ logassets + capexta + rdta + taxes + spquality + divta + cashta, data = capstr, subset = train) Prior probabilities of groups: 0 1 0.8665575 0.1334425 Group means: logassets capexta rdta taxes spqualityA- 0 8.560676 0.03063772 0.02823171 0.003542609 0.07888521 1 8.588997 0.03405813 0.02995474 0.003885632 0.08588957 spqualityA+ spqualityB spqualityB- spqualityB+ spqualityC 0 0.02267359 0.2243741 0.2272083 0.2172886 0.1549362 1 0.03067485 0.1871166 0.2024540 0.2239264 0.1656442 spqualityD divta cashta 0 0.02031176 0.02069909 0.1212555 1 0.01840491 0.02140061 0.1267697 Coefficients of linear discriminants: LD1 logassets 0.07437504 capexta 14.18858846 rdta 2.75013964 taxes 26.01643615 spqualityA- -2.09679273 spqualityA+ -0.80507013 spqualityB -3.25118683 spqualityB- -2.94930233 spqualityB+ -2.29109597 spqualityC -2.16343389 spqualityD -2.73205749 divta -0.33553702 cashta 0.78987199
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
> plot(lda.fit2) Error in plot.new() : figure margins too large > lda.pred<-predict(lda.fit2, test) > names(lda.pred) [1] "class" "posterior" "x" > test.levinc <- test$leverageincrease > lda.class<-lda.pred$class > table(lda.class,test.levinc) test.levinc lda.class 0 1 0 2421 770 1 0 0 > mean(lda.class==test.levinc) [1] 0.7586963 > attach(capstr) The following objects are masked from capstr (pos = 4): capexta, cashta, conm, divta, gvkey, industry, leverage, leverageincrease, logassets, rdta, roa, spquality, taxes, year The following objects are masked from ceo_comp: gvkey, industry, leverage, logassets, rdta, roa, year > train.X<-cbind(logassets, roa)[train,] > test.X<-cbind(logassets, roa)[!train,] > train.levinc=leverageincrease[train] > test.levinc=leverageincrease[!train] > knn.pred1<-knn(train.X,test.X,train.levinc,k=5) > table(knn.pred,test.levinc) Error in table(knn.pred, test.levinc) : all arguments must have the same length > mean(knn.pred==test.levinc) [1] 0 Warning messages: 1: In `==.default`(knn.pred, test.levinc) : longer object length is not a multiple of shorter object length 2: In is.na(e1) | is.na(e2) : longer object length is not a multiple of shorter object length > lm.fit1 <- lm(leverage~logassets, data=capstr, subset=train) > mean((test$leverage-predict(lm.fit1, test))^2) [1] 0.04393114
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
> lm.fit2 <- lm(leverage~poly(logassets,2), data=capstr, subset=train) > mean((test$leverage-predict(lm.fit2, test))^2) [1] 0.04301659 > lm.fit3 <- lm(leverage~poly(logassets,3), data=capstr, subset=train) > mean((test$leverage-predict(lm.fit3,test))^2) [1] 0.04300088 > length(train.X) [1] 4886 > length(test.X) [1] 6382 > length(train.levinc) [1] 2443 > length(test.levinc) [1] 3191 > split_ratio <- 0.7 > total_samples <- nrow(your_data) Error in nrow(your_data) : object 'your_data' not found > total_samples <- nrow(capstr) > train_indices <- sample(1:total_samples, size = round(total_samples * split_ratio)) > train.X<-cbind(logassets, roa)[train,] > test.X<-cbind(logassets, roa)[!train,] > train.levinc=leverageincrease[train] > test.levinc=leverageincrease[!train] > knn.pred1<-knn(train.X,test.X,train.levinc,k=5) > table(knn.pred,test.levinc) Error in table(knn.pred, test.levinc) : all arguments must have the same length > length(train.X) [1] 4886 > length(test.X) [1] 6382 > length(train.levinc) [1] 2443 > length(test.levinc) [1] 3191 > split_ratio <- 0.7 > total_samples <- nrow(capstr) > train.X <- cbind(logassets, roa)[train, ] > test.X <- cbind(logassets, roa)[!train, ] > train.levinc = leverageincrease[train] > test.levinc = leverageincrease[!train] > knn.pred1 <- knn(train.X, test.X, train.levinc, k = 5) > confusion_matrix <- table(knn.pred1, test.levinc) > accuracy <- mean(knn.pred1 == test.levinc)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
> confusion_matrix test.levinc knn.pred1 0 1 0 2385 748 1 36 22 > accuracy [1] 0.754309 >
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help