Dickman_William_HW3

docx

School

University of North Carolina, Pembroke *

*We aren’t endorsed by this school

Course

5190

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

22

Uploaded by MateCheetahMaster827

Report
Linear & Logistic Regression HW3 By William Dickman, UNCP Linear Regression Model 1. Are all the independent variables statistically significant in the linear regression model (Clue: look at the t-values of estimated coefficients in the linear regression model)? Not all of the variables are significant. Mining, Gold, and Build are statistically insignificant, and I would also say that Transport is also insignificant given its t-value is less than 2.0. 2. How well does the linear regression model perform in forecasting the 90 days bank bill rate using these 15 independent variables (Clue: look at the min_max accuracy and the MAPE measure, as illustrated in the Linear Regression in R-A Case Study.R script file)? The Linear Regression Model seems to do well in forecasting the 90 days bank bill rate given the min_max accuracy is 0.9535892, which is close to 1. And, the MAPE is 0.04953731, which is fairly low. Linear Regression Model > BankBillRate.Data <- read.csv("C:/Users/wwdic/Downloads/BankBillData.csv") > View(BankBillRate.Data) > head(BankBillRate.Data, 20) BankSharePriceIndex AllOrdinaries Develop Mining Gold 1 2292.6 5634.6 2976.5 672.6 1148.4 2 2243.0 5469.1 2761.1 638.5 1160.9 3 2334.4 5502.3 2992.8 662.2 1200.7 4 2162.5 5507.8 2750.7 687.7 1214.9 5 2127.8 5556.5 2761.3 700.6 1239.3 6 2049.1 5403.1 2607.3 693.2 1186.9 7 2192.1 5620.7 2783.1 705.8 1144.8 8 2139.6 5816.4 2724.6 712.4 1079.5 9 2040.5 5720.1 2710.9 714.7 1145.8 10 2048.6 5631.5 2696.9 707.9 1186.2 11 1886.6 5392.5 2456.4 678.7 1108.6 12 1775.6 5198.0 2301.0 659.7 1086.8 13 1711.4 5033.5 2243.1 608.2 1015.6 14 1797.3 5124.0 2335.5 571.5 917.8 15 1901.2 5504.6 2474.4 602.3 930.7 16 1850.1 5429.3 2478.0 592.6 980.7 17 2003.4 5727.6 2632.5 623.9 1052.3 18 2114.3 5968.1 2740.2 670.7 1238.4 19 2187.1 6036.0 2716.6 696.0 1456.0
20 2338.9 6253.9 2685.3 784.1 1946.5 Build Prop Industry Energy Finance Resources Transport 1 2530.1 981.6 2607.0 814.4 2110.1 976.4 3357.5 2 2354.2 953.8 2514.7 777.7 2061.1 912.8 3207.9 3 2460.9 1021.2 2609.7 750.0 2046.7 922.5 3472.7 4 2370.7 1008.1 2519.3 732.6 2050.2 938.0 3282.6 5 2382.0 981.2 2489.2 728.7 2021.1 944.3 3121.9 6 2300.7 975.6 2434.1 708.0 1926.4 930.4 2998.9 7 2452.2 1006.6 2571.3 709.9 1959.9 955.9 3211.8 8 2409.3 1005.3 2604.6 728.7 2013.0 969.8 3272.5 9 2351.0 1012.3 2529.6 704.5 2036.5 966.8 3186.1 10 2235.8 1039.8 2497.1 720.7 2026.7 944.5 3013.1 11 2098.0 1002.6 2395.1 701.6 1991.6 898.3 2819.5 12 1960.2 992.4 2296.1 707.1 1928.3 864.1 2670.7 13 1928.3 986.2 2236.7 670.0 1796.4 805.2 2571.3 14 1922.7 989.3 2310.4 670.8 1864.8 787.9 2562.5 15 2094.8 991.7 2449.8 685.0 1915.8 861.3 2703.9 16 2024.2 989.7 2418.7 677.0 1900.5 846.5 2640.4 17 2120.5 1002.3 2535.9 694.0 1946.9 900.8 2644.8 18 2192.6 1044.1 2602.2 749.6 2005.3 952.9 2701.0 19 2165.3 1064.1 2615.6 771.3 2022.3 967.9 2698.0 20 2188.6 1076.2 2673.1 762.9 2024.7 1023.7 2529.8 Retail Unemployment CPI BankBillRate 1 2807.8 10.9 107.6 8.70 2 2701.3 11.0 107.6 8.25 3 2812.8 11.1 107.6 7.70 4 2764.3 11.3 107.6 7.55 5 2641.7 11.4 107.6 7.50 6 2641.7 11.6 107.6 7.55 7 2798.8 11.7 107.3 6.95 8 2859.1 11.9 107.3 6.50 9 2707.7 11.9 107.3 6.40 10 2586.2 12.0 107.4 5.55 11 2454.8 12.0 107.4 5.90 12 2548.9 12.0 107.4 5.95 13 2350.1 12.0 107.9 5.85 14 2242.4 12.0 107.9 5.85 15 2494.7 12.0 107.9 5.90 16 2402.0 12.0 108.9 5.85 17 2444.0 12.0 108.9 5.80 18 2464.3 11.9 108.9 5.35 19 2490.8 11.8 109.3 5.25 20 2646.5 11.8 109.3 5.15 > names(BankBillRate.Data) [1] "BankSharePriceIndex" "AllOrdinaries" [3] "Develop" "Mining" [5] "Gold" "Build" [7] "Prop" "Industry" [9] "Energy" "Finance" [11] "Resources" "Transport" [13] "Retail" "Unemployment" [15] "CPI" "BankBillRate" > dim(BankBillRate.Data) [1] 71 16 > str(BankBillRate.Data)
'data.frame': 71 obs. of 16 variables: $ BankSharePriceIndex: num 2293 2243 2334 2162 2128 ... $ AllOrdinaries : num 5635 5469 5502 5508 5556 ... $ Develop : num 2976 2761 2993 2751 2761 ... $ Mining : num 673 638 662 688 701 ... $ Gold : num 1148 1161 1201 1215 1239 ... $ Build : num 2530 2354 2461 2371 2382 ... $ Prop : num 982 954 1021 1008 981 ... $ Industry : num 2607 2515 2610 2519 2489 ... $ Energy : num 814 778 750 733 729 ... $ Finance : num 2110 2061 2047 2050 2021 ... $ Resources : num 976 913 922 938 944 ... $ Transport : num 3358 3208 3473 3283 3122 ... $ Retail : num 2808 2701 2813 2764 2642 ... $ Unemployment : num 10.9 11 11.1 11.3 11.4 11.6 11.7 11.9 11.9 12 ... $ CPI : num 108 108 108 108 108 ... $ BankBillRate : num 8.7 8.25 7.7 7.55 7.5 7.55 6.95 6.5 6.4 5.55 ... > summary(BankBillRate.Data) BankSharePriceIndex AllOrdinaries Develop Min. :1711 Min. : 5034 Min. :2243 1st Qu.:2314 1st Qu.: 6002 1st Qu.:2773 Median :2820 Median : 7736 Median :3220 Mean :2911 Mean : 7724 Mean :3508 3rd Qu.:3249 3rd Qu.: 9037 3rd Qu.:4097 Max. :5193 Max. :11583 Max. :6098 Mining Gold Build Min. : 571.5 Min. : 917.8 Min. :1923 1st Qu.: 713.5 1st Qu.:1272.2 1st Qu.:2340 Median : 927.3 Median :1830.3 Median :2435 Mean : 877.0 Mean :1737.3 Mean :2442 3rd Qu.: 998.1 3rd Qu.:2111.8 3rd Qu.:2508 Max. :1098.2 Max. :2604.0 Max. :3194 Prop Industry Energy Min. : 953.8 Min. :2237 Min. : 670 1st Qu.:1029.1 1st Qu.:2613 1st Qu.: 768 Median :1077.3 Median :3052 Median : 914 Mean :1082.6 Mean :3072 Mean :1017 3rd Qu.:1130.5 3rd Qu.:3349 3rd Qu.:1150 Max. :1255.2 Max. :4392 Max. :1995 Finance Resources Transport Min. :1796 Min. : 787.9 Min. :2499 1st Qu.:2048 1st Qu.: 973.1 1st Qu.:2700 Median :2318 Median :1282.4 Median :2964 Mean :2444 Mean :1212.5 Mean :3040 3rd Qu.:2692 3rd Qu.:1387.5 3rd Qu.:3258 Max. :3546 Max. :1508.8 Max. :4322 Retail Unemployment CPI Min. :2224 Min. : 8.90 Min. :107.3 1st Qu.:2410 1st Qu.: 9.05 1st Qu.:109.1 Median :2549 Median : 9.90 Median :111.9 Mean :2603 Mean :10.30 Mean :113.6 3rd Qu.:2714 3rd Qu.:11.75 3rd Qu.:119.0 Max. :3522 Max. :12.00 Max. :120.5 BankBillRate Min. :4.750
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1st Qu.:5.365 Median :6.080 Mean :6.418 3rd Qu.:7.520 Max. :8.700 > check.missing.value <- complete.cases(BankBillRate.Data) > View(check.missing.value) > which(check.missing.value==FALSE) integer(0) > hist(BankBillRate.Data$BankBillRate, breaks = 5, + include.lowest = TRUE, right = TRUE, + density = NULL, angle = 45, col = "lightblue", border = "red", + main = paste("Histogram BankBillRate"), + xlim = c(min(BankBillRate.Data$BankBillRate), max(BankBillRate.Data$BankBillRate)), ylim = NULL, + xlab = "Bank Bill Rate", ylab ="Frequency", axes = TRUE, plot = TRUE, labels = FALSE) > plot(density(BankBillRate.Data$BankBillRate), type="l", + main="Smoothed Density of Bank Bill Rate", lwd=2, + xlab="Bank Bill Rate", ylab="Density Estimate", + col=hcl(h=195,l=65,c=100))
> boxplot(BankBillRate.Data$BankBillRate, outchar=TRUE, main="Boxplot of Bank Bill Rate", + cex=0.7, xlab="Bank Bill Rate", col=hcl(h=195,l=65,c=100)) > qqnorm(BankBillRate.Data$BankBillRate, col=hcl(h=195,l=65,c=100), cex=0.7) > qqline(BankBillRate.Data$BankBillRate)
> indep.data <- subset(BankBillRate.Data, select = -BankBillRate) > View(indep.data) > dim(indep.data) [1] 71 15 > corr.matrix <- cor(indep.data, method = "pearson") > corr.matrix BankSharePriceIndex AllOrdinaries BankSharePriceIndex 1.0000000 0.9678132 AllOrdinaries 0.9678132 1.0000000 Develop 0.9824681 0.9497925 Mining 0.5585195 0.7224221 Gold 0.2915335 0.4492073 Build 0.8524573 0.8118839 Prop 0.7372165 0.7155572 Industry 0.9838414 0.9776901 Energy 0.9665149 0.9268622 Finance 0.9691958 0.9608886 Resources 0.7948060 0.9084346 Transport 0.7206772 0.6158779 Retail 0.6064207 0.4508963 Unemployment -0.7044823 -0.7867961 CPI 0.8557557 0.9119711 Develop Mining Gold BankSharePriceIndex 0.9824681 0.55851946 0.29153352 AllOrdinaries 0.9497925 0.72242214 0.44920735 Develop 1.0000000 0.51132366 0.21857797 Mining 0.5113237 1.00000000 0.88189560 Gold 0.2185780 0.88189560 1.00000000 Build 0.8285718 0.52121534 0.28467751 Prop 0.6662170 0.53671743 0.54592243 Industry 0.9664299 0.64095709 0.40226318 Energy 0.9828873 0.44525558 0.14465172 Finance 0.9772726 0.60007119 0.31246148 Resources 0.7570773 0.93280713 0.71086435 Transport 0.7732470 0.17479457 -0.08945978 Retail 0.6260435 -0.03748334 -0.16491385 Unemployment -0.6770390 -0.67997397 -0.35208227
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CPI 0.8463149 0.65438603 0.33107522 Build Prop Industry BankSharePriceIndex 0.8524573 0.7372165 0.9838414 AllOrdinaries 0.8118839 0.7155572 0.9776901 Develop 0.8285718 0.6662170 0.9664299 Mining 0.5212153 0.5367174 0.6409571 Gold 0.2846775 0.5459224 0.4022632 Build 1.0000000 0.6743188 0.8636178 Prop 0.6743188 1.0000000 0.7883061 Industry 0.8636178 0.7883061 1.0000000 Energy 0.8067534 0.6359402 0.9368569 Finance 0.8174064 0.6860153 0.9684377 Resources 0.7186914 0.6293341 0.8410669 Transport 0.8024921 0.4596345 0.7006322 Retail 0.7260947 0.5013308 0.5936518 Unemployment -0.5402138 -0.2741836 -0.6667515 CPI 0.5924890 0.4516416 0.8293676 Energy Finance Resources BankSharePriceIndex 0.9665149 0.9691958 0.7948060 AllOrdinaries 0.9268622 0.9608886 0.9084346 Develop 0.9828873 0.9772726 0.7570773 Mining 0.4452556 0.6000712 0.9328071 Gold 0.1446517 0.3124615 0.7108644 Build 0.8067534 0.8174064 0.7186914 Prop 0.6359402 0.6860153 0.6293341 Industry 0.9368569 0.9684377 0.8410669 Energy 1.0000000 0.9610850 0.7134835 Finance 0.9610850 1.0000000 0.8061473 Resources 0.7134835 0.8061473 1.0000000 Transport 0.7793019 0.7629800 0.4041056 Retail 0.6185808 0.5520138 0.1871988 Unemployment -0.6636836 -0.6879245 -0.8193082 CPI 0.8290987 0.8457304 0.8361467 Transport Retail Unemployment BankSharePriceIndex 0.72067723 0.606420667 -0.704482268 AllOrdinaries 0.61587790 0.450896283 -0.786796129 Develop 0.77324704 0.626043467 -0.677039030 Mining 0.17479457 -0.037483340 -0.679973966 Gold -0.08945978 -0.164913847 -0.352082266 Build 0.80249211 0.726094681 -0.540213822 Prop 0.45963453 0.501330773 -0.274183578 Industry 0.70063219 0.593651832 -0.666751489 Energy 0.77930189 0.618580770 -0.663683622 Finance 0.76298002 0.552013833 -0.687924545 Resources 0.40410556 0.187198798 -0.819308176 Transport 1.00000000 0.781826596 -0.290569290 Retail 0.78182660 1.000000000 -0.003084187 Unemployment -0.29056929 -0.003084187 1.000000000 CPI 0.40305043 0.194990904 -0.923222554 CPI BankSharePriceIndex 0.8557557 AllOrdinaries 0.9119711 Develop 0.8463149 Mining 0.6543860 Gold 0.3310752
Build 0.5924890 Prop 0.4516416 Industry 0.8293676 Energy 0.8290987 Finance 0.8457304 Resources 0.8361467 Transport 0.4030504 Retail 0.1949909 Unemployment -0.9232226 CPI 1.0000000 > library(corrplot) corrplot 0.92 loaded > correl.plot <- corrplot(corr.matrix, method = "color") > correl.plot <- corrplot(corr.matrix, method = "number", number.cex=0.50) > indep.data.new <- subset(indep.data, select = -c(BankSharePriceIndex, Develop, Energy, AllOrdinaries, Industry, Finance)) > corr.matrix.new <- cor(indep.data.new, method = "pearson") > corr.matrix.new Mining Gold Build Prop Mining 1.00000000 0.88189560 0.5212153 0.5367174 Gold 0.88189560 1.00000000 0.2846775 0.5459224 Build 0.52121534 0.28467751 1.0000000 0.6743188 Prop 0.53671743 0.54592243 0.6743188 1.0000000 Resources 0.93280713 0.71086435 0.7186914 0.6293341 Transport 0.17479457 -0.08945978 0.8024921 0.4596345 Retail -0.03748334 -0.16491385 0.7260947 0.5013308
Unemployment -0.67997397 -0.35208227 -0.5402138 -0.2741836 CPI 0.65438603 0.33107522 0.5924890 0.4516416 Resources Transport Retail Mining 0.9328071 0.17479457 -0.037483340 Gold 0.7108644 -0.08945978 -0.164913847 Build 0.7186914 0.80249211 0.726094681 Prop 0.6293341 0.45963453 0.501330773 Resources 1.0000000 0.40410556 0.187198798 Transport 0.4041056 1.00000000 0.781826596 Retail 0.1871988 0.78182660 1.000000000 Unemployment -0.8193082 -0.29056929 -0.003084187 CPI 0.8361467 0.40305043 0.194990904 Unemployment CPI Mining -0.679973966 0.6543860 Gold -0.352082266 0.3310752 Build -0.540213822 0.5924890 Prop -0.274183578 0.4516416 Resources -0.819308176 0.8361467 Transport -0.290569290 0.4030504 Retail -0.003084187 0.1949909 Unemployment 1.000000000 -0.9232226 CPI -0.923222554 1.0000000 > correl.plot.new <- corrplot(corr.matrix.new, method = "color")
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
> correl.plot.new <- corrplot(corr.matrix.new, method = "number", number.cex=0.50)
> BankBillRate.Data.New <- subset(BankBillRate.Data, select = -c(BankSharePriceIndex, Develop, Energy, AllOrdinaries, Industry, Finance)) > dim(BankBillRate.Data.New) [1] 71 10 > model.fit.new <- lm(BankBillRate ~ ., data = BankBillRate.Data.New) > model.fit.new Call: lm(formula = BankBillRate ~ ., data = BankBillRate.Data.New) Coefficients: (Intercept) Mining Gold Build 5.189e+01 3.315e-03 4.912e-04 3.557e-05 Prop Resources Transport Retail -7.695e-03 -7.120e-03 -6.358e-04 2.022e-03 Unemployment CPI -1.812e+00 -1.498e-01 > coef(model.fit.new) (Intercept) Mining Gold Build 5.189210e+01 3.315291e-03 4.912432e-04 3.557092e-05 Prop Resources Transport Retail -7.695246e-03 -7.119666e-03 -6.357655e-04 2.022094e-03 Unemployment CPI -1.812056e+00 -1.497882e-01 > summary(model.fit.new) Call: lm(formula = BankBillRate ~ ., data = BankBillRate.Data.New)
Residuals: Min 1Q Median 3Q Max -0.92847 -0.23067 -0.00377 0.27510 0.87227 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.189e+01 6.050e+00 8.577 4.49e-12 *** Mining 3.315e-03 2.522e-03 1.315 0.193559 Gold 4.912e-04 4.607e-04 1.066 0.290528 Build 3.557e-05 8.273e-04 0.043 0.965844 Prop -7.695e-03 1.563e-03 -4.924 6.78e-06 *** Resources -7.120e-03 2.109e-03 -3.376 0.001283 ** Transport -6.358e-04 2.367e-04 -2.686 0.009303 ** Retail 2.022e-03 5.071e-04 3.987 0.000181 *** Unemployment -1.812e+00 1.785e-01 -10.153 9.94e-15 *** CPI -1.498e-01 4.523e-02 -3.311 0.001563 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3839 on 61 degrees of freedom Multiple R-squared: 0.9047, Adjusted R-squared: 0.8907 F-statistic: 64.37 on 9 and 61 DF, p-value: < 2.2e-16 > n.total <- nrow(BankBillRate.Data) > n.total [1] 71 > training.ratio <- 0.80 > n.training <- round(n.total*training.ratio) > n.training [1] 57 > n.test <- n.total - n.training > n.test [1] 14 > set.seed(1) > training.index <- sample(1:n.total,n.training) > training.subdata.new <- BankBillRate.Data.New[training.index,] > View(training.subdata.new) > test.subdata.new <- BankBillRate.Data.New[-training.index,] > View(test.subdata.new) > model.fit.training.new <- lm(BankBillRate ~ ., data = training.subdata.new) > summary(model.fit.training.new) Call: lm(formula = BankBillRate ~ ., data = training.subdata.new) Residuals: Min 1Q Median 3Q Max -0.86073 -0.24540 -0.00969 0.28216 0.82699 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.228e+01 7.271e+00 7.190 4.21e-09 *** Mining 4.667e-03 3.147e-03 1.483 0.144796
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Gold 4.005e-04 5.188e-04 0.772 0.444012 Build -9.535e-05 9.785e-04 -0.097 0.922785 Prop -6.808e-03 1.861e-03 -3.659 0.000639 *** Resources -7.765e-03 2.722e-03 -2.852 0.006437 ** Transport -5.296e-04 2.753e-04 -1.924 0.060423 . Retail 2.004e-03 6.149e-04 3.259 0.002084 ** Unemployment -1.833e+00 2.284e-01 -8.024 2.35e-10 *** CPI -1.614e-01 5.281e-02 -3.056 0.003690 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.401 on 47 degrees of freedom Multiple R-squared: 0.8976, Adjusted R-squared: 0.878 F-statistic: 45.78 on 9 and 47 DF, p-value: < 2.2e-16 > predict.BankBillRate.score.new <- predict(model.fit.training.new, test.subdata.new) > View(predict.BankBillRate.score.new) > observed.vs.predicted.BankBillRate.score.new <- data.frame(cbind(Observed = test.subdata.new$BankBillRate, Predicted = predict.BankBillRate.score.new)) > View(observed.vs.predicted.BankBillRate.score.new) > corre.observed.vs.predicted.new <- cor(observed.vs.predicted.BankBillRate.score.new) > corre.observed.vs.predicted.new Observed Predicted Observed 1.0000000 0.9662228 Predicted 0.9662228 1.0000000 > min.max.accuracy.new <- mean(apply(observed.vs.predicted.BankBillRate.score.new, 1, min) / apply(observed.vs.predicted.BankBillRate.score.new, 1, max)) > min.max.accuracy.new [1] 0.9535892 > mape.new <- mean(abs((observed.vs.predicted.BankBillRate.score.new$Predicted - observed.vs.predicted.BankBillRate.score.new$Observed)) + /observed.vs.predicted.BankBillRate.score.new$Observed) > mape.new [1] 0.04953731
Logistic Regression Model 1. Are all the independent variables statistically significant in the logistic regression model (Clue: look at the z-values of estimated coefficients in the logistic regression model)? Based on the rule of statistical significance being at or above 1.96, not all of the variables are significant. Port of Embarkation, Parents or Children on Board, and Fare would be statistically insignificant because their absolute z-values are below 1.96. 2. How well does the logistic regression model perform in forecasting the Titanic passengers’ survival status using these independent variables (Clue: look at the accuracy rate, as illustrated in the Logistic Regression in R-A Case Study.R script file)? The accuracy rate for this logistic regression model is 0.8282443, or 82.82% (rounded) accurate, which is high accuracy rate. This shows that this logistic regression model performs well in forecasting passengers’ survival status. > Titanic.Data <- read.csv("C:/Users/wwdic/Downloads/TitanicDataset.csv", header = TRUE, sep=",") > View(Titanic.Data) > dim(Titanic.Data) [1] 1309 12 > names(Titanic.Data) [1] "PassengerId" "Survived" "TicketClass" "Name" [5] "Sex" "Age" "SiblingsOrSpousesAboard" "ParentsOrChildrenAboard" [9] "TicketNumber" "Fare" "CabinNumber" "PortOfEmbarkation" > str(Titanic.Data) 'data.frame': 1309 obs. of 12 variables: $ PassengerId : int 1 2 3 4 5 6 7 8 9 10 ... $ Survived : int 0 1 1 1 0 0 0 0 1 1 ... $ TicketClass : int 3 1 3 1 3 3 1 3 3 2 ... $ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ... $ Sex : chr "male" "female" "female" "female" ... $ Age : num 22 38 26 35 35 ... $ SiblingsOrSpousesAboard: int 1 1 0 1 0 0 0 3 0 1 ... $ ParentsOrChildrenAboard: int 0 0 0 0 0 0 0 1 2 0 ... $ TicketNumber : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ... $ Fare : num 7.25 71.28 7.92 53.1 8.05 ... $ CabinNumber : chr "" "C85" "" "C123" ...
$ PortOfEmbarkation : chr "S" "C" "S" "S" ... > check.missing.value <- complete.cases(Titanic.Data) > View(check.missing.value) > which(check.missing.value==FALSE) integer(0) > attach(Titanic.Data) > ftable(Sex) Sex female male 466 843 > ftable(TicketClass) TicketClass 1 2 3 323 277 709 > ftable(PortOfEmbarkation) PortOfEmbarkation C Q S 270 123 916 > summary(SiblingsOrSpousesAboard) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.0000 0.0000 0.0000 0.4989 1.0000 8.0000 > summary(Age) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.17 22.00 28.00 29.47 36.62 80.00 > summary(ParentsOrChildrenAboard) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 0.000 0.000 0.385 0.000 9.000 > summary(Fare) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 7.896 14.454 33.297 31.275 512.329 > hist(Titanic.Data$Fare, breaks = 10, + include.lowest = TRUE, right = TRUE, + density = NULL, angle = 45, col = "lightblue", border = "red", + main = paste("Histogram of Fare"), + xlim = c(0,515), ylim = NULL, + xlab = "Fare", ylab ="Frequency", axes = TRUE, plot = TRUE, labels = FALSE)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
> plot(density(Titanic.Data$Fare), type="l", + main="Smoothed Density of Fare", lwd=2, + xlab="Fare", xlim =c(0,80), ylab="Density Estimate", + col=hcl(h=195,l=65,c=100)) > boxplot(Titanic.Data$Fare, outchar=TRUE, main="Boxplot of Fare", + cex=0.7, xlab="Fare", col=hcl(h=195,l=65,c=100))
> qqnorm(Titanic.Data$Fare, col=hcl(h=195,l=65,c=100), cex=0.7) > qqline(Titanic.Data$Fare) > library(gmodels) > CrossTable(Survived, Sex, digits=2, prop.r=F, prop.t=F, prop.chisq=F, chisq=T) Cell Contents
|-------------------------| | N | | N / Col Total | |-------------------------| Total Observations in Table: 1309 | Sex Survived | female | male | Row Total | -------------|-----------|-----------|-----------| 0 | 81 | 734 | 815 | | 0.17 | 0.87 | | -------------|-----------|-----------|-----------| 1 | 385 | 109 | 494 | | 0.83 | 0.13 | | -------------|-----------|-----------|-----------| Column Total | 466 | 843 | 1309 | | 0.36 | 0.64 | | -------------|-----------|-----------|-----------| Statistics for All Table Factors Pearson's Chi-squared test ------------------------------------------------------------ Chi^2 = 620.2757 d.f. = 1 p = 6.513245e-137 Pearson's Chi-squared test with Yates' continuity correction ------------------------------------------------------------ Chi^2 = 617.3134 d.f. = 1 p = 2.87141e-136 > CrossTable(Survived, TicketClass, digits=2, prop.r=F, prop.t=F, prop.chisq=F, chisq=T) Cell Contents |-------------------------| | N | | N / Col Total | |-------------------------| Total Observations in Table: 1309 | TicketClass Survived | 1 | 2 | 3 | Row Total | -------------|-----------|-----------|-----------|-----------| 0 | 137 | 160 | 518 | 815 | | 0.42 | 0.58 | 0.73 | | -------------|-----------|-----------|-----------|-----------|
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1 | 186 | 117 | 191 | 494 | | 0.58 | 0.42 | 0.27 | | -------------|-----------|-----------|-----------|-----------| Column Total | 323 | 277 | 709 | 1309 | | 0.25 | 0.21 | 0.54 | | -------------|-----------|-----------|-----------|-----------| Statistics for All Table Factors Pearson's Chi-squared test ------------------------------------------------------------ Chi^2 = 91.72368 d.f. = 2 p = 1.209085e-20 > CrossTable(Survived, PortOfEmbarkation, digits=2, prop.r=F, prop.t=F, prop.chisq=F, chisq=T) Cell Contents |-------------------------| | N | | N / Col Total | |-------------------------| Total Observations in Table: 1309 | PortOfEmbarkation Survived | C | Q | S | Row Total | -------------|-----------|-----------|-----------|-----------| 0 | 137 | 69 | 609 | 815 | | 0.51 | 0.56 | 0.66 | | -------------|-----------|-----------|-----------|-----------| 1 | 133 | 54 | 307 | 494 | | 0.49 | 0.44 | 0.34 | | -------------|-----------|-----------|-----------|-----------| Column Total | 270 | 123 | 916 | 1309 | | 0.21 | 0.09 | 0.70 | | -------------|-----------|-----------|-----------|-----------| Statistics for All Table Factors Pearson's Chi-squared test ------------------------------------------------------------ Chi^2 = 24.19378 d.f. = 2 p = 5.576842e-06
> model.fit.logit <- glm(Survived ~ factor(TicketClass) + factor(Sex) + + factor(PortOfEmbarkation) + Age + + SiblingsOrSpousesAboard + ParentsOrChildrenAboard + Fare, + family=binomial(link = "logit"), data = Titanic.Data) > model.fit.logit Call: glm(formula = Survived ~ factor(TicketClass) + factor(Sex) + factor(PortOfEmbarkation) + Age + SiblingsOrSpousesAboard + ParentsOrChildrenAboard + Fare, family = binomial(link = "logit"), data = Titanic.Data) Coefficients: (Intercept) factor(TicketClass)2 factor(TicketClass)3 4.687811 -1.170275 -2.233693 factor(Sex)male factor(PortOfEmbarkation)Q factor(PortOfEmbarkation)S -3.753269 0.079425 -0.177538 Age SiblingsOrSpousesAboard ParentsOrChildrenAboard -0.044310 -0.335308 -0.090660 Fare 0.001882 Degrees of Freedom: 1308 Total (i.e. Null); 1299 Residual Null Deviance: 1735 Residual Deviance: 948.1 AIC: 968.1 > model.fit.logit$coeff (Intercept) factor(TicketClass)2 factor(TicketClass)3 factor(Sex)male 4.687810562 -1.170274637 -2.233693355 - 3.753268593 factor(PortOfEmbarkation)Q factor(PortOfEmbarkation)S Age SiblingsOrSpousesAboard 0.079424990 -0.177538116 -0.044310024 - 0.335308239 ParentsOrChildrenAboard Fare -0.090660453 0.001881518 > summary(model.fit.logit) Call: glm(formula = Survived ~ factor(TicketClass) + factor(Sex) + factor(PortOfEmbarkation) + Age + SiblingsOrSpousesAboard + ParentsOrChildrenAboard + Fare, family = binomial(link = "logit"), data = Titanic.Data) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 4.687811 0.452535 10.359 < 2e-16 *** factor(TicketClass)2 -1.170275 0.283297 -4.131 3.61e-05 *** factor(TicketClass)3 -2.233693 0.287195 -7.778 7.39e-15 *** factor(Sex)male -3.753269 0.192205 -19.527 < 2e-16 *** factor(PortOfEmbarkation)Q 0.079425 0.355198 0.224 0.823063 factor(PortOfEmbarkation)S -0.177538 0.221491 -0.802 0.422808 Age -0.044310 0.007485 -5.920 3.23e-09 *** SiblingsOrSpousesAboard -0.335308 0.096317 -3.481 0.000499 *** ParentsOrChildrenAboard -0.090660 0.099202 -0.914 0.360770 Fare 0.001882 0.002060 0.913 0.361046 ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1735.13 on 1308 degrees of freedom Residual deviance: 948.06 on 1299 degrees of freedom AIC: 968.06 Number of Fisher Scoring iterations: 5 > library(DescTools) Registered S3 method overwritten by 'DescTools': method from reorder.factor gdata > PseudoR2(model.fit.logit) McFadden 0.45361 > library(ResourceSelection) ResourceSelection 0.3-6 2023-06-27 > hoslem.test(Titanic.Data$Survived,fitted(model.fit.logit),g=8) Hosmer and Lemeshow goodness of fit (GOF) test data: Titanic.Data$Survived, fitted(model.fit.logit) X-squared = 20.31, df = 6, p-value = 0.002439 > n.total <- nrow(Titanic.Data) > n.total [1] 1309 > training.ratio <- 0.80 > n.training <- round(n.total*training.ratio) > n.training [1] 1047 > n.test <- n.total - n.training > n.test [1] 262 > set.seed(1) > training.index <- sample(1:n.total,n.training) > training.subdata <- Titanic.Data[training.index,] > View(training.subdata) > test.subdata <- Titanic.Data[-training.index,] > View(test.subdata) > model.fitting.training.logit <- glm(Survived ~ factor(TicketClass) + factor(Sex) + + factor(PortOfEmbarkation) + Age + + SiblingsOrSpousesAboard + ParentsOrChildrenAboard + Fare, + family=binomial(link = "logit"), data =training.subdata) > model.fitting.training.logit Call: glm(formula = Survived ~ factor(TicketClass) + factor(Sex) + factor(PortOfEmbarkation) + Age + SiblingsOrSpousesAboard + ParentsOrChildrenAboard + Fare, family = binomial(link = "logit"), data = training.subdata) Coefficients: (Intercept) factor(TicketClass)2 factor(TicketClass)3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
5.18488 -1.30344 -2.38470 factor(Sex)male factor(PortOfEmbarkation)Q factor(PortOfEmbarkation)S -3.90022 0.04555 -0.29356 Age SiblingsOrSpousesAboard ParentsOrChildrenAboard -0.05377 -0.38463 -0.10321 Fare 0.00288 Degrees of Freedom: 1046 Total (i.e. Null); 1037 Residual Null Deviance: 1383 Residual Deviance: 718.5 AIC: 738.5 > summary(model.fitting.training.logit) Call: glm(formula = Survived ~ factor(TicketClass) + factor(Sex) + factor(PortOfEmbarkation) + Age + SiblingsOrSpousesAboard + ParentsOrChildrenAboard + Fare, family = binomial(link = "logit"), data = training.subdata) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 5.184880 0.532989 9.728 < 2e-16 *** factor(TicketClass)2 -1.303438 0.327241 -3.983 6.80e-05 *** factor(TicketClass)3 -2.384697 0.332402 -7.174 7.28e-13 *** factor(Sex)male -3.900222 0.224313 -17.387 < 2e-16 *** factor(PortOfEmbarkation)Q 0.045554 0.411549 0.111 0.911863 factor(PortOfEmbarkation)S -0.293557 0.257417 -1.140 0.254121 Age -0.053771 0.008535 -6.300 2.98e-10 *** SiblingsOrSpousesAboard -0.384634 0.115482 -3.331 0.000866 *** ParentsOrChildrenAboard -0.103208 0.111490 -0.926 0.354594 Fare 0.002880 0.002390 1.205 0.228286 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1382.60 on 1046 degrees of freedom Residual deviance: 718.49 on 1037 degrees of freedom AIC: 738.49 Number of Fisher Scoring iterations: 5 > predict.survived.score <- predict(model.fitting.training.logit, newdata = test.subdata, type ='response') > View(predict.survived.score) > predict.survived.score <- ifelse(predict.survived.score > 0.5,1,0) > View(predict.survived.score) > accuracy.rate <- mean(predict.survived.score == test.subdata$Survived) > accuracy.rate [1] 0.8282443
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help