Dickman_William_HW3
docx
keyboard_arrow_up
School
University of North Carolina, Pembroke *
*We aren’t endorsed by this school
Course
5190
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
22
Uploaded by MateCheetahMaster827
Linear & Logistic Regression HW3
By William Dickman, UNCP
Linear Regression Model
1.
Are all the independent variables statistically significant in the linear regression model (Clue: look at the t-values of estimated coefficients in the linear regression model)?
Not all of the variables are significant. Mining, Gold, and Build are statistically insignificant, and I would also say that Transport is also insignificant given its t-value is less than 2.0.
2. How well does the linear regression model perform in forecasting the 90 days bank bill rate using these 15 independent variables (Clue: look at the min_max accuracy and the MAPE measure, as illustrated in the Linear Regression in R-A Case Study.R script file)?
The Linear Regression Model seems to do well in forecasting the 90 days bank bill rate given the min_max accuracy is 0.9535892, which is close to 1. And, the MAPE is 0.04953731, which is fairly low.
Linear Regression Model
> BankBillRate.Data <- read.csv("C:/Users/wwdic/Downloads/BankBillData.csv")
> View(BankBillRate.Data)
> head(BankBillRate.Data, 20)
BankSharePriceIndex AllOrdinaries Develop Mining Gold
1 2292.6 5634.6 2976.5 672.6 1148.4
2 2243.0 5469.1 2761.1 638.5 1160.9
3 2334.4 5502.3 2992.8 662.2 1200.7
4 2162.5 5507.8 2750.7 687.7 1214.9
5 2127.8 5556.5 2761.3 700.6 1239.3
6 2049.1 5403.1 2607.3 693.2 1186.9
7 2192.1 5620.7 2783.1 705.8 1144.8
8 2139.6 5816.4 2724.6 712.4 1079.5
9 2040.5 5720.1 2710.9 714.7 1145.8
10 2048.6 5631.5 2696.9 707.9 1186.2
11 1886.6 5392.5 2456.4 678.7 1108.6
12 1775.6 5198.0 2301.0 659.7 1086.8
13 1711.4 5033.5 2243.1 608.2 1015.6
14 1797.3 5124.0 2335.5 571.5 917.8
15 1901.2 5504.6 2474.4 602.3 930.7
16 1850.1 5429.3 2478.0 592.6 980.7
17 2003.4 5727.6 2632.5 623.9 1052.3
18 2114.3 5968.1 2740.2 670.7 1238.4
19 2187.1 6036.0 2716.6 696.0 1456.0
20 2338.9 6253.9 2685.3 784.1 1946.5
Build Prop Industry Energy Finance Resources Transport
1 2530.1 981.6 2607.0 814.4 2110.1 976.4 3357.5
2 2354.2 953.8 2514.7 777.7 2061.1 912.8 3207.9
3 2460.9 1021.2 2609.7 750.0 2046.7 922.5 3472.7
4 2370.7 1008.1 2519.3 732.6 2050.2 938.0 3282.6
5 2382.0 981.2 2489.2 728.7 2021.1 944.3 3121.9
6 2300.7 975.6 2434.1 708.0 1926.4 930.4 2998.9
7 2452.2 1006.6 2571.3 709.9 1959.9 955.9 3211.8
8 2409.3 1005.3 2604.6 728.7 2013.0 969.8 3272.5
9 2351.0 1012.3 2529.6 704.5 2036.5 966.8 3186.1
10 2235.8 1039.8 2497.1 720.7 2026.7 944.5 3013.1
11 2098.0 1002.6 2395.1 701.6 1991.6 898.3 2819.5
12 1960.2 992.4 2296.1 707.1 1928.3 864.1 2670.7
13 1928.3 986.2 2236.7 670.0 1796.4 805.2 2571.3
14 1922.7 989.3 2310.4 670.8 1864.8 787.9 2562.5
15 2094.8 991.7 2449.8 685.0 1915.8 861.3 2703.9
16 2024.2 989.7 2418.7 677.0 1900.5 846.5 2640.4
17 2120.5 1002.3 2535.9 694.0 1946.9 900.8 2644.8
18 2192.6 1044.1 2602.2 749.6 2005.3 952.9 2701.0
19 2165.3 1064.1 2615.6 771.3 2022.3 967.9 2698.0
20 2188.6 1076.2 2673.1 762.9 2024.7 1023.7 2529.8
Retail Unemployment CPI BankBillRate
1 2807.8 10.9 107.6 8.70
2 2701.3 11.0 107.6 8.25
3 2812.8 11.1 107.6 7.70
4 2764.3 11.3 107.6 7.55
5 2641.7 11.4 107.6 7.50
6 2641.7 11.6 107.6 7.55
7 2798.8 11.7 107.3 6.95
8 2859.1 11.9 107.3 6.50
9 2707.7 11.9 107.3 6.40
10 2586.2 12.0 107.4 5.55
11 2454.8 12.0 107.4 5.90
12 2548.9 12.0 107.4 5.95
13 2350.1 12.0 107.9 5.85
14 2242.4 12.0 107.9 5.85
15 2494.7 12.0 107.9 5.90
16 2402.0 12.0 108.9 5.85
17 2444.0 12.0 108.9 5.80
18 2464.3 11.9 108.9 5.35
19 2490.8 11.8 109.3 5.25
20 2646.5 11.8 109.3 5.15
> names(BankBillRate.Data)
[1] "BankSharePriceIndex" "AllOrdinaries" [3] "Develop" "Mining" [5] "Gold" "Build" [7] "Prop" "Industry" [9] "Energy" "Finance" [11] "Resources" "Transport" [13] "Retail" "Unemployment" [15] "CPI" "BankBillRate" > dim(BankBillRate.Data)
[1] 71 16
> str(BankBillRate.Data)
'data.frame':
71 obs. of 16 variables:
$ BankSharePriceIndex: num 2293 2243 2334 2162 2128 ...
$ AllOrdinaries : num 5635 5469 5502 5508 5556 ...
$ Develop : num 2976 2761 2993 2751 2761 ...
$ Mining : num 673 638 662 688 701 ...
$ Gold : num 1148 1161 1201 1215 1239 ...
$ Build : num 2530 2354 2461 2371 2382 ...
$ Prop : num 982 954 1021 1008 981 ...
$ Industry : num 2607 2515 2610 2519 2489 ...
$ Energy : num 814 778 750 733 729 ...
$ Finance : num 2110 2061 2047 2050 2021 ...
$ Resources : num 976 913 922 938 944 ...
$ Transport : num 3358 3208 3473 3283 3122 ...
$ Retail : num 2808 2701 2813 2764 2642 ...
$ Unemployment : num 10.9 11 11.1 11.3 11.4 11.6 11.7 11.9 11.9 12 ...
$ CPI : num 108 108 108 108 108 ...
$ BankBillRate : num 8.7 8.25 7.7 7.55 7.5 7.55 6.95 6.5 6.4 5.55 ...
> summary(BankBillRate.Data)
BankSharePriceIndex AllOrdinaries Develop Min. :1711 Min. : 5034 Min. :2243 1st Qu.:2314 1st Qu.: 6002 1st Qu.:2773 Median :2820 Median : 7736 Median :3220 Mean :2911 Mean : 7724 Mean :3508 3rd Qu.:3249 3rd Qu.: 9037 3rd Qu.:4097 Max. :5193 Max. :11583 Max. :6098 Mining Gold Build Min. : 571.5 Min. : 917.8 Min. :1923 1st Qu.: 713.5 1st Qu.:1272.2 1st Qu.:2340 Median : 927.3 Median :1830.3 Median :2435 Mean : 877.0 Mean :1737.3 Mean :2442 3rd Qu.: 998.1 3rd Qu.:2111.8 3rd Qu.:2508 Max. :1098.2 Max. :2604.0 Max. :3194 Prop Industry Energy Min. : 953.8 Min. :2237 Min. : 670 1st Qu.:1029.1 1st Qu.:2613 1st Qu.: 768 Median :1077.3 Median :3052 Median : 914 Mean :1082.6 Mean :3072 Mean :1017 3rd Qu.:1130.5 3rd Qu.:3349 3rd Qu.:1150 Max. :1255.2 Max. :4392 Max. :1995 Finance Resources Transport Min. :1796 Min. : 787.9 Min. :2499 1st Qu.:2048 1st Qu.: 973.1 1st Qu.:2700 Median :2318 Median :1282.4 Median :2964 Mean :2444 Mean :1212.5 Mean :3040 3rd Qu.:2692 3rd Qu.:1387.5 3rd Qu.:3258 Max. :3546 Max. :1508.8 Max. :4322 Retail Unemployment CPI Min. :2224 Min. : 8.90 Min. :107.3 1st Qu.:2410 1st Qu.: 9.05 1st Qu.:109.1 Median :2549 Median : 9.90 Median :111.9 Mean :2603 Mean :10.30 Mean :113.6 3rd Qu.:2714 3rd Qu.:11.75 3rd Qu.:119.0 Max. :3522 Max. :12.00 Max. :120.5 BankBillRate Min. :4.750
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1st Qu.:5.365 Median :6.080 Mean :6.418 3rd Qu.:7.520 Max. :8.700 > check.missing.value <- complete.cases(BankBillRate.Data)
> View(check.missing.value)
> which(check.missing.value==FALSE)
integer(0)
> hist(BankBillRate.Data$BankBillRate, breaks = 5,
+ include.lowest = TRUE, right = TRUE,
+ density = NULL, angle = 45, col = "lightblue", border = "red",
+ main = paste("Histogram BankBillRate"),
+ xlim = c(min(BankBillRate.Data$BankBillRate), max(BankBillRate.Data$BankBillRate)), ylim = NULL,
+ xlab = "Bank Bill Rate", ylab ="Frequency", axes = TRUE, plot = TRUE, labels = FALSE)
> plot(density(BankBillRate.Data$BankBillRate), type="l", + main="Smoothed Density of Bank Bill Rate", lwd=2,
+ xlab="Bank Bill Rate", ylab="Density Estimate", + col=hcl(h=195,l=65,c=100))
> boxplot(BankBillRate.Data$BankBillRate, outchar=TRUE, main="Boxplot of Bank Bill Rate", + cex=0.7, xlab="Bank Bill Rate", col=hcl(h=195,l=65,c=100))
> qqnorm(BankBillRate.Data$BankBillRate, col=hcl(h=195,l=65,c=100), cex=0.7)
> qqline(BankBillRate.Data$BankBillRate)
> indep.data <- subset(BankBillRate.Data, select = -BankBillRate)
> View(indep.data)
> dim(indep.data)
[1] 71 15
> corr.matrix <- cor(indep.data, method = "pearson")
> corr.matrix
BankSharePriceIndex AllOrdinaries
BankSharePriceIndex 1.0000000 0.9678132
AllOrdinaries 0.9678132 1.0000000
Develop 0.9824681 0.9497925
Mining 0.5585195 0.7224221
Gold 0.2915335 0.4492073
Build 0.8524573 0.8118839
Prop 0.7372165 0.7155572
Industry 0.9838414 0.9776901
Energy 0.9665149 0.9268622
Finance 0.9691958 0.9608886
Resources 0.7948060 0.9084346
Transport 0.7206772 0.6158779
Retail 0.6064207 0.4508963
Unemployment -0.7044823 -0.7867961
CPI 0.8557557 0.9119711
Develop Mining Gold
BankSharePriceIndex 0.9824681 0.55851946 0.29153352
AllOrdinaries 0.9497925 0.72242214 0.44920735
Develop 1.0000000 0.51132366 0.21857797
Mining 0.5113237 1.00000000 0.88189560
Gold 0.2185780 0.88189560 1.00000000
Build 0.8285718 0.52121534 0.28467751
Prop 0.6662170 0.53671743 0.54592243
Industry 0.9664299 0.64095709 0.40226318
Energy 0.9828873 0.44525558 0.14465172
Finance 0.9772726 0.60007119 0.31246148
Resources 0.7570773 0.93280713 0.71086435
Transport 0.7732470 0.17479457 -0.08945978
Retail 0.6260435 -0.03748334 -0.16491385
Unemployment -0.6770390 -0.67997397 -0.35208227
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
CPI 0.8463149 0.65438603 0.33107522
Build Prop Industry
BankSharePriceIndex 0.8524573 0.7372165 0.9838414
AllOrdinaries 0.8118839 0.7155572 0.9776901
Develop 0.8285718 0.6662170 0.9664299
Mining 0.5212153 0.5367174 0.6409571
Gold 0.2846775 0.5459224 0.4022632
Build 1.0000000 0.6743188 0.8636178
Prop 0.6743188 1.0000000 0.7883061
Industry 0.8636178 0.7883061 1.0000000
Energy 0.8067534 0.6359402 0.9368569
Finance 0.8174064 0.6860153 0.9684377
Resources 0.7186914 0.6293341 0.8410669
Transport 0.8024921 0.4596345 0.7006322
Retail 0.7260947 0.5013308 0.5936518
Unemployment -0.5402138 -0.2741836 -0.6667515
CPI 0.5924890 0.4516416 0.8293676
Energy Finance Resources
BankSharePriceIndex 0.9665149 0.9691958 0.7948060
AllOrdinaries 0.9268622 0.9608886 0.9084346
Develop 0.9828873 0.9772726 0.7570773
Mining 0.4452556 0.6000712 0.9328071
Gold 0.1446517 0.3124615 0.7108644
Build 0.8067534 0.8174064 0.7186914
Prop 0.6359402 0.6860153 0.6293341
Industry 0.9368569 0.9684377 0.8410669
Energy 1.0000000 0.9610850 0.7134835
Finance 0.9610850 1.0000000 0.8061473
Resources 0.7134835 0.8061473 1.0000000
Transport 0.7793019 0.7629800 0.4041056
Retail 0.6185808 0.5520138 0.1871988
Unemployment -0.6636836 -0.6879245 -0.8193082
CPI 0.8290987 0.8457304 0.8361467
Transport Retail Unemployment
BankSharePriceIndex 0.72067723 0.606420667 -0.704482268
AllOrdinaries 0.61587790 0.450896283 -0.786796129
Develop 0.77324704 0.626043467 -0.677039030
Mining 0.17479457 -0.037483340 -0.679973966
Gold -0.08945978 -0.164913847 -0.352082266
Build 0.80249211 0.726094681 -0.540213822
Prop 0.45963453 0.501330773 -0.274183578
Industry 0.70063219 0.593651832 -0.666751489
Energy 0.77930189 0.618580770 -0.663683622
Finance 0.76298002 0.552013833 -0.687924545
Resources 0.40410556 0.187198798 -0.819308176
Transport 1.00000000 0.781826596 -0.290569290
Retail 0.78182660 1.000000000 -0.003084187
Unemployment -0.29056929 -0.003084187 1.000000000
CPI 0.40305043 0.194990904 -0.923222554
CPI
BankSharePriceIndex 0.8557557
AllOrdinaries 0.9119711
Develop 0.8463149
Mining 0.6543860
Gold 0.3310752
Build 0.5924890
Prop 0.4516416
Industry 0.8293676
Energy 0.8290987
Finance 0.8457304
Resources 0.8361467
Transport 0.4030504
Retail 0.1949909
Unemployment -0.9232226
CPI 1.0000000
> library(corrplot)
corrplot 0.92 loaded
> correl.plot <- corrplot(corr.matrix, method = "color")
> correl.plot <-
corrplot(corr.matrix, method = "number",
number.cex=0.50)
> indep.data.new <- subset(indep.data, select = -c(BankSharePriceIndex, Develop, Energy, AllOrdinaries, Industry, Finance))
> corr.matrix.new <- cor(indep.data.new, method = "pearson")
> corr.matrix.new
Mining Gold Build Prop
Mining 1.00000000 0.88189560 0.5212153 0.5367174
Gold 0.88189560 1.00000000 0.2846775 0.5459224
Build 0.52121534 0.28467751 1.0000000 0.6743188
Prop 0.53671743 0.54592243 0.6743188 1.0000000
Resources 0.93280713 0.71086435 0.7186914 0.6293341
Transport 0.17479457 -0.08945978 0.8024921 0.4596345
Retail -0.03748334 -0.16491385 0.7260947 0.5013308
Unemployment -0.67997397 -0.35208227 -0.5402138 -0.2741836
CPI 0.65438603 0.33107522 0.5924890 0.4516416
Resources Transport Retail
Mining 0.9328071 0.17479457 -0.037483340
Gold 0.7108644 -0.08945978 -0.164913847
Build 0.7186914 0.80249211 0.726094681
Prop 0.6293341 0.45963453 0.501330773
Resources 1.0000000 0.40410556 0.187198798
Transport 0.4041056 1.00000000 0.781826596
Retail 0.1871988 0.78182660 1.000000000
Unemployment -0.8193082 -0.29056929 -0.003084187
CPI 0.8361467 0.40305043 0.194990904
Unemployment CPI
Mining -0.679973966 0.6543860
Gold -0.352082266 0.3310752
Build -0.540213822 0.5924890
Prop -0.274183578 0.4516416
Resources -0.819308176 0.8361467
Transport -0.290569290 0.4030504
Retail -0.003084187 0.1949909
Unemployment 1.000000000 -0.9232226
CPI -0.923222554 1.0000000
> correl.plot.new <- corrplot(corr.matrix.new, method = "color")
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
> correl.plot.new <- corrplot(corr.matrix.new, method = "number", number.cex=0.50)
> BankBillRate.Data.New <- subset(BankBillRate.Data, select = -c(BankSharePriceIndex, Develop, Energy, AllOrdinaries, Industry, Finance))
> dim(BankBillRate.Data.New)
[1] 71 10
> model.fit.new <- lm(BankBillRate ~ ., data = BankBillRate.Data.New)
> model.fit.new
Call:
lm(formula = BankBillRate ~ ., data = BankBillRate.Data.New)
Coefficients:
(Intercept) Mining Gold Build 5.189e+01 3.315e-03 4.912e-04 3.557e-05 Prop Resources Transport Retail -7.695e-03 -7.120e-03 -6.358e-04 2.022e-03 Unemployment CPI -1.812e+00 -1.498e-01 > coef(model.fit.new)
(Intercept) Mining Gold Build 5.189210e+01 3.315291e-03 4.912432e-04 3.557092e-05 Prop Resources Transport Retail -7.695246e-03 -7.119666e-03 -6.357655e-04 2.022094e-03 Unemployment CPI -1.812056e+00 -1.497882e-01 > summary(model.fit.new)
Call:
lm(formula = BankBillRate ~ ., data = BankBillRate.Data.New)
Residuals:
Min 1Q Median 3Q Max -0.92847 -0.23067 -0.00377 0.27510 0.87227 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 5.189e+01 6.050e+00 8.577 4.49e-12 ***
Mining 3.315e-03 2.522e-03 1.315 0.193559 Gold 4.912e-04 4.607e-04 1.066 0.290528 Build 3.557e-05 8.273e-04 0.043 0.965844 Prop -7.695e-03 1.563e-03 -4.924 6.78e-06 ***
Resources -7.120e-03 2.109e-03 -3.376 0.001283 ** Transport -6.358e-04 2.367e-04 -2.686 0.009303 ** Retail 2.022e-03 5.071e-04 3.987 0.000181 ***
Unemployment -1.812e+00 1.785e-01 -10.153 9.94e-15 ***
CPI -1.498e-01 4.523e-02 -3.311 0.001563 ** ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3839 on 61 degrees of freedom
Multiple R-squared: 0.9047,
Adjusted R-squared: 0.8907 F-statistic: 64.37 on 9 and 61 DF, p-value: < 2.2e-16
> n.total <- nrow(BankBillRate.Data)
> n.total
[1] 71
> training.ratio <- 0.80
> n.training <- round(n.total*training.ratio)
> n.training
[1] 57
> n.test <- n.total - n.training
> n.test
[1] 14
> set.seed(1)
> training.index <- sample(1:n.total,n.training)
> training.subdata.new <- BankBillRate.Data.New[training.index,]
> View(training.subdata.new)
> test.subdata.new <- BankBillRate.Data.New[-training.index,]
> View(test.subdata.new)
> model.fit.training.new <- lm(BankBillRate ~ ., data = training.subdata.new)
> summary(model.fit.training.new)
Call:
lm(formula = BankBillRate ~ ., data = training.subdata.new)
Residuals:
Min 1Q Median 3Q Max -0.86073 -0.24540 -0.00969 0.28216 0.82699 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 5.228e+01 7.271e+00 7.190 4.21e-09 ***
Mining 4.667e-03 3.147e-03 1.483 0.144796
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Gold 4.005e-04 5.188e-04 0.772 0.444012 Build -9.535e-05 9.785e-04 -0.097 0.922785 Prop -6.808e-03 1.861e-03 -3.659 0.000639 ***
Resources -7.765e-03 2.722e-03 -2.852 0.006437 ** Transport -5.296e-04 2.753e-04 -1.924 0.060423 . Retail 2.004e-03 6.149e-04 3.259 0.002084 ** Unemployment -1.833e+00 2.284e-01 -8.024 2.35e-10 ***
CPI -1.614e-01 5.281e-02 -3.056 0.003690 ** ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.401 on 47 degrees of freedom
Multiple R-squared: 0.8976,
Adjusted R-squared: 0.878 F-statistic: 45.78 on 9 and 47 DF, p-value: < 2.2e-16
> predict.BankBillRate.score.new <- predict(model.fit.training.new, test.subdata.new)
> View(predict.BankBillRate.score.new)
> observed.vs.predicted.BankBillRate.score.new <- data.frame(cbind(Observed = test.subdata.new$BankBillRate, Predicted = predict.BankBillRate.score.new))
> View(observed.vs.predicted.BankBillRate.score.new)
> corre.observed.vs.predicted.new <- cor(observed.vs.predicted.BankBillRate.score.new)
> corre.observed.vs.predicted.new
Observed Predicted
Observed 1.0000000 0.9662228
Predicted 0.9662228 1.0000000
> min.max.accuracy.new <- mean(apply(observed.vs.predicted.BankBillRate.score.new, 1, min) / apply(observed.vs.predicted.BankBillRate.score.new, 1, max))
> min.max.accuracy.new
[1] 0.9535892
> mape.new <- mean(abs((observed.vs.predicted.BankBillRate.score.new$Predicted - observed.vs.predicted.BankBillRate.score.new$Observed))
+ /observed.vs.predicted.BankBillRate.score.new$Observed)
> mape.new
[1] 0.04953731
Logistic Regression Model
1.
Are all the independent variables statistically significant in the logistic regression model (Clue: look at the z-values of estimated coefficients in the logistic regression model)?
Based on the rule of statistical significance being at or above 1.96, not all of the variables are
significant. Port of Embarkation, Parents or Children on Board, and Fare would be statistically insignificant because their absolute z-values are below 1.96.
2. How well does the logistic regression model perform in forecasting the Titanic passengers’ survival status using these independent variables (Clue: look at the accuracy rate, as illustrated in the Logistic Regression in R-A Case Study.R script file)?
The accuracy rate for this logistic regression model is 0.8282443, or 82.82% (rounded) accurate, which is high accuracy rate. This shows that this logistic regression model performs
well in forecasting passengers’ survival status.
> Titanic.Data <- read.csv("C:/Users/wwdic/Downloads/TitanicDataset.csv", header = TRUE, sep=",")
> View(Titanic.Data)
> dim(Titanic.Data)
[1] 1309 12
> names(Titanic.Data)
[1] "PassengerId" "Survived" "TicketClass" "Name" [5] "Sex" "Age" "SiblingsOrSpousesAboard" "ParentsOrChildrenAboard"
[9] "TicketNumber" "Fare" "CabinNumber" "PortOfEmbarkation" > str(Titanic.Data)
'data.frame':
1309 obs. of 12 variables:
$ PassengerId : int 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
$ TicketClass : int 3 1 3 1 3 3 1 3 3 2 ...
$ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
$ Sex : chr "male" "female" "female" "female" ...
$ Age : num 22 38 26 35 35 ...
$ SiblingsOrSpousesAboard: int 1 1 0 1 0 0 0 3 0 1 ...
$ ParentsOrChildrenAboard: int 0 0 0 0 0 0 0 1 2 0 ...
$ TicketNumber : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ CabinNumber : chr "" "C85" "" "C123" ...
$ PortOfEmbarkation : chr "S" "C" "S" "S" ...
> check.missing.value <- complete.cases(Titanic.Data)
> View(check.missing.value)
> which(check.missing.value==FALSE)
integer(0)
> attach(Titanic.Data)
> ftable(Sex)
Sex female male
466 843
> ftable(TicketClass)
TicketClass 1 2 3
323 277 709
> ftable(PortOfEmbarkation)
PortOfEmbarkation C Q S
270 123 916
> summary(SiblingsOrSpousesAboard)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.0000 0.0000 0.0000 0.4989 1.0000 8.0000 > summary(Age)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.17 22.00 28.00 29.47 36.62 80.00 > summary(ParentsOrChildrenAboard)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 0.000 0.000 0.385 0.000 9.000 > summary(Fare)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 7.896 14.454 33.297 31.275 512.329 > hist(Titanic.Data$Fare, breaks = 10,
+ include.lowest = TRUE, right = TRUE,
+ density = NULL, angle = 45, col = "lightblue", border = "red",
+ main = paste("Histogram of Fare"),
+ xlim = c(0,515), ylim = NULL,
+ xlab = "Fare", ylab ="Frequency", axes = TRUE, plot = TRUE, labels = FALSE)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
> plot(density(Titanic.Data$Fare), type="l", + main="Smoothed Density of Fare", lwd=2,
+ xlab="Fare", xlim =c(0,80), ylab="Density Estimate", + col=hcl(h=195,l=65,c=100))
> boxplot(Titanic.Data$Fare, outchar=TRUE, main="Boxplot of Fare", + cex=0.7, xlab="Fare", col=hcl(h=195,l=65,c=100))
> qqnorm(Titanic.Data$Fare, col=hcl(h=195,l=65,c=100), cex=0.7)
> qqline(Titanic.Data$Fare)
> library(gmodels)
> CrossTable(Survived, Sex, digits=2, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
Cell Contents
|-------------------------|
| N |
| N / Col Total |
|-------------------------|
Total Observations in Table: 1309 | Sex Survived | female | male | Row Total | -------------|-----------|-----------|-----------|
0 | 81 | 734 | 815 | | 0.17 | 0.87 | | -------------|-----------|-----------|-----------|
1 | 385 | 109 | 494 | | 0.83 | 0.13 | | -------------|-----------|-----------|-----------|
Column Total | 466 | 843 | 1309 | | 0.36 | 0.64 | | -------------|-----------|-----------|-----------|
Statistics for All Table Factors
Pearson's Chi-squared test ------------------------------------------------------------
Chi^2 = 620.2757 d.f. = 1 p = 6.513245e-137 Pearson's Chi-squared test with Yates' continuity correction ------------------------------------------------------------
Chi^2 = 617.3134 d.f. = 1 p = 2.87141e-136 > CrossTable(Survived, TicketClass, digits=2, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
Cell Contents
|-------------------------|
| N |
| N / Col Total |
|-------------------------|
Total Observations in Table: 1309 | TicketClass Survived | 1 | 2 | 3 | Row Total | -------------|-----------|-----------|-----------|-----------|
0 | 137 | 160 | 518 | 815 | | 0.42 | 0.58 | 0.73 | | -------------|-----------|-----------|-----------|-----------|
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1 | 186 | 117 | 191 | 494 | | 0.58 | 0.42 | 0.27 | | -------------|-----------|-----------|-----------|-----------|
Column Total | 323 | 277 | 709 | 1309 | | 0.25 | 0.21 | 0.54 | | -------------|-----------|-----------|-----------|-----------|
Statistics for All Table Factors
Pearson's Chi-squared test ------------------------------------------------------------
Chi^2 = 91.72368 d.f. = 2 p = 1.209085e-20 > CrossTable(Survived, PortOfEmbarkation, digits=2, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
Cell Contents
|-------------------------|
| N |
| N / Col Total |
|-------------------------|
Total Observations in Table: 1309 | PortOfEmbarkation Survived | C | Q | S | Row Total | -------------|-----------|-----------|-----------|-----------|
0 | 137 | 69 | 609 | 815 | | 0.51 | 0.56 | 0.66 | | -------------|-----------|-----------|-----------|-----------|
1 | 133 | 54 | 307 | 494 | | 0.49 | 0.44 | 0.34 | | -------------|-----------|-----------|-----------|-----------|
Column Total | 270 | 123 | 916 | 1309 | | 0.21 | 0.09 | 0.70 | | -------------|-----------|-----------|-----------|-----------|
Statistics for All Table Factors
Pearson's Chi-squared test ------------------------------------------------------------
Chi^2 = 24.19378 d.f. = 2 p = 5.576842e-06
> model.fit.logit <- glm(Survived ~ factor(TicketClass) + factor(Sex) + + factor(PortOfEmbarkation) + Age + + SiblingsOrSpousesAboard + ParentsOrChildrenAboard + Fare,
+ family=binomial(link = "logit"), data = Titanic.Data)
> model.fit.logit
Call: glm(formula = Survived ~ factor(TicketClass) + factor(Sex) + factor(PortOfEmbarkation) + Age + SiblingsOrSpousesAboard + ParentsOrChildrenAboard + Fare, family = binomial(link = "logit"), data = Titanic.Data)
Coefficients:
(Intercept) factor(TicketClass)2 factor(TicketClass)3 4.687811 -1.170275 -2.233693 factor(Sex)male factor(PortOfEmbarkation)Q factor(PortOfEmbarkation)S -3.753269 0.079425 -0.177538 Age SiblingsOrSpousesAboard ParentsOrChildrenAboard -0.044310 -0.335308 -0.090660 Fare 0.001882 Degrees of Freedom: 1308 Total (i.e. Null); 1299 Residual
Null Deviance:
1735 Residual Deviance: 948.1 AIC: 968.1
> model.fit.logit$coeff
(Intercept) factor(TicketClass)2 factor(TicketClass)3 factor(Sex)male 4.687810562 -1.170274637 -2.233693355 -
3.753268593 factor(PortOfEmbarkation)Q factor(PortOfEmbarkation)S Age SiblingsOrSpousesAboard 0.079424990 -0.177538116 -0.044310024 -
0.335308239 ParentsOrChildrenAboard Fare -0.090660453 0.001881518 > summary(model.fit.logit)
Call:
glm(formula = Survived ~ factor(TicketClass) + factor(Sex) + factor(PortOfEmbarkation) + Age + SiblingsOrSpousesAboard + ParentsOrChildrenAboard + Fare, family = binomial(link = "logit"), data = Titanic.Data)
Coefficients:
Estimate Std. Error z value Pr(>|z|) (Intercept) 4.687811 0.452535 10.359 < 2e-16 ***
factor(TicketClass)2 -1.170275 0.283297 -4.131 3.61e-05 ***
factor(TicketClass)3 -2.233693 0.287195 -7.778 7.39e-15 ***
factor(Sex)male -3.753269 0.192205 -19.527 < 2e-16 ***
factor(PortOfEmbarkation)Q 0.079425 0.355198 0.224 0.823063 factor(PortOfEmbarkation)S -0.177538 0.221491 -0.802 0.422808 Age -0.044310 0.007485 -5.920 3.23e-09 ***
SiblingsOrSpousesAboard -0.335308 0.096317 -3.481 0.000499 ***
ParentsOrChildrenAboard -0.090660 0.099202 -0.914 0.360770 Fare 0.001882 0.002060 0.913 0.361046 ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1735.13 on 1308 degrees of freedom
Residual deviance: 948.06 on 1299 degrees of freedom
AIC: 968.06
Number of Fisher Scoring iterations: 5
> library(DescTools)
Registered S3 method overwritten by 'DescTools':
method from reorder.factor gdata
> PseudoR2(model.fit.logit)
McFadden 0.45361 > library(ResourceSelection)
ResourceSelection 0.3-6 2023-06-27
> hoslem.test(Titanic.Data$Survived,fitted(model.fit.logit),g=8)
Hosmer and Lemeshow goodness of fit (GOF) test
data: Titanic.Data$Survived, fitted(model.fit.logit)
X-squared = 20.31, df = 6, p-value = 0.002439
> n.total <- nrow(Titanic.Data)
> n.total
[1] 1309
> training.ratio <- 0.80
> n.training <- round(n.total*training.ratio)
> n.training
[1] 1047
> n.test <- n.total - n.training
> n.test
[1] 262
> set.seed(1)
> training.index <- sample(1:n.total,n.training)
> training.subdata <- Titanic.Data[training.index,]
> View(training.subdata)
> test.subdata <- Titanic.Data[-training.index,]
> View(test.subdata)
> model.fitting.training.logit <- glm(Survived ~ factor(TicketClass) + factor(Sex) + + factor(PortOfEmbarkation) + Age + + SiblingsOrSpousesAboard + ParentsOrChildrenAboard + Fare,
+ family=binomial(link = "logit"), data =training.subdata)
> model.fitting.training.logit
Call: glm(formula = Survived ~ factor(TicketClass) + factor(Sex) + factor(PortOfEmbarkation) + Age + SiblingsOrSpousesAboard + ParentsOrChildrenAboard + Fare, family = binomial(link = "logit"), data = training.subdata)
Coefficients:
(Intercept) factor(TicketClass)2 factor(TicketClass)3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
5.18488 -1.30344 -2.38470 factor(Sex)male factor(PortOfEmbarkation)Q factor(PortOfEmbarkation)S -3.90022 0.04555 -0.29356 Age SiblingsOrSpousesAboard ParentsOrChildrenAboard -0.05377 -0.38463 -0.10321 Fare 0.00288 Degrees of Freedom: 1046 Total (i.e. Null); 1037 Residual
Null Deviance:
1383 Residual Deviance: 718.5 AIC: 738.5
> summary(model.fitting.training.logit)
Call:
glm(formula = Survived ~ factor(TicketClass) + factor(Sex) + factor(PortOfEmbarkation) + Age + SiblingsOrSpousesAboard + ParentsOrChildrenAboard + Fare, family = binomial(link = "logit"), data = training.subdata)
Coefficients:
Estimate Std. Error z value Pr(>|z|) (Intercept) 5.184880 0.532989 9.728 < 2e-16 ***
factor(TicketClass)2 -1.303438 0.327241 -3.983 6.80e-05 ***
factor(TicketClass)3 -2.384697 0.332402 -7.174 7.28e-13 ***
factor(Sex)male -3.900222 0.224313 -17.387 < 2e-16 ***
factor(PortOfEmbarkation)Q 0.045554 0.411549 0.111 0.911863 factor(PortOfEmbarkation)S -0.293557 0.257417 -1.140 0.254121 Age -0.053771 0.008535 -6.300 2.98e-10 ***
SiblingsOrSpousesAboard -0.384634 0.115482 -3.331 0.000866 ***
ParentsOrChildrenAboard -0.103208 0.111490 -0.926 0.354594 Fare 0.002880 0.002390 1.205 0.228286 ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1382.60 on 1046 degrees of freedom
Residual deviance: 718.49 on 1037 degrees of freedom
AIC: 738.49
Number of Fisher Scoring iterations: 5
> predict.survived.score <- predict(model.fitting.training.logit, newdata = test.subdata, type
='response')
> View(predict.survived.score)
> predict.survived.score <- ifelse(predict.survived.score > 0.5,1,0)
> View(predict.survived.score)
> accuracy.rate <- mean(predict.survived.score == test.subdata$Survived)
> accuracy.rate
[1] 0.8282443
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Recommended textbooks for you
data:image/s3,"s3://crabby-images/86990/869902122cc988a8b1078ef9afcefe0673468505" alt="Text book image"
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/f7b2e/f7b2e13a7986b0da326090f527c815066b5aa9ba" alt="Text book image"
Functions and Change: A Modeling Approach to Coll...
Algebra
ISBN:9781337111348
Author:Bruce Crauder, Benny Evans, Alan Noell
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/70f5c/70f5cef52227d3e827c226418ce33af96e43372d" alt="Text book image"
College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Recommended textbooks for you
- Algebra and Trigonometry (MindTap Course List)AlgebraISBN:9781305071742Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage LearningFunctions and Change: A Modeling Approach to Coll...AlgebraISBN:9781337111348Author:Bruce Crauder, Benny Evans, Alan NoellPublisher:Cengage LearningGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill
- College AlgebraAlgebraISBN:9781305115545Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage Learning
data:image/s3,"s3://crabby-images/86990/869902122cc988a8b1078ef9afcefe0673468505" alt="Text book image"
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/f7b2e/f7b2e13a7986b0da326090f527c815066b5aa9ba" alt="Text book image"
Functions and Change: A Modeling Approach to Coll...
Algebra
ISBN:9781337111348
Author:Bruce Crauder, Benny Evans, Alan Noell
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/70f5c/70f5cef52227d3e827c226418ce33af96e43372d" alt="Text book image"
College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning