HM1
pdf
keyboard_arrow_up
School
University of Washington *
*We aren’t endorsed by this school
Course
484
Subject
Accounting
Date
May 31, 2024
Type
Pages
6
Uploaded by ChefTank6577
HM1
Yuchen Zou
2024-04-09
##Q1 During modeling, convert the variable “default” to numeric, i.e., between 0 and 1. Using the linear
probability model (“lm” in R) with the adjustment for predicted probabilities less than 0 set to some very small
number (e.g., 1e-5) and probabilities greater than 1 set just below 1. Predict the binary variable “pred_default”
as Yes/No using the modified fitted values from the previous step. Using the actual outcomes in the “Default”
dataset, compute the confusion matrix. Hint: table(pred_default, Default$default)
library
(ISLR2)
## Warning: package 'ISLR2' was built under R version 4.2.3
View(Default)
attach
(Default)
Default$default<- as.numeric(Default$default)
fit<- lm(default ~., data = Default)
predicted_probs<- predict(fit, type = "response")
predicted_probs[predicted_probs>0] <- 1e-5
predicted_probs[predicted_probs>1] <- 1-(1e-5)
preb_default<- ifelse(predicted_probs > 0.5, "Yes","No")
confusion_matrix<- table(preb_default,Default$default)
confusion_matrix
## ## preb_default 1 2
## No 9667 333
##Q2 Now instead of “lm,” run a weighted least squares model using “gls.” You’ll need to run library(nlme) in
order to do gls. The weights are specified as a linear function of the modified fitted values from the previous
step. Again, adjust the predicted probabilities from the gls model to reasonable values, predict the binary
response as Yes/No, and compute the confusion matrix.
library
(nlme)
gls_fit<- gls(default ~., weights = varIdent(predicted_probs),data = Default)
predict_probs_gls<- predict(gls_fit, type = "response")
predict_probs_gls[predict_probs_gls>0] <- 1e-5
predict_probs_gls[predict_probs_gls>1] <- 1-(1e-5)
preb_default_gls<- ifelse(predict_probs_gls > 0.5, "Yes","No")
confusion_matrix_2<- table(preb_default_gls,Default$default)
confusion_matrix_2
## ## preb_default_gls 1 2
## No 9667 333
##Q3 Randomly partition the data into training, validation and test sets (proportions = 80% + 10% + 10%). Set
a seed of your choice for reproducibility. Using the naive Bayes classifier (“naiveBayes” in R library “e1071”) on
the training set, predict on the training, validation and test sets and compute the 3 confusion matrices. The
levels in the confusion matrices should be Yes/No. Note that the response variable should be of class “factor.”
Hint for naiveBayes: ISLR textbook, section 4.7.5.
library
(e1071)
## Warning: package 'e1071' was built under R version 4.2.3
set.seed(123)
n<- nrow(Default)
train_index<- sample(seq_len(n),size = 0.8*n)
remianing_index<- setdiff(seq_len(n),train_index)
vaild_index<- sample(remianing_index, 0.1*n)
test_index<- setdiff(remianing_index, vaild_index)
train_data<- Default[train_index, ]
vaild_data<- Default[vaild_index, ]
test_data<- Default[test_index, ]
train_data$default<- as.factor(train_data$default)
vaild_data$default<- as.factor(vaild_data$default)
test_data$default<- as.factor(test_data$default)
fit_nb<- naiveBayes(default ~ ., data = train_data)
train_pred<- predict(fit_nb, train_data)
vaild_pred<- predict(fit_nb, vaild_data)
test_pred<- predict(fit_nb, test_data)
confusion_matrix_train<- table(train_pred , train_data$default)
confusion_matrix_vaild<- table(vaild_pred , vaild_data$default)
confusion_matrix_test<- table(test_pred , test_data$default)
confusion_matrix_train
## ## train_pred 1 2
## 1 7692 190
## 2 41 77
confusion_matrix_vaild
## ## vaild_pred 1 2
## 1 962 23
## 2 5 10
confusion_matrix_test
## ## test_pred 1 2
## 1 960 25
## 2 7 8
##Q4Using the same training, validation and test sets, run a logistic regression (“glm” in R for binary response)
on the training data. Note that the response variable should be of class “factor.” Again, predict on the training,
validation and test sets and compute the 3 confusion matrices. The levels in the confusion matrices should be
Yes/No. logistic_model<- glm(default ~ ., data = train_data, family = binomial)
train_pred_glm<- ifelse(predict(logistic_model, type = "response")> 0.5, "Yes", "No")
vaild_pred_glm<- ifelse(predict(logistic_model, newdata = vaild_data, type = "response")> 0.5, "Yes", "No")
test_pred_glm<- ifelse(predict(logistic_model, newdata = test_data, type = "response")> 0.5, "Y
es", "No")
confusion_matrix_train_glm<- table(train_pred_glm , train_data$default)
confusion_matrix_vaild_glm<- table(vaild_pred_glm , vaild_data$default)
confusion_matrix_test_glm<- table(test_pred_glm , test_data$default)
confusion_matrix_train_glm
## ## train_pred_glm 1 2
## No 7698 180
## Yes 35 87
confusion_matrix_vaild_glm
## ## vaild_pred_glm 1 2
## No 964 23
## Yes 3 10
confusion_matrix_test_glm
## ## test_pred_glm 1 2
## No 964 23
## Yes 3 10
##Q5 The linear model just give us a basic assumption about the data; the WLS model assigns different
weights to observations based on their influence; the Naive Bayes model assumes independence among
predictors, and the logistic regression models the probability of a binary outcome and can handle both
categorical and continuous predictors. I generally believe that the logistic regression model is performing better
since it use the train and the test data which can give more accurate and dependable test automation, and it
assumes a linear relationship between the predictors and the log-odds of the outcome.
##Q6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
options(repos = "https://cran.rstudio.com/")
cmc<- read.csv("E:/econ484/cmc.data")
View(cmc)
names(cmc)[names(cmc) == 'X24']<- 'wife_age'
names(cmc)[names(cmc) == 'X2']<- 'wife_edu'
names(cmc)[names(cmc) == 'X3']<- 'husband_edu'
names(cmc)[names(cmc) == 'X3.1']<- 'num_children'
names(cmc)[names(cmc) == 'X1']<- 'wife_religion'
names(cmc)[names(cmc) == 'X1.1']<- 'wife_working'
names(cmc)[names(cmc) == 'X2.1']<- 'husband_occupation'
names(cmc)[names(cmc) == 'X3.2']<- 'standard_of_living_index'
names(cmc)[names(cmc) == 'X0']<- 'media_exposure'
names(cmc)[names(cmc) == 'X1.2']<- 'contraceptice_method'
install.packages("mlogit")
## Installing package into 'C:/Users/tamel/AppData/Local/R/win-library/4.2'
## (as 'lib' is unspecified)
## package 'mlogit' successfully unpacked and MD5 sums checked
## ## The downloaded binary packages are in
## C:\Users\tamel\AppData\Local\Temp\Rtmpa0Pwoz\downloaded_packages
install.packages("zoo")
## Installing package into 'C:/Users/tamel/AppData/Local/R/win-library/4.2'
## (as 'lib' is unspecified)
## package 'zoo' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'zoo'
## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
## C:\Users\tamel\AppData\Local\R\win-library\4.2\00LOCK\zoo\libs\x64\zoo.dll to
## C:\Users\tamel\AppData\Local\R\win-library\4.2\zoo\libs\x64\zoo.dll: Permission
## denied
## Warning: restored 'zoo'
## ## The downloaded binary packages are in
## C:\Users\tamel\AppData\Local\Temp\Rtmpa0Pwoz\downloaded_packages
library
(zoo)
## Warning: package 'zoo' was built under R version 4.2.3
## ## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## ## as.Date, as.Date.numeric
library
(mlogit)
## Warning: package 'mlogit' was built under R version 4.2.3
## Loading required package: dfidx
## Warning: package 'dfidx' was built under R version 4.2.3
## ## Attaching package: 'dfidx'
## The following object is masked from 'package:stats':
## ## filter
long_data<- mlogit.data(cmc, choice = "contraceptice_method" ,shape = "wide")
View(long_data)
mlogit_model <- mlogit(contraceptice_method ~ 1 | wife_age + wife_edu + husband_edu + num_child
ren + wife_religion + wife_working + husband_occupation + standard_of_living_index + media_expo
sure, data = long_data)
summary(mlogit_model)
## ## Call:
## mlogit(formula = contraceptice_method ~ 1 | wife_age + wife_edu + ## husband_edu + num_children + wife_religion + wife_working + ## husband_occupation + standard_of_living_index + media_exposure, ## data = long_data, method = "nr")
## ## Frequencies of alternatives:choice
## 1 2 3 ## 0.42663 0.22622 0.34715 ## ## nr method
## 5 iterations, 0h:0m:0s ## g'(-H)^-1g = 0.000362 ## successive function values within tolerance limits ## ## Coefficients :
## Estimate Std. Error z-value Pr(>|z|) ## (Intercept):2 -3.229492 0.759486 -4.2522 2.117e-05 ***
## (Intercept):3 -0.099123 0.627502 -0.1580 0.8744843 ## wife_age:2 -0.045815 0.012060 -3.7988 0.0001454 ***
## wife_age:3 -0.105979 0.011256 -9.4150 < 2.2e-16 ***
## wife_edu:2 0.881807 0.114534 7.6991 1.377e-14 ***
## wife_edu:3 0.366726 0.087486 4.1918 2.767e-05 ***
## husband_edu:2 -0.084184 0.133938 -0.6285 0.5296593 ## husband_edu:3 0.055833 0.100691 0.5545 0.5792400 ## num_children:2 0.344962 0.042878 8.0453 8.882e-16 ***
## num_children:3 0.351056 0.038624 9.0891 < 2.2e-16 ***
## wife_religion:2 -0.478887 0.200415 -2.3895 0.0168722 * ## wife_religion:3 -0.325347 0.198101 -1.6423 0.1005213 ## wife_working:2 0.032842 0.168364 0.1951 0.8453437 ## wife_working:3 0.168330 0.150681 1.1171 0.2639382 ## husband_occupation:2 -0.081099 0.097250 -0.8339 0.4043236 ## husband_occupation:3 0.179077 0.084399 2.1218 0.0338566 * ## standard_of_living_index:2 0.342624 0.096783 3.5401 0.0003999 ***
## standard_of_living_index:3 0.228939 0.072548 3.1557 0.0016013 ** ## media_exposure:2 -0.445152 0.388015 -1.1473 0.2512759 ## media_exposure:3 -0.479717 0.272739 -1.7589 0.0785969 . ## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ## Log-Likelihood: -1390.4
## McFadden R^2: 0.11469 ## Likelihood ratio test : chisq = 360.25 (p.value = < 2.22e-16)
Question: Which is(are) the variable(s) that affect the current contraceptive method choice significantly?
From the summary, we can clearly see that the education level of the wife is affecting the choice of
contraceptive method the most.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
I need typing clear full answers pls i will give 5 upvotes
arrow_forward
Question 3
The Black-Scholes model is a time series (X₂) that follows the dynamics
X₁ = X₁-1*exp(μ+0€)
where μ, >0 and () is a time series of independent and identically distributed
standard Gaussian random variables. Note that
E(exp(μ+σt)) = "¹+0²
(i) Show that the log-returns of this process X, form a white noise process
(ii) Suppose E(X) = 0. Compute E(X₂).
Determine if the variance of X, is falling, remains constant or is growing in t.
arrow_forward
No chatgpt used i will give 5 upvotes typing please i need both answers
arrow_forward
H1.
arrow_forward
A random variable that can assume only a finite number of values is referred to as a(n) _____.
a. infinite sequence
b. finite sequence
c. discrete probability function
d. discrete random variable
arrow_forward
Consider the following set of dependent and independent variables. Complete parts a through c below.
y 9 11
D
15
15
21
24
28
32
X1
2
6
6 8
6
11
15
20
X2 16 10
14
11 3
8
7 4
a. Using technology, construct a regression model using both independent variables.
y= (15.3952) + (1.0092) x₁ + (-0.5869)x2
(Round to four decimal places as needed.)
b. Test the significance of each independent variable using α = 0.05.
Test the significance of x₁. Identify the null and alternative hypotheses.
Ho B₁ =
H₁ B₁
0
0
(Type integers or decimals.)
Calculate the appropriate test statistic.
The test statistic is 4.20
(Round to two decimal places as needed.)
Determine the appropriate critical value(s) for α = 0.05.
The critical value(s) is (are) ☐
(Round to two decimal places as needed. Use a comma to separate answers as needed.)
arrow_forward
Exercise 13.4 Let X₁,
****
(a) Show that
X, be independent random variables with expected values E[X] =μ, and consider the following simulation estimator of E[Y]:
n
W=Y+(X₁₁).
i=1
-
n
n
Var(W) = Var(Y) + c²/ Var(X;) +2Σ c; Cov(Y, X;).
i=1
i=1
(b) Use calculus to show that the values of c₁, ..., c, that minimize Var(W) are
Cov(Y, X;)
Ci
i=1,..., R.
Var (X)
arrow_forward
Consider the following data for a dependent variable y and two independent variables, x1 and 12.
30
12
94
47
10
109
25
18
112
51
16
178
40
94
51
19
175
75
171
36
12
118
59
13
143
77
17
212
Round your all answers to two decimal places. Enter negative values as negative numbers, if necessary.
a. Develop an estimated regression equation relating Y to ¤1.
Predict y if æ1 = 35.
b. Develop an estimated regression equation relating y to x2.
ŷ =
+
Predict y if x2 = 25.
ŷ =
c. Develop an estimated regression equation relating y to ¤1 and 2.
Predict y if x1 = 35 and x2 = 25.
ŷ =
arrow_forward
Hello
This chapter is about security returns, and i dont understand what "Gamma, y" is in both of these equation.
They are supposed to be First and Second-pass regression equation. What is Gamma used for and how does this have a relation with the CAPM formula
arrow_forward
EXAMPLE• Consider the following information:State Probability ABC, Inc. ReturnBoom .25 0.15Normal .50 0.08Slowdown .15 0.04Recession .10 -0.03• What is the expected return?• What is the variance?• What is the standard deviation?
arrow_forward
Calculate the expected value of X, E(X), for the given probability distribution.
x
−20
−10
0
10
20
40
P(X = x)
0.2
0.1
0.2
0.1
0
0.4
E(X) =
arrow_forward
I need do fast typing clear urjent no chatgpt used i will give 5 upvotes pls full explain
arrow_forward
An exponential probability distribution _____.
a. must be normally distributed
b. is a discrete distribution
c. is a continuous distribution
d. can be either continuous or discrete
arrow_forward
None
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Essentials of Business Analytics (MindTap Course ...
Statistics
ISBN:9781305627734
Author:Jeffrey D. Camm, James J. Cochran, Michael J. Fry, Jeffrey W. Ohlmann, David R. Anderson
Publisher:Cengage Learning
Related Questions
- I need typing clear full answers pls i will give 5 upvotesarrow_forwardQuestion 3 The Black-Scholes model is a time series (X₂) that follows the dynamics X₁ = X₁-1*exp(μ+0€) where μ, >0 and () is a time series of independent and identically distributed standard Gaussian random variables. Note that E(exp(μ+σt)) = "¹+0² (i) Show that the log-returns of this process X, form a white noise process (ii) Suppose E(X) = 0. Compute E(X₂). Determine if the variance of X, is falling, remains constant or is growing in t.arrow_forwardNo chatgpt used i will give 5 upvotes typing please i need both answersarrow_forward
- H1.arrow_forwardA random variable that can assume only a finite number of values is referred to as a(n) _____. a. infinite sequence b. finite sequence c. discrete probability function d. discrete random variablearrow_forwardConsider the following set of dependent and independent variables. Complete parts a through c below. y 9 11 D 15 15 21 24 28 32 X1 2 6 6 8 6 11 15 20 X2 16 10 14 11 3 8 7 4 a. Using technology, construct a regression model using both independent variables. y= (15.3952) + (1.0092) x₁ + (-0.5869)x2 (Round to four decimal places as needed.) b. Test the significance of each independent variable using α = 0.05. Test the significance of x₁. Identify the null and alternative hypotheses. Ho B₁ = H₁ B₁ 0 0 (Type integers or decimals.) Calculate the appropriate test statistic. The test statistic is 4.20 (Round to two decimal places as needed.) Determine the appropriate critical value(s) for α = 0.05. The critical value(s) is (are) ☐ (Round to two decimal places as needed. Use a comma to separate answers as needed.)arrow_forward
- Exercise 13.4 Let X₁, **** (a) Show that X, be independent random variables with expected values E[X] =μ, and consider the following simulation estimator of E[Y]: n W=Y+(X₁₁). i=1 - n n Var(W) = Var(Y) + c²/ Var(X;) +2Σ c; Cov(Y, X;). i=1 i=1 (b) Use calculus to show that the values of c₁, ..., c, that minimize Var(W) are Cov(Y, X;) Ci i=1,..., R. Var (X)arrow_forwardConsider the following data for a dependent variable y and two independent variables, x1 and 12. 30 12 94 47 10 109 25 18 112 51 16 178 40 94 51 19 175 75 171 36 12 118 59 13 143 77 17 212 Round your all answers to two decimal places. Enter negative values as negative numbers, if necessary. a. Develop an estimated regression equation relating Y to ¤1. Predict y if æ1 = 35. b. Develop an estimated regression equation relating y to x2. ŷ = + Predict y if x2 = 25. ŷ = c. Develop an estimated regression equation relating y to ¤1 and 2. Predict y if x1 = 35 and x2 = 25. ŷ =arrow_forwardHello This chapter is about security returns, and i dont understand what "Gamma, y" is in both of these equation. They are supposed to be First and Second-pass regression equation. What is Gamma used for and how does this have a relation with the CAPM formulaarrow_forward
- EXAMPLE• Consider the following information:State Probability ABC, Inc. ReturnBoom .25 0.15Normal .50 0.08Slowdown .15 0.04Recession .10 -0.03• What is the expected return?• What is the variance?• What is the standard deviation?arrow_forwardCalculate the expected value of X, E(X), for the given probability distribution. x −20 −10 0 10 20 40 P(X = x) 0.2 0.1 0.2 0.1 0 0.4 E(X) =arrow_forwardI need do fast typing clear urjent no chatgpt used i will give 5 upvotes pls full explainarrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Essentials of Business Analytics (MindTap Course ...StatisticsISBN:9781305627734Author:Jeffrey D. Camm, James J. Cochran, Michael J. Fry, Jeffrey W. Ohlmann, David R. AndersonPublisher:Cengage Learning

Essentials of Business Analytics (MindTap Course ...
Statistics
ISBN:9781305627734
Author:Jeffrey D. Camm, James J. Cochran, Michael J. Fry, Jeffrey W. Ohlmann, David R. Anderson
Publisher:Cengage Learning