Lab 4
Rmd
keyboard_arrow_up
School
Pasadena City College *
*We aren’t endorsed by this school
Course
012
Subject
Statistics
Date
Feb 20, 2024
Type
Rmd
Pages
5
Uploaded by BailiffStrawBoar21
---
title: "Lab 4"
author: "Kimberly Boswell"
date: "2024-01-31"
output: pdf_document
---
## Special Symbols When you are copying certain variables like `-` and `^` it may throw an error, so make sure to type out your code. Secondly, elements like the \#, \%, and \$ may also throw errors when knitted, so be sure to put a backslash in front. These symbols can be. used for other purposes in Markdown, so the backslash instructs R to print it as is. ### Mathematical Notation in Markdown
An advantage of markdown is its use as a plain-text editor, R console and a LaTeX compiler. If you'd like to include any sort of mathematical equations in markdown, you can use several things: (i) a pair of dollar signs (inline)
(ii) a pair of 2 dollar signs (new line/line break)
(iii) align environment
Say you want to write xi in subscript form, inline it would look like $x_i$ or $\
alpha$
or the sum of xi's in a new line:
$$ \sum\limits_{i=1}^n x_i $$
As you type, you should see the preview, which should let you know if your formatting is correct. If there was a series of equations, we could use the align environment. Adding an asterisk removes the numbering. Say you wanted to label only
one element in the document, you could use the 'notag' command. We use the ampersand for alignment
\begin{align}
y &= mx+c \notag \\
& = b_2 x + b_1
\end{align}
If you wanted to write a hypothesis test for whether the marginal effect of the explanatory variable of a standard linear model is 0 or not, how could you write this out?
## Continuation of Food Dataset Processing
Yet again, we will start by using the food dataset to illustrate, and then segue into a novel dataset for further practice. ```{r message = F}
library(POE5Rdata)
library(margins)
library(tseries)
data(food)
reg <- lm(food_exp ~ income, food)
```
## Marginal Effect
For this, we can use the margins package (if you don't have it installed, that must
be done first using `install.packages("margins")`). Thereafter, you can use
`library(margins)` to call it. This gives us the average marginal effect. Try it on
reg and see: ```{r}
margins(reg)
```
Now, if we wanted to find the confidence interval for the margin, ```{r}
margins_summary(reg)
```
If we store the margins output as an object, we're able to plot it. ```{r}
m <- margins(reg)
plot(m)
```
We can also find the marginal effect at a particular point. This is especially helpful for nonlinear regression models. Finding the marginal effect of income on food expenditure for a quadratic model, where income is \$2000. ```{r}
margins(reg, at = list(income= 20))
margins_summary(reg, at = list(income= 20), level = 0.9)
```
## Finding p values. Recall that our test statistic is: $$ tstat = \dfrac{Estimate - Parameter}{s.e.(Estimate)}$$
If we use the critical value method, we would use the qt function in conjunction with $\alpha$ to find our rejection region. If instead, we were to use the p-value method, we would need to find the probability of obtaining that test statistic or larger. If our test statistic was 2.37 at 38 degrees of freedom, what is the p-
value for a: (i) right-tailed test
(i) two-tailed test
## Prediction v Confidence Intervals for Linear Combination of parameters
The predicted value $\hat{y}$ is a random variable, since it depends on the sample; therefore, we can calculate a confidence interval and test hypothesis about
it, provided we can determine its distribution and variance. Note that there are two types of intervals. ```{r predict}
predict(reg, data.frame(income=20),interval="confidence",level=0.95)
predict(reg, data.frame(income=20),interval="prediction",level=0.95)
```
What do you notice about these two outputs, both the estimate and the bounds?
```{r}
income <- seq(from=min(food$income), to=max(food$income))
ypredict <- predict(reg, newdata=data.frame(income),
interval="confidence")
yforecast <- predict(reg, newdata=data.frame(income),
interval="predict")
matplot(income, cbind(ypredict[,1], ypredict[,2], ypredict[,3], yforecast[,2], yforecast[,3]), type ="l", lty=c(1, 2, 2, 3, 3), col=c("black", "red", "red", "blue", "blue"),
ylab="food expenditure", xlab="income")
points(food$income, food$food_exp)
legend("topleft", cex = 0.7,
legend=c("E[y|x]", "lwr_pred", "upr_pred",
"lwr_forcst","upr_forcst"),
lty=c(1, 2, 2, 3, 3), col=c("black", "red", "red", "blue", "blue")
)
```
## Goodness of Fit
The command for the output for the $R^2$ is r.squared, and we can use the selection
operator, i.e.
```{r rsq}
summary(reg)$r.squared
```
Remember you are able to check `attributes` to see what elements are at your disposal. ## Important Output from R for the exam
You will need the output from the following commands for your midterm `summary`, `anova`, `vcov`. ```{r refresher}
summary(reg)
anova(reg)
vcov(reg)
```
## Checking for Normality or Residuals and Other Diagnostics.
We can either use diagrams or supplement with a statistical test. ```{r resid}
hist(resid(reg), probability = T)
lines(density(resid(reg)), col = "blue")
plot(reg)
jarque.bera.test(resid(reg))
```
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
#Influential Observations
### AUX. You may be interested in exploring outliers, high leverage, and or influential observations worth removing. - Outlier: has a large residual
- High Leverage: the x value is far from its mean
- Influential obs: causes large changes to the coefficients of the model
Why are we doing this?
To spot influential or anomalous observations. What methods do we use for this? + Observation and Reasoning: If the value seems illogical, like if it's an age and its value is 0 or > 100. By observing the boxplot. If there is a cluster of values outside the boxplot, then you may not want to eliminate them. You'll want to
check the Cook's distance first, and if the outlier has a high Cooks Distance (see criteria below) then we consider eliminating it. **Make sure to mention in your final report or analysis that you removed any outliers.**
```{r eval=FALSE}
#Identifying outliers in the boxplot
quartiles <- quantile(data$variable, probs=c(.25, .75), na.rm = FALSE)
IQR <- IQR(data$variable)
UQ <- quartiles[2]+1.5*IQR
LQ <- quartiles[1]-1.5*IQR
outliers_removed <- subset(data, data$variable > LQ & data$variable < UQ)
#OR we could also use the feature within boxplot to do this
outliers <- boxplot(data$variable, plot=FALSE)$out
outliers_removed <- data[-which(data$variable %in% outliers),]
```
+ Cook's distance: takes into account both leverage and residual
Criteria for elimination: a. Either being greater than three times the average Cooks distance or
b. being greater than k/N (k= #of predictors; N - n.obs)
c. being greater than 4/N
d. being higher than 1 ```{r eval = FALSE}
#How to find Cook's Distance.
library(ggplot2)
data(diamonds)
model <- lm(y ~ x, data = diamonds)
#find Cook's distance for each observation in the dataset
cooksD <- cooks.distance(model)
plot(model, which = 4)
#change to 1 to show the residual plot, for part(6)
#How to identify influential points using different criteria
influential_obs <- as.numeric(names(cooksD)[(cooksD > (2*(p+1)/n))])
influential_obs1 <- as.numeric(names(cooksD)[(cooksD > (3 * mean(cooksD)))])
influential_obs2 <- as.numeric(names(cooksD)[(cooksD > 1))])
#New data frame with influential observations removed
outliers_removed <- outliers[-influential_obs, ]
```
You might want to restimate your model in 2 after removing these observations to see its effect. If you have a lot of influential observations, this could mean a problem with your regression model. So hold off on eliminating these points. For any observations that are eliminated, be sure to include your explanation for removal. Alternatively, ```{r}
hats <- as.data.frame(hatvalues(reg))
plot(hatvalues(reg), type = 'h')
```
## Importing Files from other sources
Related Documents
Related Questions
Show steps in Excel please
arrow_forward
ady.com/student/dashboard/home
ty e Sign in | Discovery..
ClassDojo for Stude.
i-Ready Login Jad Desmos | Scientific..
Copy of 7th Grade
-Ready
Practice: Equivalent Expressions - Practice- Level F
0) Use properties to rewrite 3(x + 2y) + 8+ 5y as 3x + 11y + 8.
3(x + 2y) + 8+ 5y = 3. x +3 2 y + 8+ 5y
distributive property
= 3x + 6y + 8+ 5y
= 3x + 6y + 5y + 8
arrow_forward
Write each spreadsheet expression in ordinary algebraic notation. Let cell A1 represent the variable x and A2 the variable y.
(a)
+5*A1^2 - 7*A2 + 9
(b)
+3*(A1^2 + 5)*(9"A1^2 - 9)^2
arrow_forward
cant understand a single thing that is written
arrow_forward
alculate d for each patient by subtracting the number of hours of sleep with the drug from the number without the drug.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Related Questions
- Show steps in Excel pleasearrow_forwardady.com/student/dashboard/home ty e Sign in | Discovery.. ClassDojo for Stude. i-Ready Login Jad Desmos | Scientific.. Copy of 7th Grade -Ready Practice: Equivalent Expressions - Practice- Level F 0) Use properties to rewrite 3(x + 2y) + 8+ 5y as 3x + 11y + 8. 3(x + 2y) + 8+ 5y = 3. x +3 2 y + 8+ 5y distributive property = 3x + 6y + 8+ 5y = 3x + 6y + 5y + 8arrow_forwardWrite each spreadsheet expression in ordinary algebraic notation. Let cell A1 represent the variable x and A2 the variable y. (a) +5*A1^2 - 7*A2 + 9 (b) +3*(A1^2 + 5)*(9"A1^2 - 9)^2arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageMathematics For Machine TechnologyAdvanced MathISBN:9781337798310Author:Peterson, John.Publisher:Cengage Learning,Algebra: Structure And Method, Book 1AlgebraISBN:9780395977224Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. ColePublisher:McDougal Littell
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell