Lab2_Key_SP23
pdf
keyboard_arrow_up
School
Irvine Valley College *
*We aren’t endorsed by this school
Course
101
Subject
Statistics
Date
Apr 3, 2024
Type
Pages
8
Uploaded by ElderScorpion4198
Lab 2 (Spring 2023)
Prof Cat.
2023-02-10
Problem 1
Download the “interruption” data from Canvas, and import this data into R. This dataset has
two variables that describe the number of interruptions a person counted before we defined
what an interruption was (int1) and after we came up with a shared definition (int2). Report
the mean, range, and standard deviation for both variables.
Then, use R to calculate the
standard deviation of each variable “by hand” (watch videos from Lecture 3 readings to see
how to do this). Finally, explain how the mean, range, and standard deviation changed after
we operationalized what an interruption is (e.g., compare these statistics for int1 and int2).
First I’ll load and check the dataset.
int
<-
read.csv(
"~/Dropbox/Teaching Datasets/interruptions_SP23.csv"
)
# loading data
head(int)
# checking to make sure the data loaded correctly.
##
int1 int2
## 1
25
25
## 2
30
50
## 3
29
12
## 4
29
12
## 5
24
9
## 6
25
17
Next, graphing the variables.
par(
mfrow =
c(
1
,
2
))
# using the par function to split the graphics window
hist(int$int1)
hist(int$int2)
1
Histogram of int$int1
int$int1
Frequency
0
10
30
50
0
10
20
30
40
50
Histogram of int$int2
int$int2
Frequency
0
10
20
30
40
50
0
10
20
30
40
50
60
70
describe(int)
# using the describe function; from the
'
psych
'
package. you can use other methods to do t
##
vars
n
mean
sd median trimmed
mad min max range skew kurtosis
se
## int1
1 197 21.94 10.52
20
21.25 7.41
1
57
56 0.77
0.99 0.75
## int2
2 197 15.02
6.06
15
14.64 4.45
1
50
49 1.60
7.20 0.43
Okay, now I’ll calculate the SD by hand:
resI1
<-
int$int1 - mean(int$int1,
na.rm =
T)
# residuals for int 1
resI2
<-
int$int2 - mean(int$int2,
na.rm =
T)
# residuals for int 2
SSE1
<-
sum(resI1ˆ
2
)
# SSE for int1
SSE2
<-
sum(resI2ˆ
2
)
# SSE for int2
nI1
<-
length(na.omit(int$int1))
# sample size for int1, omitting missing data
nI2
<-
length(na.omit(int$int2))
# sample size for int2, omitting missing data
sqrt(SSE1/(nI1
-1
))
# sd for int1...same thing R got using the describe function
## [1] NA
sqrt(SSE2/(nI2
-1
))
# sd for int2...same thing R got using the describe function
## [1] NA
2
Note that student estimates might be off if they did not a) remove missing data from the sample size, and b)
subtract n-1 in the equation. This is fine (no penalty)! The key idea is that the SD is an average of the SSE.
So, both the mean and standard deviation of the number of interruptions went down after we operationalized
an interruption. This makes sense because our definition a) clarified what an interruption was (ensuring that
people’s answers would be more similar to each other = less variation = lower SD) and b) focused on counting
the number of times one individual was interrupted (whereas before the operationalization, some students
were counting interruptions from both people).
Note : I didn’t remove outliers here; student answers might be slightly different if they did.
Problem 2
Download the “cal_mini_data_SP23.csv” dataset from bCourses. Load the data into R, check
to make sure it loaded correctly, and report the sample size.
mini
<-
read.csv(
"~/Dropbox/Teaching Datasets/cal_mini_data_SP23.csv"
,
stringsAsFactors =
T)
head(mini)
##
fb.friends insta.followers insta.follow bored thirsty tired satisfied
## 1
3
1477
1815
4
7
7
7
## 2
0
945
1001
3
8
9
8
## 3
802
561
571
5
6
10
9
## 4
69
134
149
4
7
7
7
## 5
0
285
882
3
3
5
6
## 6
0
1200
667
7
4
10
8
##
oski.love r.love socmeduse data.pow corp.pow hard.work privilege catdogpref
## 1
3
6
7
2
8
8
6
cats
## 2
5
3
8
8
9
8
7
dogs
## 3
4
7
8
5
8
5
6
cats
## 4
4
7
5
3
8
8
8
dogs
## 5
10
4
10
10
10
8
8
dogs
## 6
4
1
1
10
6
10
9
dogs
##
tuhoburapref calsports is.female long.hair have.water shoe.size height
## 1
horses
Yes
Yes
Yes
Yes
8.0
64
## 2
horses
Yes
Yes
Yes
Yes
9.5
70
## 3
horses
Yes
Yes
Yes
Yes
6.0
64
## 4
horses
No
Yes
No
No
7.0
63
## 5
rats
Yes
Yes
Yes
No
8.5
66
## 6
butterflies
Yes
Yes
Yes
Yes
8.0
68
nrow(mini)
## [1] 123
Choose one continuous (numeric) variable from this dataset.
Graph the variable using the
hist() function, and use arguments to change the color of this graph, the labels of this graph,
and the title of the graph.
Paste this graph into your lab.
Below the histogram, use R to
report the mean, median, standard deviation, and range for this variable. Then describe the
shape of this distribution (e.g., normal / skew / kurtosis) and what you learn about our class
from this graph.
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
names(mini)
##
[1] "fb.friends"
"insta.followers" "insta.follow"
"bored"
##
[5] "thirsty"
"tired"
"satisfied"
"oski.love"
##
[9] "r.love"
"socmeduse"
"data.pow"
"corp.pow"
## [13] "hard.work"
"privilege"
"catdogpref"
"tuhoburapref"
## [17] "calsports"
"is.female"
"long.hair"
"have.water"
## [21] "shoe.size"
"height"
hist(mini$corp.pow,
main =
"A Histogram"
,
xlab =
"Perceptions of Corporate Power Over User Data"
,
col =
"black"
,
bor =
"white"
)
A Histogram
Perceptions of Corporate Power Over User Data
Frequency
2
4
6
8
10
0
5
10
15
20
25
30
35
describe(mini$corp.pow)
##
vars
n mean
sd median trimmed
mad min max range
skew kurtosis
se
## X1
1 123 7.93 1.75
8
8.15 1.48
2
10
8 -1.04
1.29 0.16
The graph is very negatively / left skewed - this means the majority of students believe that corporations
hold power over our user data.
Now, choose a categorical (non-numeric) variable from this dataset. Graph the variable using
the plot() function, and use the summary() function to report the number of people in each
group. Below the graph, describe what you learn about our class from the graph.
4
plot(mini$tuhoburapref)
butterflies
horses
rats
turtles
0
10
20
30
40
summary(mini$tuhoburapref)
## butterflies
horses
rats
turtles
##
31
44
15
33
Our class is a fan of horses and does not appear to like rats very much.
Problem 3
Load the covid_behvaior_data.csv dataset into R, check to make sure it loaded correctly, and
report the sample size.
covid
<-
read.csv(
"~/Dropbox/Teaching Datasets/covid_behavior_data.csv"
,
stringsAsFactors =
T)
head(covid)
##
age Handwash Mask Sanitize SocialDistance SelfIsolate gender ethnicity
## 1
NA
NA
NA
NA
NA
NA
<NA>
<NA>
## 2
41
NA
NA
NA
NA
NA
M
W
## 3
52
4
4
4
4
4
W
W
## 4
60
NA
NA
NA
NA
NA
M
W
## 5
39
NA
NA
NA
NA
NA
W
W
5
## 6
28
1
3
0
2
0
M
W
##
political_party
EXTRA
AGREE
CONSC
NEGEM OPENN
## 1
<NA>
NA
NA
NA
NA
NA
## 2
R
NA
NA
NA
NA
NA
## 3
R 0.3333333 4.0000000 3.333333 2.666667
4
## 4
R
NA
NA
NA
NA
NA
## 5
R
NA
NA
NA
NA
NA
## 6
R 2.3333333 0.3333333 2.000000 1.000000
1
nrow(covid)
## [1] 842
Looks like the data loaded correctly.
Choose one continuous (numeric) variable from the dataset and one categorical variable from
the dataset - use the codebook as a guide for what these variables mean. Graph each variable.
Report descriptive statistics for the continuous variable and frequency of the levels for the
categorical variable. Then, describe what you learn about the people in this dataset from each
graph.
par(
mfrow =
c(
1
,
2
))
# splitting my graphics window into a 1x2 grid
hist(covid$CONSC)
plot(covid$ethnicity)
Histogram of covid$CONSC
covid$CONSC
Frequency
0
1
2
3
4
0
50
100
150
AA
EA
O
W
0
100
300
500
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
library(psych)
# I installed this package in a previous R sesh.
describe(covid$CONSC)
##
vars
n mean
sd median trimmed
mad
min max range skew kurtosis
se
## X1
1 405 2.96 0.93
3
3.06 0.99 0.33
4
3.67 -0.7
-0.42 0.05
summary(covid$CONSC)
##
Min. 1st Qu.
Median
Mean 3rd Qu.
Max.
NA’s
##
0.3333
2.3333
3.0000
2.9646
3.6667
4.0000
437
Participants seem very skewed in terms of their conscientiousness; I would have expected something looking
more normal given what I know about personality differences (which tend to be normal). This makes me
think there’s something weird going on in the data (maybe the data are incorrect?) or the sample (maybe
there’s something different about these people that make them super conscientious. . . ).
Participants also do not seem reprentative of the ethnic breakdown of the US; lots of white people in the
data, not many people of color, latino/a/x people not represented (or lumped into some other category??)
Seems problematic if we want to use these data to make a claim about what people in general are like. Tho
TBH this is fairly standard in psychological research; will talk more about this in a few weeks so stay tuned!
Problem 4
I’m not going to do this for the lab key. But cool that you did!
Problem 5
OPTIONAL CHALLENGE : Use Google to find a way to calculate the mode in R. Check that
your method works by defining two variables in R - one with a set of numbers that has one
mode, and one with a set of numbers that has two modes. [For example : variable1 <- c(1, 1,
2) has two modes]. Use the mode function on each variable to confirm that the mode function
works. Screenshot your code and output to calculate and test the mode, with a link to where
you found the code.
I found the following code by googling “calculate mode in R” : https://stackoverflow.com/questions/2547402/
how-to-find-the-statistical-mode/8189441#8189441
Modes
<-
function
(x) {
ux
<-
unique(x)
tab
<-
tabulate(match(x, ux))
ux[tab == max(tab)]
}
I’ll test the code to see if it works.
modetest1
<-
c(
1
,
1
,
1
,
3
,
4
,
6
,
42
)
modetest2
<-
c(
1
,
1
,
1
,
3
,
4
,
6
,
42
,
42
,
42
)
Modes(modetest1)
## [1] 1
7
Modes(modetest2)
## [1]
1 42
It works. Yay.
8
Related Documents
Recommended textbooks for you

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Recommended textbooks for you
- Linear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill