Lab 10- instructions
docx
keyboard_arrow_up
School
University of Kansas *
*We aren’t endorsed by this school
Course
570
Subject
Statistics
Date
Jan 9, 2024
Type
docx
Pages
6
Uploaded by ConstableStingray4018
Lab 10
11/8
Goals
Paired and two-sample t-tests in R.
See the
Quick Summary of Chapter 12 of Whitlock and Schluter
and print out the XXXXXX
Part 1: Paired t-test – Mosquitoes and malaria
For the first part of the lab, we are going to examine some data on whether or not mosquitoes
preferentially bite people who have malaria (which is transmitted by mosquitoes). Researchers
conducted a series of trials in which a patient with malaria and two uninfected people were put
in a controlled environment. Then the researchers released a mosquito and measured which
person attracted the mosquito first.
Our data set is from a series of 100 tests (each with a different mosquito). The number of
mosquitoes attracted to the patient with malaria was recorded as the “before” value. Next, the
patient was treated for malaria. When the patient was no longer showing symptoms of the
disease, the experiment was repeated using the same three people. In this “after'' treatment
none of the three people showed signs of having malaria. Once again, the preferences of 100
mosquitoes were tested, and the number of mosquitoes that were attracted to the patient that
had been treated for malaria was recorded as the “after'' measurement.
A pair of “before” and “after” measurements constitutes a single trial. These measurements
represent a natural pairing because they record the number of mosquitoes that a particular
person attracted, and the only factor that varies between the measurements is whether the
person was showing symptoms of malaria (the “before'' measurement) or not (the “after''
measurement). To control for the fact that different people may be more or less attractive to
mosquitoes (regardless of the person's malarial status), a total of twelve patients with malaria
were examined in this way.
Question 1
. What are the appropriate null and alternative hypotheses for this experiment?
Step 1:
Get the data into an RStudio or
http://www.r-fiddle.org/#/
session:
raw = "group,before,after
A,19,40
B,77,8
C,11,50
D,28,40
E,42,0
F,85,7
G,90,23
H,12,18
I,78,14
J,50,25
K,62,9
L,50,41"
fakefile = textConnection(raw)
data = read.csv(fakefile, header=TRUE)
print(data)
Step 2:
Make a difference in the change in the number of mosquitoes attracted by executing:
hist(data$before - data$after, xlab="Change in # of mosquitoes (before -
after)") ;
Use a histogram to get a sense of how the magnitude of differences between the number of
mosquitoes attracted to a person when the person had malaria and the number attracted when
the person does not have malaria.
Question 2
. If we find evidence that the difference is a positive number, what will our biological
conclusion be?
Based on visual inspection, do you think that there will be a significant evidence to reject the null
hypothesis that the infection status of a person does not affect his/her attractiveness to
mosquitoes? (It is OK if this is a guess, just think a bit about whether the data seem to be clear
about whether or not there is an effect of infection status.)
Step 3:
We are interested in whether the difference between the before-treatment and after-
treatment data is significantly different from 0. Because we have a pair of measurements that
are naturally linked to each other (they come from trials that involved the same group of three
people), this problem is an example of a paired
t
-test. Recall that when we do a paired t-test, we
are effectively just doing a one-sample
t
-test on the difference between each member of the
pair. So we will use the same type of R command that we used last week:
t.test(data$before - data$after, mu=0) ;
Question 3
. What do you conclude about whether or not malaria infection makes a person more
attractive to mosquitoes?
Note that you can also conduct the test by passing in both vectors (rather than a vector of
differences) by using:
t.test(data$before, data$after, mu=0, paired=TRUE) ;
Question 4
: Did you get the same result when you performed the paired t-test this way?
Step 4:
For a paired t-test, it can be convenient to look at a scatterplot of the pairs of
measurements with the equality line plotted:
axr = c(0, max(c(data$before, data$after)))
plot(data$before, data$after, xlab="before", ylab="after", xlim=axr,
ylim=axr);
abline(0, 1);
The
axr
line is creating a vector with 0 and the largest value in the dataset. We use the
xlim
and
ylim
arguments to tell R to plot this range along both axes. The
abline
funciton takes the
y-intercept (0) and slope(1) of a line to overlay on the plot.
Part 2: Two-sample
t
-tests - London taxi driver study
Maguire et al. (2000)
studied whether obtaining spatial knowledge changes in people's brains:
“Taxi drivers in London must undergo extensive training, learning how to navigate
between thousands of places in the city. This training is colloquially known as
‘being on The Knowledge’ and takes about 2 years to acquire on average. To be
licensed to operate, it is necessary to pass a very stringent set of police
examinations. London taxi drivers are therefore ideally suited for the study of
spatial navigation.”
The hippocampus is a region of the brain that is known to have a role in facilitating spatial
memory. In particular, the posterior region of the hippocampus has been hypothesized to be
associated with spatial memory. Maguire et al. (2000) compared the size of the hippocampus in
people who had been taxi drivers for a long time (and presumably had acquired a large amount
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
of spatial information), versus those who have just become taxi drivers (and presumably have
not yet learned all of the spatial information). The size of brain regions of taxi drivers was
measured using MRI.
Ideally, we would analyze this data in the form of a test of the relationship between two
continuous, numerical variables - we would compare the length of time someone has been a
taxi driver to the size of that person's hippocampus. We have not covered the statistical
methods for such an analysis yet. For today's lab, we will “bin” the amount of time that someone
has been a taxi driver into two groups: those who have been taxi drivers less than 15 years, and
those who have been taxi drivers for at least 15 years. This creates two populations of subjects
for the study. Maguire et al. (2000) were interested in whether the aspects of the brain differ
between these groups.
Step 5:
Get the data in R’s memory by executing:
raw = "months.driving,post.hippo.diff,ant.hippo
54,-2.4,110.68
60,0.35,97.55
62,-0.44,95.85
67,-3.53,114.55
72,-1.68,90.29
90,-4.05,115.63
102,-1.33,114.08
139,-4.77,111.77
223,-3.88,71.59
258,0,103.58
258,-2.2,71.59
272,0.07,84.26
272,1.41,69.89
340,3.09,85.96
349,3.98,96.93" ;
fakefile = textConnection(raw) ;
data = read.csv(fakefile, header=TRUE) ;
print(data) ;
Question 5
. What are the appropriate null and alternative hypotheses?
Step 6:
The first column of the worksheet is the number of months that the subject has been a
taxi driver. The next two columns are measurements of the volume of the hippocampus. The
post.hippo.diff
column is the difference between the volume of the posterior part of the
hippocampus (in mm
3
) minus the average volume of that part of the brain in a person of the
same age. Larger numbers correspond to larger posterior portions of the hippocampus;
negative numbers indicate smaller than average posterior hippocampus).
The measurements for the anterior portion did not require age-standardization, so that data (in
the
ant.hippo
column) are actually the volume of that region.
Question 6
. Are these data paired (can we use a paired t-test)?
Step 7:
Let’s see if there is any hint of a relationship between the amount of time as a taxi driver
and the
post.hippo.diff
to make scatterplot:
plot(data$months.driving, data$post.hippo.diff,
xlab="months as taxi driver",
ylab="Posterior hippocampus difference.") ;
Because the researchers believe that experience as a taxi driver causes the hippocampus to
change, we put the months spent as a driver on the X-axis (explanatory variable) and the
posterior hippocampus volume on the Y-axis (response variable). Later in the course, we will
discuss how to compare two continuous variables (like the brain volume measurements and the
number of months). For now, we will bin the data into groups those subjects who have been
taxi drivers for 15 years (180 months) vs those who have been driving taxis for at least 180
months. First lets create two variables to hold the posterior hippocampus data for the two
groups (
lt
for the <15 years group, and
geq
for the >15 years group):
lt = data$post.hippo.diff[data$months.driving < 180] ;
geq = data$post.hippo.diff[data$months.driving >= 180] ;
We can visualize the differences between the groups using a pair of boxplots:
boxplot(lt, geq, names = c("<15 yrs", ">= 15 yrs"),
ylab="post.hippo.diff") ;
Question 7
. Based on the boxplots, do you think that we will find a significant difference in the
size of the hippocampus between those drivers with < 15 years experience and those with > 15
years experience?
Step 8:
Let’s take a look at some summary stats:
require(pastecs) ;
stat.desc(lt) ;
stat.desc(geq) ;
If you see a message which ends in something like “
here is no package called ‘pastecs’
” then
you will need to execute the command
install.packages(
"
pastecs
"
);
And then try the the previous 3 lines again.
Question 8
: Use the summary statistics that you displayed to calculate a
t
-statistic for the
difference in means for the posterior hippocampus (comparing the difference to an expected
value under the null of zero). What is the value of the t-statistic?
Recall that:
sp2=(n1
- 1)s12 +
(n2
- 1)s22
n1
+
n2
- 2
SEY1-Y2= sp21n1+1n2
t=Y1-Y2 - (1-2)SEY1-Y2
df =n1
+
n2
- 2
Question 9
. Set a variable called df to hold the correct degrees of freedom and use:
pt(0.975, df=df)
to find the critical value.
Question 10
. Which population mean did you subtract in order to calculate the
t
-statistic? If you
were to get a positive value for the difference in means, what would this indicate biologically?
(Would it appear that taxi driving increased or decreased the size of the posterior
hippocampus?)
Step 8:
Recall that, in this course we are not going to worry about the Welch's
t
-test (which
does not assume equal variances). So to check your work in R, use the version of the
t
-test that
assumes that the variances in the 2 populations are equal:
t.test(lt, geq, var.equal = TRUE);
Question 11
: What is the
P
-value reported by R? Make sure that the t-statistic agrees with the
one you calculated for Question #5.
Question 12
. What should we conclude from this study?
Step 9:
Do the data appear to fit the assumptions of normality? Execute the commands:
qqnorm(lt) ;
qqnorm(geq) ;
You'll see a plot similar to the normal quantile plot that we discussed in lecture. R puts the
quantiles from your data on the
y
-axis and the quantiles of the standard normal on the
x
-axis.
Does the plot appear to be approximately linear?
These data are not obviously non-normal, so we would usually just conduct the t- test as we
have just done above. If normality tests had shown strong deviations from normality, then
transforming the data or using a non-parametric tests (such as the Mann-Whitney U-test) would
be recommended.
Step 10:
Now, following the same procedure as above, use R to perform a two-sample
t
-test for
the anterior portion of the hippocampus.
Question 13
. Are the results similar to what was found earlier for the posterior hippocampus?
What is the
P
-value for this comparison?
References
: Maguire, Eleanor A.,
et al
. "Navigation-related structural change in the hippocampi
of taxi drivers."
Proceedings of the National Academy of Sciences
97.8 (2000): 4398-4403.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you

Functions and Change: A Modeling Approach to Coll...
Algebra
ISBN:9781337111348
Author:Bruce Crauder, Benny Evans, Alan Noell
Publisher:Cengage Learning

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Recommended textbooks for you
- Functions and Change: A Modeling Approach to Coll...AlgebraISBN:9781337111348Author:Bruce Crauder, Benny Evans, Alan NoellPublisher:Cengage LearningLinear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill

Functions and Change: A Modeling Approach to Coll...
Algebra
ISBN:9781337111348
Author:Bruce Crauder, Benny Evans, Alan Noell
Publisher:Cengage Learning

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill